What is overfitting

Last updated: April 1, 2026

Quick Answer: Overfitting occurs when a machine learning model learns training data too closely, memorizing noise and irrelevant patterns, which reduces its ability to generalize to new, unseen data. This results in high training accuracy but poor real-world performance.

Key Facts

Overview

Overfitting is a fundamental problem in machine learning where a model learns the training data too well, including its noise and irrelevant patterns. Rather than capturing the underlying relationship between input features and target outputs, an overfitted model essentially memorizes the training data. This leads to excellent performance on training data but poor performance on new, unseen data, making the model useless for real-world applications.

Causes of Overfitting

Several factors contribute to overfitting:

Detecting Overfitting

Overfitting manifests clearly when comparing training and validation metrics. Training accuracy may reach 95% while validation accuracy stagnates at 70%. This divergence indicates the model has learned training-specific patterns rather than generalizable features. Plotting training and validation loss over epochs typically shows validation loss increasing while training loss continues decreasing—a clear overfitting signal.

Prevention Strategies

Regularization techniques constrain model complexity and prevent overfitting:

Bias-Variance Tradeoff

Overfitting represents one extreme of the bias-variance tradeoff. High-complexity models exhibit low bias (accurate on training data) but high variance (sensitive to training data fluctuations). Conversely, overly simple models exhibit high bias (inaccurate everywhere) and low variance. Optimal models balance these extremes, achieving reasonable accuracy while maintaining generalization.

Practical Impact

In production environments, overfitted models fail when deployed on real data differing from training conditions. A fraud detection model overfitted to historical patterns won't detect novel fraud schemes. An image classifier overfitted to specific training images will misclassify similar images with different lighting or angles. Preventing overfitting ensures models provide reliable, consistent performance across diverse real-world scenarios.

Related Questions

What is underfitting in machine learning?

Underfitting occurs when a model is too simple to capture the underlying pattern in data, resulting in poor accuracy on both training and test sets. It represents the opposite extreme of overfitting on the bias-variance spectrum.

What is cross-validation?

Cross-validation is a technique that divides data into multiple subsets for training and validation, evaluating model performance across different data splits. This helps detect overfitting and provides more reliable performance estimates than single train-test splits.

What is regularization in machine learning?

Regularization modifies the learning algorithm by adding constraints or penalties that discourage overly complex models. Techniques like L1/L2 regularization, dropout, and early stopping reduce overfitting by limiting model flexibility and capacity to memorize data.

Sources

  1. Wikipedia - Overfitting CC-BY-SA-4.0
  2. Britannica - Machine Learning Proprietary