Learning Curve Simulator Back
Machine Learning

Learning Curve Simulator — Diagnose Overfitting & Underfitting

Diagnose whether a machine-learning model is overfitting or underfitting using the learning curve — the training error and validation error plotted against the amount of training data. Adjust model complexity, data size, noise and regularisation and watch in real time whether the two curves converge.

Parameters
True function
The underlying function f(x) that generates the data
Model complexity (polynomial degree)
Degree of the fitted polynomial. Higher = more complex
Training-set size
pts
Right end of the learning curve = maximum data
Data noise standard deviation
Magnitude σ of Gaussian noise added to the observations
Regularisation strength λ
Ridge regularisation. Higher curbs overfitting
Results
Training error (final)
Validation error (final)
Generalisation gap
Diagnosis
Recommended action
Model complexity
Data and model fit

The bold cyan curve is the true function, the dots are the training data, and the orange curve is the fitted model. When the model overfits it wiggles through the noise; when it underfits it is too smooth to follow the true curve.

Learning curve — error vs training-set size
Validation error vs model complexity (U-curve)
Theory & Key Formulas

$$E_{val}\;=\;\underbrace{\text{bias}^2}_{\text{large when underfitting}}+\underbrace{\text{variance}}_{\text{large when overfitting}}+\sigma^2$$

Validation error decomposes into squared bias, variance and the irreducible error σ². A small train-validation gap with high error signals underfitting; a large gap signals overfitting.

$$\hat{w}\;=\;\big(X^{\!\top}X+\lambda I\big)^{-1}X^{\!\top}y$$

The normal equations for ridge-regularised least squares. X is the Vandermonde design matrix and λ is the regularisation strength. Raising λ keeps the weights w small and curbs overfitting.

$$\text{MSE}\;=\;\frac{1}{m}\sum_{i=1}^{m}\big(\hat{f}(x_i)-y_i\big)^2$$

The training error is the mean squared error on the m training points; the validation error is evaluated on the fixed held-out validation set.

What is a Learning Curve?

🙋
I hear "learning curve" all the time — but what exactly is it a plot of?
🎓
Roughly, it is a plot of "how the model's error changes as you add data". The horizontal axis is the number of training points used, the vertical axis is the error. There are two lines: one is the error on the training data itself, the other is the error on a separate held-out validation set. With few data points the model can simply memorise the training set, so the training error is near zero — but the validation error is large. As you add data the training error creeps up and the validation error comes down. Watching how those two lines move is the learning curve.
🙋
And what do those two lines tell you?
🎓
This is the really useful part — some people call it the single most useful diagnostic in applied machine learning. The key things are "the final height of the two lines" and "the gap between them". If both sit high and lie almost on top of each other, the model is too simple to express the true relationship — that is underfitting. If the training error is near zero but the validation error stays high and the gap never closes — that is overfitting. The model has memorised the noise too.
🙋
When it is underfitting, I feel like collecting more data should fix it…?
🎓
That is exactly the biggest payoff of reading a learning curve. With underfitting (high bias) the two lines already meet at a "high place", not a low one. No matter how much data you add, both lines stay glued to that high ceiling — so collecting data is wasted effort. To fix it you have to make the model more complex, add features, or weaken regularisation. For example, in house-price prediction, a linear regression on floor area alone plateaus in accuracy — a classic high-bias case, and the fix is to add features like location and building age.
🙋
So conversely, for overfitting, adding data is what works. When I push the degree up to 15 the validation error shoots up.
🎓
Right. Overfitting (high variance) is often driven by lack of data, so adding data brings the validation error down and closes the gap. Other levers are strengthening regularisation λ, or simply lowering the model degree. In fact, set degree 15 and 20 data points in this tool and the training error is near zero while the validation error explodes — a degree-15 polynomial is threading perfectly through the noise of 20 points. Add data from there and the polynomial loses room to wiggle and settles down.
🙋
The "validation error vs model complexity" chart is U-shaped. What does that mean?
🎓
That U is the bias-variance trade-off itself. On the left, low complexity means high bias and high error — underfitting. Raise the complexity and the validation error falls to a best value at some valley. Raise it further and variance grows and the error climbs again — overfitting. So model selection is really "finding the valley of the U". In practice you locate that valley numerically with cross-validation. The valley is the right amount of complexity; anything to the right of it is a sign the model is starting to memorise noise.

Frequently Asked Questions

A learning curve plots the training error and the validation error (vertical axis) against the amount of training data (horizontal axis). As you add data, the training error rises from near zero toward a plateau, while the validation error falls from a high value toward a plateau. It is called the single most useful diagnostic in applied machine learning because, just by reading the final height of the two curves and the gap between them, you can tell at a glance whether the model is underfitting or overfitting, and whether collecting more data will help.
High bias means the training error and the validation error are both high and the gap between them (the generalisation gap) is small — the two curves sit almost on top of each other at a high plateau. That signals a model too simple to capture the true relationship, and adding data will not help. High variance means the training error is very low but the validation error is large and the gap never closes — the model has memorised the noise in the training data. This tool diagnoses the two automatically.
No. A high-bias model has its two learning curves already meeting at a high plateau, not a low one. Adding more data leaves both curves stuck against that high ceiling. The fixes for underfitting are to increase model complexity (raise the polynomial degree), add features, or reduce regularisation. Conversely, if the model is overfitting (high variance), then collecting more data is one of the effective remedies.
There are three main remedies for overfitting (high variance). (1) Add more training data — the validation error drops and the gap closes. (2) Strengthen regularisation — raising λ in this tool keeps the weight magnitudes small and curbs over-fitting to noise. (3) Simplify the model — lower the polynomial degree. The valley of the validation-error-vs-complexity U-curve marks the right amount of complexity; the further right of the valley you go, the worse the overfitting.

Real-World Applications

The "should we collect more data" decision: One of the most expensive judgements in a machine-learning project is "collect more data, or change the model". Data collection costs time and money. Draw a learning curve and you can decide instantly: if the validation error is still falling, more data helps; if it has already met the training error at a plateau, more data is wasted. The rule is to look at the learning curve before commissioning blind labelling work.

Model selection and hyperparameter tuning: The validation-error-vs-complexity U-curve is a microcosm of every hyperparameter choice — decision-tree depth, the number of layers and units in a neural network, polynomial degree, regularisation strength. In practice the valley is found numerically with cross-validation. Simpler than the valley means too much bias; more complex means too much variance; the best generalisation is at the valley.

Diagnosing deep-learning training: When training a neural network you also plot the training loss and validation loss against the amount of learning (epochs or data size). When the validation loss stops falling and turns upward, that point marks the onset of overfitting and is used for early-stopping decisions. The learning curve is the most basic visualisation, continuously monitored in experiment-tracking tools such as TensorBoard or Weights & Biases.

A sanity check for data quality and leakage: If a learning curve comes out in a shape very different from what you expect, suspect a problem on the data side. A validation error lower than the training error, or both being unnaturally small, is a classic sign of data leakage (validation information bleeding into training). The shape of the curve itself becomes a mirror that reflects bugs in the data pipeline, not just in the model.

Common Misconceptions and Pitfalls

The most common misconception is "a small training error means a good model". The training error only shows how much of the training data the model has memorised — it is a different thing from performance on unseen data. Give a degree-15 polynomial 20 points and the training error drops to near zero, but that is the result of rote memorisation and the validation error actually explodes. What you should always evaluate is the validation error (the generalisation error); the training error alone guarantees nothing. Training and validation error must always be read together, gap and all.

Next, the assumption that "for both overfitting and underfitting, you should just collect more data". Adding data only helps in the overfitting (high variance) case. An underfitting (high bias) model simply lacks expressive power, so even tenfold data will not move it off its high plateau. Rushing to collect data without first checking where the two curves meet on the learning curve — low or high — leads to the failure where cost piles up and accuracy does not improve by a millimetre. Diagnose first, act second.

Finally, the trap that "repeatedly inspecting the validation data while tuning leads to overfitting the validation set". If you tune hyperparameters endlessly while watching the validation error, you fit to the accidental quirks of the validation data and performance does not hold on truly unseen data. To prevent this, practitioners split the data three ways — training, validation and test — and use the test data only once, for the final evaluation. The learning curve too should be read with an eye on whether the validation data has been over-used.

How to Use

  1. Set polynomial degree (degNum: 1–10) to control model complexity; higher degrees risk overfitting on small datasets.
  2. Specify training set size (sizeNum: 20–500 samples) and noise level (noiseNum: 0–50% of signal variance) to simulate real data conditions.
  3. Adjust L2 regularization strength (lamNum: 0.0–1.0) to penalize large coefficients; observe how training error, validation error, and generalisation gap change.
  4. Compare final metrics: if validation error exceeds training error by >15%, model is overfitting; if both errors remain high, model is underfitting.

Worked Example

Train a polynomial regressor on 100 samples with degree=5, noise=10%, lambda=0.01. Training error converges to 0.032 MSE, validation error plateaus at 0.156 MSE, yielding a generalisation gap of 0.124. Diagnosis: overfitting. Increase lambda to 0.5; validation error drops to 0.078 MSE (gap = 0.046), indicating better regularisation. Reduce degree to 3; both errors stabilise near 0.065 MSE with gap = 0.008, confirming underfitting was eliminated.

Practical Notes

  1. Use sizeRange slider to simulate data scarcity: models trained on <50 samples almost always overfit unless heavily regularised (lambda ≥0.3).
  2. When noiseRange exceeds 30%, validation error typically dominates; prioritise noise reduction in data collection over model tuning.
  3. Monitor the Diagnosis output: "high bias" signals underfitting (increase degNum or reduce lambda); "high variance" signals overfitting (increase sizeNum or raise lambda).
  4. In production, use the Recommended action field to automate hyperparameter adjustment before retraining on fresh batches.