A learning curve plots and as functions of training-set size . The two curves converge as — to the noise floor for the right model class. The shape (gap and slope) reveals whether a model is underfitting, overfitting, or hitting irreducible error.

The Two Curves

For a fixed hypothesis set and learning algorithm :

  • — expected training error, averaging over training sets of size .
  • — expected test error of the same trained hypothesis on new data.

Plot both as functions of . Universal qualitative behaviour:

  • increases with . With few points the model interpolates; adding points forces compromises.
  • decreases with . More data means a more representative training set and a hypothesis that generalises better.
  • Both curves converge to the same asymptote — the best achievable error within , which equals (irreducible noise) for a well-specified model.

Simple vs Complex Model

The shape of the curves changes qualitatively with model complexity:

Simple model (low VC dimension).

  • Curves come together quickly — small gap even for small .
  • Both stabilise high above zero error: the model is structurally limited (high bias).
  • Asymptote: , with the bias contribution often dominant.

Complex model (high VC dimension).

  • Curves are far apart for small — large generalisation gap. With less than , can be near zero (model interpolates) while is huge.
  • Need before they meet.
  • Asymptote: lower than the simple model — closer to — because expressive enough to fit .

This visualises the regime distinction:

  • Underfitting region: simple model, both curves near asymptote, asymptote is high.
  • Overfitting region: complex model with too small, large gap.
  • Sweet spot: complex model with large enough that the gap has closed.

Linear Regression (Closed Form)

For linear regression with a noisy target , , with parameters and , the learning curve has an exact form:

Reading off:

  • starts at when (perfect fit) and rises towards .
  • starts above and decays towards .
  • Generalisation gap: , decaying as .

Each parameter costs you a unit of generalisation gap. With , the gap is negligible and both errors sit at .

VC vs Bias–Variance View

Two equivalent ways to draw the same curves:

VC view. Shade the region between the curves as the generalisation error — what the VC bound controls. The bound is a uniform statement about how far the curves can be apart.

Bias–variance view. Draw a horizontal line at separating the area under (= bias) from the area between the curves (= variance, plus noise asymptote). Now the picture decomposes the error into the structural minimum (bias) and the data-dependent fluctuation (variance).

Both pictures live on top of the same two curves — they’re complementary lenses, not competing analyses.

Diagnostic Use

Plotting empirical learning curves (using held-out validation as a proxy for ) is one of the most useful debugging tools in ML:

  • Both curves high, close together: high bias. Try a more expressive model.
  • Training curve low, validation curve high: high variance. Get more data, regularise, or simplify.
  • Both flatten out at the same value: hitting the noise floor — diminishing returns from more data.
  • Curves still descending at the right edge: more data would help.

This is also how people decide whether to invest in collecting more data: if the validation curve has plateaued, more data won’t help; if it’s still falling, it might.

Active Recall