The growth function is the largest number of distinct dichotomies that can produce on any choice of inputs. Bounded by in the worst case, often polynomial in for structured hypothesis sets — and the quantity that replaces in the generalisation bound when .

Definition

For a hypothesis set of binary classifiers on , define

where is the number of distinct labellings can produce on those specific inputs.

Two key properties:

  • — there are only binary labellings of points.
  • depends only on and . It does not depend on the input distribution , the learning algorithm, or the target function.

The “max over inputs” eliminates dependence on the specific training set. We’re asking: with the most adversarial possible placement of points, how many distinct labellings can realise?

Worked Examples

Positive Rays

— threshold functions on the real line.

Place points on , sorted: . Sliding the threshold from to , the labelling pattern changes only when crosses one of the points. There are “regions” for , each producing a distinct dichotomy:

The "" comes from the flip points; the "" from the two ends.

Positive Intervals

.

Place points on . A dichotomy is determined by which two of the “gaps between/around points” the endpoints and fall into. Choosing 2 gaps from with repetition gives

The extra "" is the all-negative labelling (interval entirely outside the points).

2D Perceptrons

all linear classifiers in . By direct enumeration:

122
244
388
41416

At the count drops below for the first time. With more careful analysis for 2D perceptrons — polynomial, not exponential.

Convex Sets in

.

Place inputs on a circle. For any labelling, take the convex hull of the points (slightly extended): every dichotomy is realisable. So

Convex sets are maximally expressive on circles — they shatter any . The hypothesis set has no break-point and infinite VC dimension.

The Break Point

A break point for is the smallest such that no set of inputs is shattered — i.e.,

If is a break point, every is also a break point: if you can’t shatter inputs, you can’t shatter either (a counter-example to shattering on points sits inside any superset).

Break points for the examples above:

Growth functionBreak point
Positive rays ()
Positive intervals ()
2D perceptrons ()
Convex sets in None

The Polynomial Bound

The deep result of VC theory: if has any break point , then is bounded by a polynomial in . Concretely, with (the VC dimension):

The growth from exponential () to polynomial () is what makes the generalisation bound non-vacuous for infinite hypothesis sets. Plug a polynomial into

(the VC bound), and the right-hand side as — learning is feasible.

Without a break point (, like convex sets), and the bound is vacuous: the hypothesis set is too expressive to generalise from finite data.

Why Max Over Inputs?

The growth function takes the maximum over all input arrangements, not the typical case. Three reasons:

  1. Worst-case bound. The generalisation bound must hold uniformly over all distributions — no probabilistic assumption rules out adversarial input placement.
  2. Algorithm- and distribution-free. Removing the dependence on makes a property of alone, computable without reference to data.
  3. Structural complexity, not statistical. measures what is capable of expressing, divorced from what data we happen to see.

Active Recall