Given a hypothesis and inputs , the dichotomy generated by on those inputs is the tuple of labels . The set of all dichotomies replaces the infinite cardinality with a finite count of at most .

Definition

A hypothesis maps every point in to . The dichotomy of on a fixed set of inputs is what looks like restricted to those inputs:

Two different hypotheses can produce the same dichotomy: any two lines in that label the same training points the same way are the same dichotomy even if they’re different decision boundaries. The dichotomy is a coarser equivalence class than the hypothesis itself.

The set of all dichotomies that can implement on these inputs is

This set has size at most — the total number of binary labellings of points — and is finite even when .

Why It Matters

The generalisation bound from a union bound is

where . For continuously-parameterised hypotheses (perceptrons, SVMs, neural networks), and the bound is vacuous.

The fix: most of the in the union bound is wasteful. If two hypotheses produce the same labelling on the training set, their bad events overlap massively — in the union bound we’re double-counting them.

Counting distinct dichotomies rather than distinct hypotheses gives a much smaller, finite quantity, and (with a more careful argument than the simple union bound) replaces in the generalisation bound. The maximum number of dichotomies over all possible input choices is the growth function .

A Concrete Count

Take all lines in (i.e., 2D perceptrons). Place inputs in general position. How many of the possible labellings can lines actually realise?

All 8: for any binary labelling of three non-collinear points, there’s a line that puts the points on one side and on the other. So .

Now place points. Of the labellings, lines can realise only 14. The two unrealisable ones are the “XOR” labellings: and on a square — no line separates them. The infinite hypothesis set “all lines in ” produces only 14 dichotomies on 4 points.

All possible labellings Lines in realise
122
244
388
41614

The structural restriction begins at . This is the break-point behaviour that ultimately makes polynomial rather than exponential, and learning with infinite- feasible.

Shattering

When produces all possible dichotomies on a particular set of inputs, we say shatters those inputs. Lines in shatter any 3 non-collinear points, but cannot shatter any 4 points in general position. Convex sets in , on the other hand, shatter any number of points placed on a circle — so the dichotomy count stays at for all .

Shattering is the strongest possible expressivity claim: there is no labelling we can’t fit. The largest for which can shatter some set of inputs is the VC dimension of .

  • growth-function over all input choices; replaces in the generalisation bound.
  • vc-dimension — the largest for which some set of inputs is shattered.
  • break-point — smallest at which shattering fails for every input set.
  • generalization-bound — what the dichotomy-counting trick rescues.

Active Recall