A break point for a hypothesis set is any such that no set of inputs is shattered — equivalently, the growth function satisfies . Once has a break point, every larger is also a break point, is bounded by a polynomial of degree , and the VC bound becomes non-vacuous.
Definition
A break point for is a positive integer such that
The smallest such is the break point. By convention if no such exists ( shatters arbitrarily large input sets), has no break point, equivalently .
The break point and VC dimension are linked exactly:
If then shatters some set of inputs () but not any set of (), making the break point.
Examples
| Hypothesis set | Break point | ||
|---|---|---|---|
| Positive rays | () | ||
| Positive intervals | () | ||
| 2D perceptrons | () | ||
| Convex sets in | none |
Why It Matters
The Sauer–Shelah lemma turns a break point at into a polynomial bound on the growth function:
The transition from “could be exponential” () to “guaranteed polynomial” () is the crucial step that makes generalisation provable for infinite hypothesis sets:
shrinks to zero as if is polynomial. If there’s no break point, and the right-hand side is bounded below by a constant — learning is not guaranteed to generalise.
A single discrete fact (“does the hypothesis set fail to shatter even one input set of size ?”) thus controls the entire qualitative behaviour of generalisation.
Once a Break Point, Always
If is a break point for , so is every . Why? Suppose for contradiction that shatters some set of size . Take any subset of size — every dichotomy of extends to a dichotomy of , so shatters too, contradicting that is a break point.
This monotonicity is why we focus on the smallest break point: it determines when shattering first fails, and everything beyond inherits the failure.
Related
- growth-function — the function whose break point we’re finding.
- vc-dimension — equals one less than the break point.
- dichotomy — labelling patterns whose count saturates at before the break point.
Active Recall
Why is the break point of "all lines in " equal to 4 rather than 5, and what does this say about the VC dimension?
At , you can place 4 points (corners of a square) such that the XOR labelling is not realisable by any line — so and is a break point. At , three non-collinear points are shattered: all labellings are realisable. So 3 is not a break point; 4 is the smallest. By the relation break point , .
Suppose has break point . Is also a break point? Justify.
Yes. If no set of 5 inputs is shattered, no set of 7 inputs can be either: any 7-point set contains a 5-point subset, and shattering the 7-set would imply shattering the 5-set (every dichotomy of the subset extends to a dichotomy of the superset). The “no break point” property propagates upward, so once shattering fails at any size, it fails at all larger sizes.
Convex sets in have no break point. What does this imply about generalisation when learning with this hypothesis set, and why is it intuitive?
No break point means for all , so the VC bound is vacuous — no finite guarantees small generalisation gap. Intuitively, given any training set with a labelling, the convex-hull-of-positives produces a hypothesis with — but the learned set tells you nothing about how unseen points should be classified. The hypothesis set is too expressive to constrain the outcome, so the worst-case behaviour can be arbitrarily bad. This is the formal version of “any function works on the training data, so we have no information about new data”.