dichotomy

Given a hypothesis $h : X \to {+ 1, - 1}$ and inputs $x_{1}, \dots, x_{N}$ , the dichotomy generated by $h$ on those inputs is the tuple of labels $(h (x_{1}), \dots, h (x_{N})) \in {+ 1, - 1}^{N}$ . The set of all dichotomies $H (x_{1}, \dots, x_{N}) = {(h (x_{1}), \dots, h (x_{N})) : h \in H}$ replaces the infinite cardinality $∣ H ∣$ with a finite count of at most $2^{N}$ .

Definition

A hypothesis $h \in H$ maps every point in $X$ to ${+ 1, - 1}$ . The dichotomy of $h$ on a fixed set of $N$ inputs is what $h$ looks like restricted to those inputs:

$h (x_{1}, \dots, x_{N}) = (h (x_{1}), h (x_{2}), \dots, h (x_{N})) \in {+ 1, - 1}^{N} .$

Two different hypotheses can produce the same dichotomy: any two lines in $R^{2}$ that label the same training points the same way are the same dichotomy even if they’re different decision boundaries. The dichotomy is a coarser equivalence class than the hypothesis itself.

The set of all dichotomies that $H$ can implement on these inputs is

$H (x_{1}, \dots, x_{N}) = {(h (x_{1}), \dots, h (x_{N})) : h \in H} .$

This set has size at most $2^{N}$ — the total number of binary labellings of $N$ points — and is finite even when $∣ H ∣ = \infty$ .

Why It Matters

The generalisation bound from a union bound is

$P [∣ E_{in} - E_{out} ∣ > ϵ] \leq 2 M e^{- 2 ϵ^{2} N}$

where $M = ∣ H ∣$ . For continuously-parameterised hypotheses (perceptrons, SVMs, neural networks), $M = \infty$ and the bound is vacuous.

The fix: most of the $M$ in the union bound is wasteful. If two hypotheses $h_{1}, h_{2}$ produce the same labelling on the training set, their bad events $B_{i} = {∣ E_{in} (h_{i}) - E_{out} (h_{i}) ∣ > ϵ}$ overlap massively — in the union bound we’re double-counting them.

Counting distinct dichotomies rather than distinct hypotheses gives a much smaller, finite quantity, and (with a more careful argument than the simple union bound) replaces $M$ in the generalisation bound. The maximum number of dichotomies over all possible input choices is the growth function $m_{H} (N)$ .

A Concrete Count

Take $H =$ all lines in $R^{2}$ (i.e., 2D perceptrons). Place $N = 3$ inputs in general position. How many of the $2^{3} = 8$ possible labellings can lines actually realise?

All 8: for any binary labelling of three non-collinear points, there’s a line that puts the $+ 1$ points on one side and $- 1$ on the other. So $∣ H (x_{1}, x_{2}, x_{3}) ∣ = 8$ .

Now place $N = 4$ points. Of the $2^{4} = 16$ labellings, lines can realise only 14. The two unrealisable ones are the “XOR” labellings: $(+, -, -, +)$ and $(-, +, +, -)$ on a square — no line separates them. The infinite hypothesis set “all lines in $R^{2}$ ” produces only 14 dichotomies on 4 points.

$N$	All possible labellings $2^{N}$	Lines in $R^{2}$ realise
1	2	2
2	4	4
3	8	8
4	16	14

The structural restriction begins at $N = 4$ . This is the break-point behaviour that ultimately makes $m_{H} (N)$ polynomial rather than exponential, and learning with infinite- $H$ feasible.

Shattering

When $H$ produces all $2^{N}$ possible dichotomies on a particular set of $N$ inputs, we say $H$ shatters those inputs. Lines in $R^{2}$ shatter any 3 non-collinear points, but cannot shatter any 4 points in general position. Convex sets in $R^{2}$ , on the other hand, shatter any number of points placed on a circle — so the dichotomy count stays at $2^{N}$ for all $N$ .

Shattering is the strongest possible expressivity claim: there is no labelling we can’t fit. The largest $N$ for which $H$ can shatter some set of $N$ inputs is the VC dimension of $H$ .

growth-function — $m_{H} (N) = max ∣ H (x_{1}, \dots, x_{N}) ∣$ over all input choices; replaces $M$ in the generalisation bound.
vc-dimension — the largest $N$ for which some set of $N$ inputs is shattered.
break-point — smallest $N$ at which shattering fails for every input set.
generalization-bound — what the dichotomy-counting trick rescues.

Active Recall

Why isn't the size of $H (x_{1}, \dots, x_{N})$ always equal to $∣ H ∣$ , even when $H$ is infinite?

Because many distinct hypotheses produce the same labelling on a fixed finite input set. Two slightly-rotated lines in $R^{2}$ that happen to put the same training points on the same sides are different hypotheses but the same dichotomy. The dichotomy count collapses the equivalence class “labels the training set identically” — capped at $2^{N}$ regardless of how many distinct hypotheses exist.

Place 4 points at the corners of a square in $R^{2}$ . Which two of the 16 binary labellings can no linear classifier (perceptron in 2D) realise, and why?

The XOR-style labellings: $(+, -, -, +)$ on diagonally opposite corners, and the complement $(-, +, +, -)$ . A line in $R^{2}$ partitions the plane into two half-planes; to put diagonally-opposite corners on the same side and the other two opposite corners on the other side would require a non-linear boundary. So $∣ H (x_{1}, \dots, x_{4}) ∣ = 14 < 16$ , which is the source of the $R^{2}$ perceptron’s break point at 4.

Why does counting dichotomies rather than hypotheses help us bound the probability of bad generalisation?

The naive union bound multiplies the per-hypothesis bad-event probability by $∣ H ∣$ , which is infinite for any continuously parameterised model. But “bad event for $h_{1}$ ” and “bad event for $h_{2}$ ” almost coincide when $h_{1}$ and $h_{2}$ produce the same labelling on the training set. A finer accounting collapses each equivalence class of identically-behaving hypotheses into a single dichotomy. There are at most $2^{N}$ dichotomies, and for “structured” $H$ far fewer — often polynomial in $N$ . The corrected union bound uses this dichotomy count rather than $∣ H ∣$ .

Course Notes

Explorer

dichotomy

Definition

Why It Matters

A Concrete Count

Shattering

Active Recall

Graph View

Table of Contents

Backlinks

Course Notes

Explorer

dichotomy

Definition

Why It Matters

A Concrete Count

Shattering

Related

Active Recall

Graph View

Table of Contents

Backlinks