kkt-conditions

For unconstrained convex optimisation, $\nabla F (x) = 0$ is necessary and sufficient. Add inequality constraints, and a single gradient condition isn’t enough — KKT extends optimality to four conditions that the primal and dual solutions must jointly satisfy.

The Conditions

For the problem $min_{x} F (x)$ subject to $f_{i} (x) \leq 0$ (with convex $F$ and $f_{i}$ ), a primal solution $x^{*}$ and dual multipliers $a^{*}$ are jointly optimal iff:

1. Stationarity. The Lagrangian’s gradient w.r.t. $x$ vanishes:

$\nabla_{x} L (x^{*}, a^{*}) = \nabla F (x^{*}) + \sum_{i} a_{i}^{*} \nabla f_{i} (x^{*}) = 0$

2. Complementary slackness. For every $i$ :

$a_{i}^{*} f_{i} (x^{*}) = 0$

Either $a_{i}^{*} = 0$ or $f_{i} (x^{*}) = 0$ (or both). Constraints with $a_{i}^{*} > 0$ are active (binding with equality at the optimum); inactive constraints have $a_{i}^{*} = 0$ .

3. Primal feasibility. $f_{i} (x^{*}) \leq 0$ for all $i$ .

4. Dual feasibility. $a_{i}^{*} \geq 0$ for all $i$ .

Why Complementary Slackness Matters

This is the structural condition that makes SVMs sparse. In SVM the constraint is $1 - y^{(n)} h (x^{(n)}) \leq 0$ (i.e., the point is correctly classified beyond the margin). KKT says, for each training point, either:

$a^{(n)} = 0$ — the point sits strictly outside the margin and contributes nothing to the dual sum, or
$1 - y^{(n)} h (x^{(n)}) = 0$ — the point is exactly on the margin: a support vector.

So at the optimum, only support vectors have $a^{(n)} > 0$ . Every other training point is structurally invisible to the prediction function. This is the formal reason SVMs depend on a small subset of the data.

Stationarity in Practice (SVM Worked Example)

For the SVM Lagrangian $L (w, b, a) = \frac{1}{2} ∥ w ∥^{2} + \sum_{n} a^{(n)} (1 - y^{(n)} (w^{⊤} ϕ (x^{(n)}) + b))$ :

$\partial L / \partial w = 0$ gives $w = \sum_{n} a^{(n)} y^{(n)} ϕ (x^{(n)})$ .
$\partial L / \partial b = 0$ gives $\sum_{n} a^{(n)} y^{(n)} = 0$ .

Substituting these back into $L$ eliminates $w$ and $b$ , producing the SVM dual problem in $a$ alone. KKT stationarity is what does the elimination.

Active Recall

Why is complementary slackness the structural reason SVMs are sparse, while logistic regression isn't?

Logistic regression has no inequality constraints — every training point contributes a non-zero gradient term at every iteration, so every point shapes the final $w$ . SVM’s margin constraint $1 - y^{(n)} h (x^{(n)}) \leq 0$ is an inequality, and KKT forces $a^{(n)} (1 - y^{(n)} h (x^{(n)})) = 0$ . Most training points sit strictly inside their constraint ( $f_{i} < 0$ ), forcing $a^{(n)} = 0$ — they drop out of the dual sum entirely. Sparsity is not an algorithmic accident; it’s a structural consequence of having inequality constraints.

KKT requires both stationarity and complementary slackness. Why isn't stationarity enough on its own, like in unconstrained optimisation?

Stationarity alone identifies critical points of the Lagrangian — but the Lagrangian is parameterised by the multipliers $a$ . Without complementary slackness, you could pick any $a \geq 0$ , find the corresponding stationary $x$ , and call it a solution — but most such pairs aren’t primal-dual optimal. Complementary slackness is the link that ties $a$ to the constraint geometry: $a_{i}$ can only be non-zero when constraint $i$ is active. Together with feasibility, the four conditions pin down the unique primal-dual pair.

lagrangian — KKT conditions emerge from the Lagrangian framework
support-vector-machine — KKT is what produces support-vector sparsity
convex-function — KKT is necessary and sufficient only for convex problems; for non-convex it’s only necessary

Course Notes

Explorer

kkt-conditions

The Conditions

Why Complementary Slackness Matters

Stationarity in Practice (SVM Worked Example)

Active Recall

Graph View

Table of Contents

Backlinks

Course Notes

Explorer

kkt-conditions

The Conditions

Why Complementary Slackness Matters

Stationarity in Practice (SVM Worked Example)

Active Recall

Related

Graph View

Table of Contents

Backlinks