For unconstrained convex optimisation, is necessary and sufficient. Add inequality constraints, and a single gradient condition isn’t enough — KKT extends optimality to four conditions that the primal and dual solutions must jointly satisfy.
The Conditions
For the problem subject to (with convex and ), a primal solution and dual multipliers are jointly optimal iff:
1. Stationarity. The Lagrangian’s gradient w.r.t. vanishes:
2. Complementary slackness. For every :
Either or (or both). Constraints with are active (binding with equality at the optimum); inactive constraints have .
3. Primal feasibility. for all .
4. Dual feasibility. for all .
Why Complementary Slackness Matters
This is the structural condition that makes SVMs sparse. In SVM the constraint is (i.e., the point is correctly classified beyond the margin). KKT says, for each training point, either:
- — the point sits strictly outside the margin and contributes nothing to the dual sum, or
- — the point is exactly on the margin: a support vector.
So at the optimum, only support vectors have . Every other training point is structurally invisible to the prediction function. This is the formal reason SVMs depend on a small subset of the data.
Stationarity in Practice (SVM Worked Example)
For the SVM Lagrangian :
- gives .
- gives .
Substituting these back into eliminates and , producing the SVM dual problem in alone. KKT stationarity is what does the elimination.
Active Recall
Why is complementary slackness the structural reason SVMs are sparse, while logistic regression isn't?
Logistic regression has no inequality constraints — every training point contributes a non-zero gradient term at every iteration, so every point shapes the final . SVM’s margin constraint is an inequality, and KKT forces . Most training points sit strictly inside their constraint (), forcing — they drop out of the dual sum entirely. Sparsity is not an algorithmic accident; it’s a structural consequence of having inequality constraints.
KKT requires both stationarity and complementary slackness. Why isn't stationarity enough on its own, like in unconstrained optimisation?
Stationarity alone identifies critical points of the Lagrangian — but the Lagrangian is parameterised by the multipliers . Without complementary slackness, you could pick any , find the corresponding stationary , and call it a solution — but most such pairs aren’t primal-dual optimal. Complementary slackness is the link that ties to the constraint geometry: can only be non-zero when constraint is active. Together with feasibility, the four conditions pin down the unique primal-dual pair.
Related
- lagrangian — KKT conditions emerge from the Lagrangian framework
- support-vector-machine — KKT is what produces support-vector sparsity
- convex-function — KKT is necessary and sufficient only for convex problems; for non-convex it’s only necessary