For unconstrained convex optimisation, is necessary and sufficient. Add inequality constraints, and a single gradient condition isn’t enough — KKT extends optimality to four conditions that the primal and dual solutions must jointly satisfy.

The Conditions

For the problem subject to (with convex and ), a primal solution and dual multipliers are jointly optimal iff:

1. Stationarity. The Lagrangian’s gradient w.r.t. vanishes:

2. Complementary slackness. For every :

Either or (or both). Constraints with are active (binding with equality at the optimum); inactive constraints have .

3. Primal feasibility. for all .

4. Dual feasibility. for all .

Why Complementary Slackness Matters

This is the structural condition that makes SVMs sparse. In SVM the constraint is (i.e., the point is correctly classified beyond the margin). KKT says, for each training point, either:

  • — the point sits strictly outside the margin and contributes nothing to the dual sum, or
  • — the point is exactly on the margin: a support vector.

So at the optimum, only support vectors have . Every other training point is structurally invisible to the prediction function. This is the formal reason SVMs depend on a small subset of the data.

Stationarity in Practice (SVM Worked Example)

For the SVM Lagrangian :

  • gives .
  • gives .

Substituting these back into eliminates and , producing the SVM dual problem in alone. KKT stationarity is what does the elimination.

Active Recall

  • lagrangian — KKT conditions emerge from the Lagrangian framework
  • support-vector-machine — KKT is what produces support-vector sparsity
  • convex-function — KKT is necessary and sufficient only for convex problems; for non-convex it’s only necessary