lagrangian

A trick for handling inequality constraints: instead of enforcing $f_{i} (x) \leq 0$ from outside the optimisation, fold a penalty $a_{i} f_{i} (x)$ into the objective, with $a_{i} \geq 0$ a Lagrange multiplier. The resulting Lagrangian $L (x, a) = F (x) + \sum_{i} a_{i} f_{i} (x)$ can be optimised by alternating min over $x$ and max over $a$ — and under the right conditions, swapping the order (“strong duality”) gives an equivalent but often easier problem.

The Setup: Constrained Convex Optimisation

We have a primal problem:

$min_{x} F (x) subject to f_{i} (x) \leq 0, i \in {1, \dots, N}$

We’d like to use unconstrained calculus ( $\nabla F = 0$ ), but the constraints get in the way. They forbid certain regions of $x$ -space, and the unconstrained minimum may sit in a forbidden region.

Lagrange Relaxation

Form the Lagrangian:

$L (x, a) = F (x) + \sum_{i = 1}^{N} a_{i} f_{i} (x), a_{i} \geq 0$

The $a_{i}$ are Lagrange multipliers. They re-express the constraints as a penalty:

If a constraint is violated ( $f_{i} (x) > 0$ ), the term $a_{i} f_{i} (x)$ is positive — it inflates the objective, pushing the optimiser away from infeasible $x$ .
If a constraint is satisfied ( $f_{i} (x) \leq 0$ ), the term is $\leq 0$ — at best it helps the objective, at worst it does nothing.

Minimax Primal Formulation

A naive Lagrangian relaxation has a problem: with fixed multipliers, the penalty for violation may be too small. The fix is to maximise over $a$ , then minimise over $x$ :

$min_{x} max_{a \geq 0} L (x, a)$

Reading the inner max:

If any constraint is violated ( $f_{i} (x) > 0$ ), the inner max can drive $a_{i} \to \infty$ , sending $L \to \infty$ . The outer min refuses to land there.
If all constraints are satisfied, every $a_{i} f_{i} (x) \leq 0$ , so the inner max is achieved at $a_{i} = 0$ (or wherever $f_{i} = 0$ ), giving $L (x, a) = F (x)$ .

So the minimax exactly reproduces the constrained primal — no information is lost.

The Dual Formulation

The minimax requires solving a constrained max problem inside an unconstrained min — still awkward. The dual swaps the order:

$max_{a \geq 0} min_{x} L (x, a)$

Now the inner min is unconstrained (no inequality constraints on $x$ alone), so we can attack it with $\nabla_{x} L = 0$ . This often yields a closed-form expression for $x^{*}$ in terms of $a$ , which we substitute back to get a problem in $a$ alone.

Weak Duality

Always true:

$max_{a} min_{x} L \leq min_{x} max_{a} L$

The dual gives a lower bound on the primal optimum. The gap between them is the duality gap.

Strong Duality

When the gap is zero — primal and dual have the same optimal value — we say strong duality holds. Two sufficient conditions:

$F$ and all $f_{i}$ are convex ( $L$ is convex in $x$ for fixed $a$ , concave in $a$ for fixed $x$ ).
Slater’s condition: there exists at least one strictly feasible $x$ with $f_{i} (x) < 0$ for all $i$ .

Both hold for SVM and most ML problems, so we can freely move between primal and dual representations.

TIP — Saddle point picture

When strong duality holds, the optimum is a saddle point of $L$ : a minimum along the $x$ axis and a maximum along the $a$ axis. Whether you walk down then up (minimax) or up then down (maxmin), you arrive at the same point.

Why Bother Going to the Dual?

Three reasons that show up in SVM:

The inner min has closed form. Setting $\nabla_{x} L = 0$ and solving eliminates $x$ , leaving a problem purely in $a$ .
The dual may have fewer variables. SVM primal has $D + 1$ unknowns ( $w, b$ where $D = dim (ϕ)$ ); SVM dual has $N$ unknowns (one $a^{(n)}$ per training point). When $D ≫ N$ (high-dimensional embedding, modest training set), the dual is dramatically smaller.
The dual depends on $x$ only through inner products $ϕ (x^{(i)})^{⊤} ϕ (x^{(j)})$ . This opens the door to the kernel-trick — replace inner products with a kernel function and never compute $ϕ$ at all.

Worked Example: SVM Primal → Lagrangian → Minimax → Dual

The SVM constraint $y^{(n)} (w^{⊤} ϕ (x^{(n)}) + b) \geq 1$ is rewritten as $1 - y^{(n)} (w^{⊤} ϕ (x^{(n)}) + b) \leq 0$ , which slots straight into the Lagrangian template:

Setting up the minimax (and equivalently the dual maxmin) gives the saddle-point form whose inner min has closed-form solutions for $w^{*}$ and the constraint $\sum_{n} a^{(n)} y^{(n)} = 0$ :

Active Recall

Why does the inner $max_{a}$ in the minimax formulation reproduce the original constraint exactly?

Because the multipliers $a_{i} \geq 0$ are unbounded from above. If a constraint is violated ( $f_{i} > 0$ ), the max wants to drive $a_{i} \to \infty$ to make the penalty arbitrarily large — the outer min then refuses to land on any infeasible $x$ . If a constraint is satisfied ( $f_{i} \leq 0$ ), the max wants $a_{i} = 0$ to avoid making $L$ smaller than $F$ . Either way, the minimax recovers the original primal: $L = F$ on the feasible set, $L = \infty$ outside it.

Strong duality fails for non-convex problems. What can the dual still tell us?

Weak duality holds unconditionally — the dual is always a lower bound on the primal. Even when there’s a duality gap, the dual gives a certificate of optimality (the primal can’t do better than the dual value). For non-convex problems where the primal is intractable, solving the dual can still yield useful bounds on how good a heuristic primal solution is.

kkt-conditions — the necessary and sufficient optimality conditions that primal/dual solutions must jointly satisfy
support-vector-machine — the canonical ML use case
kernel-trick — what becomes possible after going to the dual
quadratic-programming — the structural form SVM lives in (quadratic objective, linear constraints)

Course Notes

Explorer

lagrangian

The Setup: Constrained Convex Optimisation

Lagrange Relaxation

Minimax Primal Formulation

The Dual Formulation

Weak Duality

Strong Duality

Why Bother Going to the Dual?

Worked Example: SVM Primal → Lagrangian → Minimax → Dual

Active Recall

Graph View

Table of Contents

Backlinks

Course Notes

Explorer

lagrangian

The Setup: Constrained Convex Optimisation

Lagrange Relaxation

Minimax Primal Formulation

The Dual Formulation

Weak Duality

Strong Duality

Why Bother Going to the Dual?

Worked Example: SVM Primal → Lagrangian → Minimax → Dual

Active Recall

Related

Graph View

Table of Contents

Backlinks