A class of optimisation problems with a quadratic objective and linear (in)equality constraints. Convex QPs have unique global optima and efficient off-the-shelf solvers, which is why many ML problems work hard to be expressible in QP form — most notably the SVM primal.
Definition
A quadratic program has the form:
subject to:
where is symmetric, , , , and similarly for .
The QP is convex iff is positive semi-definite. Convex QPs are the well-behaved case: unique global optimum (assuming the feasible set is non-empty), no local minima to worry about, and a rich library of polynomial-time solvers.
Why QPs Matter
Three properties make convex QPs especially useful in ML:
- Global optima. Convexity guarantees that any local min is a global min. There is no “did we get stuck somewhere bad?” concern.
- Efficient solvers. Interior-point methods, active-set methods, and other algorithms solve QPs reliably in polynomial time. Implementations in CVXPY, CVXOPT, MOSEK, Gurobi, etc. accept the problem in standard form and return the optimum.
- Sparse active sets. At the optimum, only some of the inequality constraints are active (binding with equality). For SVMs, the active constraints are exactly the support vectors — a small fraction of the training set.
The SVM Primal Is a QP
The hard-margin SVM formulation is:
To match the standard QP form, let the unknown be . Then:
- Objective: where is the identity in the block and zero in the entry.
- Constraints: rewrites as for each — a linear inequality in .
is positive semi-definite (the block is identity, the entry is zero), the constraints are linear, so the SVM primal is a convex QP.
Convexity at a Glance
For :
| Objective | Optimisation difficulty | |
|---|---|---|
| Positive definite | Strictly convex | Easy — unique global min |
| Positive semi-definite | Convex (possibly with flat directions) | Easy — global min, may not be unique |
| Indefinite (mixed eigenvalues) | Saddle-shaped | NP-hard in general |
| Negative definite | Strictly concave | Easy — unique global max (so flip signs to use a min solver) |
The SVM primal lives squarely in the “easy” zone. By contrast, generic non-convex quadratic programs (e.g., with indefinite) are NP-hard, and ML problems that can be expressed as them — such as discrete combinatorial relaxations — usually need approximation algorithms.
Active Constraints and Support Vectors
At the optimum of a constrained problem, each inequality constraint is either:
- Active (binding): . The constraint is “pulling” on the solution.
- Inactive (slack): . The constraint isn’t a barrier; the solution would be the same without it.
For SVMs, the constraint is active when — i.e., when is a support vector. The number of active constraints equals the number of support vectors, which is typically much smaller than . This sparsity is what makes the SVM solution structurally informative.
The Karush-Kuhn-Tucker (KKT) conditions characterise optimality of constrained QPs and are the formal vehicle for stating “support vectors are exactly the points with active constraints.” Treated in week 4 alongside the SVM dual.
Other QPs in ML
QPs appear in several other places besides SVMs:
- Lasso (linear regression with regularisation) is a QP after introducing auxiliary variables.
- Portfolio optimisation in quantitative finance.
- Constrained least-squares problems with bounds or equality constraints.
- Trust-region subproblems inside many nonlinear optimisers.
- Quadratic discriminant analysis (some variants).
The pattern is general: any time you have a smooth quadratic objective and constraints describable by half-spaces, you have a QP.
Related
- support-vector-machine — primary user of QP in this module
- convex-function — convexity of the objective is what makes the QP tractable
- newton-raphson-method — Newton-Raphson on a quadratic objective converges in one step (related but different setting: unconstrained)
Active Recall
For a quadratic program subject to , what condition on guarantees a unique global minimum (assuming feasibility)?
must be positive definite (all eigenvalues strictly positive). Positive semi-definite (some eigenvalues zero) gives convexity and so a global min, but the optimum can lie on a flat ridge — the value is unique but the location may not be. The SVM primal has that is positive semi-definite (zero in the direction), so the optimum value of is unique, though strictly speaking the proof of uniqueness of relies on the constraints fixing .
The hard-margin SVM primal becomes infeasible when the data is not linearly separable — there is no satisfying all the constraints. Why doesn't this break the QP formulation, and what's the standard fix?
The QP machinery doesn’t break — it correctly reports infeasibility, which is a meaningful result. The fix is soft-margin SVM: introduce non-negative slack variables , relax the constraint to , and add a penalty to the objective. The result is still a convex QP — now in — that always has a feasible solution. trades margin width against misclassification cost.
Why are off-the-shelf QP solvers preferred over rolling your own gradient descent for the SVM problem?
Three reasons. First, the SVM problem has constraints, and gradient descent has no built-in mechanism for handling them — you’d need projection or penalty methods that introduce their own approximations and tuning. Second, QP solvers exploit problem structure (sparsity, convexity, KKT conditions) to converge much faster than gradient descent, often in tens rather than thousands of iterations. Third, they handle numerical conditioning issues that would require manual care in a custom implementation. For ML researchers, “this reduces to a QP” is essentially a closed problem — pass it to CVXPY and move on.