A function is convex if any line segment between two points on its graph lies above (or on) the graph. Convexity is the property that makes optimisation tractable — every local minimum is a global minimum.

Definition

A function is convex if, for all in its domain and :

Geometrically: the chord connecting any two points on the graph never dips below the graph.

Strict convexity replaces with (for and ), ruling out flat regions.

A concave function is just the negative of a convex one (chord lies below the graph). Maximising a concave function is the same problem as minimising a convex one.

Why Convexity Matters

The key consequence: for a convex function, every local minimum is a global minimum. If gradient descent converges to a critical point (), that point is the global optimum — no worry about being trapped in a worse local minimum.

For strictly convex functions, the global minimum is also unique (no flat plateaus or multiple equally-good optima).

Tests for Convexity

For univariate, twice-differentiable : is convex iff everywhere. Strictly convex iff everywhere.

For multivariate, twice-differentiable : is convex iff its Hessian is positive semi-definite at every point ( for all ). Strictly convex iff is positive definite (strict inequality for ).

Subgradient test (for non-differentiable convex functions): any subgradient — a linear function that touches the graph at a point and lies below it everywhere — exists at every point in the interior of the domain. If is a subgradient at , then is a global minimum.

Examples

FunctionConvex?Notes
Yes (strictly)
Yes (strictly)
Yes (strictly) on
YesConvex but not differentiable at 0; subgradient exists
Yes (strictly)Bowl
NoCurvature flips sign

Convexity in Machine Learning

The optimisation problems we want to solve are convex if both the loss and any regularisers are convex (and combined with non-negative weights). Convex problems include:

  • Linear regression with squared error.
  • Logistic regression with cross-entropy loss — strictly convex in .
  • Support Vector Machines with hinge loss + L2 regularisation.
  • Lasso (linear regression + L1 regularisation).

Non-convex problems include neural networks (the network’s output is a non-convex function of its weights), most clustering objectives (K-means is non-convex), and many latent-variable models.

When the problem is convex, gradient descent and Newton-Raphson are guaranteed to find the global optimum (under mild conditions). When it is not, optimisation becomes a heuristic search — gradient descent finds some local minimum, and we hope it’s a good one.

Operations That Preserve Convexity

  • Sum of convex functions is convex.
  • Maximum of convex functions is convex (e.g., — the ReLU).
  • Composition with an affine function: is convex if is.
  • Non-negative scaling: is convex for .

These rules let you build complex convex losses from simple convex pieces.

Active Recall