A function is a valid kernel iff it corresponds to an inner product in some feature space . Mercer’s condition is the test: for any finite set of points, the Gram matrix must be symmetric and positive semidefinite.

Statement

Given any finite set of points , build the Gram (kernel) matrix:

Mercer’s condition requires that for every such finite set:

  1. Symmetric: .
  2. Positive semidefinite: for all .

If both hold, corresponds to the inner product for some feature map (which Mercer’s theorem constructs but doesn’t require us to compute).

Why These Conditions

Inner products must be symmetric () and produce non-negative self-products (). Mercer’s condition demands the kernel inherit those properties at every scale — for any sample of points, the implied geometry is consistent with being an inner product in a Hilbert space.

Algorithmically, positive semidefiniteness is what keeps the SVM dual a convex QP. If has a negative eigenvalue, the dual objective is no longer concave and the optimisation can wander to nonsense.

Kernel Composition Rules

Verifying Mercer’s condition from scratch is often hard. Instead, build kernels by composition: starting from validated kernels , the following are also valid:

RuleDescription
Scaling by a constant
Multiplying by any function on each side
Polynomial of with non-negative coefficients
Exponential of a kernel
Sum of kernels
Product of kernels

The Gaussian kernel can be derived as a chain of these rules starting from the linear kernel (see gaussian-kernel for the proof).

Active Recall

  • kernel-trick — Mercer’s condition is the gate that determines which functions can be used as kernels
  • gaussian-kernel — proof of validity uses the composition rules
  • polynomial-kernel — derived from a polynomial of the linear kernel (rule 3)