design-matrix

An $N \times (M + 1)$ matrix $Φ$ whose row $i$ is the basis-function vector $ϕ (x_{i})^{⊤}$ evaluated at training input $x_{i}$ . Compactly encodes “all training inputs through all basis functions” so that $\hat{y} = Φ w$ is a single matrix-vector product, and the OLS solution becomes $(Φ^{⊤} Φ)^{- 1} Φ^{⊤} y$ .

Construction

Given training inputs ${x_{i}}_{i = 1}^{N}$ and basis functions ${ϕ_{j}}_{j = 0}^{M}$ :

$Φ = ϕ_{0} (x_{1}) ϕ_{0} (x_{2}) ⋮ ϕ_{0} (x_{N}) ϕ_{1} (x_{1}) ϕ_{1} (x_{2}) ⋮ ϕ_{1} (x_{N}) \dots \dots ⋱ \dots ϕ_{M} (x_{1}) ϕ_{M} (x_{2}) ⋮ ϕ_{M} (x_{N})$

By convention $ϕ_{0} (x) = 1$ , so the first column is all ones — pairing with the intercept weight $w_{0}$ .

Dimensions. $Φ$ is $N \times (M + 1)$ :

$N$ = number of training examples (rows)
$M + 1$ = number of basis functions including the intercept (columns)

Why It’s Useful

The model $\overset{y}{^}_{i} = w^{⊤} ϕ (x_{i})$ for all $i$ becomes a single matrix product:

$\hat{y} = Φ w$

The OLS objective $\sum_{i} (y_{i} - \overset{y}{^}_{i})^{2}$ becomes $∥ y - Φ w ∥^{2}$ , which differentiates to the normal equation in one line.

Whatever the basis functions are — polynomial, Gaussian, sigmoidal, custom — the math is the same. The design matrix decouples what features you use from how you fit.

Examples

Polynomial basis (degree $M$ , single input $x$ ):

$Φ = 11 ⋮ 1 x_{1} x_{2} x_{N} x_{1}^{2} x_{2}^{2} x_{N}^{2} \dots \dots \dots x_{1}^{M} x_{2}^{M} ⋮ x_{N}^{M}$

This is a Vandermonde matrix — a recurring object in interpolation theory.

Gaussian RBF basis ( $M$ centres at $μ_{1}, \dots, μ_{M}$ , width $s$ ):

$Φ_{i, j} = {1 exp (- \frac{( x _{i} - μ _{j} ) ^{2}}{2 s ^{2}}) if j = 0 otherwise$

Multi-input ( $x \in R^{d}$ , plain linear basis):

$Φ = 11 ⋮ 1 x_{1}^{(1)} x_{1}^{(2)} x_{1}^{(N)} x_{2}^{(1)} x_{2}^{(2)} x_{2}^{(N)} \dots \dots \dots x_{d}^{(1)} x_{d}^{(2)} ⋮ x_{d}^{(N)}$

Practical Notes

The first column is always ones. This is the dummy basis function $ϕ_{0} \equiv 1$ that pairs with the intercept $w_{0}$ . Forgetting it is a common bug — fits are forced through the origin.
Standardise inputs first. Polynomial and Gaussian bases on raw inputs can have wildly different column magnitudes, making $Φ^{⊤} Φ$ ill-conditioned.
Tall vs wide. $Φ$ is “tall” when $N > M$ (more examples than parameters) — the standard setting where OLS works. “Wide” ( $M > N$ ) makes $Φ^{⊤} Φ$ rank-deficient and OLS degenerates.

Connections

ordinary-least-squares — uses $Φ$ in the normal equation $w = (Φ^{⊤} Φ)^{- 1} Φ^{⊤} y$ .
linear-regression — the model whose predictions are $\hat{y} = Φ w$ .
non-linear-transformation — the basis-expansion idea; the design matrix is the “ $ϕ$ -space training set”.

Course Notes

Explorer

design-matrix

Construction

Why It’s Useful

Examples

Practical Notes

Connections

Graph View

Table of Contents

Backlinks