A learning paradigm where a model is trained on input–output pairs and evaluated on its ability to predict outputs for unseen inputs.

Definition

Supervised learning is the task of learning a function from a set of training examples , where each is an input and is the corresponding output (label). The goal is for to approximate an unknown target function well enough to generalize to new data drawn from the same distribution.

The Learning Framework

Every supervised learning problem has five components:

  1. Unknown target function — the true relationship between inputs and outputs. We never see directly; we only observe noisy samples from it.
  2. Training data — a finite set of pairs drawn i.i.d. from an unknown joint distribution .
  3. Hypothesis set — the family of candidate functions the algorithm is allowed to consider (e.g., all linear functions, all polynomials of degree ). This is the assumption we bring to the table about what might look like.
  4. Learning algorithm — the procedure that searches for the hypothesis that best fits the training data.
  5. Final hypothesis — the output of ; our best approximation of .

The choice of is critical. Too small and may lie outside it; too large and the algorithm may overfit. This tension runs through the entire module.

Input and Output Spaces

The input space is typically -dimensional. Each dimension (feature) can be:

  • Numeric — e.g., age, salary (already real-valued).
  • Ordinal — e.g., expertise (ordered categories, often mapped to numbers like ).
  • Categorical — e.g., car brand (no natural ordering). Typically encoded via one-hot encoding: each category becomes a binary dimension.

The output space determines the task type:

  • Regression: (predict a continuous value, e.g., house price).
  • Classification: is a finite set of categories. Binary classification () is the most common; multi-class () extends naturally.

Noisy Targets

In practice the training data rarely comes from a deterministic . Instead, is drawn from a conditional distribution , meaning the same input can map to different outputs. This noise is not a bug — it reflects genuine uncertainty in the real world. The target distribution subsumes the deterministic case (where is a point mass on ).

Other Learning Paradigms

Supervised learning is one of three major paradigms:

ParadigmDataGoal
SupervisedLearn
Unsupervised (no labels)Find structure (clusters, density)
ReinforcementStates, actions, rewardsLearn policy that maximizes cumulative reward

Active Recall