sigmoid-function-ml

A smooth, S-shaped function that maps any real number to the interval $(0, 1)$ , serving as the bridge between unbounded linear scores and valid probabilities.

Definition

The sigmoid (logistic) function is defined as:

$σ (z) = \frac{1}{1 + e ^{- z}}$

Equivalently: $σ (z) = \frac{e ^{z}}{1 + e ^{z}}$ .

Key Properties

Property	Statement
Range	$σ : R \to (0, 1)$
Midpoint	$σ (0) = 0.5$
Symmetry	$1 - σ (z) = σ (- z)$
Monotonicity	Strictly increasing for all $z$
Limits	$lim_{z \to + \infty} σ (z) = 1$ , $lim_{z \to - \infty} σ (z) = 0$
Derivative	$σ^{'} (z) = σ (z) (1 - σ (z))$

The symmetry property is particularly useful. In logistic-regression, it means:

$P (y = 0 ∣ x) = 1 - σ (w^{⊤} x) = σ (- w^{⊤} x)$

so both class probabilities can be expressed through the same function.

Role in Logistic Regression

In logistic-regression, the linear combination $w^{⊤} x$ can be any real number, but we need a probability in $[0, 1]$ . The sigmoid provides exactly this mapping:

$P (y = 1 ∣ x) = σ (w^{⊤} x)$

The sigmoid is the inverse of the logit function: if $logit (p) = ln \frac{p}{1 - p} = z$ , then $p = σ (z)$ .

Shape and Behaviour

The S-shape means the function is steepest around $z = 0$ (where it equals $0.5$ ) and flattens out at the extremes — saturating toward 0 and 1. Practically:

$∣ z ∣ > 5$ : $σ (z)$ is effectively 0 or 1.
$∣ z ∣ < 1$ : $σ (z)$ is roughly linear, centred on $0.5$ .

This saturation is important: points far from the decision boundary (large $∣ w^{⊤} x ∣$ ) get near-certain probabilities, while points near the boundary get probabilities close to $0.5$ .

logistic-regression — primary user of the sigmoid in this module
decision boundary — the locus where $σ (w^{⊤} x) = 0.5$

Active Recall

Prove that $1 - σ (z) = σ (- z)$ .

$1 - σ (z) = 1 - \frac{1}{1 + e ^{- z}} = \frac{1 + e ^{- z} - 1}{1 + e ^{- z}} = \frac{e ^{- z}}{1 + e ^{- z}}$ . Multiply numerator and denominator by $e^{z}$ : $= \frac{1}{e ^{z} + 1} = \frac{1}{1 + e ^{- (- z)}} = σ (- z)$ .

If $w^{⊤} x = 0$ , what probability does the sigmoid assign, and what does this mean for classification?

$σ (0) = 1/ (1 + e^{0}) = 1/2 = 0.5$ . The model is maximally uncertain — it assigns equal probability to both classes. This point lies exactly on the decision boundary.

Why does the sigmoid saturate (flatten) for large $∣ z ∣$ , and what is the practical consequence for classification confidence?

As $z \to + \infty$ , $e^{- z} \to 0$ , so $σ (z) \to 1$ ; as $z \to - \infty$ , $e^{- z} \to \infty$ , so $σ (z) \to 0$ . The function approaches but never reaches 0 or 1. Practically, points with large $∣ w^{⊤} x ∣$ — far from the decision boundary — get near-certain class probabilities, while points near the boundary (small $∣ w^{⊤} x ∣$ ) stay close to $0.5$ .

Course Notes

Explorer

sigmoid-function-ml

Definition

Key Properties

Role in Logistic Regression

Shape and Behaviour

Active Recall

Graph View

Table of Contents

Backlinks

Course Notes

Explorer

sigmoid-function-ml

Definition

Key Properties

Role in Logistic Regression

Shape and Behaviour

Related

Active Recall

Graph View

Table of Contents

Backlinks