Changelog
Append-only log of wiki ingests.
2026-04-20
- Initialized vault structure. Created raw/ and wiki/ folders, index.md, and CHANGELOG.md.
- Ingested week 1. Created:
weeks/week-01.md,concepts/supervised-learning.md,concepts/logistic-regression.md,concepts/sigmoid-function.md,concepts/generalization.md,concepts/decision-boundary.md,concepts/discriminative-vs-generative-models.md(stub). Updatedindex.md.
2026-04-26
- Ingested week 2. Created:
weeks/week-02.md,concepts/maximum-likelihood-estimation.md,concepts/cross-entropy-loss.md,concepts/gradient-descent.md,concepts/convex-function.md,concepts/taylor-polynomial.md,concepts/hessian-matrix.md,concepts/newton-raphson-method.md,concepts/iteratively-reweighted-least-squares.md. Updatedindex.md. - Enhanced
taylor-polynomial.md. Added “Why polynomials?” motivation, “Coefficients as Independent Derivative Controls” with the cascade explanation, and a “Convergence and Radius of Convergence” section with three new active-recall questions. - Removed empty
gradient-descent.mdstub at vault root. It was shadowing the populatedwiki/concepts/gradient-descent.mdin Obsidian’s wikilink resolution. - Ingested week 3. Created:
weeks/week-03.md,concepts/non-linear-transformation.md,concepts/support-vector-machine.md,concepts/margin.md,concepts/quadratic-programming.md. Updatedindex.md. - Cross-referenced Notion exports (
raw/p1.zip,raw/p2.zip). Enhancedgeneralization.mdwith the “Closeness vs Attainability” two-pillars framing and marble-jar metaphor; added Strengths and Limitations section tologistic-regression.md; added “MLE is the source of standard loss functions” table (Bernoulli → cross-entropy, Gaussian → MSE, etc.) tomaximum-likelihood-estimation.md; added the “linear in -space, non-linear in original” reframing tonon-linear-transformation.md. - Ingested week 4. Created:
weeks/week-04.md,concepts/lagrangian.md,concepts/kkt-conditions.md,concepts/kernel-trick.md,concepts/mercers-condition.md,concepts/gaussian-kernel.md,concepts/polynomial-kernel.md. Updatedindex.md. Covers SVM dualisation, KKT and complementary-slackness as the structural source of support-vector sparsity, the kernel trick, Mercer’s condition and composition rules, and the polynomial / Gaussian kernels.
2026-05-04
- Ingested week 11. Created:
weeks/week-11.md,concepts/validation.md,concepts/cross-validation.md,concepts/learning-principles.md. Updatedindex.md. Two-part week: (1) Validation as the second cure for overfitting (the first being regularisation from week 10). Hold out of size , train on the rest, evaluate on . is an unbiased estimate of with variance for binary classification (Bernoulli per-example error, ). The K trade-off: large → low variance but trained on too few examples; small → but noisy estimate. Rule of thumb . After model selection picks , retrain on full — . Validation generalisation bound: — only because is a small finite candidate list. Validation vs regularisation: both estimate the same overfit penalty equation , regularisation estimates the penalty term, validation estimates directly. Cross-validation removes the trade-off by using all data: LOOCV (, almost unbiased for , trainings — closed form for linear regression via hat matrix), V-fold ( practical default). (2) Three learning principles closing the module’s theory: Occam’s Razor — simpler hypothesis sets (small , small VC dimension) generalise better; “better” means lower , not aesthetic elegance. Sampling Bias — if , the i.i.d. assumption fails and the VC bound is silent; no algorithm rescues a model trained on the wrong distribution. Data Snooping — any test-set influence on any decision (preprocessing, feature engineering, hyperparameter selection) contaminates it; reported is then biased downward. Discipline: lock in the safe, evaluate once at the end. Lec 2’s recap of weeks 7–10 (probabilistic ML, VC vs bias-variance, OLS = MLE, MAP = ridge) was not duplicated — already covered by existing concept pages.
2026-05-04
- Generated flashcards for weeks 1–6. Created
wiki/exam-prep/flashcards/and added one Markdown file per week (week-01.mdthroughweek-06.md) following the Anki obsidian-to-anki callout format (> [!question]- ...). Cards are atomic, derived from week pages and concept pages, and grouped under##subtopic headings. Week 6 (revision) is a synthesis file with cross-week comparison cards rather than fresh-content cards. Total ~70 cards across the six weeks. - Documented flashcard-creation conventions in
.obsidian/CLAUDE.mdunder a new### Flashcardssection in the Core Operations area: output path, frontmatter shape, atomic-card rules, voice/notation conventions, target card count per week, and append-to-CHANGELOG step.
2026-05-04
- Ingested week 10. Created:
weeks/week-10.md,concepts/overfitting.md,concepts/regularization.md,concepts/lasso-regression.md. Updatedindex.md. Two-part week: (1) Overfitting — low but high . Six common causes (model complexity, data scarcity, too many epochs, no regularisation, high-variance features, poor data processing). The “two-learners” experiment: even when both learners know the target is degree-10, beats on test error if small — the right hypothesis class isn’t enough; need data to constrain it. Stochastic noise (label randomness ) and deterministic noise (target complexity beyond ) are operationally equivalent — both encourage overfitting. MLE is structurally unable to prevent this because it has no preference for simpler hypotheses. (2) Regularisation as the structural cure. Built up from hard constraint (), to looser constraint (sparsity: of ), to soft constraint . Lagrangian transforms constrained → unconstrained: augmented error , equivalent for paired . Closed-form solution — recovers ridge regression from week 8 by a different derivation route (constrained optimisation rather than Bayesian MAP). Effective VC dimension smaller than nominal because the algorithm only navigates within the ball. L1 (lasso) replaces the round constraint ball with a corner-bearing diamond, producing sparse solutions; corresponds to Laplace prior in Bayesian view. Choosing depends on , target complexity, — none observable, hence cross-validation (next week).
2026-05-04
- Ingested week 9. Created:
weeks/week-09.md,concepts/dichotomy.md,concepts/growth-function.md,concepts/break-point.md,concepts/vc-dimension.md,concepts/bias-variance-decomposition.md,concepts/learning-curve.md. Updatedindex.md. Two-part week: (1) Replacing in the generalisation bound with a finite quantity. Dichotomies count distinct labellings on specific inputs (). The growth function takes the max over inputs. A break point — smallest with — forces to grow polynomially: . The VC dimension equals the largest shatterable , equivalently break point minus one. For perceptron in : (number of free parameters). The VC bound contracts as for finite . Sample complexity: theory says , practice says . SVMs achieve generalisation by margin-restricting to fat hyperplanes: , independent of . (2) Bias–variance as the average-case complement to VC’s worst case. Decomposition for squared loss. Bias = how far average hypothesis is from ; variance = how much fluctuates around . Optimal model complexity depends on — the example shows constant beats line at but loses at . Learning curves visualise the decomposition; for linear regression, exact form , , gap decays as .
2026-04-28
- Ingested week 8. Created:
weeks/week-08.md,concepts/bayesian-linear-regression.md,concepts/ridge-regression.md,concepts/hoeffding-inequality.md,concepts/generalization-bound.md. Updatedindex.md. Two-part week: (1) Bayesian regression — Gaussian prior + Gaussian likelihood → closed-form Gaussian posterior; MAP estimate is exactly ridge regression, so L2 regularisation is the negative log of a Gaussian prior (regularisation = prior). (2) Learning theory — Hoeffding’s inequality bounds with probability ; union bound over hypotheses gives the generalization bound . The two opposing forces ( small for tight bound vs large for good fit) formalise the bias-variance trade-off. Sets up VC dimension via the dichotomy/effective- idea.
2026-04-27
- Ingested week 7. Created:
weeks/week-07.md,concepts/linear-regression.md,concepts/ordinary-least-squares.md,concepts/design-matrix.md,concepts/gaussian-distribution.md,concepts/bayes-law.md. Updatedindex.md. Covers the pivot from classification to regression, OLS via the normal equation , basis expansion (linear in parameters, non-linear in inputs), and the probabilistic interpretation that justifies squared error: under additive Gaussian noise, MLE for is identical to OLS. Frames this as the regression analog of week 2’s logistic-regression-as-MLE result — the noise model determines the loss. - Reorganised image assets. Created
raw/images/and renamed all 22 datedScreenshot 2026-04-26 at *.pm.pngfiles to descriptive content-based names (e.g.,svm-intro.png,kernel-trick-input-feature-space.png,soft-margin-c-effect.png). Updated all 22 references across the wiki. Obsidian’s![[name]]syntax resolves by basename so paths didn’t need changing — just filenames. - Added visuals to weekly synthesis pages. Embedded soft-margin motivation, slack-geometric, primal-comparison, and -effect figures across
week-05.md,slack-variables.md, andsoft-margin-svm.md. Cross-referenced the SVM primal→dual chain, kernel-trick input→feature-space, and Gaussian bell-curve intoweek-04.md(same images already used in concept pages — Obsidian shares a single asset across all references). - Added cross-week synthesis pages. Created
topics/optimization-algorithms.md(compares GD, Newton-Raphson/IRLS, SMO — when each applies, cost, role of curvature/constraints) andtopics/classification-approaches.md(compares LogReg, hard-margin SVM, soft-margin SVM — same hypothesis form, different fitting criteria, hinge vs cross-entropy as the structural source of SVM sparsity). Motivated by the week-6 revision lecture which presents these groupings as the module’s organising structure. Updatedindex.md. - Ingested week 5. Created:
weeks/week-05.md,concepts/soft-margin-svm.md,concepts/slack-variables.md,concepts/sequential-minimal-optimization.md. Updatedindex.md. Covers the soft-margin relaxation (slack variables, hyperparameter , the box constraint ), the three-way support-vector partition from KKT (non-SV / margin-SV / bound-SV), and the SMO decomposition algorithm — including why two multipliers is the minimum updateable subset, the analytic update rule, clipping to feasibility, and pair-selection heuristics. - Embedded slide screenshots. Added the SVM intro and non-linear-margin pictures to
support-vector-machine.md; the distance-to-hyperplane and max-margin pictures tomargin.md; the input/feature-space cartoon tokernel-trick.md; the bell-curve and decision-boundary plots togaussian-kernel.md; the SVM primal→dual derivation chain tolagrangian.md; the differential-curvature ellipses, difficult-topology plot, and steepest-descent paraboloid togradient-descent.md; the linear-classifier+sigmoid figure tologistic-regression.md; and the log-likelihood-to-cross-entropy derivation tocross-entropy-loss.md.