Changelog
Append-only log of wiki ingests.
2026-04-20
- Initialized vault structure. Created raw/ and wiki/ folders, index.md, and CHANGELOG.md.
- Ingested week 01. Sources: w01-slides.pdf (109 pp), w01-l01-transcript.txt, w01-l02-transcript.txt, w01-l03-transcript.txt, w01-lab-01.pdf, w01-lab-01-solutions.pdf, w01-lab-02-solutions.pdf, w01-lab-02-solutions-pt2.pdf.
- Created:
wiki/weeks/week-01.md - Created:
wiki/concepts/regular-expressions.md - Created:
wiki/concepts/eliza.md - Created:
wiki/concepts/type-and-token.md - Created:
wiki/concepts/corpora.md - Created:
wiki/concepts/tokenization.md - Created:
wiki/concepts/subword-tokenization.md - Created:
wiki/concepts/byte-pair-encoding.md - Created:
wiki/concepts/text-normalization.md - Created:
wiki/concepts/edit-distance.md - Updated:
wiki/index.md(week-01 gloss and concept links)
- Created:
- Ingested week 02. Sources: w02-slides.pdf (75 pp), 3 lecture transcripts (lectures 4–6), NLP_Lab_2-1-1.pdf, NLP_Lab_2_2.pdf, NLP_Worksheet_2_Solutions-1-1.pdf, NLP_Worksheet_2_1_Solutions-1.pdf.
- Created:
wiki/weeks/week-02.md - Created:
wiki/concepts/n-gram-language-models.md - Created:
wiki/concepts/perplexity.md - Created:
wiki/concepts/evaluation-methodology.md - Created:
wiki/concepts/smoothing.md - Created:
wiki/concepts/interpolation-and-backoff.md - Updated:
wiki/index.md(week-02 gloss and concept links) - Renamed 7 raw files to follow naming convention (w02-l04-transcript.txt, etc.)
- Created:
2026-04-25
-
Consolidated week 4 from updated slide deck. Cross-referenced
raw/week-04/w04-slides-updated.pdf(118 pp) against existing notes. Changes:tf-idf.md: corrected tf formula to match course convention (was from the original deck; the updated deck changed this). Updated the Shakespeare tf-idf worked table with recomputed values. Rewrote the walkthrough recall question to use the new formula.pmi.md: expanded PPMI section significantly — added full matrix-based PPMI worked computation (cherry/strawberry/digital/information with count(w), count(c), joint and marginal probability tables, full PPMI output matrix), plus a new Weighting PMI section covering the context-probability smoothing trick (same that word2vec’s negative sampling uses) and add-one smoothing. Three reasons now given for why negative PMI is problematic (reliability on small corpora, human calibration).- Added 5 new MCQ-derived active-recall Q&As across
lexical-semantics.md,vector-semantics.md,cosine-similarity.md,tf-idf.md,pmi.md,word2vec.md— each with the full MCQ statement and reasoning for which options are correct. week-04.md: updated tf-idf formula in the narrative, expanded the PMI paragraph to mention PPMI’s reliability rationale and the smoothing, added 2 new MCQ recall callouts (TF-IDF = 0.70, PPMI = 4.0).
-
Ingested week 04. Sources: w04-slides.pdf (98 pp), 2 lecture transcripts (lectures 10 and 12; lecture 11 transcript not available — covered tf-idf), NLP_Worksheet_4.pdf, NLP_Worksheet_4_Solutions-1.pdf.
- Created:
wiki/weeks/week-04.md - Created:
wiki/concepts/lexical-semantics.md - Created:
wiki/concepts/vector-semantics.md - Created:
wiki/concepts/tf-idf.md - Created:
wiki/concepts/pmi.md - Created:
wiki/concepts/cosine-similarity.md - Created:
wiki/concepts/word2vec.md - Updated:
wiki/index.md(week-04 gloss and concept links) - Renamed 4 raw files to follow naming convention (w04-l10-transcript.txt, w04-l12-transcript.txt, w04-worksheet.pdf, w04-worksheet-solutions.pdf).
- Created:
2026-04-24
- Ingested week 03. Sources: w03-slides.pdf (81 pp), 3 lecture transcripts (lectures 7–9), NLP_Lab_3.pdf, NLP_Lab_3_1.pdf, NLP_Worksheet_3_Solutions-1.pdf, NLP_Worksheet_3_1_Solutions-1.pdf.
- Created:
wiki/weeks/week-03.md - Created:
wiki/concepts/text-classification.md - Created:
wiki/concepts/bag-of-words.md - Created:
wiki/concepts/bayes-rule.md - Created:
wiki/concepts/naive-bayes.md - Created:
wiki/concepts/sentiment-analysis.md - Created:
wiki/concepts/classification-evaluation.md - Created:
wiki/concepts/harms-in-classification.md - Updated:
wiki/index.md(week-03 gloss and concept links) - Renamed 7 raw files to follow naming convention (w03-l07-transcript.txt, w03-lab-01.pdf, w03-lab-02-solutions.pdf, etc.)
- Created:
- Extracted cross-cutting concept
maximum-likelihood-estimation. MLE was referenced byn-gram-language-models,naive-bayes, andsmoothingbut had no standalone page. Created one covering the principle, the categorical closed-form derivation sketch, the zero-probability problem, and the MLE/MAP/Bayesian spectrum. Retrofitted links from the three referring pages. Added to the week-02 concept line inwiki/index.md(MLE is first taught there). - Consolidated week 3 from updated slide deck. Cross-referenced
raw/week-03/3 NB and Sentiment Classification - Consolidation-1.pdf(117 pp) against existing notes. Added missing content:sentiment-analysis.md: Scherer’s Typology of Affective States (emotion / mood / interpersonal stance / attitudes / personality traits) — frames sentiment as the detection of attitudes specifically.classification-evaluation.md: expanded bootstrap section into full Statistical Significance Testing treatment — effect size , null/alternative hypotheses, p-value definition, parametric vs non-parametric tests, full paired bootstrap algorithm (after Berg-Kirkpatrick et al., 2012), and the shift correction. Added new Devsets and -fold Cross-Validation section.harms-in-classification.md: added Model Cards (Mitchell et al., 2019) — five-field documentation standard that makes biases visible without fixing them.week-03.md: tightened statistical significance and model-cards threads to match the expanded concept pages.- Added 5 new active-recall Q&As across the three updated concept pages (devset vs cross-validation, null/alternative hypotheses + p-value, shift, paired vs unpaired tests, model card structure).