NLP Module Wiki
Map of the module’s weekly themes and key concepts.
Weeks
- week 01 — Text as Data: regex, tokenization, BPE, text normalization, edit distance · concepts: regular expressions, eliza, type and token, corpora, tokenization, subword tokenization, byte pair encoding, text normalization, edit distance
- week 02 — Counting Words: n-gram LMs, perplexity, smoothing, interpolation & backoff · concepts: n gram language models, maximum likelihood estimation, perplexity, evaluation methodology, smoothing, interpolation and backoff
- week 03 — Text Classification: Naive Bayes, sentiment analysis, precision/recall/F1, harms in classification · concepts: text classification, bag of words, bayes rule, naive bayes, sentiment analysis, classification evaluation, harms in classification
- week 04 — Vector Semantics: distributional hypothesis, tf-idf, PMI, cosine, word2vec/SGNS, analogies, embedding bias · concepts: lexical semantics, vector semantics, cosine similarity, tf idf, pmi, word2vec
- week 05 —
- week 06 —
- week 07 —
- week 08 —
- week 09 —
- week 10 —
- week 11 —
- week 12 —
Cross-Week Topics
Emerges once patterns crystallize.
Exam Prep
- flashcards
- past papers
- revision guide