Sentiment analysis is the text classification task of labelling a document as positive, negative, or neutral — and the canonical worked example for Naive Bayes text classification.

Task Definition

Input: a document (product review, tweet, movie review, earnings call transcript). Output: a sentiment label — typically positive/negative, sometimes with neutral or a 1–5 star scale.

Sentiment is the prototypical example in the course because it exposes every weakness of bag-of-words classification — and the mitigations motivate several techniques (binary NB, negation handling, lexicon features).

Where Sentiment Fits: Scherer’s Typology of Affective States

Sentiment analysis is the detection of attitudes, which is one of five categories in Scherer’s typology of affective states. The course focuses narrowly on attitudes; the other categories are mentioned to explain what sentiment analysis is not.

Affective stateDescriptionExamples
EmotionBrief, organically synchronized evaluation of a major eventangry, sad, joyful, fearful, ashamed, proud, elated
MoodDiffuse, non-caused, low-intensity, long-duration change in subjective feelingcheerful, gloomy, irritable, listless, depressed, buoyant
Interpersonal stanceAffective stance toward another person in a specific interactionfriendly, flirtatious, distant, cold, warm, supportive, contemptuous
AttitudesEnduring, affectively coloured beliefs or dispositions toward objects or personsliking, loving, hating, valuing, desiring
Personality traitsStable personality dispositions and typical behaviour tendenciesnervous, anxious, reckless, morose, hostile, jealous

The simple task this chapter focuses on is: “is the attitude of this text positive or negative?” General affect classification — emotion detection, mood tracking, stance analysis — is a richer and harder problem covered in later chapters.

Worked Example: Small Sentiment Classifier

From the slides. Training set (5 documents):

ClassDocument
just plain boring
entirely predictable and lacks energy
no surprises and very few laughs
+very powerful
+the most fun film of the summer

Test: “predictable with no fun”.

Prior: , . Drop with (it’s not in the training vocabulary; see handling unknown words).

With Laplace smoothing and vocabulary size 20:

And similarly for no and fun. Final scores:

Negative wins. The test document gets labelled negative — even though fun points positive, the two negative-leaning words (predictable, no) combined with the stronger negative prior dominate.

Why Bag of Words Struggles with Sentiment

Occurrence beats frequency. Seeing the word fantastic once tells you a lot about sentiment; seeing it five times tells you little more. Raw counts weight repeated words too heavily. This is why binary multinomial NB — clipping counts at 1 per document — often beats plain multinomial NB on sentiment.

Negation flips polarity but not bag identity. “I like this movie” and “I don’t like this movie” are almost identical bags; the word don't is the only difference, and treating it as a single independent feature can’t capture that it inverts the polarity of like. Mitigations:

  • NOT_ prefixing (Das & Chen 2001; Pang, Lee, Vaithyanathan 2002): add NOT_ to every word between a negation and the next punctuation. “didn’t like this movie, but I” becomes “didn’t NOT_like NOT_this NOT_movie but I”. The model learns that NOT_like is a negative-class feature even though like is positive.

Irony, sarcasm, domain shift — bag-of-words has no hope for these. Modern work handles them with contextual embeddings; NB is a baseline that gets you most of the way on straightforward reviews.

Handling Negation

The baseline method (simple and effective for NB):

didn't like this movie , but I
↓
didn't NOT_like NOT_this NOT_movie but I

Rule: after a negation word (not, n't, no, never), prefix NOT_ to every subsequent word until the next punctuation mark. Treat NOT_X as a new vocabulary entry during training. This gives the classifier a chance to learn that NOT_like and NOT_good are negative-class features.

Approximate and over-aggressive in complex sentences, but it captures most of the signal for the same cost as tokenization.

Sentiment Lexicons

When labelled training data is scarce, pre-built word lists with polarity labels (lexicons) can supply extra signal.

MPQA Subjectivity Cues Lexicon

  • Source: Wilson, Wiebe, Hoffmann (2005); Riloff & Wiebe (2003)
  • URL: https://mpqa.cs.pitt.edu/lexicons/subj_lexicon/
  • Size: 6,885 words from 8,221 lemmas, annotated for intensity (strong/weak)
  • Breakdown: 2,718 positive, 4,912 negative
  • Positive examples: admirable, beautiful, confident, dazzling, ecstatic, favor, glee, great
  • Negative examples: awful, bad, bias, catastrophe, cheat, deny, envious, foul, harsh, hate

The General Inquirer

  • Source: Stone, Dunphy, Smith, Ogilvie (1966)
  • URL: http://www.wjh.harvard.edu/~inquirer
  • Categories: Positiv (1915 words), Negativ (2291 words); Strong/Weak; Active/Passive; Overstated/Understated; Pleasure, Pain, Virtue, Vice, Motivation, Cognitive Orientation
  • Free for research use.

How to Use Lexicons in Classification

As a feature: add “token occurs in positive lexicon” and “token occurs in negative lexicon” as two extra features that increment every time a matching word appears. Now all positive words (good, great, beautiful, wonderful, …) count as a single dense feature, and similarly for negative.

Using just these two lexicon features is worse than using all the word features — individual words carry more information. But lexicon features help in two specific cases:

  1. Sparse training data — when you have few labelled examples, each individual word has a noisy likelihood estimate; aggregating across the lexicon gives a more stable signal.
  2. Domain shift — when test data is unlike training data, individual word counts may not transfer, but the positive-lexicon aggregate still fires on seen and unseen positive words alike.

The lexicon contains only isolated word labels — it does not provide sentence-level examples, so you can’t use it to augment labelled training data directly. Use it as a feature source, not as extra supervision.

Active Recall