Chilton, Jiang, and Posner Determine Who Uses a Bigger Vocabulary, Jay Z or Scalia?
What kind of writers are the justices of the Supreme Court? Scalia is reputed for writing trenchantly, Roberts and Kagan with wit, and Kennedy in garrulous and sentimental style. But what do you find if you rigorously analyze the complexity of their vocabularies?
New methods for rigorous textual analysis make it possible to answer this question. In his blog, Matt Daniels, a data scientist, compared the lyrics of 85 rap artists using a method called token analysis. A computer program counts the number of unique words that appear in a text of a given size. For example, the sentence, “The dog chased the cat and then the cat chased the dog,” has 12 words, six of which are unique. Writers who use more unique words write more complexly, with a larger vocabulary. Daniels’ method counts slight variations as unique words—for example, pimps, pimp, pimping, and pimpin’. But a rapper who used pimp over and over would receive a lower score than one who alternated pimp with, say, whoremonger.
Token analysis can yield surprising insights. It turns out that the relatively unknown rappers Dr. Octagon and CunninLynguists use more complex vocabularies than Shakespeare.* And as in so many areas of life, there is a trade-off between artistic merit and commercial success. Daniels quotes Jay Z, who places below the mean, explaining that “I dumbed down for my audience to double my dollars.”