Some are born great, some achieve greatness, and others have greatness thrust upon them”, quoth William Shakespeare. Or did he? Some people question whether Shakespeare really wrote the works that bear his name, or whether he even existed at all. They speculate that Shakespeare was a pseudonym for another writer, or a group of writers. Proposed candidates for the real Shakespeare include other famous playwrights, politicians and even some prominent women. Could it be true that the greatest writer in the English language was as fictional as his plays?
Most Shakespeare scholars dismiss these theories based on historical and biographical evidence. But there is another way to test whether Shakespeare’s famous lines were actually written by someone else. Linguistics, the study of language, can tell us a great deal about the way we speak and write by examining syntax, grammar, semantics and vocabulary. And in the late 1800s, a Polish philosopher named Wincenty Lutosławski formalized a method known as stylometry, applying this knowledge to investigate questions of literary authorship.
So how does stylometry work? The idea is that each writer’s style has certain characteristics that remain fairly uniform among individual works. Examples of characteristics include average sentence length, the arrangement of words, and even the number of occurrences of a particular word. Let’s look at use of the word thee and visualize it as a dimension, or axis. Each of Shakespeare’s works can be placed on that axis, like a data point, based on the number of occurrences of that word.
In statistics, the tightness of these points gives us what is known as the variance, an expected range for our data. But, this is only a single characteristic in a very high-dimensional space. With a clustering tool called Principal Component Analysis, we can reduce the multidimensional space into simple principal components that collectively measure the variance in Shakespeare’s works.
We can then test the works of our candidates against those principal components. For example, if enough works of Francis Bacon fall within the Shakespearean variance, that would be pretty strong evidence that Francis Bacon and Shakespeare are actually the same person. What did the results show? Well, the stylometrists who carried this out have concluded that Shakespeare is none other than Shakespeare. The Bard is the Bard. The pretender’s works just don’t match up with Shakespeare’s signature style. However, our intrepid statisticians did find some compelling evidence of collaborations.
For instance, one recent study concluded that Shakespeare worked with playwright Christopher Marlowe on “Henry VI,” parts one and two. Shakespeare’s identity is only one of the many problems stylometry can resolve. It can help us determine when a work was written, whether an ancient text is a forgery, whether a student has committed plagiarism, or if that email you just received is of a high priority or spam.
And does the timeless poetry of Shakespeare’s lines just boil down to numbers and statistics? Not quite. Stylometric analysis may reveal what makes Shakespeare’s works structurally distinct, but it cannot capture the beauty of the sentiments and emotions they express, or why they affect us the way they do. At least, not yet.