Text Analysis

For my class presentation I chose to focus on the topic of Text Analysis and Visualisation with my fellow student Owen. While Owen tackled the visualisation aspect of the week, I focused on the Text Analysis. Of course there’s a reason the two topics were paired together in the first place and therefore our presentations had their fair share of overlap.

Though much-spoken about in recent years, text analysis has actually existed in some form for centuries. For example as early as the thirteenth century Dominican monks produced a concordance of the accepted Bible of the time in order to study the position and frequency of common phrases in the bible. In 1887, T. C. Mendenhall studied word-length in Shakespeare.

Most directly-related to current text analysis practices is the life’s work of Father Roberta Busa, who teamed up with IBM in 1949 to further his analysis of the works of St Thomas Aquinas.

As progress and information marched on in the later 20th century, one question presented itself over and over as more and more corpuses were being digitised: what next? Of course there was a lasting, searchable archive but what else could be done with these newly-digital corpuses? Text analysis seems to be the answer. And though people with a passing familiarity with the discipline may regard it as something of a party trick, there’s actually a lot more going on than meets the eye.

One of the first things I noticed when delving into the topic of text analysis was the diversity of work already being produced in the field. One’s first associations with the topic would likely be searching Google Books for word trends, a controversial subject in itself as I soon found out, or making a word frequency chart for some cursory study of an English text. However studies of far greater depth and ingenuity are popping up every day in the field of text analysis.

One of the examples of the lengths text analysis is already being taken to is the study Canadian researchers ran on the works of British novelist Agatha Christie which posited that her shrinking vocabulary towards the end of her life may be linked to her possibly suffering from Alzheimer’s disease. Ian Lancashire, Professor of English at the University of Toronto, determined that in Christie’s 73rd novel her vocabulary dropped by 20% and her use of indefinite words spiked.

When combined with other research, as great text analysis usually is, these findings make a compelling argument that Christie was beginning to develop Alzheimer at the time. It gave Christie fans a whole new level upon which to appreciate the novel, entitled Elephants Can Remember and concerning a female novelist who is losing her memory.

This example is especially good as it not only serves as an example for those in the field of just how far this technology can be pushed but also provides a great means of showing the potential uses of the field to members of the public. Since I began getting involved with the world of DH I have found myself having to explain it to people time and again just what it means. Being able to use a project like this as a touchstone for text analysis is fantastic, not just for the whole world of possibilities it opens up, but for showing what kind of inventive things are going on right now.

Most casual dalliances with text analysis among my own peer group seem to have been based around one of these websites which tells you which famous author you most write like (although for a while there it looked like it was telling everyone they wrote like David Foster Wallace which would have been a pretty funny joke in itself). Though an easy-to-use and somewhat-engaging idea, this site is easily gamed and paints text analysis as a very limited field.

I was glad then to see that an even more simplistic but far, far more powerful and striking form of text analysis is beginning to gain popularity. Nohomophobes.com presents a live-updating total of all the homophobic slurs being used on twitter on this day, in the last week or since the site began. It also lets you scroll down and view the individual tweets. This is not only an accessible and important form of text analysis. By combining close reading and the “distance reading” normally associated with text analysis it also sidesteps much of the criticism of the discipline, ie that it “can’t uncover the true scope and nature of” the corpus it is being used in conjunction with.

Even the most enthusiastic of responses to a text analysis project always come with a long list of caveats. There is nothing wrong with this per say but it would be nice to read one piece about the discipline that acknowledges almost everyone in the field is painfully aware of the dangers it poses to context and the pressing need there is to only make use of it in conjunction with more traditional study.

Though progress is never easy, especially when the danger posed by text analysis is being hailed in some circles as the end of traditional English scholarship, over the course of my research into the topic I found more than ample evidence that it was being carefully and cautiously used to answer new and exciting kinds of research questions that haven’t been feasible until now.

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>