Contributed by Geremy Carnes, Lindenwood University
Written for the Cleveland Teaching Collaborative

Text analysis is one of the oldest forms of humanistic practice, but new digital tools can perform text analyses on a scale that would be impossible for a human being to achieve. These computationally-enabled forms of text analysis cannot replace traditional forms of practice, but they can complement them, revealing patterns across a single text or across vast corpora of texts that lead to new insights.

Spotlight Tool: NGram

One of the simplest yet most powerful ways to get started with computationally-enabled text analysis is with the Google NGram Viewer. The NGram Viewer is a free tool that allows you to plot the usage of particular words over time, across over 8 million books and 5 centuries. The sheer size of its corpus makes the NGram Viewer the best way to quickly explore language change over time in the classroom. 

Google NGram Viewer

Learning Outcomes

Using the Google NGram Viewer can be a component of many traditional text analysis assignments. It can also be used by itself in low-stakes exploratory assignments. Assignments involving the NGram Viewer support outcomes for many types of courses, especially literature and history courses:

  • Students reflect on the processes of language change through consideration of statistically-based visual evidence.
  • Students examine the relationship between language change and historical developments/events.
  • Students support their analysis of a text with evidence about historical word usage frequency.

Resources

To get started with the Google NGram Viewer, simply go to https://books.google.com/ngrams and start searching. Enter the words you want to graph (separated by commas) and the date range you want to examine, and the NGram Viewer will graph those words’ frequency within Google’s corpus over that time period. If you spend a half hour reading the help documentation, more advance searches and graphs become possible, such as searching for words used as a particular part of speech or adding the results for multiple words together in a single graph line.

The NGram Viewer does have some limitations you should be mindful of. The OCR (Optical Character Recognition) used to build Google’s corpus isn’t perfect, especially with texts from earlier centuries. (When running searches prior to 1800, be mindful of the fact that many long S’s have been recorded as F’s.) There are also valid concerns about the representativeness of the corpus’s contents. However, while these limitations are important to keep in mind when using the NGram Viewer to make scholarly arguments, they are unlikely to present a problem for classroom usage; indeed, discussing these limitations can help students think more critically about the nature of text analysis corpora.

For an example of the kind of creative assignment to which Google NGram lends itself well, check out how Katherine D. Harris had her students use the tool in an exploratory manner when reading A Clockwork Orange.

Other Free and Accessible Text Analysis Tools:  

Leave a Reply

Your email address will not be published. Required fields are marked *