The Google Books Ngram Viewer is one of the easiest and immediately rewarding ways to begin humanities work that has a digital component. Created with the Cultural Observatory at Harvard, the Ngram viewer provides statistical data on occurrences of words or strings of words in the Google Books corpus (currently over 30 million texts).
Occurrences (percentage) of above words in US English, 1840-2008
Above is an example of a Google Ngram from one of my conference presentations on the influence of Dante’s Inferno in US visual culture. This data was used as background to get into my analytic discussion regarding the clustering of Inferno adaptations (there being a surge in the earlier 20th century, and again in the early 21st century). The graph shows that there is increased activity of the word “Dante,” for instance, at the turn of the 20th century, and an uptick of the word “hell” after 2000.
This data is presented without context and without the entire corpus of the written word, but it does provide some interesting trends, and is not unlike similar work that has been done on statistical text analysis without the aid of computers and the big data set currently available.
Because this data is open source, there are a number of other tools available to work with the data.
Bookworm was developed by Culturomics and offers an interface tool based on the digitized texts at OpenLibrary.org, which include books with full title, author, etc. information. This enables users to query specific bodies of work, but presents a smaller data set.
Culturomics have also provided open source Python code to retrieve data from Google Ngram in a tsv format. Google provides the complete data set for personal use so you don’t overrun their servers if you need to pull a lot of data.
BYU Google Books Viewer also provides an interface to interact with Google Books that accepts larger strings of words and links to books where the data is pulled from.
When I presented the above graph at the conference I was actually met with some hostility, even though the graph was only used as a sort of visual aside to my actual analysis. In the Boston Review, Claude S. Fischer provides background of relevant issues, and of course there is no perfect data set. However, it is increasingly important to try to situate the humanities within a larger technological environment, and Google Ngrams are a good first step for those who do not want to venture too far into the digital humanities right away, while also providing options for those interested in doing so.
Additionally, this can be a fun tool to use in the humanities classroom to get students thinking about word usage over time. This can easily turn into a discussion board assignment, as was done here.
Ultimately, the Ngram visual representation provides the humanities scholar with an additional tool to approach text and the history of words.

కామెంట్లు లేవు:
కామెంట్ను పోస్ట్ చేయండి