First World War Poetry Digital Archive

Brief History of Text Analysis

For many years now literary and linguistic scholars have been interested in the ability offered by computers to search through and order large quantities of text. Before the computer, searching through the complete works of an author (for example) would have to be done by hand, page by page, whereas once the text is in machine searchable form on a computer you can start to look through it and analyse it very quickly. Predominantly literary scholars are interested in how an author or authors use key words or phrases, how often, when words came into common usage, etc. We tend to term this ‘quantitative analysis’, i.e. it focuses on frequency and position.

Such practices actually pre-date computers by many centuries. For example, in the late thirteenth century Dominican monks produced a concordance of the Vulgate (the accepted Bible of the time), which allowed them to look at common phrases and see where they occurred. In 1887, T. C. Mendenhall began to look at word-length in Shakespeare and Bacon to try to come up with some conclusions about authorship; in 1949 Father Roberta Busa used an IBM mainframe to analyse the works of St Thomas Aquinas, and in 1964 F. Mosteller and D. Wallace used text analysis to try to establish the authorship of a series of anonymous federalist papers.

More recently computer-aided text analysis has been used for such things as identifying forged police confessions, and plagiarism detection.

For more information on the history of text analysis see D. I. Holmes, ‘The Evolution of Stylometry in Humanities Scholarship’, Literary & Linguistic Computing 13.3 (1998), pp. 111-117.