Visualizing the Transcribe Bentham corpus

Start: Dec 06, 2016 05:30 PM
End: Dec 06, 2016 06:30 PM

Location: G31, Foster Court, UCL, Malet Place, London, WC1E 7JG

UCLDH seminar series

How can we gain an overview of the 17,000 pages of Bentham's manuscripts made available by Transcribe Bentham? Methods to provide an overview of the corpus may help domain-experts find corpus areas relevant for their research. In this work we have applied computational techniques to visualize the corpus, providing a general view of its content.

First, a lexical extraction was performed to choose terms to model the corpus. Then, term clusters were created based on similarity between the terms' contexts, and visualized as corpus maps. The maps provide an overview of the corpus as a whole, as well as of corpus terms more prominent in different corpus periods. The issue of evaluating these corpus maps will also be discussed.

Pablo Ruiz is a PhD Student in Natural Language Processing for Digital Humanities at the École Normale Supérieure in Paris.