High-end computing to enhance arts and humanities research
20 July 2006
Researchers in the arts and humanities will soon be able to mine 19th century census data at the click of a mouse, thanks to a pioneering high-end computing project led by Dr Melissa Terras of UCL School of Library, Archive and Information Studies.
Dr Terras has secured access from The National Archives and genealogy service Ancestry.co.uk to census data covering 1841 to 1901 that has been recently digitised. With the help of colleagues in UCL Research Computing, she is developing modelling techniques using powerful computing grids that will allow academics to interrogate and match the records in a fraction of the time it takes to analyse them manually.
"I've been interested in e-Science, or grid computing, programmes for a few years, but no applications have been developed for the arts and humanities," says Dr Terras. "Once the idea of the census occurred to me, I could see numerous uses: historians could examine child mortality, social mobility or even map how geography changes over the years, given the fluctuation in the definition of neighbourhoods and counties."
It's an idea that also appeals to the Arts and Humanities Research Council, which in May awarded Dr Terras one of only six awards to fund a pilot project investigating the benefits and pitfalls of exploiting e-Science for a humanities-focused audience. At an international workshop held at UCL last month, historians and information experts confirmed that little has been done to date to explore this use of the technology but that the potential is vast.
However, interrogating data for the arts and humanities communities presents its own set of challenges. "Arts and humanities datasets tend to be smaller than scientific ones, and often involve lots of different types of data," explains Dr Terras. "The information is usually manually entered, which can make it 'dirty', that is, susceptible to errors or omissions."
The problems posed by dirty information, such as variants in spelling, makes Dr Terras's project attractive to physicists, who are keen to devise ways of teaching computers to think out of their precise boxes and recognise matches in so-called 'fuzzy' data.
The pilot is the tip of the iceberg. The proportion of the records from The National Archives that are available digitally and in the public domain is growing all the time, as amateur historians are taking matters into their own hands. Ancestry also own digitised census records from around the world, and if the project is a success it may be extended into the international arena. Furthermore, the techniques developed by UCL will be able to mine any dataset, so future additional holdings - census records from other countries, for instance - could be cross-searched with ease. These developments will depend on the outcome of a further, larger funding round this autumn. For the time being, Dr Terras is keen to work with arts and humanities specialists who think e-Science could bring an extra dimension to their research.
To find out more, contact Dr Melissa Terras or use the links at the bottom of this article.