Transcribing Brunel’s illegible handwriting using AI
20 August 2019
The handwriting of one of Britain’s most prolific engineers, Isambard Kingdom Brunel, is to be decoded with new automated transcription software developed by a consortium of institutions including UCL and the University of Innsbruck.
For years a team of researchers has been trying to manually decipher diaries and documents belonging to Brunel, but it has proved to be extremely time-consuming.
Now the documents are being scanned by Transkribus software, which works by learning the specific quirks of an author's writing, comparing it against examples which have already been transcribed. The research and development of the software is described in a new case study published in the Journal of Documentation.
“There are millions upon millions of pages of handwritten documents in archives around the world. Transkribus has the potential to revolutionise the way archivists and researchers read, transcribe, process and mine historical documents,” said co-author of the study, Professor Philip Schofield (Bentham Project, UCL Laws).
In the case of Brunel's handwriting the software needs to be provided with a minimum of 15,000 manually transcribed words to make sense of further pages. The algorithm will be applied to thousands of further pages. The reliability of the results continues to improve as more training data is provided, and Transkribus can currently read 65% of Brunel's words.
The research forms part of a large-scale international research initiative known as READ (Recognition and Enrichment of Archival Documentation), funded under the EU Horizon 2020 programme. The READ initiative is focused on making archival material more accessible through the development of cutting-edge technologies.
The Transkribus platform was developed by the University of Innsbruck, and incorporates cutting-edge technologies developed by computer scientists from across Europe. The platform seeks to meet the specific challenge of automating the indexing, searching, and full transcription of historic handwritten manuscripts which are written in dozens of languages, and date from the medieval period to the present day.
UCL led on the testing of the software by trialling it on thousands of papers written by the British Philosopher, Jeremy Bentham, held in the universities’ archive.
Professor Philip Schofield, Director of the Bentham Project at UCL, said: “The Bentham Project, which runs the iconic Transcribe Bentham scholarly crowdsourcing initiative, where volunteers transcribe Bentham’s handwriting, has been actively involved in disseminating Transkribus by organising dozens of events with archivists and researchers. We have also been using the Bentham Papers to test the capabilities of the platform.”
One of the strongest Handwritten Text Recognition models produced using Transkribus is based upon transcripts produced by volunteers for Transcribe Bentham. These transcripts have been used by the Bentham Project in its experiments with Transkribus to produce a model capable of automatically deciphering Bentham’s often challenging handwriting.
The Bentham model has been successfully trained on over 50,000 words from papers written by Bentham and his secretaries, and Transkribus can now produce automated transcripts of Bentham’s manuscripts which are 95% accurate, depending upon the complexity of layout and legibility of the page in question.
Through their involvement in READ, Bentham Project researchers have also worked with colleagues at the Polytechnic University of Valencia to develop a keyword spotting tool for the whole of the 90,000 digitized manuscript pages in UCL’s and the British Library’s Bentham papers.
Future projects include working with colleagues at the University of Toronto in order to improve the platform’s capability to transcribe late Medieval Latin documents.
Links
- Journal of Documentation article
- UCL Bentham Project
- Transcribe Bentham
- Professor Philip Schofield’s academic profile
Image
- Credit: BRSGB-2016.06001. Accepted under the Cultural Gifts Scheme by HM Government from Clive Richards OBE DL and allocated to the SS Great Britain Trust, 2017
Media contact
Natasha Downes
tel: +44 20 3108 3844
E: n.downes [at] ucl.ac.uk