Consortium including Bentham Project receives European Commission’s Horizon Impact Award for 2020
23 September 2020
UCL’s Bentham Project was part of a European consortium which has received the major award in recognition of their contribution to transforming access to the world’s archival knowledge using artificial intelligence.
The READ (Recogntion and Enrichment of Archival Documents) consortium, of which the Bentham Project at UCL Faculty of Laws was a partner, has been named as one of the winners of the European Commission’s Horizon Impact Award for 2020. The Award, which this year received 225 nominations across all disciplines, recognises and celebrates EU-funded research projects whose results have created significant societal impact across Europe and beyond. The award ceremony took place on 23 September 2020 during the European Research and Innovation Days event.
READ (2016–19), funded by the Horizon 2020 programme, set out to transform the way in which the public and scholars access the world’s archival knowledge, using machine learning and artificial intelligence. Building upon the work of the tranScriptorium programme (2013–15), in which the Bentham Project was also a partner, the READ team delivered Transkribus, a comprehensive, freely-available platform for the automated recognition, transcription, and searching of historical documents, which incorporates Handwritten Text Recognition (HTR), Keyword Spotting, and other cutting-edge technologies. Transkribus has a growing user base, with over 40,000 individuals having registered an account with the platform, thousands of whom use it on a weekly basis. Transkribus is capable of producing, by either using an off-the-shelf HTR model or after a certain amount of training, automated transcripts of handwritten manuscripts in a variety of scripts and languages dating from centuries ago to the present day.
Critical to testing the HTR and associated technologies incorporated in Transkribus was UCL’s Bentham Papers, a vast collection of manuscripts which contain features that any effective HTR platform needs to contend with, such as difficult handwriting, pages written in more than one hand, skewed writing, and crossings-out. In addition, transcripts produced by volunteers for the Bentham Project’s award-winning crowdsourced transcription initiative, Transcribe Bentham, were integral to training and testing robust HTR models. Early experiments resulted in an HTR model able to recognise correctly around 82% of all the characters on a fairly straightforward Bentham manuscript; by the end of the READ programme subsequent experiments produced models capable of recognising at least 95% of characters on a straightforward Bentham manuscript, and 91% on the most complex. Dr Louise Seaward of the Bentham Project led a major dissemination campaign to promote the use of Transkribus, having organised and presented at over sixty workshops, seminars, conferences, hackathons, and public talks, cumulatively attended by over a thousand people.
The Bentham Project also worked with colleagues at the Pattern Recognition and Human Language Technology Research Center (Universitat Politècnica de València) to produce the Bentham Papers Indexing and Search engine. Based on pattern recognition, this Keyword Spotting engine allows the user to search for words and phrases in the almost 100,000 pages of the Bentham Papers without the need for them to have been transcribed—a proof-of-concept of a potentially transformative technology for further widening access to historic manuscripts, and a major resource in its own right.
The work of the READ project is now continued by the READ CO-OP, a European Co-operative Society to sustain and further develop the Transkribus platform, and whose subscribers include the British Library and the respective National Archives of Finland, Luxembourg, Norway, and Sweden.