UCL Centre for Digital Humanities


Information Extraction and Semantic Annotation

19 October 2016, 5:30 pm–6:30 pm

UCLDH seminar logo

Event Information

Open to





UCL Centre for Digital Humanities
Gower Street
United Kingdom

Archaeological reports contain a great deal of information that conveys facts and findings in different ways. This kind of information is highly relevant to the research and analysis of archaeological evidence but at the same time can be a hindrance for the accurate indexing of documents. Information Extraction as a Natural Language Processing can unlock and surface such information by analysing a textual input and producing a structured textual output that is suitable for further manipulation. In this process Semantic Annotation links ontological definitions to natural language text   by   providing   class   information   for   textual   instances. Described as a mediator platform between concepts and their worded representations, semantic annotation as metadata can automate the identification of concepts and their relationships in documents. It is proposed as a mechanism for connecting natural language and formal conceptual structures to enable new information access methods and to enhance existing ones.   The annotation process enriches documents and enables access on the basis of a conceptual structure. This aids information retrieval from heterogeneous data sources, empowering users to search across resources for entities and relations instead of words.

The seminar will present the semantic annotation system (OPTIMA) which performs the tasks of Named Entity Recognition, Relation Extraction, Negation Detection, and Word-Sense Disambiguation over Archaeological Excavation reports (Grey Literature). The system employs rule-based Information Extraction techniques to deliver interoperable semantic abstractions (semantic annotations) with respect to the CIDOC Conceptual Reference Model (CRM) and relevant Cultural Heritage thesauri.


Dr. Andreas Vlachidis is a Research Associate at the UCL Department of Information Studies. He currently contributes to the cultural heritage data modelling and semantic enrichment aims of the EU Horizon 2020 CROSSCULT project. He holds a PhD on Semantic Indexing of Archaeological Grey Literature, and he is a certified text analyst of the General Architecture for Text Engineering GATE, a fellow of the Higher Education Academy (FHEA) and a member of the British Computing Society (BCS). In the past, as a member of the Hypermedia Research Group (USW) he has worked with Prof. Douglas Tudhope in the AHRC funded project STAR and in the EU FP7 funded project Ariadne. He has also received a grant from the Welsh government for developing a suite of open source natural language processing modules for the Welsh Language and worked with Prof. Hamish Fyfe in the Digital R&D fund for the arts in Wales and in the Creative Wales Exchange Network, providing research and managerial support to knowledge exchange activities.

All welcome and there will be drinks and discussion after the talk. Please note that registration is required.