UCL Institute of Health Informatics


Recent developments and research challenges in data linkage

19 September 2016, 1:00 pm–2:00 pm

Event Information

Open to



UCL Institute of Health Informatics

Speaker: Professor Peter Christen,  ANU College of Engineering and Computer Science (CECS),The Australian National University

Techniques for linking and integrating data from different sources are becoming increasingly important in many application areas, including health, census, taxation, immigration, social welfare, in crime and fraud detection, in the assembly of national security intelligence, for businesses, in bibliometrics, as well as in the social sciences.

In today's Big Data era, data linkage (also known as entity resolution, duplicate detection, and data matching) not only faces computational challenges due to the increasing size of data collections and their complexity, but also operational challenges as many applications move from static environments into real-time processing and analysis of potentially very large and dynamically changing data streams, where real-time linking of records is required. Additionally, with the growing concerns by the public of the use of their sensitive data, privacy and confidentiality often need to be considered when personal information is being linked and shared between organisations.

In this talk Professor Christen will present a short introduction to data linkage, highlight recent developments in advanced data linkage techniques and methods - with an emphasis on work conducted in the computer science domain - and discuss future research challenges and directions. 

Peter Christen is professor at ANU College of Engineering and Computer Science (CECS) at The Australian National University. He received a Diploma in Computer Science Engineering from ETH Zurich in 1995, and a PhD in Computer Science from the University of Basel in 1999 (both in Switzerland).  His research in data mining and data matching has so far resulted in over 140 publications, including the book Data Matching by Springer in 2012. He is also the main developer of the `Febrl' (Freely Extensible Biomedical Record Linkage) open source data cleaning, deduplication and record linkage system.