XClose

Dawes Centre for Future Crime at UCL

Home
Menu

Textwash

6 November 2024

Research summary

In this project, UCL researchers developed software that removes personally identifying and sensitive information from text data. While many stakeholders (e.g. the police, healthcare providers, tech companies) are keen to share text data with researchers – for example, to evaluate or build information extraction approaches – a key impediment to date is the sensitive nature of the data. Individuals have the right to privacy, so text data would need to be anonymized before it can be shared. To do this automatically while retaining the usefulness for secondary computational analyses, Textwash was developed. The outcome of this project is the first empirically-validated and transparent anonymization software that puts all control in the hand of the users.

Textwash was supported by a proof-of-concept grant from SAGE.

Key findings

The Textwash project has seen significant advancements in the underlying algorithms and is growing to a multi-language project with additional funding from the Dutch Research Council (NWO). Textwash was further piloted as a key resource for the US National Archive of Criminal Justice Data for their mission to make research data widely available. 

Lead Investigator(s)
  • Dr Bennett Kleinberg, UCL Security and Crime Science 
  • Dr Toby Davies, UCL Security and Crime Science 
  • Maximilian Mozes, UCL Security and Crime Science 
Outputs