Centre for Doctoral Training in AI-enabled Healthcare


Simulated datasets for health data access and research

28 February 2020, 12:30 pm–1:30 pm


Join us on the 28th February to hear Paul Clarke (Health Data Insight) speak about 'simulated datasets for health data access and research’. This seminar will be the 5th in our Future of AI in Healthcare CDT Seminar Series.

Event Information

Open to



Craig Smith – UKRI Centre for Doctoral Training in AI-enabled Healthcare


Institute of Health Informatics
222 Euston Road

Simulated datasets are a useful and innovative approach to facilitating safe access to patient data for research.  By aggregating cohorts and sampling fields at random, we can create realistic models of health databases that can be used to develop queries and explore the structure of the data.  These generative algorithms come with privacy guarantees, so the resulting simulated datasets can be used widely.  In this talk I will discuss some of the methods used to generate simulated datasets.   In addition there will be a practical case study of using the simulated datasets as a model for the data held by Public Health England’s National Cancer Registration and Analysis Service to facilitate innovative research in industry.

About the Speaker

Paul Clarke

Mathematician at Health Data Insight CIC

Paul Clarke is a mathematician at Health Data Insight CIC with a special interest in simulated datasets. His current work sees him working to expand and improve the Simulacrum dataset and explore and compare methods for creating simulated datasets.

More about Paul Clarke