Simulated datasets for health data access and research
28 February 2020, 12:30 pm–1:30 pm

Join us on the 28th February to hear Paul Clarke (Health Data Insight) speak about 'simulated datasets for health data access and research’. This seminar will be the 5th in our Future of AI in Healthcare CDT Seminar Series.
Event Information
Open to
- All
Organiser
-
Craig Smith – UKRI Centre for Doctoral Training in AI-enabled Healthcare02035495035
Location
-
LG:01Institute of Health Informatics222 Euston RoadLondonNW1 2DA
Simulated datasets are a useful and innovative approach to facilitating safe access to patient data for research. By aggregating cohorts and sampling fields at random, we can create realistic models of health databases that can be used to develop queries and explore the structure of the data. These generative algorithms come with privacy guarantees, so the resulting simulated datasets can be used widely. In this talk I will discuss some of the methods used to generate simulated datasets. In addition there will be a practical case study of using the simulated datasets as a model for the data held by Public Health England’s National Cancer Registration and Analysis Service to facilitate innovative research in industry.
About the Speaker
Paul Clarke
Mathematician at Health Data Insight CIC
Paul Clarke is a mathematician at Health Data Insight CIC with a special interest in simulated datasets. His current work sees him working to expand and improve the Simulacrum dataset and explore and compare methods for creating simulated datasets.
More about Paul Clarke