UCL Great Ormond Street Institute of Child Health


Great Ormond Street Institute of Child Health


Surveillance of rare diseases and risk prediction using administrative data

Supervisors: Professor Bianca De Stavola, Dr James Doidge, Professor Ruth Gilbert

Currently, disease registries that require active data collection are used to monitor incidence of specific rare diseases for service planning, early warning systems, and as a tool for research. However, population-wide registries can be difficult to implement and expensive to maintain. Digitisation of healthcare records and increases in computing power have led to significant interest in understanding how linked administrative data can support service provision and research. Population-level administrative data may provide a cost-effective alternative to disease registries. Administrative data can provide useful information about demographic characteristics, long-term outcomes and utilisation of services. Using administrative data, however, requires that individuals can be classified in terms of their disease or condition status. The principle means of identifying disease status in hospital records is to use diagnostic codes but these can be fraught with error. Other clinical or service data may improve disease identification but research is needed to understand how.

The problem of identifying disease status is methodologically similar to the problem of predicting disease outcomes; both involve classification of an unknown variable. There are many statistical and computing techniques available for addressing classification problems but their application to rare disease surveillance and clinical risk prediction requires an understanding of the natural history of the disease, its clinical management and the administrative procedures for which the data are created. The student undertaking this research project will become an expert on the use of high-value UK administrative datasets and a set of specific rare diseases, in order to develop classification procedures that are best suited to these applications.

Specific aims are to be agreed with the student, to align with their individual interests and objectives. The core research objectives are: (R1) to explore ways that administrative data can be used to monitor the incidence of rare diseases at a population level, (R2) to explore ways that administrative data can be used to predict risk of poor outcomes at an individual level, and (R3) to identify ways that administrative data may complement or replace active data collection by disease registries. The core learning objectives are: (L1) to gain expertise in the use of high-value UK health-related administrative and registry datasets for research, (L2) to gain expertise in a set of specific rare diseases, (L3) to gain expertise in data science approaches to healthcare research, and (L4) to be able to identify useful administrative data sources and navigate the ethical principles, legal protections and bureaucratic procedures that govern access to them.

Linked datasets to which the supervisors already have access (in particular Hospital Episode Statistics for England, Office of National Statistics Mortality data, the National Pupil Database and the National Down Syndrome Cytogenetic Register) will provide a foundation for the research, and the student will be supported in accessing additional datasets depending on their individual research interests. There will be scope to work with clinical disease experts and registries within GOSH. The use of existing datasets will allow the student to progress in their candidature, regardless of the time taken to negotiate access to additional datasets. The student will spend their first year learning about relevant diseases and datasets, methods for data linkage and analysis of linked health data, statistical and machine learning approaches to classification, and identifying additional datasets that are of interest to them and applying for access to them. The second year will focus on analysing data and extending skill development in specific areas in which they would like to gain expertise. In the final year the student will finalise analyses, prepare manuscripts for peer-reviewed journals, and adapt them for inclusions as chapters in their thesis.

With its focus on the analytic methods underpinning health service research, this project sits at the interface of computer and clinical sciences. In this way, the student can expect to make significant contributions to the design, delivery and evaluation of healthcare services.