UCL Institute of Health Informatics


GSK-UCL Phenomics

Defining human disease at scale using health data


Defining human diseases using rich phenotyping information from multiple sources (e.g., genomic, clinical and imaging data) can lead to a better understanding of the mechanisms of disease and lead to development of novel, more effective, therapeutic agents.

The UCL GSK Phenomics Hub is a partnership between UCL Institute of Health Informatics (PIs Denaxas, Torralbo and Fitzpatrick) and GSK Genomic Sciences to deliver state-of-the-art, electronic health record (EHR)-derived phenotyping infrastructure across three major contemporary biobanks: the UK Biobank, Genes and Health, and Our Future Health (upon release). 

Aims of the Hub

The partnership builds on the Group’s previous research on computational approaches for defining and evaluating phenotyping algorithms using electronic health records (EHR).

Working closely with the team at GSK, the Phenomics Hub is now focussing on carrying out ‘deeper’ phenotype development and validation of phenotypes. This will add novel data elements such as biomarker and prescription data to be able to define disease progression and severity phenotypes across multiple EHR-linked biobanks.

Our collaboration also aims to create opportunities for a bidirectional knowledge exchange to drive disease severity and progression to phenotyping projects. 


Our novel phenotyping approach and published set of resources can be used for research to benefit areas including:

  • Drug development: accurately define relationships between therapeutic targets and disease endpoints; identify different underlying mechanisms of disease to enable the development of new molecules that target specific subpopulations who share common aetiology mechanisms.
  • Personalized medicine: avoid prescribing medications to patients that is not beneficial or may develop adverse outcomes; target patients with a therapy that will benefit them.
  • Randomized trials: inform inclusion/exclusion criteria; make trials safer and more effective; better outcomes ascertained in clinical trials of new medicines to inform sample size calculation and patient recruitment strategies.

For more information about the Hub, contact Natalie Fitzpatrick n.fitzpatrick@ucl.ac.uk 


Phenomics Hub Team at Institute of Health Informatics

Our team includes experienced health data scientists and software engineers and project management staff: 

Prof Spiros Denaxas Principal Investigator. Professor of Biomedical Informatics

Dr Ana Torralbo Co-Principal Investigator. Senior Research Fellow

Natalie Fitzpatrick Co-Principal Investigator. Research Programme Manager

Dr Chris Tomlinson Health Data Scientist

Cai Ytsma Health Data Scientist

Dr Natalie Zelenka Senior Research Fellow

Sanjay Nair Project Coordinator 


Outputs related to this collaboration


Carrasco-Zanini J, Pietzner M, Davitte J, Surendran P, Croteau-Chonka DC, Robins C, Torralbo A, Tomlinson C, Fitzpatrick NK, Ytsma C, Kanno T, Gade S, Freitag D, Ziebell F, Denaxas S, Betts JC, Wareham NJ, Hemingway H, Scott RA, Langenberg C. Proteomic prediction of common and rare diseases. Under review: New England Journal of Medicine; published as a preprint at medRxiv: https://doi.org/10.1101/2023.07.18.23292811

Chung SC, Providencia R, Sofat R, Pujades-Rodriguez M, Torralbo A, Fatemifar G, Fitzpatrick NK, Taylor J, Li K, Dale C, Rossor M, Acosta-Mena D, Whittaker J, Denaxas S. Incidence, morbidity, mortality and disparities in dementia: A population linked electronic health records study of 4.3 million individuals. Alzheimers Dement. 2023 Jan;19(1):123-135. doi: 10.1002/alz.12635. Epub 2022 Mar 15. PMID: 35290719; PMCID: PMC10078672.

Chung SC, Sofat R, Acosta-Mena D, Taylor JA, Lambiase PD, Casas JP, Providencia R. Atrial fibrillation epidemiology, disparity and healthcare contacts: a population-wide study of 5.6 million individuals. Lancet Reg Health Eur. 2021 Aug;7:100157. doi: 10.1016/j.lanepe.2021.100157. PMID: 34405204; PMCID: PMC8351189.

Denaxas S, Shah AD, Mateen BA, Kuan V, Quint JK, Fitzpatrick N, Torralbo A, Fatemifar G, Hemingway H. A semi-supervised approach for rapidly creating clinical biomarker phenotypes in the UK Biobank using different primary care EHR and clinical terminology systems. JAMIA Open. 2020 Dec 5;3(4):545-556. doi: 10.1093/jamiaopen/ooaa047. PMID: 33619467; PMCID: PMC7717266.

Denaxas S, Gonzalez-Izquierdo A, Direk K, Fitzpatrick NK, Fatemifar G, Banerjee A, Dobson RJB, Howe LJ, Kuan V, Lumbers RT, Pasea L, Patel RS, Shah AD, Hingorani AD, Sudlow C, Hemingway H. UK phenomics platform for developing and validating electronic health record phenotypes: CALIBER. J Am Med Inform Assoc. 2019 Dec 1;26(12):1545-1559. doi: 10.1093/jamia/ocz105. PMID: 31329239; PMCID: PMC6857510.

Kuan V, Denaxas S, Gonzalez-Izquierdo A, Direk K, Bhatti O, Husain S, Sutaria S, Hingorani M, Nitsch D, Parisinos CA, Lumbers RT, Mathur R, Sofat R, Casas JP, Wong ICK, Hemingway H, Hingorani AD. A chronological map of 308 physical and mental health conditions from 4 million individuals in the English National Health Service. Lancet Digit Health. 2019 May 20;1(2):e63-e77. doi: 10.1016/S2589-7500(19)30012-3. PMID: 31650125; PMCID: PMC6798263.

Conference proceedings

Davitte J, Croteau-Chonka DC, Gade S, Ziebell F, Surendran P, Wang Q, N Bowker, Ehm M, Torralbo A, Denaxas S, Fitzpatrick N, Ytsma C, Betts J, Scott R, Robins C. Integration of Phenome-Wide Time-To-Event Modeling with Genetic Colocalization Results for 2,941 Plasma Proteins and 310 Diseases in 44,896 UK Biobank Participants. American Society of Human Genetics 2023; Washington (accepted).

Torralbo  A, Ytsma C, Fitzpatrick NK, Tomlinson C, Denaxas S. Defining and redefining human disease at scale in the UK Biobank: a framework for disease phenotyping algorithm development and evaluation. American Medical Informatics Association 2023; New Orleans (accepted).