Centre for Doctoral Training in AI-enabled Healthcare


Machine Learning for high-throughput Phenotyping and Comorbidity Mapping in EHR data

Project Summary 

Electronic Health Records (EHR) are data generated when patients interact with the healthcare system and contain information on diagnoses, symptoms, procedures, prescriptions, and tests. Identifying patients with a specific disease, its onset and progression (a process called phenotyping) is challenging and time consuming due to the highly dimensional and noisy nature of EHR data. A patient typically has tens of thousands of data points over multiple years and there is significant variation in how diseases manifest and progress. Delivering personalised treatment relies on accurately identifying diseases and their subtypes as well as clusters of comorbidities. Machine learning has the potential to identify disease phenotypes and subtypes, by finding non obvious patterns in complex cross sectional and longitudinal EHR data.

The candidate will work with world leading machine learning and translational bioinformatics experts at BenevolentAI to develop and evaluate machine learning approaches (e.g. sparse latent factor models, nonnegative tensor factorization, semi supervised anchor learning) for identifying neurological diseases, disease subtypes and clusters in multi-modal EHR data. The project will use large scale contemporary data sources (UK Biobank: 500,000 middle aged adults with extensive genotyping/phenotyping and, CALIBER: 15M primary care patients with hospital EHR) and potentially industry sponsored and proprietary data resources. The project will also consider approaches which utilize external information sources, such as published medical literature, for creating more concise and interpretable disease phenotypes.

Whilst the student will participate in all CDT activities and its programme, it is expected that they will spend a proportion of their time working with BenevolentAI at their base, a few minutes' walk from the IHI in Tottenham Court Road.

Funding Availability and Award

This full time PhD studentship is funded by BenevolentAI. Funding covers university course fees, an annual maintenance stipend (roughly £19,077 - tax fee) and research expenses.

Residency Requirements

Studentships are open to all UK applicants. Applicants are also eligible for a studentship if they have been an ordinary resident in the UK for three years prior to the start of the studentship grant. For instance, if the applicant applies for a studentship to start in October 2019, they must have resided in the UK since October 2015.

Please note: if the applicant is from an EU-country, these three years may include time spent studying however if the applicant is from outside the EU (international), these three years cannot include time spent studying at a Higher Education institution.

Person specification

Essential criteria

  • Minimum of 2:1 BSc in biomedical, statistics or computing and/or a Master’s degree in computational statistics, machine learning, statistical genetics or other quantitative discipline (preferably with a merit or distinction)
  • Experience in quantitative biomedical data analysis and statistics
  • Ability to organise and prioritise workload
  • Ability to work as part of a team


  • Experience in the use of programmes such as R
  • Excellent verbal and written communication skills (ranging from informal 1:1 discussion to formal presentations)
  • Experience in analysing electronic health records and/or routine datasets