Centre for Doctoral Training in AI-enabled Healthcare


Combining Mendelian randomisation and machine learning for drug target validation in complex disease

Mendelian randomisation (MR) is an established statistical tool for causal inference using data about human genetic variation to  estimate casual effects. MR has been applied successfully to demonstrate the causal contribution of risk factors (e.g. body mass index, blood cholesterol levels, interleukin-6 signalling) to complex diseases such as type 2 diabetes, coronary heart disease and inflammatory bowel disease. A specialised application of MR is in the identification and validation of potential drug targets. In this scenario, common genetic variants encoding a drug target are used as proxies for pharmacological modulation. Using the MR paradigm, the association of these variants with disease can be used to infer a causal role for the target in the disease of interest. By extension, the same variants can model the potential consequences of modulating that target with a drug. MR analyses, including several published by researchers at UCL, have been successfully shown to recapitulate the effects of established drug targets and to provide valuable insights into the utility of novel targets.

Applications of MR to date have largely been restricted to single or small numbers of targets, and have considered comparatively limited combinations of target-encoding genes, biomarkers and disease outcomes. Given the power of the approach, scaling up its deployment to large numbers of targets and diseases would substantially increase its utility. Machine learning on genome-wide data could also be applied to account for some known limitations of the MR method and to increase the scope of its utility. A major increase in scale brings methodological challenges that are best addressed by formulating meaningful hypotheses and testing them with real and simulated data. This PhD project has three core components:

  1. Can machine learning and advanced data science be used to optimise the construction of genetic instruments for use in MR for drug target validation?
  2. Can machine learning methods optimise the application of MR to drug target validation by detecting and accounting for genetic pleiotropy?
  3. Can a combination of machine learning and MR methods be used to prioritise drug targets and elaborate the molecular and physiological pathways involved in the efficacy and safety of a drug target?

These questions will be addressed using large volumes of summary statistics from published genome-wide association studies (GWAS) and individual level data from population studies (e.g. UK Biobank).


Funding Availability and Award

This full time PhD studentship is funded by BenevolentAI. Funding covers university course fees, an annual maintenance stipend and research expenses.

Person specification

Essential criteria

  • Minimum of 2:1 BSc in biomedical, statistics or computing and/or a Master’s degree in computational statistics, machine learning, statistical genetics or other quantitative discipline (preferably with a merit or distinction)
  • Experience in quantitative biomedical data analysis and statistics
  • Ability to organise and prioritise workload
  • Ability to work as part of a team


  • Experience in the use of programmes such as R or Python
  • Excellent verbal and written communication skills (ranging from informal 1:1 discussion to formal presentations)
  • Experience in analysing electronic health records and/or routine datasets
  • Experience in analysing human genetic datasets