UCL Institute of Health Informatics


Applied Genomics for Data Scientists

Human genetic data is increasingly available in health care. This introductory course includes key theoretical background on human genetics along with a focus on practical applications of computational methods. Working with synthetic UK-Biobank data, we will cover principles of human genetics, genome-wide association analysis, and the development and application of polygenic risk scores, and practical introduction to the interpretation of genetic associations.

We will work with synthetic data closely resembling the UK-biobank dataset, a widely used population based dataset covering a range of phenotypes and assessments. This course will therefore provide a good primer for anyone planning to work with this or similar population datasets in their future work. Specifically, we will cover the principles of human genetics, genome-wide association analysis, and the development and application of polygenic risk scores. In addition, we provide a practical guide to the tools and data resources available to interpret genome-wide association data and for applications in causal inference analysis. This two-day intensive practical course provides an opportunity for data scientists to learn how to integrate genetic data into their work through hands-on experience of genomic data analysis and computational genetics. 

Learning Objectives

  • Knowledge of basic Principles of human genetics
  • Implement basic Linux commands and scripting in Bash
  • Understand genetic data handling, files formats, and quality control 
  • Understand the format of UK-biobank phenotype data and apply restructuring and QC for analysis 
  • Apply logistic and linear regression for genetic association’s studies using PLINK 
  • Generate polygenic risk scores and evaluate application of these for prediction 
  • Implement and evaluate clinical interpretation of genetic associations 

Course Team

Dr Johan Thygesen (Co-Lead)

Johan Thygessen
Johan is a lecturer in Health Data Science at the Institute of Health Informatics. His main research interest lies in genetic analysis of complex disorders, with a specific focus on rare variants and in utilising electronic health records, to expand phenotype information. He has worked mainly within the field of mental health and psychiatric disorders, trying to establish causes as well as exploring ways to improve treatment and understand disease trajectories from electronic health records.
Dr Tom Lumbers

Tom Lumbers
Dr Tom Lumbers is an Honorary Consultant Cardiologist at Barts Heart Centre, HDR UK Fellow at University College London, and Visiting Scientist at the Broad Institute of Harvard and MIT. He received his Ph.D. in Molecular Biology at Imperial College London and subsequently completed post-doctoral training in Genetic Epidemiology at University College London. Tom’s research focuses on defining the genetic architecture of heart failure and left ventricular dysfunction to generate insights into causal factors and molecular disease mechanisms. He co-founded the HERMES Consortium, an international collaboration in heart failure genetics. He is developing tools to deliver scalable and validated disease phenotypes using real-world data to enable large scale genetic analysis of disease subtypes. He co-leads the phenotype working group at BigData@Heart, an EU public-private consortium (www.bigdata-heart.eu).