UCL Institute of Health Informatics


Principles of Health Data Science

This module introduces students to the main principles of health data science and provides them with an overview of the main research areas where these are applied. Though a combination of lectures, invited speakers, computer practical and group work, students will become familiar with clinical terminologies, data linkage, translational bioinformatics and phenotyping electronic health records for research.
Additionally, a series of lectures will introduce methods for processing health data in a data-driven manner such as machine learning, medical imaging, natural language processing.

Module code


UCL credits


Course Length

9 Weeks

Face to Face Dates

Oct:02;09;16;23;30. Nov: 13;20;27. Dec:04;11. Dec:18.

Assessment Dates

6th january 2020

Module organiser

Dr Spiros Denaxas Please direct queries to courses-IHI@ucl.ac.uk

Learning Outcomes

  1. Outline the main types of EHR data, and how controlled clinical terminologies are utilized to record healthcare information;
  2. Summarise the main advantages and limitations of creating and evaluating electronic health record phenotypes;
  3. Explain and practice the foundations of data visualisation;
  4. Summarize the importance of medical imaging and how data from medical images can be visualized and processed;
  5. Outline and apply the principles of natural language processing for health data, and explain the foundations of artificial intelligence in healthcare;
  6. Understand the fundamental concepts of epidemiology e.g. measures of frequency of disease, effect of association, and impact;
  7. Explain the concepts of confounding, effect modification and bias in the context of epidemiological analyses;
  8. Outline how to measure incidence & prevalence, of disease;
  9. Explain the main types of observational studies (cohort, cross sectional, ecological);


  • Controlled Clinical Terminologies
  • Phenotyping of Electronic Health Records
  • Medical Imaging
  • Data Linkage
  • Biomedical Data Standards
  • Machine Learning I: Natural Language Processing
  • Machine Learning II: Supervise Learning
  • Machine Learning III: Neural Networks & Artificial Intelligence
  • Machine Learning IV: Unsupervised Learning

Teaching and learning methods

This 15 credit module lasts for 10 weeks and should represent roughly 150 hours of learning time. This module will use a mixture of lectures and problem classes, seminars and computer practical. There will be private reading and materials will be made available via Moodle, with some online activities.


Final Assessment by examination