UCL Institute of Health Informatics


Data Methods for Health Research

This module will deal with contemporary computational tools and approaches for managing health and biomedical data in the context of research. Students will be introduced to some of the current technologies and computational tools for cleaning, transforming and querying biomedical data such as Python (including contemporary data analysis libraries such as numpy and pandas) and relational database management systems such as MySQL and SQL. Through a set of hands on practical, and tutorials, students will apply the knowledge they gain onto contemporary electronic health record and health related datasets and gain practical experience on common scientific programming approaches and best-practices on version control, data cleaning, curation and metadata.

Module code


UCL credits


Course Length

9 Weeks

Face to Face Dates

Term 1, Teaching weeks 6-10 & 12-16, Tuesday 9:00 - 15:30

Assessment Dates

Term 1, Dates TBC

Module organiser

Dr Spiros Denaxas Please direct queries to courses-IHI@ucl.ac.uk


  • Scientific Programming in Python I
  • Scientific Programming in Python II
  • Scientific Programming in Python III
  • Reproducible Science
  • Relational databases I
  • Relational databases II
  • Relational databases III
  • Evidence Synthesis
  • Data visualization
  • Mobile Health

Teaching and learning methods

This 15 credit module lasts for 10 weeks and should represent roughly 150 hours of learning time. This module will use a mixture of lectures and problem classes, seminars and computer practical. There will be private reading and materials will be made available via Moodle, with some online activities.


In this assignment you will be provided with a real-world biomedical dataset, a set of documents describing how the data were generated and what the contents are. You are asked to explore, clean and summarize the dataset and answer a set of questions in a clear, reproducible manner using Python.