UCL Institute of Health Informatics


Data Lab

At the intersection of clinical research and computer science, the Data Lab seeks to develop data-driven computational methods and tools for exploiting electronic health records for biomedical research. The Lab is involved in the development of novel methods, tools and scripts, and preparation of data from electronic health records for research. 

The Data Lab team, consisting of experienced data scientists and statisticians, has extensive expertise in the curation and analysis of large, linked health and administrative datasets, and is involved across all stages of the project, from providing initial advice, development of disease definitions and classifications through to extraction and release of data onto the data safe haven. 

The team 

Head of the Data Lab team

Spiros Denaxas portrait
Professor Spiros Denaxas is Professor in Biomedical Informatics and Deputy Institute Director for the Institute of Health Informatics (IHI), UCLH BRC HIGODS theme co-lead, co-lead for the HDR UK Actionable Analytics theme, and head of a multidisciplinary research lab (http://denaxaslab.org) focusing on developing methods and tools for exploiting national structured EHR for translational research.

Data scientists 

Arturo Gonzalez-Izquierdo portrait
Dr Arturo Gonzalez-Izquierdo, Senior Data Scientist
Currently leading the data workflow in the CALIBER research platform.
With a background in Applied Statistics, Epidemiology and Public Health (MSc) and Biostatistics (PhD), his areas of expertise range from disease epidemiology and healthcare utilisation in observational studies, to data methods for clinical phenotyping using electronic health data at national and international levels. 

Michail Katsoulis portrait
Dr Michail Katsoulis, Statistician.
Michalis is a medical statistician, working at the Institute of Health Informatics, at UCL as a senior research fellow.  
He holds a BHF Immediate Postdoctoral Basic Science Research Fellowship (Feb 2018 - Jan 2022) entitled “Weight change and the onset and progression of cardiovascular diseases in large scale electronic health records”. 
He is particularly interested in causal inference. His intention is to bring together the worlds of epidemiology and statistical methodology and highlight the potential benefit for public health from the appropriate analysis of observational data. 


Ana Torralbo portrait
Dr Ana Torralbo, Research Fellow in Health Data Science
Ana has a Phd in Cognitive Psychology and worked as Marie Curie Research Fellow in Cognitive Neuroscience in UCL. She has an MSc in Data Science for Research in Health and Biomedicine and she is interested in phenotyping methods using linked electronical health records. She is a Research Fellow in Health Data Science at the Institute of Health Informatics in UCL where she is working in an industry funded project to create and develop phenotype algorithms in the UK Biobank.

Muhammad qummer ul arfeen portrait
Mr Muhammad Qummer Ul Arfeen (Arfeen), Data Manager
Arfeen is working as Data Manager for IHI-UCL. He is responsible for coordinating databases, storing, organizing, securing, and accessing information. Arfeen is passionate about making data management more efficient and effective. His role is to manage the ongoing operation of the EHR databases. Analyze large volumes of clinical data to identify trends and quality of data submitted. He mostly collaborates with a support team and clinical research groups to ensure data integrity and data exchange. Arfeen is highly proficient in the use of computer programs and applications to make raw data more useful to the department and research group.


Yulei Fan
Dr Yulei Fan, Caliber data manager has five years industry work experience in developing intelligent digital healthcare information management system and healthcare data collection & exchanging in RioMed company, two years work experience in analysing and visualizing genomic variation within Escherichia coli and Salmonella enterica in Medical School of University of Warwick, five years teaching experience in computer science in Xiamen University, two years research work on modelling CAs as Neuron-Psychological Phenomena and for practical applications in computing school of Middlesex University London, and his PhD research was on biological image segmentation and 3D/4D visualization mouse embryo models reconstruction for gene mapping in computing school of Leeds Beckett University. His areas of interest are data science with machine learning and software development. 

Cai Ytsma portrait
Cai Ytsma, Health data scientist
Cai holds an MSc in Health Data Science from UCL and is expanding from her dissertation work analyzing prescribing trends post-COVID infection in the UK Biobank. As a research associate, she contributes to the development and analysis of phenotyping algorithms in the UK Biobank and beyond.
She is also an expert in geochemical quantification with laser-induced breakdown spectroscopy (LIBS) and works as a spectroscopy data scientist consultant in industry, academia, and for national planetary organizations


Systems architects

Vaclaz Papez portrait
Dr Vaclav Papez, Research Associate in Health Data Systems.
Vaclav has worked in the fields of health and neuro informatics for more than seven years now and has spent the last four years as a research associate at the Institute of Health Informatics at UCL. Previously, he worked as a junior researcher at the Department of Computer Science and Engineering of the University of West Bohemia in Pilsen, Czech Republic, whereby he obtained a PhD with a thesis on Archetype-based approach for modelling of electroencephalographic/event-related potentials data and metadata. 
Having a professional background as a computer scientist and software engineer, his primary interest is in data models, database technologies (relational as well as non-relational) and linked data / semantic web.

Project management and facilitation

Natalie Fitzpatrick portrait
Ms Natalie Fitzpatrick, Data Science Facilitator 
Natalie has more than 20 years’ experience managing large research programmes involving linked electronic health record (EHR) data.  She is responsible for facilitating research collaborations for CALIBER including governance and access to data.  Natalie co-leads the UCL Institute of Health Informatics (IHI) Phenomics Group and is programme manager for the Health Data Research (HDR) UK Phenomics Implementation Projects to develop the HDR UK CALIBER Phenotype Portal, an open resource for EHR users to share their methods and tools, and build the UK's natural language processing (NLP) 

PhD students

Andre Vauvelle portrait
Mr Andre Vauvelle
Vauvelle is a PhD student on the AI Enabled Healthcare Systems CDT and is sponsored by BenevolentAI. After working with the NHS and healthcare data through startups and consultancy, he found many healthcare problems that could benefit from further academic study. His research focuses on developing new machine learning methods and tools for computational phenotyping with structured EHR data.

Albert Henry portrait
Mr Albert Henry
Albert Henry is a PhD student registered with the BHF 4-year PhD in Cardiovascular Biomedicine programme at UCL Institute of Cardiovascular Science. He is a fully trained clinician (general practice) from Indonesia and holds an MSc degree in Health and Biomedical Data Science from UCL Institute of Health Informatics. His current PhD research focuses on studying the genetics of heart failure and heart failure subtypes using large-scale genomics, molecular profile, clinical assessments, and electronic-health record data. Outside research, he co-leads the IHI Code Club, a volunteer-led initiative aiming to promote reproducibility and good coding practice across research communities.

Nonie Alexander portrait
Ms Nonie Alexander
Nonie is a PhD student working on clustering in EHR to find hidden subtypes of heterogenous diseases. She has also worked on projects looking into unfair bias in health data.

 Visiting Researchers

Marie Pikoula portrait
Dr Maria Pikoula, Clinical Data Scientist
Maria is a Data Scientist by training. She enjoys working in the multidisciplinary IHI environment with clinicians, epidemiologists, geneticists, and statisticians, applying traditional as well as machine learning methods to tackle research questions using electronic health records.

In 2018, Maria was a warded the Joseph Footit British Lung Foundation Grant for COPD research. Using data and tools from the CALIBER resource, she is currently investigating airway disease subtypes, with the aim to improving the quality and personalising care for those living with COPD, asthma and bronchiectasis.

Maria has now taken a break from full time research to pursue a graduate entry medicine course.

Ghazaleh Fatemifar portrait
 Dr Ghazaleh Fatemifar, Senior Research Fellow 
Ghazaleh is a genetic epidemiologist, working at the Institute of Health Informatics. She holds an American Heart Association Research Fellowship in Health Data Science. Her fellowship is focused on using machine learning algorithms to identify and validate clinically meaningful subtypes of heart failure using linked EHR and genetics data in the UK Biobank.  


Research administrator

Cecile Bremont portrait
 Cécile Brémont
Cécile is a Research Administrator on secondment at the Institute of Informatics. She previously was the administrator at the Centre for Critical Heritage studies at the Institute of Archaeology and also worked at the Thomas Coram Research Unit (Institute of Education) supporting the ERC-funded Families and Food in Hard Times study. Prior to joining UCL, Cécile supported FP7-EC funded projects at City, University of London and LSHTM.