The Primary Care Database User Group Meetings are open to all with an interest in electronic health record research, and are held in the Research Department of Primary Care and Population Health, located at the Royal Free Hospital. Directions can be found on our contact page. To sign up to the Primary Care Database User Group mailing list, please register on JISCMail

Primary Care Database User Group Meetings

Using ethnicity data in primary care databases

Monday 7th September, 3.30-5pm

Rohini Mathur (LSHTM): Quality, completeness and usability of ethnicity data in electronic health records

The social determinants of health are widely accepted as being integral to our understanding of why disease and mortality affect different population groups unequally. The concept of ethnicity is a vital tool with which to explore these differences, as it can provide valuable information about shared exposures for individuals with similar geographic origin, culture, language, beliefs about and access to health services. This presentation will discuss the social construction of the variable ethnicity and the ways in which it is recorded in the Clinical Practice Research Datalink (CPRD) and the Hospital Episode Statistics (HES). The completeness of ethnicity recording in each database, concordance between the two, and comparisons with the census will be discussed. Examples of how ethnicity data are used to inform best practice in the NHS and a brief overview of my research into ethnicity and diabetic outcomes will be presented at the end.

Tra Pham (UCL): Ethnicity recording in primary care: multiple imputation of missing data in ethnicity recording using The Health Improvement Network (THIN) database

Ethnicity is an important factor to be considered in many epidemiological studies because of its association with inequality in disease prevalence and the utilisation of healthcare. Ethnicity recording has been incorporated in primary care electronic health records, and therefore is available in a number of large UK primary care databases such as The Health Improvement Network (THIN). However, since primary care data are routinely collected to serve clinical purposes, a large amount of data that are relevant for research purposes including ethnicity is often missing. A popular approach is to use multiple imputation, but the standard multiple imputation does not give plausible estimates of the ethnicity distribution in THIN compared to the general UK population. However, census data can be utilised to form weights to use in multiple imputation such that the correct ethnicity distribution is recovered. I will describe how the method of weighted multiple imputation of missing data is implemented using Stata's mi impute suite, note some issues, and introduce a new procedure to implement the method for multiple incomplete variables which require different imputation weights. Finally, I will give an example showing how the method works when ethnicity is used as an explanatory variable in a cohort study.

Previous seminars

1st July 2015

Will Dixon from the University of Manchester gave a presentation on challenging the assumptions of data preparation and risk attribution in pharmacoepidemiology.

19th May 2015

Krishnan Bhaskaran from LSHTM presented work using primary care data from the Clinical Practice Research Datalink to study the association between body mass index and a wide range of cancers (Bhaskaran et al, Lancet 2014).

16th October 2014

Ruth Blackburn presented work using THIN to explore cardiovascular risk screening and statin prescribing to individuals with severe mental illness.

17th September 2014

Yonas Weldeselassie from the Open University gave a presentation on the self controlled case series (SCCS) method with smooth risk functions. More information on SCCS methodology.

1st July 2014

In a talk titled "Doctor, doctor, how can I be sure this medication is safe?", Irene Petersen discussed how electronic health records can and can't be used to examine the safety and effectiveness of prescribed medicine, in the light of the recent debate about the safety of commonly-used medication such as statins, hypnotics, and antidepressants.

26th March 2014

Liz Sampson and Rebecca Lodwick presented a study on: "Health outcomes and health service use of cohabitees living with terminally ill patients with cancer, chronic obstructive airways disease, and dementia", and Michael King and Louise Marston presented a study on: "Mortality and Medical Care after Bereavement: A General Practice Cohort Study".

11th December 2013

Gillian Hall, who has been involved in primary care database research from the very beginning, talked about the history of these databases, as well as guidelines and good practice when using them.

30th October 2013

Hedvig Nordeng from the University of Oslo talked about pharmacoepidemiological studies on medication use and safety during pregnancy.

22nd May 2013

Daniel Prieto-Alhambra gave a short introduction to the Catalan SIDIAP primary care database.

9th April 2013

Anoop Shah presented his Freetext Matching Algorithm (FMA), a program which can convert free text entered by clinicians into relevant Read codes. The program, and the paper describing the program, are available free and open access on BioMed Central.

5th February 2013

Our first PDUG meeting of 2013 featured a short presentation by Myriam Alexander on the exploration of multiple measurements of cardiovascular risk factors in THIN. This was followed by Laura Shallcross who presented some results of her PhD project on skin infections in primary care.

4th December 2012

Katie Harron from the UCL Institute of Child Health gave a talk on data linkage and what can go wrong when linking datasets.  

16th October 2012

The meeting was about the development of reporting guidelines for electronic health records (RECORD). More information is available from the RECORD website:

19th September 2012

We started the new academic year with a very special user group meeting featuring Tarek Hammad, deputy director for the Division of epidemiology at the US Food and Drug Agency (FDA). He talked about the role of epidemiology in drug safety from a regulatory perspective.

9th July 2012

Jordana Peake, from the UCL Institute for Women's Health presented on the use of patient identifiable information for data linkage for research into neural tube defects in ethnic communities. And Jenny Woodman, from the UCL Institute for Child Health, told us about the role of the GP for children with (possible) abuse and neglect: a mixed methods study.

14th June 2012

This usergroup meeting focussed on time series analyses. Lisa Szatkowski and Tessa Langley from the UK Centre for Tobacco Control Studies and University of Nottingham presented their work on evaluation of tobacco control policies: Evaluating tobacco control policies using time series analysis: examples and reflections. Some of their work is detailed in a recent paper using structural vector autoregression analysis.

28th March 2012

We had three speakers, who all have substantial experience with analysing HES data. Pia Hardelid (ICH, UCL), Nick Freemantle (PCPH, UCL) and Ruth Gilbert (ICH, UCL). Pia provided us with an overview of HES data and how it is organised, Nick told us about a study he was involved in that used HES data to assess whether weekend hospitalisation is associated with an increased risk of death, and finally, Ruth shared her expertise on validating codes in HES. 

7th November 2011

Ruth Brauer from the London School of Hygiene and Tropical Medicine (LSHTM) told us all about her work on antipsychotic agents and myocardial infarction. She compared the results from a case control study to a self-controlled case series design.

28th September 2011

Cathy Welch told us about a two stage method to identify outliers in electronic databases. The slides from this presentation can be found here, and the paper discussed is in press, but will be added to our publications list once it is available.

18th July 2011

Gareth James, a member of our very own THIN team, gave a talk about prevalence and patterns of long term prophylactic antibiotic use in COPD, and how to use multiple imputation to account for missing data in antibiotic prescribing. 

28th March 2011

Beatrix De La Iglesia from the University of East Anglia talked about the work for their recent publication in Heart: "Performance of the ASSIGN cardiovascular disease risk score on a UK cohort of patients from general practice".

Page last modified on 09 apr 15 15:53 by Rebecca K Lodwick