New review on statistical and machine learning tools to advance equity in genomic research
15 May 2025
Review led by Dr Brieuc Lehmann and PhD student Leandra Brauninger (both UCL Statistical Science) explores how the analytical methods used to process and interpret genomic data play a critical role in promoting health equity.

The project, which was funded by Genomics England, also highlights how current statistical and machine learning tools can perpetuate or help correct biases, and calls for more equitable methods to ensure genomic research benefits all populations fairly. The findings of the review were published in Nature Reviews Genetics today.
Despite repeated calls to improve the representativeness of genetic datasets, the proportion of genome-wide association studies (GWAS) conducted in individuals of European genetic ancestries has been increasing. The lack of diversity in genomic data is compounded by factors related to the sociopolitical system in which genomic research takes place, including the under-representation of genomic scientists from diverse backgrounds and concerns from historically underrepresented groups over data privacy and misuse.
Combined, these factors can limit the utility of genetic insights in achieving health equity, defined by the World Health Organization as “the absence of unfair, avoidable or remediable differences among groups of people”, particularly as the use of genetic data for clinical decision-making expands in healthcare systems.
The review explores how bias can enter each step involved in genomic data analysis, from research design and data acquisition, to data preparation, model development and evaluation. The growing appreciation of the impact of existing biases has seen the development of new statistical techniques to understand, quantify and correct for imperfect data and models.
For example, given the current lack of diversity in genomic datasets, methods to boost power for statistical inference or prediction in under-represented groups can provide large benefits in terms of equity. The paper highlights three strategies to boost power and specific methodological techniques: including more individuals, including more traits, and leveraging non-genetic data. The paper also explores how statistical methods can reduce bias, assess genetic variation, and identify disparities in existing analysis pipelines.
Methods development in genomics and genomic medicine is increasingly directed at addressing equity, but many challenges remain. The review explores further issues related to categorisation, genomic references, data sharing and understanding the role of social and environmental effects.
Lead author Dr Brieuc Lehmann commented: “The role of statistical and machine learning methods in advancing equity in genomics research is often underappreciated. This review – the culmination of a multi-year, multidisciplinary collaboration – aims to help genomic researchers recognise and address potential biases introduced by methodological choices, while also fostering innovation in analytic tools that promote equity.”
Links
- Review published in Nature Reviews Genetics
- Blog post in Data Science for Health Equity
- Genomics England
- Dr Brieuc Lehmann’s academic profile
- Leandra Brauninger’s academic website
- UCL Statistical Science
Image
- Credit: cienpies/iStock
Media contact
Ingrida Bertasiute
i.bertasiute [at] ucl.ac.uk