Congratulations to Dr Sam Morris
1 March 2022
Sam Morris was awarded his PhD (as part of the BBSRC LiDO DTP) for his thesis on "Harnessing haplotype sharing information from low coverage sequencing and sparsely genotyped data".
The celebrations begin...
Many thanks to his examiners: Toomas Kivisild from the Department of Human Genetics in KU Leuven and Hernan Burbano from GEE CLOE at UCL. Sam is currently a research assistant in Oxford University, working as an analyst on the China Kadoorie Biobank project. Well done, Dr. Morris, and hope to see you around a lot in the future!
Thesis abstract: The proliferation of DNA from ancient remains is revolutionising the understanding of past population structure and demography. However, these data are often sparse (<1x coverage), making it challenging to extract reliable haplotype information from them, i.e. to model associations among linked SNPs. While this may not be essential when inferring population sub-structure involving genetically diverged groups, larger cohorts of samples from geographically proximate regions are emerging. In such cases, subtle genetic differences may be captured with haplotype information, demonstrated in analyses of cohorts containing geographically nearby present-day individuals.
Among such haplotype-based methods, Chromopainter has been used extensively in ancient DNA papers that analyse higher coverage (>1x) samples. Lower coverage data can be imputed to provide the dense SNPs that haplotype-based techniques require. However, imputation typically uses modern reference panels, and may be sensitive to effects that obscure fine-scale ancestry signals. Although efforts have been made, there has yet to be a detailed characterisation of the effect of imputation bias on Chromopainter analysis of a range of available aDNA populations for different coverages, and whether more power can be gained with new techniques.
In this thesis I propose modifications to the Chromopainter method to assist extracting haplotype information from low-coverage samples. These include accounting for allele probabilities, retaining SNPs adhering to specific criteria, and upweighting the likelihood contribution of genomic regions containing higher coverage SNPs. I explore these in the context of detecting subtle population structure, including measuring the loss in haplotype information from subsetting data to SNPs that overlap among SNP arrays.
I use these insights to analyse two datasets of newly sequenced ancient samples from Bavaria and Slavic-speaking regions. I also explore the genetic ancestry of ethnic minorities in UK Biobank and the effects of imputation bias in haplotype-based inference when jointly analysing individuals genotyped on different SNP arrays.