Understanding genetic ancestry testing
Genetic ancestry testing is the use of DNA information to make inferences about someone’s "deep" ancestry, hundreds or thousands of years into the past. Genetic genealogy on the other hand combines DNA testing with genealogical and historical records, and typically makes use of large databases to identify matches, or direct comparisons to test for expected matches. There is some overlap between the two, but genetic genealogy is generally more reliable because of its use of additional information: the information about your ancestry available from DNA alone is limited, as we try to explain here.
There are three main types of genetic ancestry test:
A Y-chromosome DNA (Y-DNA) test provides information about your male line ancestry only, which in most cultures corresponds with the inheritance of surnames. Only males carry a Y-chromosome, but a female can learn about her father line, for example, through her father or brother. Among the tests currently available there is much variety in the amount of information provided. The markers tested are of two types: STRs (short tandem repeats) and SNPs (single nucleotide polymorphisms). These markers have different mutation rates and so give information at different time depths. The information you receive depends on which and how many markers of each type are tested.
SNP testing is used for deep ancestry purposes to provide information about your haplogroup which tells you which branch of the Y-DNA tree you belong to. Y-STR tests are used for genetic genealogy purposes within surname projects to test hypotheses about patrilineal relationships and to investigate questions about surname origins. These tests also provide very limited information about your deep ancestry by giving you a predicted haplogroup assignment.
If two people have the same Y-DNA haplogroup, it means that they will usually share a common patrilineal ancestor more recently than two people from different haplogroups, but that common ancestor may still have been a long time ago. That time can be estimated, but such estimates are not precise with current standard tests, although comprehensive sequencing of the Y-chromosome is becoming available and will give improved precision.
The haplogroup information is often accompanied by a story about the origin of your ancestors, including a map of the world with arrows indicating ancestral migrations. Hundreds of thousands of men from around the world have now had their Y-DNA tested, and we have a very good idea of the distribution of the different haplogroups in the present-day population. It is, however, difficult to be confident about where these haplogroups originated and how they spread; many different histories could explain their current distribution. Sometimes a company will associate a haplogroup with, for example, Viking, Norman or Saxon ancestry, but such associations are at best speculative and should be treated with caution. Just as today most haplogroups are shared among many populations, so would it have been for past populations. Furthermore, those past populations would have been genetically diverse, and different from the modern populations in their regions of origin.
The father line is just one lineage in your family tree, and as you go further back in time it represents a rapidly diminishing proportion of your total ancestry. For example, you have 64 great-great-great-great grandparents, and a man shares his Y-chromosome with just one of these 64 ancestors.
A mitochondrial DNA test provides information about your female line ancestry only. Mitochondrial DNA is passed on by a mother to her male and female children but only females can pass their mtDNA on to the next generation (males are dead ends for mtDNA). This test, like the Y-DNA test, provides information about one specific lineage – your mother, your mother’s mother, your mother’s mother’s mother, and so on back in time. Again the amount of information provided varies among tests, but the mtDNA sequence is short (just 16,569 DNA "letters") and so sequencing the whole mtDNA genome is already not very expensive.
An mtDNA test can be used for genealogical purposes to test a hypothesis about recent female line ancestry (perhaps arising from genealogical research) or to look for matches in a genetic genealogy database. The mtDNA test also provides a haplogroup assignment which may, like the Y-DNA haplogroup, be accompanied by a story and perhaps a "migration" map. We know a lot about the present-day distribution of the mtDNA haplogroups, but it is again much more difficult to make inferences about the more distant past. The mtDNA mutation rate is relatively high, although there is considerable uncertainty about the precise rate. The probability of a mutation occurring in the whole mtDNA genome in one generation (ie, transmission from mother to child) is estimated at between 1% and 3%. Therefore the time gap between mutations in an mtDNA sequence can be 100 generations or more, and so common mtDNA ancestors cannot be dated accurately even with full mtDNA genome data: if you share a full mtDNA sequence with someone, your common matrilineal ancestor could be 1 or 50 generations ago. For example, it is common for participants in genetic genealogy databases to have exact full sequence matches with people with ancestry from a number of different countries.
As with the Y-chromosome, as you go further back in time your mtDNA represents a rapidly diminishing proportion of your total ancestry.
An autosomal DNA test provides information from the great majority of your DNA (the autosomes are the chromosomes other than the X, Y and mtDNA, and contain most of your DNA sequences, and genes). Although full genome sequencing is not far away, it remains unaffordable for most and autosomal DNA tests usually examine up to around 1 million genetic markers (SNPs) spread across the genome (1 million may sound a lot but there are over 3 billion DNA letters in the human genome, so it's still a small fraction but the most informative sites are chosen). The markers give information about all your ancestors in recent generations, but once you go beyond about 10 generations back into the past (roughly 300 years) only a small fraction of your ancestors have contributed directly to your DNA: so even if William Shakespeare were your ancestor (born ~450 years ago), you almost certainly inherited no DNA from him. This can be a bit confusing: you did inherit almost all your DNA from ancestors alive at that time, but there are very many of them (perhaps 10 thousand or more), and you only actually inherited your DNA from a few hundred of them - a small fraction. The others are "pedigree ancestors" but not "DNA ancestors": you could have inherited DNA from them, but did not because of the randomness in the 50% transmission of DNA from parent to child.
The uniparental Y and mtDNA are exceptions: you inherited them from all your patrilineal and matrilineal ancestors respectively (the former only if you are male), and so in a sense they can provide a link with very remote ancestors. But they represent only a small fraction of your ancestry, and allow only limited inferences about time depth.
Autosomal DNA tests can be used to identify individuals with whom you share one or more common ancestors up to a handful of generations in the past. This is done by looking for large chunks of DNA that you both share, indicating recent shared inheritance. Sometimes it happens that a large chunk of DNA is conserved in two individuals from a common ancestor more than 10 generations in the past, but this is rare: the great majority of common ancestors at that time depth will not be identified from the DNA of their descendants today. Although sharing one or more large chunks of DNA makes it almost certain that the two of you had at least one recent common ancestor, dating the ancestor(s) is imprecise, particularly beyond about 4 generations ago. Also the tests have no ability to distinguish certain relationships: for example, using DNA alone the half-sibling relationship cannot be distinguished from the grandparent-grandchild relationship, and in the latter case we can't tell from the DNA which is the grandparent and which is the grandchild. Algorithms that predict specific relationships are rarely precise beyond 1st degree, but they can identify more distant relationships approximately, with good accuracy out to about 2nd cousin, and the precise relationship may then be confirmed using additional information.
Autosomal tests also provide information about an individual's "ethnicity" by identifying sections of the DNA that best match reference databases of modern populations with geographical or ethnic labels. Ethnicity tests are better called biogeographical ancestry tests or admixture tests (your "ethnicity" is a social category that may not accurately reflect your ancestry). However, the reference populations used for comparison purposes are limited, the ethnic labels applied to them may be questionable, and they were collected in different ways for different purposes: they rarely represent true random samples from a population (e.g. because the "population" itself may not be precisely defined: populations usually overlap and blend with other populations). Distinguishing between populations within continents is often poor with the current resolution of markers and databases. Human genetic variation usually varies smoothly with geographical distance: as you travel from Dakar to Vladivostok you can observe continual change in gene variant frequencies; there is a big genetic difference between start and end cities, but there are no sharp genetic boundaries along the way.
Ethnic/geographical assignments have some validity at a large scale. For example in Latin Americans it is usually possible to distinguish with confidence sections of an individual's genome that are of sub-Saharan African, European and Native American origin. However, testing companies will often assign national labels to genetic clusters, whereas gene variant frequencies tend to change smoothly across borders. Thus, French people may be assigned a large percentage of "British" ancestry. Normandy and Kent are genetically similar, as you would expect from history and geography, so it is not easy to distinguish English from French based on DNA alone. Given high quality genomic databases it would be possible to assign an individual to a region of origin with a reasonable degree of accuracy (human provenancing), but this is beyond what genetic testing companies currently have available both in terms of having enough genetic markers in large and well-annotated databases.
As a result of the random inheritance of DNA, close relatives can often be assigned markedly different ethnicity percentages. This may be correct. For example if you have three grandparents from Africa and one from Asia, you and your brother/sister may receive very different proportions of Asian DNA even though you share the same parents. However such differences may also reflect inadequacies in the databases used, or the methods of inference applied.
It is also common to find that people get very different percentages from different testing companies. This is partly because each company uses different databases and the individuals within them are categorised in different ways: there is no "correct" way to categorise human beings. Each company also uses its own algorithms to make the estimates, and the target time depth varies from company to company but is often not explicitly stated. The estimates will also change over time as additional reference populations are added and as the algorithms are adjusted or improved.
The Sense About Science guide Sense about genetic ancestry testing highlights the limitations of genetic ancestry testing.
Debbie Kennett's blog post for Sense About Science Sense about genealogical DNA testing provides an overview of the legitimate uses of DNA testing for genetic genealogy purposes.
A list of related articles can be found in the International Society of Genetic Genealogy (ISOGG) website. We particularly recommend the following:
- The Guardian Notes and Theories blog by Mark Thomas.
- "Selling Roots" by Elliot Aguilar in The New Enquiry.
Royal et al. (2010) "white paper" published by the American Society of Human Genetics Ancestry and Ancestry Testing Task Force. These authors say for example that
- "... moving from [an] inference of common ancestry to the conclusion that the match implies something about the biogeographical ancestry of both individuals can be problematic."
- "... any quantitative claims about ancestry should have an easily interpreted assessment of confidence or accuracy associated with them"
- "... whenever formal inferences about population history have been attempted with uniparental systems, the statistical power is generally low. Claims of connections, therefore, between specific uniparental lineages and historical figures or historical migrations of peoples are merely speculative".
- American Society of Human Genetics Ancestry Testing Statement, 2008
- Bandelt et al. BioEssays (2008).
Bolnick et al. Science (2007). These authors say:
- "... when an allele or haplotype is most common in one population, companies often assume it to be diagnostic of that population. This can be problematic ..."
- "Many genetic ancestry tests also claim to tell consumers where their ancestral lineage originated and the social group to which their ancestors belonged. However, ..."
- "the tests ... promote the popular [mis]understanding that race is rooted in one’s DNA"
- "market pressures can lead to conflicts of interest".
"Beware the gene genies" by Martin Richards in The Guardian 21/2/03.
- "Lavish but questionable promises have been made to those who want to trace their genetic ancestry".
The following lectures provide a useful resource as an introduction to DNA ancestry testing.
Ancestry testing using DNA: The pros and cons. Public lecture by Prof Mark G. Thomas at WDYTYA Live at Birmingham's NEC in April 2015. Also available on youtube
DNA for Beginners. Public lecture by Debbie Kennett at Genetic Genealogy Ireland 2014 at Dublin's RDS in October 2014. Also available on youtube