Mapping the human genome 2
More about linkage
Human linkage maps
The size of the human genome is big enough that it took a long while to discover any evidence of genetic linkage. It was not until the 1950s that the first autosomal linkage groups were discovered. These all involved the polymorphic blood groups. One of the first was between one form of hereditary elliptocytosis (an anaemia caused by malformed erythrocytes) and the rhesus blood group. This study was important in showing that linkage information could prove the existence of more than one form of the same disease, it showed that there were some families in which the disease elliptocytosis was clearly linked to rhesus but that there were others in which it was not. This implies the existence of at least two genes which when mutant could cause the same disease (genetic heterogeneity).
Relationship to the Human Genome Project
It was not until the discovery of extensive inherited variation in DNA sequence which could be traced in families using Southern blotting or by PCR that human linkage studies really took off.
- It became possible to construct detailed genetic maps of the human genome which have provided a crucial skeleton of genetic landmarks onto which the human genome project is fitting DNA clones which will be sequenced.
- The detailed genetic map has made it now a matter almost of routine to position a genetic disease gene onto the genome map. This is the first stage in positional cloning, the normal route these days to identifying a disease gene.
Types of genetic 'marker'
Any genetic variation can be studied and it is not necessary that the objects being measured in a linkage study are genes. Any polymorphic piece of DNA can be studied. Examples include:
The use of Lod scores
What makes a cross informative?
- There must be one (best) or two (not quite so good) parents who is (are) doubly heterozygous for the two loci.
The first thing to say here is that I do not expect you to be able to calculate a Lod score. However, you must have a general understanding of what the lod score is because it is an important and often quoted statistic.
Suppose that we wish to decide whether two genes / markers are linked.
- We examine a number of families which are informative for the loci in question.
- For each family we must find the genotype of each parent and of each of the offspring.
- We must try to deduce whether each child has the parental arrangement of alleles or is a recombinant. This is greatly helped if we know how the alleles at the loci which we are considering entered the parents i.e. do we know the phase.
- Now for the tricky part, we must decide whether it is more likely that we have observed the particular progeny in that family because the genes are linked and are tending to be inherited together than that we have observed those progeny through the chance (Mendelian) independent assortment of alleles at the two genes. For this we use a statistic known as the logarithm of the odds or Lod score.
- The Lod score,
- We can calculate this relative likelihood under the assumption that the two genes are tightly linked with no recombination occurring between them at all, i.e. with the recombination fraction = 0. Or we can calculate the relative likelihood for any other value of that we care to choose. The value of for which Z reaches a maximum positive value is then our best estimate of the distance between the two loci. The value of Zmax is a measure of our confidence in the result. If Zmax > 3 then we conventionally take the linkage to have been proved. Conversely, if Z < -2 we take the linkage to have been disproved for that value of .
- The data are often displayed in the form of a table of Z calculated for a range of theta from 0 to 0.5. e.g.
Notice in this example that Z has sunk to minus infinity at = 0. This will be the case if there has been at least one recombination event between the genes, they cannot then be zero distance apart. It is often helpful to display the data graphically as shown in the red curve on the right. The maximum value of Z occurs at a recombination fraction of 0.1 and it is greater than 3. Therefore we can feel confident that the two genes are indeed linked and our best estimate for the genetic distance between them is = 0.1 (equivalent to 10cM). Compare this with the blue curve plotted for two unlinked genes. In this case Z is always negative and the best estimate of is 0.5 i.e. unlinked.
The map of the human genome
Many thousands of genetic markers have now been mapped so that there is a minimum of about one marker per centiMorgan and in some regions of the genome there are as many as ten markers per centiMorgan. On the left are two maps covering the same small region of chromosome 9 (a region in which I have a particular interest because it includes the gene TSC1 which can mutate to cause the disease "tuberous sclerosis"). The map on the right is a genetic map made by considering the inheritance of many markers (with names such as D9S66) in about 60 large families. The map on the left is not a genetic map, it is a map based on a large number of overlapping pieces of cloned DNA (currently being sequenced) and so it is an accurate reconstruction of the actual DNA sequence of that region of chromosome 9. As is to be hoped, the genetic map gives the same order of markers and approximately the same relative distances between them. There are some differences, The gene ABO is not resolved from marker locus D9S150 on the genetic map whereas, in physical reality there is a good sized gap. Differences like this occur because the rate of genetic recombination is not absolutely even throughout the genome, some areas are hot spots and others are areas of reduced recombination. This is reflected in the genetic distances between pairs of markers. The total genetic map length of the human genome is about 3,000 cM and by a lucky coincidence, the total genome length is about 3,000 million basepairs. So on average, 1 cM is equivalent to 1 Mb (Mb = million basepairs). In this region of chromosome 9 there is an elevated recombination rate compared to the genome average and so here 1 cM = ~ 300 kb.
Diagnosis of genetic disease through linkage analysis
The identification of linkage to a marker locus is very often the first step on the way to cloning a disease gene (see the next section). However, it also immediately provides diagnostic opportunities even before the disease gene itself has been identified and often when absolutely nothing is known about the nature of the underlying genetic defect. In this pedigree (which you will recognise as being the phase known pedigree above with added children) an autosomal dominant mutation is present. The disease gene has not been identified but it is known to be closely linked to DNA marker polymorphisms Aa and Bb and to map between them. Clearly, in this family the mutant disease gene is present on the chromosome which happens to have alleles A and B on it. [Remember that Aa and Bb have nothing whatsoever to do with the disease, they are simply two bits of the genome which are polymorphic and which are within about a million basepairs of the disease gene.] The unborn baby can be tested to see which alleles it has inherited from its father at the two marker loci. If it has inherited A and B then it almost certainly has also inherited the disease mutation.
The identification of 'positional candidate' genes
The identification of a disease gene proceeds by about 5 steps.
As well as the TSC1 success story above there have been many other examples of successful identification of disease genes based solely, or almost solely, on positional information. The classic example is the cystic fibrosis gene CFTR. This is discussed at more length in a subsequent lecture
- First families are "collected" and the clinical investigation is repeated to ensure that there are no misdiagnoses of unaffected individuals, DNA samples are prepared (usually from a 10ml blood specimen).
- When there are enough families to give a chance of a significant positive Lod score, the DNA is amplified by PCR from about 200 polymorphic marker loci spread throughout the genome. That should ensure that one locus is within about 7.5 cM of the disease gene.
- When that initial linkage is found, many more loci from the same region are also tested in the families and recombinants are used to define the genetic interval within which the disease gene must lie. In the TSC1 example above, the gene was initially defined as being between D9S149 and D9S114. Later this interval was reduced to between D9S2127 and D9S150 because individuals were found who had inherited only part of the chromosome due to genetic recombination in the formation of the gamete from their affected (and heterozygous because TSC1 is a dominant mutation) parent.
- Most areas of the human genome have now been cloned. Clones covering the minimum area are identified and used to find all the genes in the area.
- Mutations in these genes are sought in the affected patients. Hopefully this should lead to that "Eureka" moment.
- As more and more of the genome is sequenced already, several of these steps may be bypassed. It may be possible to jump directly from the definition of the genetic interval within which the gene must lie to the mutation screen.
The topics include:
- Genetic Linkage
- Genetic maps
- Measurement in human studies
- lod scores
- 'positional candidate' genes
- relationship to the human genome project
- Mange and Mange Chapter 9 and you could re-read parts of chapter 13.
- Lewis Chapter 5 (pp94 - 99) gives information about linkage. It's one area that this otherwise excellent book is rather weak on. Chapter 21 for the Human genome project.
- Mueller and Young Chapter 7 (pp99 - 101) (It's here but rather condensed)
- Thompson McInnes and Willard Chapter 8 The core material is on pp178 - 186 but the whole chapter is valuable background reading.
- Connor and Ferguson-Smith Chapter 9 The core material is on pp82 - 86 but the whole chapter is valuable background reading. Also read chapter 19, p200, figures 19.3, 19.4 and 19.6 but these might be difficult to understand unless you were keeping well up with the lecture.
Back to the top
Back to the introduction