These are the last three lectures of the Biology B200 course, given by Dr Jonathan Wolfe.
Chapters 3, 4, 5 and 15 of Human genetics: the molecular revolution by Edwin McConkey, pub. Jones and Bartlett, 1993 are recommended reading.
An excellent (though slightly out of date) web site to visit for relevant information is The US Department of Energy's Primer of Molecular Genetics. (The Department of Energy is a big mover in the Genome World.)
Molecular Biology and the Human Genome Project
Introduction
Cloning
cDNA clones
Genomic clones
Mapping
Introduction
Top Down
Bottom Up
Physical maps
Somatic cell hybrids
Cytogenetic mapping
Large insert clones (YACs and BACs) and Sequence Tagged Sites (STSs)
Genetic maps
Family studies
Radiation hybrid mapping
Sequencing
The last thirty years have seen the study of the human genome grow from a minority academic interest into a mega-industry. During this time it has ceased to be solely an academic province but has become the object of study of the major pharmaceutical companies, for example, in the US and UK SmithKline Beecham are very active in the field of human genome research. This is a link to their publicity. In addition research has moved out of small laboratories and into a few heavily funded genome factories (for want of a better word). The principal UK effort is at the Sanger Centre in Cambridgeshire.
At the same time, but particularly in the past 5 years, the tools of the professional genomist have become very much more automated. However, they are still based around the techniques which we have studied in B200, DNA-DNA hybridisation, the Polymerase Chain Reaction and the dideoxy chain termination method of DNA sequencing. The cloned pieces of DNA used would be familiar too to those who began DNA cloning in the late 1970's and early 1980's and about which you have learned earlier in this course. However, some large insert clones are relatively novel.
Cytogenetics still plays a role in genome mapping and has been revolutionised by the use of DNA-DNA hybridisation with fluorescent probes (or rather with fluorescent antibodies to chemical tags on DNA hybridisation probes).
In these lectures we will concentrate on the application of molecular biological techniques to the enormous problems imposed by the scale of the various genome projects. The human genome is big. It consists of approximately 3 x 109 bp of DNA. So far the longest region of continuous completed sequence is no more than about 106bp. Despite this, we are still on schedule for completion in about 2005.
Sequencing a big genome doesn't just happen. It has to be carefully planned. The human genome project has proceded in three stages, first the construction of detailed maps of what I will call 'landmarks'. Next the complete cloning of the genome in well characterised clones which include all the landmarks and finally, the sequencing and annotation. Of all these the mapping has taken longest, the sequencing will take surprisingly little time but the annotation of the sequence is still to happen and is likely to take much longer than anything else.
Before anything else can happen, material for study must be sought. This has always involved DNA cloning. We begin therefore with a discussion of the types of clones which may be involved.
back to the table of contentsAll clones come from libraries. Essentially these may be only one or other of cDNA or genomic and it is vital to remember the distinction between them.
Remember that cDNA means complementary DNA. It has been copied directly from mRNA. Because that RMA will have been processed in its journey from gene to cytoplasm (to test tube) it will not contain introns. Nor will it contain any sequences from upstream or downstream of the genes. Every cDNA library is made from a tissue source. It will only contain a representation of the sequences transcribed in that tissue. e.g. do not expect to isolate beta globin cDNA from a skin fibroblast cDNA library. Even though cDNA is made from mRNA it still contains repetitive sequences, 5%-10% of human transcripts contain an interspersed repetitive element such as an Alu repeat in the 3ŽUTR.
Most libraries contain cDNA clones in the same relative abundance as were their corresponding mRNAs in the tissue of origin. e.g. liver cDNA libraries are a rich source of serum albumin cDNA clones. Some libraries have been normalised, i.e. an attempt has been made to equalise the frequencies with which all types of cDNA are found within the library (if they were to be found at all in the mRNA source).
cDNA clones may be made for various purposes
From the point of view of mapping and sequencing the genome, only the latter two classes of clone are relevant.
Genomic clones are designed to include as much genomic DNA as possible in order to minimise the number of clones required to be isolated. Over the years vector systems have evolved. The first generation of genomic libraries were built in vectors based on lambda, later libraries used plasmid-phage hybrid vectors such as cosmids. Recently yeast artificial chromosomes (YACs) have been popular but are now gradually being replaced by bacterial systems based on either the phage P1 or the F element origins of replication (PACs and BACs).
| Vector | Maximum Insert size | Approx. No. of clones required in library | Advantages? | Disadvantages? |
| lambda | 20 kb | 5 x 105 | easy to construct libraries, relatively stable inserts |
many clones required hard to prepare DNA from clones |
| cosmid | 45 kb | 2 x 105 | easy to construct libraries easy to prepare DNA from clones |
not always stable |
| YAC | 1 Mb | 104 | few clones required | very prone to rearrangement, difficult to construct |
| BAC | > 500 kb | 5 x 104 | few clones required very stable |
single copy origin of replication therefore harder to prepare DNA |
Another innovation has been the use of gridded and chromosome specific libraries. In a gridded library every clone has its own unique address where it is to be found in a well in a microtitre tray. This has huge advantages over ungridded, amplified libraries for our ability to exchange information about clones. Chromosome specific libraries have been made by flow sorting individual metaphase chromosomes using a machine originally designed to sort different populations of cells.
To map the human genome has required the combined efforts of very many laboratories and the use of many techniques.
Broadly maps can be divided into two kinds, those based on direct physical evidence and those based on inference from patterns of inheritance, i.e. physical and genetic maps. A more useful distinction is perhaps into the classes of top down or bottom up maps.
Physical maps can be constructed from the top down or from the bottom up, genetic maps are of the top down variety.
Both approaches can be combined, landmarks developed as part of a top down map can be used as seeds for development of a bottom up map.
If a human cell in culture is fused to a rodent cell, a hybrid cell line can be formed containing chromosomes from both species. In these circumstances, for reasons which we do not know, the human chromosomes tend to be lost at random until a stable karyotype is formed containing a small number of human chromosomes in a more or less intact rodent background. A panel of different hybrid cell lines can be constructed each with a different complement of human chromosomes. By asking in which hybrid lines is a DNA locus present, (by PCR or by Southern blotting) and by correlating this with the karyotypes of the hybrid lines it is possible to deduce mapping information. This is relatively large scale information (What galaxy am I in?) but by the use of hybrids containing translocation or deletion chromosomes it can be made more specific. This technique is more useful than you might think because, since the 1960s when somatic cell genetics was introduced, an enormous variety of cell lines containing an enormous number of rearranged chromosomes have been produced and frozen away.
By this is meant directly looking at chromosomes usually by the technique of Fluorescence In Situ Hybridisation (FISH)
. See for example the interesting web site from the University of Wisconsin cytogenetics lab with lots of examples.
A probe is made from a genomic (or sometimes even a cDNA) clone by using the Nick Translation reaction to incorporate a biotinylated nucleotide into DNA. Metaphase chromosome spreads are made by conventional techniques, the spread chromosomes are treated to denature the DNA of the chromosomes and the probe is allowed to hybridise to the chromosomes. Later, the site(s) of hybridisation is found by using streptavidin and antibodies conjugated to a fluorescent dye. In this way the site of origin of any clone can be found in the genome. This is equivalent, perhaps, to seeing the position of a large city on the Earth's globe in the top down map analogy above.
Here is one example from the Wisconsin site
which shows a cosmid hybridising specifically to a site on chromosome 13. Remember that the chromosomes are replicated ready to divide and that each therefore consists of two chromatids. The probe has hybridised to all four chromatids.
The technique also has many diagnostic uses (which is what is actually shown in the Wisconsin site above).
We are not restricted to using normal chromosomes. Sometimes it is useful to use chromosomes from a cell line or a patient with a rearrangement such as a translocation or deletion. In this way the clones can be mapped with respect to the cytogentic breakpoint(s) involved and vice versa. See for example the Elastin Williams Syndrome probe on the Wisconsin page (copied here). See the Elastin page at On Line Mendelian Inheritance in Man (OMIM) if you want to know more about the syndrome.
FISH is not restricted to intact chromosomes. It also works well on interphase nuclei, where it is used to visualise the order of closely spaced (from 100 - 500 kb apart) probes on a chromosome and to measure their distances appart.
Most excitingly, FISH can be used on stretched extended DNA fibres where it has a resolution of less than 5 kb.
This is a bottom up technique. Thousands of short fragments of DNA of various types have been sequenced to form Sequence Tagged Sites.

All VNTRs have multiple alleles which is a good thing for genetic mapping, since it makes it more likely that you will be able to distinguish between both alleles in any person whom you wish to study.
Historically, genetic maps based on the use of inherited variation in DNA markers, in enzyme electrophoresic variation and in blood groups etc. have been very important in setting up the framework maps on which have been hung all the clones which have contributed to the physical maps. Now that this job has been completed, genetic maps still have a role. They are the only way in which we can position genetic variation which has a phenotype but for which we have no molecular data. This is still true for many genetic diseases. As an example I cannot do better than refer you to the recent cloning (in which I had a hand) of the gene TSC1 which is responsible for the genetic disease Tuberous sclerosis. Before this gene was cloned we had almost no idea of its role in the cell. Nonetheless, the gene was identified solely on the basis of its position in the genome.
I have included radiation hybrid mapping in a section of its own, it is a physical technique but the mathematics are analogous to those used for genetic mapping.
In this technique a panel of somatic cell hybrids is made exactly as described above. Except, before the human cell line is fused to the rodent cell line it is irradiated with a dose of X-rays sufficient to fragment the human chromosomes. Each colony that grows out will contain random fragments of human DNA. The closer together two loci are in the human genome, the more likely they are to be included in the genomes of the same hybrids by being included on the same DNA fragment. The distance between loci is measured statistically by mathematical methods which bear close resemblance to the maths of genetic mapping.
Last October a paper which mapped 16,000 genes (actually ESTs) with respect to 1,000 microsatellite gentic markers was published in Science. It is well worth reading. Schuler et al. 1996
The least interesting part of the whole business. As yet, all the big sequencing centres are operating on massively automated dideoxy sequencing reactions. "Nuff said."