Molecular Biology and the Human Genome Project


These are the last three lectures of the Biology B200 course, given by Dr Jonathan Wolfe.

Chapters 3, 4, 5 and 15 of Human genetics: the molecular revolution by Edwin McConkey, pub. Jones and Bartlett, 1993 are recommended reading.

An excellent (though slightly out of date) web site to visit for relevant information is The US Department of Energy's Primer of Molecular Genetics. (The Department of Energy is a big mover in the Genome World.)


Table of Contents

Molecular Biology and the Human Genome Project
    Introduction
    Cloning
        cDNA clones
        Genomic clones
    Mapping
        Introduction
            Top Down
            Bottom Up
        Physical maps
            Somatic cell hybrids
            Cytogenetic mapping
            Large insert clones (YACs and BACs) and Sequence Tagged Sites (STSs)
        Genetic maps
            Family studies
        Radiation hybrid mapping
    Sequencing

Introduction

The last thirty years have seen the study of the human genome grow from a minority academic interest into a mega-industry. During this time it has ceased to be solely an academic province but has become the object of study of the major pharmaceutical companies, for example, in the US and UK SmithKline Beecham are very active in the field of human genome research. This is a link to their publicity. In addition research has moved out of small laboratories and into a few heavily funded genome factories (for want of a better word). The principal UK effort is at the Sanger Centre in Cambridgeshire.

At the same time, but particularly in the past 5 years, the tools of the professional genomist have become very much more automated. However, they are still based around the techniques which we have studied in B200, DNA-DNA hybridisation, the Polymerase Chain Reaction and the dideoxy chain termination method of DNA sequencing. The cloned pieces of DNA used would be familiar too to those who began DNA cloning in the late 1970's and early 1980's and about which you have learned earlier in this course. However, some large insert clones are relatively novel.

Cytogenetics still plays a role in genome mapping and has been revolutionised by the use of DNA-DNA hybridisation with fluorescent probes (or rather with fluorescent antibodies to chemical tags on DNA hybridisation probes).

In these lectures we will concentrate on the application of molecular biological techniques to the enormous problems imposed by the scale of the various genome projects. The human genome is big. It consists of approximately 3 x 109 bp of DNA. So far the longest region of continuous completed sequence is no more than about 106bp. Despite this, we are still on schedule for completion in about 2005.

Sequencing a big genome doesn't just happen. It has to be carefully planned. The human genome project has proceded in three stages, first the construction of detailed maps of what I will call 'landmarks'. Next the complete cloning of the genome in well characterised clones which include all the landmarks and finally, the sequencing and annotation. Of all these the mapping has taken longest, the sequencing will take surprisingly little time but the annotation of the sequence is still to happen and is likely to take much longer than anything else.

Before anything else can happen, material for study must be sought. This has always involved DNA cloning. We begin therefore with a discussion of the types of clones which may be involved.

back to the table of contents

Cloning

All clones come from libraries. Essentially these may be only one or other of cDNA or genomic and it is vital to remember the distinction between them.

cDNA clones

Remember that cDNA means complementary DNA. It has been copied directly from mRNA. Because that RMA will have been processed in its journey from gene to cytoplasm (to test tube) it will not contain introns. Nor will it contain any sequences from upstream or downstream of the genes. Every cDNA library is made from a tissue source. It will only contain a representation of the sequences transcribed in that tissue. e.g. do not expect to isolate beta globin cDNA from a skin fibroblast cDNA library. Even though cDNA is made from mRNA it still contains repetitive sequences, 5%-10% of human transcripts contain an interspersed repetitive element such as an Alu repeat in the 3ŽUTR.

Most libraries contain cDNA clones in the same relative abundance as were their corresponding mRNAs in the tissue of origin. e.g. liver cDNA libraries are a rich source of serum albumin cDNA clones. Some libraries have been normalised, i.e. an attempt has been made to equalise the frequencies with which all types of cDNA are found within the library (if they were to be found at all in the mRNA source).

cDNA clones may be made for various purposes

From the point of view of mapping and sequencing the genome, only the latter two classes of clone are relevant.

back to the table of contents

Genomic clones

Genomic clones are designed to include as much genomic DNA as possible in order to minimise the number of clones required to be isolated. Over the years vector systems have evolved. The first generation of genomic libraries were built in vectors based on lambda, later libraries used plasmid-phage hybrid vectors such as cosmids. Recently yeast artificial chromosomes (YACs) have been popular but are now gradually being replaced by bacterial systems based on either the phage P1 or the F element origins of replication (PACs and BACs).


Cloning Vectors
Vector Maximum Insert size Approx. No. of clones required in library Advantages? Disadvantages?
lambda 20 kb 5 x 105 easy to construct libraries,
relatively stable inserts
many clones required
hard to prepare DNA from clones
cosmid 45 kb 2 x 105 easy to construct libraries
easy to prepare DNA from clones
not always stable
YAC 1 Mb 104 few clones required very prone to rearrangement,
difficult to construct
BAC > 500 kb 5 x 104 few clones required
very stable
single copy origin of replication therefore harder to prepare DNA

Another innovation has been the use of gridded and chromosome specific libraries. In a gridded library every clone has its own unique address where it is to be found in a well in a microtitre tray. This has huge advantages over ungridded, amplified libraries for our ability to exchange information about clones. Chromosome specific libraries have been made by flow sorting individual metaphase chromosomes using a machine originally designed to sort different populations of cells.

back to the table of contents


Mapping

Introduction

To map the human genome has required the combined efforts of very many laboratories and the use of many techniques.

Broadly maps can be divided into two kinds, those based on direct physical evidence and those based on inference from patterns of inheritance, i.e. physical and genetic maps. A more useful distinction is perhaps into the classes of top down or bottom up maps.

Physical maps can be constructed from the top down or from the bottom up, genetic maps are of the top down variety.

Both approaches can be combined, landmarks developed as part of a top down map can be used as seeds for development of a bottom up map.

back to the table of contents

Physical maps

Many different mapping techniques have been used.

back to the table of contents

Genetic maps

Genetic maps are based on the tendency of genes on the same chromosome to be inherited together. If genes are far apart there will be a greater likelihood of a genetic recombination event occuring between them. If close together there will be less likelihood of this happening. Genetic distances are measured in centiMorgans. 1 cM = the distance between two genetic markers such that they will recombine in 1% of meioses. In fact, it is not necessary that the objects being measured are genes. Any polymorphic piece of DNA can be studied. Examples include

Family studies

Historically, genetic maps based on the use of inherited variation in DNA markers, in enzyme electrophoresic variation and in blood groups etc. have been very important in setting up the framework maps on which have been hung all the clones which have contributed to the physical maps. Now that this job has been completed, genetic maps still have a role. They are the only way in which we can position genetic variation which has a phenotype but for which we have no molecular data. This is still true for many genetic diseases. As an example I cannot do better than refer you to the recent cloning (in which I had a hand) of the gene TSC1 which is responsible for the genetic disease Tuberous sclerosis. Before this gene was cloned we had almost no idea of its role in the cell. Nonetheless, the gene was identified solely on the basis of its position in the genome.

back to the table of contents

Radiation hybrid mapping

I have included radiation hybrid mapping in a section of its own, it is a physical technique but the mathematics are analogous to those used for genetic mapping.

In this technique a panel of somatic cell hybrids is made exactly as described above. Except, before the human cell line is fused to the rodent cell line it is irradiated with a dose of X-rays sufficient to fragment the human chromosomes. Each colony that grows out will contain random fragments of human DNA. The closer together two loci are in the human genome, the more likely they are to be included in the genomes of the same hybrids by being included on the same DNA fragment. The distance between loci is measured statistically by mathematical methods which bear close resemblance to the maths of genetic mapping.

Last October a paper which mapped 16,000 genes (actually ESTs) with respect to 1,000 microsatellite gentic markers was published in Science. It is well worth reading. Schuler et al. 1996

back to the table of contents


Sequencing

The least interesting part of the whole business. As yet, all the big sequencing centres are operating on massively automated dideoxy sequencing reactions. "Nuff said."

back to the table of contents