Reminder: an excellent (though slightly out of date) web site to visit for relevant information is The US Department of Energy's Primer of Molecular Genetics. Another really well thought out and presented site is the MIT Biology Hypertextbook, and in particular, the molecular genetics chapter.
| optimum size range of the molecules to be separated | |
| Agarose | 0.5 - 20 kb |
| Polyacrylamide | 10 - 1000 bp (DNA), 0 - 100 kd (proteins) |
You will find many variations on this basic theme in the course of this lecture series.
This is the general term given to the isolation of a specific fragment of target (e.g. human) DNA by propagating that fragment in a microorganism (usually a bacterium but sometimes a yeast). Using standard microbiological techniques, a single microorganism can be isolated and grown in culture carrying within its genome one small fragment of human DNA. The human DNA can readily be purified from the bacterium in gram quantities if necessary.
Almost all clones come from libraries. Essentially these may be only one or other of cDNA or genomic and it is vital to remember the distinction between them.
Remember that cDNA means complementary DNA. It has been copied directly from mRNA. Because that RNA will have been processed in its journey from gene to cytoplasm (to test tube) it will not contain introns. Nor will it contain any sequences from upstream or downstream of the genes. Every cDNA library is made from a tissue source. It will only contain a representation of the sequences transcribed in that tissue. e.g. do not expect to isolate beta globin cDNA from a skin fibroblast cDNA library. Even though cDNA is made from mRNA it still contains repetitive sequences, 5%-10% of human transcripts contain an interspersed repetitive element such as an Alu repeat in the 3´UTR.
Most libraries contain cDNA clones in the same relative abundance as were their corresponding mRNAs in the tissue of origin. e.g. liver cDNA libraries are a rich source of serum albumin cDNA clones. Some libraries have been normalised, i.e. an attempt has been made to equalise the frequencies with which all types of cDNA are found within the library (if they were to be found at all in the mRNA source).
cDNA clones may be made for various purposes
From the point of view of mapping and sequencing the genome, the latter two classes of clone are most relevant.
Genomic clones are designed to include as much genomic DNA as possible in order to minimise the number of clones required to be isolated. Over the years vector systems have evolved. The first generation of genomic libraries were built in vectors based on lambda, later libraries used plasmid-phage hybrid vectors such as cosmids. Recently yeast artificial chromosomes (YACs) have been popular but are now gradually being replaced by bacterial systems based on either the phage P1 or the F element origins of replication (PACs and BACs).
| Vector | Maximum Insert size | Approx. No. of clones required in library | Advantages? | Disadvantages? |
| lambda | 20 kb | 5 x 105 | easy to construct libraries, relatively stable inserts |
many clones required hard to prepare DNA from clones |
| cosmid | 45 kb | 2 x 105 | easy to construct libraries easy to prepare DNA from clones |
not always stable |
| YAC | 1 Mb | 104 | few clones required | very prone to rearrangement, difficult to construct |
| PAC | ~120 kb | 105 | fewer clones required than for cosmids stable |
single copy origin of replication therefore harder to prepare DNA |
| BAC | > 500 kb | 5 x 104 | few clones required very stable |
single copy origin of replication therefore harder to prepare DNA |
Another innovation has been the use of gridded and chromosome specific libraries. In a gridded library every clone has its own unique address where it is to be found in a well in a microtitre tray. This has huge advantages over ungridded, amplified libraries for our ability to exchange information about clones. Chromosome specific libraries have been made by flow sorting individual metaphase chromosomes using a machine originally designed to sort different populations of cells.
Before PCR and cheap fast sequencing changed our view of the universe that is genetics, the Southern Blot was a universal workhorse. There was not an experiment in molecular genetics which did not at some stage employ a Southern Blot. It is still a useful tool and you need to know about it so that you can interpret historical data.
![]() |
Named after its inventor, Prof. Ed. Southern, the blot is a fast way of analysing a small number of DNA fragments which may be present in a complex mixture. For instance, the sickle cell mutation is a point mutation in the beta globin gene which changes a single amino acid in the beta globin polypeptide. In homozygotes, in conditions of low oxygen concentration the mutant globin polymerises forcing the cell into a bizzarre shape.
Suppose that we wish to ask whether the sickle cell mutation is present in an individual and the only material which we have available is a DNA sample. We could employ a Southern Blot:
In the case of the sickle cell mutation, the single base change involved, as well as causing a missense mutation in the beta globin gene, also causes the disappearance of a restriction site for the enzyme MstII. As a consequence the size of the restriction fragment containing the 5´ end of the gene is altered from 1.15kb to 1.35kb. See the figure below where MstII sites are shown as arrows.
This invention has revolutionised molecular genetics by doing away with the need to clone DNA in many circumstances where it used to be necessary. It is so poweful that it has made it possible to produce microgram amounts (that's a lot!) of DNA starting from just a single molecule. This has applications in forensic science, in archeology (Neandertal mitochondrial DNA amplified from ancient bones by PCR was recently sequenced), and in medicine where, for example, it can enable antenatal DNA tests to be performed in just a few hours work and large populations can be screened for particular mutations very quickly and cheaply.

Normally about 30 cycles of amplification are carried out. If the efficiency of the reaction were 100% that would represent approximately a 109 fold amplification. In fact, although early rounds are highly efficient the later rounds of amplification are much less so because of depletion of nucleotide triphosphates and primers in the reaction and gradual destruction of the DNA polymerase.
At the heart of the so called "new genetics" is our ability to sequence DNA rapidly and cheaply. With knowledge of sequence comes knowledge of gene structure and very often a beginning of understanding of gene function. The simplest method of sequencing DNA was invented by Dr. Fred Sanger for which he was awarded his second Nobel Prize. In the UK our major national Human Genome Project DNA sequencing centre is named after him, the Sanger Centre in Cambridgeshire. You can visit its home page here if you wish.
The Sanger method is also known as the "dideoxy chain termination method".


PCR and sequencing together have made possible the creation of useful landmarks in the genome. These are several thousand short fragments of known DNA sequence whose presence in any DNA sample can be tested by PCR. They are known as STSs (Sequence Tagged Sites). If an STS is part of a transcribed sequence it is known as an EST (Expressed Sequence Tag). Hundreds of thousands of ESTs have been created and can be accessed by computer. One major attempt to classify them all is known as Unigene
The topics include:
Reading: