A certain amount of background knowledge is assumed here. I hope that you all, having taken the first year course BIOLB31A and most of you having taken BIOLB100 will be familiar at least in outline with the "Central Dogma" of molecular biology that "DNA makes RNA makes Protein", that you know what those three molecules are and, in principle, how DNA is replicated. If you have any difficulty with this then you should read the introductory material in almost any textbook of human genetics before the next lecture. In this lecture I want to discuss the organisation of human DNA both in terms of the microscopically observable structures, chromosomes, and also how the DNA is organised in terms of its sequence content.
We know far more about the structure of the human genome than about any other vertebrate genome. In its size and in the arrangement of its sequence it seems to be similar to any other vertebrate. The Human Genome project is making astonishing progress and by the year 2005 we should know >95% of the sequence of the nucleotides which make it. One of the main objectives of this course is for you to become familiar with the nature of what is known and also with the evidence on which this knowledge is based.
The human genome is diploid, in other words, it contains two copies of every DNA sequence. (The alert will already have spotted that this is not strictly true, but let's talk about females here OK! And we'll forget about mitochondria too for a while if you don't mind!) A haploid genome, which contains just one copy of everything, comprises about 3 x 109 basepairs of DNA. In physical terms this is quite a suprising amount: each cell nucleus contains about two metres of DNA.
In common with all other organisms, human DNA is organised into chromosomes which serve to manoeuvre DNA through the difficulties of cell division where something like two metres of DNA has to be moved through a distance of only about 100 µm and be separated from a similar amount of DNA moving the other way.
Humans have 23 pairs of chromosomes. Twenty two pairs, the autosomes, are the same in either sex and are numbered from 1 - 22 in order of diminishing size. One pair, the sex chromosomes are either a pair of X chromosomes (in females) or an X and the very much smaller Y chromosome (in males). One complete set of chromosomes i.e. autosomes 1-22 and a sex chromosome is known as a haploid set. Cells which contain two complete sets (i.e. most cells except for mature germ cells) are diploid. Each chromosome contains its own unique sequence of DNA. Consequently, when the chromosomal DNA and its associated histone and non-histone protein is at its most densly packed (i.e. at mitotic metaphase) it adopts a shape which is slightly different from any other chromosome.
Cytogeneticists study chromosomes microscopically. Cells are treated with a drug which prevents the spindle fibres forming. The chromosomes continue to be prepared for mitosis but because the spindle has not formed the division remains blocked and cells accumulate at metaphase. The cells can then be "fixed" i.e. treated with a chemical which denatures proteins causing the structures to be preserved, and then burst open on a microscope slide. After gentle treatment with a very small amount of proteinase, the chromosomes are stained. Each chromosome reveals a characteristic pattern of alternating dark and light bands which reflects in some way its underlying architecture. On the right for example is the ideogram which represents the characteristic banding pattern of chromosome 9.
The "chromosome spread" can be photographed, individual chromosomes cut out and paired and displayed as shown below.
The sum of all the chromosome information is known as a karyotype.
All human chromosomes have two arms, the short arm is refered to as the p arm and the long arm as the q. The position of the primary constriction (another name for the centromere) defines whether the chromosome is metacentric (two substantial arms) or acrocentric (one very tiny arm and one which contains almost all the DNA). The acrocentric chromosomes are numbers 13, 14, 15, 21 and 22. At each end of the chromosome is a telomere, a structure designed to avoid problems with DNA replication right to the end of a linear molecule. Acrocentric chromosome short arms sometimes hardly seem to be attached, they can be linked via a short stalk known as a secondary constriction. The tiny short arm, bobbing about at a distance is known as a satellite.
Ultimately, we will answer this question by knowing the complete nucleotide sequence of the human genome but, even then, we will still want to consider the organisation of the sequence in terms of the relationships of parts of the sequence one to another and the relative abundance of small parts of the sequence. The words which we use to discuss the relative abundances and organisations of different DNA sequences come from two types of experiment.
If we experiment with DNAs of different complexities ranging from polyA + polyU ( complexity = 1bp) to E. coli DNA (complexity = 4.6 x 106bp), we see (in the figure below) that although the curves are similarly shaped, their positions on the C0t axis are characteristic of the complexity of the DNA. This is because the concentration of any single short sequence is reduced as the number of different short sequences making up the whole mixture is increased. Therefore each sequence takes longer to find its complementary strand. If we examine the C0t at which half the DNA has reannealed (C0t½) it is proportional to the complexity of the DNA (see the top scale).
If the same experiment is carried out using DNA purified from a complex eukaryote, such as human, then we do not see a simple sigmoidal curve. Instead we see a curve which is the sum of the reannealings of many different components.
You will come across the term "human C0t 1 DNA". This is human DNA which has been denatured and allowed to reanneal to a C0t value of 1. The double stranded component is then purified from the single stranded component and is supplied commercially. It contains most of the human repetitive DNA but very little "single copy" DNA.
This is another old experiment which has left its mark on the language of molecular genetics. If a concentrated solution of Cesium Chloride is subjected to centrifugation at a very high g force, a concentration gradient is established from top to bottom of the centrifuge tube. If sheared DNA had been previously mixted with the cesium chloride then, at equilibrium, each molecule of DNA will come to rest in the tube where it is floating in a solution which is exactly its buoyant density. If this experiment is carried out with mouse DNA and then the concentration profile of the DNA in the density gradient is analysed a result like this is obtained.
|Throughout the euchromatin|
|Throughout the euchromatin (perhaps the Y chromosome has more than its fair share?)|
|Scattered in single copies throughout the entire genome|
|Clustered in heterochromatin, as satellite DNA but also interspersed throughout the genome for instance as Alu repeats.|
Genes are scattered throughout the genome. Current estimates (based on EST sequencing, see later in the course) of the number of distinct human genes seem to be in the range of 60,000 to 70,000. The diagram below shows the features of a "typical" human gene.
|CpG island||many genes have a GC rich region upstream of (and often including) the first exon. Unlike the rest of the genome, it has no deficit of the dinucleotide CpG.|
|Exons||These are the parts of the gene which will remain in the mRNA after the primary transcript is processed.|
|Introns||These parts of the gene which are removed from the primary transcripts by splicing|
|translated region||This is the region of the gene which codes for protein. It is shaded grey in the diagram.|
|5'UTR||The region of exon 1 between the cap site and the start of translation at the first AUG codon.|
|3'UTR||The region of the final exon between the translation stop codon and the polyA addition site at the end of the transcript.|
Back to the top
Back to the lecture list