The Structure of the Human Genome

A certain amount of background knowledge is assumed here. I hope that you all, having taken the first year course BIOLB31A and most of you having taken BIOLB100 will be familiar at least in outline with the "Central Dogma" of molecular biology that "DNA makes RNA makes Protein", that you know what those three molecules are and, in principle, how DNA is replicated. If you have any difficulty with this then you should read the introductory material in almost any textbook of human genetics before the next lecture. In this lecture I want to discuss the organisation of human DNA both in terms of the microscopically observable structures, chromosomes, and also how the DNA is organised in terms of its sequence content.

We know far more about the structure of the human genome than about any other vertebrate genome. In its size and in the arrangement of its sequence it seems to be similar to any other vertebrate. The Human Genome project is making astonishing progress and by the year 2005 we should know >95% of the sequence of the nucleotides which make it. One of the main objectives of this course is for you to become familiar with the nature of what is known and also with the evidence on which this knowledge is based.

How much DNA is there?

The human genome is diploid, in other words, it contains two copies of every DNA sequence. (The alert will already have spotted that this is not strictly true, but let's talk about females here OK! And we'll forget about mitochondria too for a while if you don't mind!) A haploid genome, which contains just one copy of everything, comprises about 3 x 109 basepairs of DNA. In physical terms this is quite a suprising amount: each cell nucleus contains about two metres of DNA.

How is the DNA organised on a macroscale?

Chromosomes

In common with all other organisms, human DNA is organised into chromosomes which serve to manoeuvre DNA through the difficulties of cell division where something like two metres of DNA has to be moved through a distance of only about 100 µm and be separated from a similar amount of DNA moving the other way.

Humans have 23 pairs of chromosomes. Twenty two pairs, the autosomes, are the same in either sex and are numbered from 1 - 22 in order of diminishing size. One pair, the sex chromosomes are either a pair of X chromosomes (in females) or an X and the very much smaller Y chromosome (in males). One complete set of chromosomes i.e. autosomes 1-22 and a sex chromosome is known as a haploid set. Cells which contain two complete sets (i.e. most cells except for mature germ cells) are diploid. Each chromosome contains its own unique sequence of DNA. Consequently, when the chromosomal DNA and its associated histone and non-histone protein is at its most densly packed (i.e. at mitotic metaphase) it adopts a shape which is slightly different from any other chromosome.

Cytogeneticists study chromosomes microscopically. Cells are treated with a drug which prevents the spindle fibres forming. The chromosomes continue to be prepared for mitosis but because the spindle has not formed the division remains blocked and cells accumulate at metaphase. The cells can then be "fixed" i.e. treated with a chemical which denatures proteins causing the structures to be preserved, and then burst open on a microscope slide. After gentle treatment with a very small amount of proteinase, the chromosomes are stained. Each chromosome reveals a characteristic pattern of alternating dark and light bands which reflects in some way its underlying architecture. On the right for example is the ideogram which represents the characteristic banding pattern of chromosome 9.

The "chromosome spread" can be photographed, individual chromosomes cut out and paired and displayed as shown below.

The sum of all the chromosome information is known as a karyotype.

All human chromosomes have two arms, the short arm is refered to as the p arm and the long arm as the q. The position of the primary constriction (another name for the centromere) defines whether the chromosome is metacentric (two substantial arms) or acrocentric (one very tiny arm and one which contains almost all the DNA). The acrocentric chromosomes are numbers 13, 14, 15, 21 and 22. At each end of the chromosome is a telomere, a structure designed to avoid problems with DNA replication right to the end of a linear molecule. Acrocentric chromosome short arms sometimes hardly seem to be attached, they can be linked via a short stalk known as a secondary constriction. The tiny short arm, bobbing about at a distance is known as a satellite.

How is the DNA organised on a microscale?

Ultimately, we will answer this question by knowing the complete nucleotide sequence of the human genome but, even then, we will still want to consider the organisation of the sequence in terms of the relationships of parts of the sequence one to another and the relative abundance of small parts of the sequence. The words which we use to discuss the relative abundances and organisations of different DNA sequences come from two types of experiment.

Summary

Sequence content of the human genome
Class
copy number
distribution
percentage of the human genome
"Single copy sequence"
1
Throughout the euchromatin
50%
Low copy number repeat
2 - 20
Throughout the euchromatin (perhaps the Y chromosome has more than its fair share?)
10%
Moderately repeated DNA
approx. 500
Scattered in single copies throughout the entire genome
25%
Highly repeated DNA
10,000 - 500,000
Clustered in heterochromatin, as satellite DNA but also interspersed throughout the genome for instance as Alu repeats.
15%

Genes

Genes are scattered throughout the genome. Current estimates (based on EST sequencing, see later in the course) of the number of distinct human genes seem to be in the range of 60,000 to 70,000. The diagram below shows the features of a "typical" human gene.

Diagram of a typical gene

Gene features
Feature
Description
CpG island many genes have a GC rich region upstream of (and often including) the first exon. Unlike the rest of the genome, it has no deficit of the dinucleotide CpG.
Exons These are the parts of the gene which will remain in the mRNA after the primary transcript is processed.
Introns These parts of the gene which are removed from the primary transcripts by splicing
translated region This is the region of the gene which codes for protein. It is shaded grey in the diagram.
5'UTR The region of exon 1 between the cap site and the start of translation at the first AUG codon.
3'UTR The region of the final exon between the translation stop codon and the polyA addition site at the end of the transcript.

Self Assessment Questions

  1. Which of the following classes of sequence are likely to be found in a mRNA and which are unlikely?
  2. Which of them are likely to be found within the sequence of a gene?

Answers


Back to the top

Next lecture

Back to the lecture list