BIOL 2007 EVOLUTIONARY GENETICS

Become a bioinformatician in 7 easy steps!

Use ClustalX for Windows to align DNA sequences and draw a phylogenetic tree
First

NB: You will need 1/2 hr - 1 hr for this exercise. If you're not from BIOL2007 at UCL, that's fine.  Let me know if this works!

You will need some software from the internet.

1) Obtain the ClustalX 1.83 programme for Windows from ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalX/clustalx1.83.zip (other versions are available for Macs and UNIX computers).

2) Install it by copying the contents of the *.zip file into a suitable folder, such as

C:\Documents and Settings\jim\Clustal\

... on your own computer.

Then

3) Obtain some DNA sequences and align them to one another, or use an alignment already created.

a) I have already aligned some great ape sequences, but you could use unaligned sequences straight from EMBL, and try to align them yourself using ClustalX (it doesn't always work perfectly, and you will probably have to do some manual editing in a different programme, such as BioEdit for Windows).  I have also cut off some of the bases in the highly variable "control region" of the mtDNA, which did not align between species.

Here is the file I put together from EMBL sequences for the complete mitochondrial genome of humans, chimps, gorillas, and orang-utans: HumChGoMito3.fas or, if that doesn't work, HumChGoMito3.txt.  It is really just a text file (in FASTA format), and you should save it to disk and you can open it using NotePad or MS-Word to have a look.

b) Or you can find sequences for a particular gene or organism from the nucleotide sequence databases GenBank (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide) or EMBL (http://www.ebi.ac.uk/embl/).  For instance try searching for "heliconius melpomene ddc", and you should find some sequences that I entered from the butterfly Heliconius melpomene for the gene dopa decarboxylase (Ddc).  Or you can search using a reference number, e.g.: AY437780. Then you can align them in ClustalX. However, this is difficult.

4) Open ClustalX by opening the folder where you stored the programmes, and clicking the icon labelled "ClustalX.exe" or "Clustal". From within ClustalX, open the sequence file. You should now see the alignment in colour.

5) Calculate a "neighbor-joining" tree. (Other programs will allow you to estimate phylogenies using other distance-based, maximum likelihood, or Bayesian methods). Save the tree in "Phylip" format (*.ph). This is just a textfile (again, you can open it in Word or NotePad to check; it contains lots of numbers and brackets).

6) You can now open and view the tree using another programme you downloaded in the ClustalX zip folder called "njplotWIN95.exe".  Having done this, you should see a nice tree showing how closely human mitochondrial DNA (mtDNA) is related to that of chimps etc.  Use the "show branch lengths" checkbox to display what fraction of the total 16,500 base pair DNA sequence differs between each node and sequence.

7) Well done! You are now a bioinformatician.  Essentially, all DNA (and protein) sequence manipulation uses these kinds of operations.  As well as aligning sequences and drawing phylogenetic trees, you can do many more things, such as study rates of sequence evolution, study whether the sequence evolution conforms to neutral or selected modes, or you can study 3-D protein structure via programmes that predict protein structure.


A neighbour-joining tree is a one-dimensional representation of the relationships of the different species in terms of genetic distance. Using the tree, you can answer the following question:

Now test your understanding of what you have done:

Read off from the tree: How divergent is the mtDNA of humans and chimps? How divergent are chimps and gorillas? Humans and gorillas? Use as your approximate measure of divergence the % differences at their mtDNA bases (by adding branch lengths together from the tree). Remember to tick the box for "branch lengths" so that you can see the values for each branch on the tree.

Jim Mallet 15 February 2008


BACK TO: Molecular Evolution lecture ; Phylogeny lecture
GO TO: BIOL 2007 TIMETABLE