UCL Cancer Institute

The MHC Haplotype Project

IBD Logo


The MHC Haplotype Consortium:
Stephan Beck
Stephen Sawcer
John Todd
John Trowsdale
John Elliott
Pieter de Jong
Roger Horton

The MHC Haplotype Project

The MHC Haplotype Project was conducted between 2000 and 2006 at the Sanger Institute and offers a framework and resource for association studies of all MHC-linked-diseases. It provides the genomic sequences and gene annotation of 8 different HLA-homozygous typing haplotypes (listed below), their resulting variations (see DATA) and ancestral relationships.

The table below lists the eight cell lines used in the project along with their HLA haplotypes and alleles. Links are given to the Sanger Institute Vega database from which sequences and gene annotation can be downloaded, and to the same regions in the UCSC browser for those who prefer that approach.

The sequence from PGF is now incorporated in the reference sequence for chromosome 6. For the other seven haplotype sequences links are given to GRC entries (numbers with GL prefixes). These GRC haplotype contigs, called "alternate loci", are constructed so that they begin with additional anchor sequence derived from the reference. The fasta sequence derived from these links will, therefore, differ from that derived from Vega.

PGF A3-B7-DR15 03:01:01:01 07:02:01 07:02:01:03 15:01:01:01 06:02:01 04:01
Vega  UCSC

COX A1-B8-DR3 01:01:01:01 08:01:01 07:01:01 03:01:01:01 02:01 03:01 Vega  UCSC
APD A1-B60-DR13 01:01:01:01 40:01:01 06:02:01:01 13:01:01 06:03:01 04:02 Vega  UCSC
DBB A2-B57-DR7 02:01:01:01 57:01:01 06:02 07:01:01 03:03:02 04:01:01 Vega  UCSC
MANN A29-B44-DR7 29:02:01 44:03:01 16:01 07:01:01:01 02:02 02:01:02 Vega  UCSC
SSTO A32-B44-DR4 32:01:01 44:02:01:01 05:01:01:02 04:03:01 03:05:01 04:01:01 Vega  UCSC
QBL A26-B18-DR3 26:01:01 18:01:01 05:01:01:01 03:01:01:02 02:01:01 02:02 Vega  UCSC
MCF A2-B62-DR4 02:01 15:01:01:01 03:04:01:01 04:01 03:01 04:02 Vega  UCSC

MHC Haplotype Project Data

The data from the MHC Haplotype Project are available in a single BED (Browser Extensible Data) format file for viewing in the UCSC Genome Browser.

Download the file from HERE (right click and select 'Save Link As..').

Follow UCSC instructions for the loading of BED file data as a custom track. The data are colour-coded by haplotype and will initially be displayed showing 1Mb in the centre of the MHC. You are then free to adjust the co-ordiantes and to zoom in to your region of interest.

When zoomed in you should change the display of the Sanger_MHC custom track from "dense" to "full". You may also want to unhide "Variation and Repeats SNPs (131)" and "Genes and Gene Prediction Tracks Vega genes".

Although the project data were originally issued in co-ordinates of a previous release of the Human Genome they have been converted using the UCSC tool LiftOver for use with the February 2009 GRCh37/hg19 Assembly.

The convention used for naming variations is:

[haplotype name]:[BAC sequence SV number]_[base position in SV]_[variation]

For single nucleotide polymorphisms "variation" consists of two letters, firstly, the base in the refernce sequence, and secondly, the base in the other haplotype. Insertions and deletions are identified by "_i" and "_d" respectively, followed by the numerical value of their length, and their base sequence, if this is 12 bases or less. For longer sequences an "X" value is given which refers to a look-up table HERE.


Horton et al 2008 Immunogenetics 60(1):1-18.
Traherne et al 2006 PLoS Genet. 2(1):e9.
Stewart et al 2004 Genome Res. 14(6):1176-87.
Allcock et al 2002 Tissue Antigens 59(6):520-1.