MHC Haplotype Project

The MHC Haplotype Project was conducted between 2000 and 2006 at the Sanger Institute and offers a framework and resource for association studies of all MHC-linked-diseases. It provides the genomic sequences and gene annotation of 8 different HLA-homozygous typing haplotypes (listed below), their resulting variations (see Data below) and ancestral relationships. 

The table below lists the eight cell lines used in the project along with their HLA haplotypes and alleles. Links are given to the Sanger Institute Vega database from which sequences and gene annotation can be downloaded, and to the same regions in the UCSC browser for those who prefer that approach. 

The sequence from PGF is now incorporated in the reference sequence for chromosome 6. For the other seven haplotype sequences links are given to GRC entries (numbers with GL prefixes). These GRC haplotype contigs, called "alternate loci", are constructed so that they begin with additional anchor sequence derived from the reference. The fasta sequence derived from these links will, therefore, differ from that derived from Vega.

Cell Lines Haplotype HLA-A HLA-B HLA-C HLA-DRB1 HLA-DQB1 HLA-DPB1 Links:
PGF A3-B7-DR15 03:01:01:01 07:02:01 07:02:01:03 15:01:01:01 06:02:01 04:01 Vega UCSC
COX A1-B8-DR3 01:01:01:01 08:01:01 07:01:01 03:01:01:01 02:01 03:01 Vega UCSC
APB A1-B60-DR13 01:01:01:01 40:01:01 06:02:01:01 13:01:01 06:03:01 04:02 Vega UCSC
DBB A2-B57-DR7 02:01:01:01 57:01:01 06:02 07:01:01 03:03:02 04:01:01 Vega UCSC
MANN A29-B44-DR7 29:02:01 44:03:01 16:01 07:01:01:01 02:02 02:01:02 Vega UCSC
SSTO A32-B44-DR4 32:01:01 44:02:01:01 05:01:01:02 04:03:01 03:05:01 04:01:01 Vega UCSC
QBL A26-B18-DR3 26:01:01 18:01:01 05:01:01:01 03:01:01:02 02:01:01 02:02 Vega UCSC
MCF A2-B62-DR4 02:01 15:01:01:01 03:04:01:01 04:01 03:01 04:02 Vega UCSC

MHC Haplotype Project Data

The data from the MHC Haplotype Project are available in .txt file format file for viewing in the UCSC Genome Browser.

View the file

Follow UCSC instructions for the loading of BED file data as a custom track. The data are colour-coded by haplotype and will initially be displayed showing 1Mb in the centre of the MHC. You are then free to adjust the co-ordiantes and to zoom in to your region of interest. 

When zoomed in you should change the display of the Sanger_MHC custom track from "dense" to "full". You may also want to unhide "Variation and Repeats SNPs (131)" and "Genes and Gene Prediction Tracks Vega genes". 

Although the project data were originally issued in co-ordinates of a previous release of the Human Genome they have been converted using the UCSC tool LiftOver for use with the February 2009 GRCh37/hg19 Assembly.

The convention used for naming variations is: 

[haplotype name]:[BAC sequence SV number]_[base position in SV]_[variation] 

For single nucleotide polymorphisms "variation" consists of two letters, firstly, the base in the refernce sequence, and secondly, the base in the other haplotype. Insertions and deletions are identified by "_i" and "_d" respectively, followed by the numerical value of their length, and their base sequence, if this is 12 bases or less. For longer sequences an "X" value is given which refers to a look-up table