The MHC Haplotype Project was conducted between 2000 and 2006 at the Sanger Institute and offers a framework and resource for association studies of all MHC-linked-diseases. It provides the genomic sequences and gene annotation of 8 different HLA-homozygous typing haplotypes (listed below), their resulting variations (see Data below) and ancestral relationships.
The table below lists the eight cell lines used in the project along with their HLA haplotypes and alleles. Links are given to the Sanger Institute Vega database from which sequences and gene annotation can be downloaded, and to the same regions in the UCSC browser for those who prefer that approach.
The sequence from PGF is now incorporated in the reference sequence for chromosome 6. For the other seven haplotype sequences links are given to GRC entries (numbers with GL prefixes). These GRC haplotype contigs, called "alternate loci", are constructed so that they begin with additional anchor sequence derived from the reference. The fasta sequence derived from these links will, therefore, differ from that derived from Vega.
MHC Haplotype Project Data
The data from the MHC Haplotype Project are available in .txt file format file for viewing in the UCSC Genome Browser.
Follow UCSC instructions for the loading of BED file data as a custom track. The data are colour-coded by haplotype and will initially be displayed showing 1Mb in the centre of the MHC. You are then free to adjust the co-ordinates and to zoom in to your region of interest.
When zoomed in you should change the display of the Sanger_MHC custom track from "dense" to "full". You may also want to unhide "Variation and Repeats SNPs (131)" and "Genes and Gene Prediction Tracks Vega genes".
Although the project data were originally issued in co-ordinates of a previous release of the Human Genome they have been converted using the UCSC tool LiftOver for use with the February 2009 GRCh37/hg19 Assembly.
The convention used for naming variations is:
[haplotype name]:[BAC sequence SV number]_[base position in SV]_[variation]
For single nucleotide polymorphisms "variation" consists of two letters, firstly, the base in the reference sequence, and secondly, the base in the other haplotype. Insertions and deletions are identified by "_i" and "_d" respectively, followed by the numerical value of their length, and their base sequence, if this is 12 bases or less. For longer sequences an "X" value is given which refers to a look-up table.