XClose

UCL Cancer Institute

Home
Menu

Medical Genomics

The Medical Genomics Group explores the genomics and epigenomics of phenotypic plasticity in health and disease in order to advance translational, regenerative and personalised medicine.

Contact

Emeritus Professor Stephan Beck

Email: s.beck@ucl.ac.uk

Tel: +44-20-7679-0964

Stephan Beck is Emeritus Professor of Medical Genomics. His group studied the genomics and epigenomics of phenotypic plasticity in health and disease using computational and data science approaches. The research aimed to advance translational, regenerative and personalised medicine and advocated for more open data sharing and science in general.

UCL Profile

stephan_beck_profile

The Medical Genomics Group has broad interests in the genomics and epigenomics of phenotypic plasticity in health and disease. We use computational and data science approaches to study genetic and epigenetic variations and how they modulate genome function. Our research aims to advance translational, regenerative and personalized medicine. We also advocate for more open data sharing and governance and science in general.

Research Projects

Methylome Analysis

DNA methylation is involed in many biological processes, health states and treatment options. The aim of this project is to develop novel computational methods for advanced methylome analysis in health and disease. Current developments include new tools for RRBS (CAMDAC), imputation (GIMMEcpg), handling of large data tables (MATT), DNA methylation clocks (CellAgeClock, epiClockR) and BeadChip arrays (ChAMP).

Find out more.

Epigenomics of Common Disease

Having established the concept epigenome-wide association studies (EWAS) in 2011, we have contributed to numerous EWAS since then and are currently involed in EWAS on T1D and CKDu and related studies across several population cohorts (BIOPATH, HAPIEE, CRELES).

Epigenetics of Aging

Research into healthy aging benefits from having biomarkers for different aspects of the aging process. DNA methylation has the hallmarks to make an excellent biomaker and epigenetic clocks turned out to be the most accurate molecular readout of aging to date. We develop, evaluate and use epigenetic clocks in a variety of aging contexts to assess their use as a possible molecular crystal ball for human aging.

See here for mini review.

C2c: Cancer to chronic disease

Aberrant DNA methylation is an early event in carcinogenesis. Liquid biopsies hold great promise for the early detection of tumour-specific genetic and epigenetic alterations, enabling early diagnosis and improved cancer management. The aim of the C2c project is to develop novel approaches for detection of tumour-specific DNA methylation changes in cell free DNA (cfDNA) for predictive, prognostic, and diagnostic purposes.

Find out more

OMEGA: A custom genOME GenerAtor

Data science plays an essential role in Genomic Medicine, including the NHS Genomic Medicine Service. However, the development of the underlying algorithms is severely hampered by data access issues. OMEGA will overcome this limitation with open access reference genomes from PGP and GIAB that will be customised by machine learning approaches with quantifiable (epi)genetic variants and signatures of clinically relevant phenotypes and diseases for diverse ancestries.

Personal Genome Project UK

Using Open Consent, PGP-UK provides genomic, epigenomic, transcriptomic and trait data under Open Access to advance personal and medical genomics and to promote Citizen Science. In addition to the data, PGP-UK provides Open Access genome, methylome and pharmacogenetics reports. Details of the Study, Resource and Analysis Pipeline are published.

Find out more

EU-STANDS4PM

The European Standards for Precision Medicine (EU-STANDS4PM) Consortium will initiate an EU-wide mapping process to assess and evaluate strategies for data-driven in silico modelling approaches. A central goal is to develop harmonised transnational standards, recommendations and guidelines that allow a broad application of predictive in silicomethodologies in personalised medicine across Europe. Our contribution is to evaluate and implement innovative strategies for sharing data more openly and effectively.

Find out more

GCGR: Glioma Cellular Genetics Resource

GCGR is generating a comprehensive collection of research tools, materials and associated data to lay the foundations for future basic and translational studies into glioma.

EpiMatch

This NIHR Blood Transplant Research Unit project aims to identify and validate donor-specific biomarkers which are predictive of recipient outcome following haematopoietic stem cell transplantation. Such biomarkers would allow guidance of treatment strategy and improved donor selection, in a personalised medicine approach to transplantation.

Find out more

Multiple MS

Using a systems medicine approach, this project aims to develop personalised treatments for multiple sclerosis based on multi-omics biomarkers. Our contributions to MultipleMS consist of integrative computational analyses and the development of a DNA methylation-based biomarker for brain atrophy.

Find out more

Funders

  • H2020
  • Wellcome Trust
  • NIHR-BRC
  • Cancer Research UK

Past projects

Computational Analysis

We utilise and develop computational tools for the analysis of biological data, primarily obtained from studies of DNA methylation. The data predominantly comes from either second-generation sequencing platforms or methylation arrays.

Our current computational analysis approaches entail developing methods for the integration of epigenomic next generation sequencing data with genomic, transcriptomic and other types of datasets. The primary focus is whole genome bisulfite sequencing, yielding a high-resolution genome-wide profile of DNA methylation. Our work so far has benchmarked the potential of this base resolution technique by assessing the existing computational tools, the effects of smoothing, developing new tools (COMETgazer) and performing feature-specific saturation analyses. These highlighted the effects of coverage on biological signature extraction.

 

Our experience with these data brought about the assembly of a WGBS workflow for full analysis including processing compartments for quality control to alignment and extraction and on to the analysis of differentially methylated positions and regions, as well as the determination of blocks of co-methylation (COMETs) with our COMETgazer algorithm, which exploits DNA methylation oscillations. 

CpG methylation can be modelled as an harmonic, where oscillatory patterns are used to segment the methylome

 

Our approach to differential methylation analysis is based on our experience of the effect of coverage, utilizing point-wise, regional or COMET inference depending on sequencing depth and data resolution. We use the COMET approach for low-resolution analyses as a means to summarize methylation over large DNA stretches and compute differential methylation by assessing the fragmentation of the methylome. Our COMET work is currently in review with Nature Biotechnology. 

Our data integration approaches focus on the use of COMET profiles for assessing a potential relationship with haplotype blocks and for the prediction of genomic enhancers. We are developing methods for harvesting oscillations in DNA methylation for epigenomic signature analysis.

We also have experience with the analysis of other next generation sequencing dataset types including RNA-seq, ChIP-Seq and MeDIPSeq, and the integration of these with transcriptomic datasets. A previous lab member (Gareth Wilson) developed a MeDIPSeq analysis wrapper (the MeDUSA pipeline) for the analysis of MeDIPSeq data (Wilson et al., 2012). The focus of which is to locate differentially methylated regions between cohorts.

Our group is involved in the UK contribution to the Personal Genome Project which aims at creating UK genome, health and trait data integration with epigenomics in a open data, open access framework.

Epigenomics of Common Disease

Genome-wide association studies (GWAS) have identified a multitude of genetic variants associated with complex traits including common diseases. However, their effect sizes are modest, and the majority of causality remains unexplained for most common diseases. This project aimed to integrate GWAS with epigenome-wide association studies (EWAS) to gain a more complete picture of the aetiology of common diseases, including T1D, T2D and UC.

Background

Recent advances in genomic technologies have enabled systematic, large-scale studies of human disease-associated epigenetic variation, specifically variation in DNA methylation. Such epigenome-wide association studies (EWAS) present exciting opportunities for the investigation of disease mechanisms, but also create new challenges that are not encountered in GWAS, such as study design, cohort and sample selections, statistical significance and power, confounding factors and follow-up studies (Rakyan V.K. et al (2012), Nat. Rev. Genet. 12, 529-541; Paul D.S. & Beck S. (2014), Trends Mol. Med. 20, 541-543). A particularly promising area of research is the integrative analysis of GWAS and EWAS to investigate GWAS haplotypes that exert their effect on phenotype through epigenetic mechanisms (see also: EpiTrain Consortium).

Past projects

DNA methylation variation in type I diabetes

As part of the BLUEPRINT Consortium, we aim to identify variation in DNA methylation associated with type 1 diabetes mellitus (T1DM). We have collected immune effector and control cell types from monozygotic twins discordant for T1DM, including CD4+ lymphocytes, CD14+CD16- monocytes, CD19+ B cells and buccal cells. In addition, we have collected Guthrie card samples from progressors and non-progressors to T1DM, and peripheral lymph node and spleen samples of both T1DM cases and healthy controls. Our initial aim is to pre-screen the monozygotic twin pairs discordant for T1DM using the Illumina 450K array platform and whole-genome bisulphite sequencing in selected immune effector cells. Overall, we aim to generate DNA methylation profiles in a total of 1,000 samples. The analysis will include validation of the T1DM methylation signature using a targeted bisulphite sequencing platform, such as RainDrop BS-seq (see: Methylome Analysis), integration with GWAS data, biological pathway analysis and functional follow-up.

We have previously performed epigenome-wide association studies for the following common diseases:

  • Diabetic nephropathy: Bell C.G. et al (2010), BMC Med. Genomics. 3, 33
  • Type II diabetes: Bell C.G. et al (2010), PLoS ONE 5, e14040
  • Type I diabetes: Rakyan V.K. et al (2011), PLoS Genet. 7, e1002300
  • Ulcerative colitis: Häsler R. et al (2012), Genome Res. 22, 2130-2137
  • Pain sensitivity: Bell J.T. et al (2014), Nat. Commun. 5, 2978

Project Members

  • Dirk Paul
  • Charles Breeze
  • Emanuele Libertini

Collaborators

  • Queen Mary University of London: David Leslie, Stephanie Cunningham, Mary Dang, Mohammed Hawa, Claire Bedford, Vardhman Rakyan, Rob Lowe
  • Lund University: Ake Lernmark
  • Ulm University: Bernard Boehm
  • BLUEPRINT Consortium

 

iPSC Methylome

The UCL Cancer Institute, University of Cambridge, CellCentric, Sigma-Aldrich and TAP embark on a collaboration (CellCentric Press Release) to define markers of epigenetic reprogramming, supported by £1.1m from the UK Governmentís Technology Strategy Board.

Project Members

  • Tiffany Morris
  • Lee Butcher
  • Stephan Beck

Collaborators

  • Anne Ferguson Smith, University of Cambridge
  • Ludovic Vallier, University of Cambridge
  • CellCentric
  • The Automation Partnership
  • Sigma-Aldrich

 

Methylome Analysis

The involvement of DNA methylation in health and disease is well established but not yet fully understood. This project aimed to develop novel experimental and computational methods for the analysis of 5-methyl-cytosine (5-mC) and 5-hydoxymethyl-cytosine (5-hmC).

Background

Variation in DNA methylation has been associated with predisposition, progression and response to treatment of a broad range of clinical conditions such as cancer and autoimmune diseases (Feinberg A.P. (2007), Nature 447, 433-440). The aim of this project is to develop novel experimental and computational methods for the analysis of 5-methyl-cytosine (5-mC) and 5-hydoxymethyl-cytosine (5-hmC) using both genome-wide and targeted approaches.

Methylated DNA immunoprecipitation (MeDIP-Seq)

DNA methylation affects approximately 60-80% of the 28M CpG loci in the genome (Lister R. et al (2009), Nature 462, 315-322; Li Y. et al (2010) PLoS Biol. 8, e1000533). Given this vast, pliable landscape – and taking lessons from GWAS, in which genome-wide, hypothesis-free approaches often yielded surprising results – whole genome approaches to studying the methylation (co-called 'methylomes') are prudent. Here, local methylation events are evaluated against the background of methylome signals to separate signal from noise.

A number of techniques for generating whole-methylome datasets are available. The gold standard is Whole Genome Bisulfite Sequencing (WGBS), in which the methylated status of every cytosine in the genome can be resolved at single base pair resolution. Despite this benefit though, WGBS is time consuming, costly and (currently at least) inefficient, with 70-80% of sequencing reads providing little to no information (Ziller M. et al (2013) Nature 500, 477-481). One lower-cost alternative is Methylated DNA Immunoprecipitation (MeDIP), which involves co-incubation of DNA with an antibody raised against 5-mC along with magnetic beads, which form bead-antibody-methylated DNA complexes. These complexes are then separated from the unmethylated fraction and when coupled with next generation sequencing can assess more 50% of methylated CpG dinucleotides at 150- to 200-bp resolution, concomitant with sequence inset size.

This technique has been used to resolve the first DNA methylome (Down T.A. et al (2008), Nat. Biotechnol. 26, 779-785), map aberrant DNA methylation in malignant peripheral nerve sheath tumours (Feber A. et al (2011), Genome Res. 21, 515-524), and to locate differentially methylated regions associated with aging in mouse hematopoietic stem cells (Taiwo O. et al (2013), Epigenetics 8, 1114-1122).

Project Members:

  • Lee Butcher
  • Emanuele Libertini,
  • Andrew Feber

Large-scale, targeted bisulphite sequencing using RainDrop BS-seq

Several techniques have been established to map DNA methylation in single CpG resolution at a selected subset of genomic regions of interest, for example to validate signals from epigenome-wide association studies (see: EWAS project page). Such methods include pyrosequencing, Sanger sequencing and high-resolution melting curve analysis. However, most of these have not been optimised for high-throughput applications. Emerging techniques that utilise next-generation DNA sequencing platforms are particularly promising for the large-scale, targeted bisulphite sequencing of genomic regions of interest.

RainDance Technologies developed a fully integrated enrichment system using microdroplet PCR that can also be coupled to next-generation sequencing platforms (Tewhey R. et al (2009), Nat. Biotechnol. 27, 1025-1031). The encapsulation of distinct PCR reactions in microdroplets enables the sensitive, specific and simultaneous amplification of up to 20,000 target loci using either unconverted or bisulphite-converted genomic DNA. We refined the approach into targeted RainDrop BS-seq and used it to validate a hypermethylation phenotype in isocitrate dehydrogenase (IDH) mutant chondrosarcoma (Guilhamon P. et al (2013), Nat. Commu. 4, 2166). Further, we recently presented a systematic assessment of RainDrop BS-seq as a method for large-scale, targeted bisulfite sequencing using a wide range of starting DNA quantity, quality and different cell types (Paul D.S. et al (2014), Epigenetics 9, 678-684).

 

The custom primer panel for the genomic regions of choice is prepared by RainDance Technologies. The workflow comprises the following key steps: (1) bisulphite conversion of genomic template; (2) merger of picoliter-volume droplets of bisulphite-treated template with pre-made primer pair droplets (primer panel) on microfluidic chips; (3) pooled, thermal cycling of the PCR reactions (microdroplet PCR); (4) destabilization of droplets to release the PCR products; (5) purification of PCR products using magnetic beads; (6) incorporation of DNA sequencing barcodes through standard PCR (universal PCR), followed by purification of the PCR products and Illumina sequencing.

Project Members

  • Dirk Paul
  • Paul Guilhamon
  • Pawan Dhami
  • Stefan Stricker
  • Andrew Feber

 

MeDUSA

MeDUSA (Methylated DNA Utility for Sequence Analysis) is a computational pipeline bringing together numerous software packages to perform a full analysis of MeDIP-seq data, including sequence alignment, quality control (QC), and determination and annotation of DMRs. It utilizes many applications in order to perform this task. From alignment (BWA) and subsequent filtering (SAMtools), through generation of numerous quality control metrics (FastQC and MEDIPS), DMR identification and finally preliminary annotation of the DMRs (utilizing the capabilities of BEDTools).

Our focus is on the analysis of MeDIP-seq, though we are also experienced in the analysis of RNA-seq, ChIP-seq, bis-seq and exome sequencing.

Originally, the Batman algorithm (Down et al., 2008) was used for our MeDIP analyses, including the first cancer methylome (Feber et al., 2011). Since then we have developed the MeDUSA pipeline (Wilson et al., 2012). The focus of which is to locate differentially methylated regions between cohorts.

MeDUSA brings together numerous software packages to perform a full analysis of MeDIP-seq data, including sequence alignment, quality control (QC), and determination and annotation of DMRs. MeDUSA utilises several applications from within the USeq software suite, and in turn uses the R Bioconductor package DESeq for differential count analysis. In addition, MeDUSA will control several other important functions from the alignment (BWA) and subsequent filtering (SAMtools), through generation of numerous quality control metrics (FastQC and MEDIPS), and finally preliminary annotation of the DMRs (utilising the capabilities of BEDTools).

A focus for future research within the group is on the integration of disparate datasets in order to elucidate a fuller understanding of the underlying biology and thus address fundamental questions associated with epigenetic regulation of mammalian cells.

Download MeDUSA Version 1.0 (03/04/2012)

Publication: Wilson GA, Dhami P, Feber A, Cortazar D, Suzuki Y, Schulz R, Schar P, Beck S (2012) Resources for methylome analysis suitable for gene knockout studies of potential epigenome modifiers. GigaScience 1 (1). doi:10.1186/2047-217X-1-3

 

References

  1. Down TA, Rakyan VK, Turner DJ, Flicek P, Li H, Kulesha E, Graf S, Johnson N, Herrero J, Tomazou EM, et al: A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nat Biotechnol 2008, 26:779-785.
  2. Feber A, Wilson GA, Zhang L, Presneau N, Idowu B, Down TA, Rakyan VK, Noon LA, Lloyd AC, Stupka E, et al: Comparative methylome analysis of benign and malignant peripheral nerve sheath tumors. Genome Res 2011, 21:515-524.
  3. Wilson GA, Dhami P, Feber A, Cortazar D, Suzuki Y, Schulz R, Scchar P, Beck S: Resources for methylome analysis suitable for gene knockout studies of potential epigenome modifiers. Giga Science 2012, 1:3.

 

Glioma Cancer Stem

Gliomas are the most common form of primary brain tumours. Glioma cancer stem cells (CSC) are cells that possess characteristics associated with normal stem cells but are tumourigenic and can cause relapse and metastasis by giving rise to new tumours. Using methylome analysis, this project aimed to characterize epigenetic changes that occur during the differentiation of glioma CSC to committed progenitor cells.

Project Members

  • Helena Caren
  • Stephan Beck

 

Head and Neck Cancer Epigenome Project

Head and Neck Squamous Cell Cancer (HNSCC) is the sixth most common cancer worldwide and increasingly caused by infection with human papilloma virus (HPV). The aim of this project was to analyse HPV and non HPV-associated HNSCC epigenomes for differences in DNA methylation and microRNA expression for translation into biomarkers and therapeutic targets.

Project Leader

  • Matthias Lechner

Project Members

  • Andy Feber
  • Lee Butcher
  • Gareth Wilson
  • Stephan Beck

Main findings

Epigenetic regulation of inflammatory bowel disease

Ulcerative Colitis (UC) is a type of Inflammatory Bowel Disease (IBD) where inflammation develops in the large intestine (the colon and rectum). This project identified novel epigenetic variations associated with UC by conducting an EWAS on monozygotic twins discordant for the disease.

Project Leader: Liselotte Backdahl

Inflammatory bowel disease (IBD), is characterized by a complex aetiology in a combination of genes and environment. Despite years of research very few disease causing factors have been identified. This extended complexity could be attributed both to disease pathway heterogeneity and to epigenetic modifications of the genomic landscape that leads to altered gene expression profile of the disease tissue. The epigenetic machinery consists of short interfering RNA, histone modifications and DNA methylation. New tools are now available that will enable studies of these phenomena. By combining immuno-precipitation of methylated Cytosine containing DNA with a high resolution tiling array, consisting of 390K 50mer probes of significant epigenetic importance, a comprehensive insight to the DNA methylation profile of an individual in a given cell type can be generated. The ideal way to investigate "non-genetic" disease risk-factors is to study monozygotic twins and epigenetic and transcriptional differences and the transcription/ methylation levels will be analysed in mucosa biopsies.

Epigenetic regulation of inflammatory bowel disease

Inflammation is a key biological mechanism in the organism's defence system against harmful intruders. However, dysregulation of inflammatory mechanisms may be detrimental and is the underlying cause for widespread chronic diseases such as asthma, psoriasis, atopic eczema, periodonditis and inflammatory bowel disease (IBD). IBD is a chronic inflammatory disease of the gastrointestinal tract. It is divided into two main sub diseases; Crohn's disease (CD) and ulcerative colitis (UC).Their classification is based on clinical, endoscopic, histological and pathological parameters, including the distribution of inflammation in the colon and/or small intestine. There is a significant increased risk of developing either forms of IBD for relatives to diseased individuals.

 

The project aims to identify and characterize epigenetic modifications such as altered DNA methylation profiles that are indicative of or contributory to the pathophysiology of inflammatory bowel disease (IBD). Such methylation profiles will represent the first link between the predetermined genetic basis of susceptibility and flexible environmental factors. By integrating different technologies such as genome wide mRNA expression profiling together with high throughput methylation profiling of the epigenome, a correlated IBD specific DNA methylation/ expression profile will be generated. Discordant monozygotic twins represent an ideal cohort for monitoring association between epigenetic and transcriptional differences and the transcription/ methylation levels will be analysed in mucosa biopsies.

DNA methylation and disease

DNA methylation is the most common epigenetic modification and occurs nearly exclusively at cytosines in CpG dinucleotide enriched areas, which are often located in the promoter region of many genes. For instance, hypermethylation of promoter regions often results in gene silencing, while promoters of transcriptionally active genes are typically hypomethylated. Thus a direct link between methylation and the functional phenotype of a cell, organ or even organism can be postulated.

 

Disruption of methylation patterns is a characteristic feature in cancer where hypermethylation of tumor suppressor genes can lead to tumor progression and global hypomethylation may lead to activation of oncogenes. Methylation positions have been identified that correlate with cancerdiabetes type II, arteriosclerosis, rheumatoid arthritis and neurodegenerative diseases. Methylation at other positions has been shown to correlate with age, gender, nutrition, drug use, and probably a whole range of other environmental triggers. DNA methylation is one of the basic mechanisms that control transcription. It plays a role in chromosome X-inactivation, in silencing of tumour genes, in imprinting, and in many other fields and diseases as described above. Patterns of DNA methylation reflect the pattern of gene activity within a given cell type and functional cellular state. A recent publication indicates that homozygotic twin's global methylation profiles diverge over time, which indicates regulated differential methylation possibly due to lifestyle differences. Understanding DNA methylation changes and their correlation to gene expression patterns may therefore provide important novel insights into the complex pathophysiology of inflammation. As DNA methylation may be the most important flexible genomic parameter that can change genome function under an exogenous influence, it most likely provides the main link between the genetics of disease, and the environmental components that are widely acknowledged to play the decisive role in the aetiology of all inflammatory diseases in the focus of biomedical research today. New techniques have recently been developed which enable genome-wide methylation profiling.

One promising technique is methylated DNA immunoprecipitation MeDIP this method relies on immuno recognition to detect and precipitate methylated DNA, that later will be hybridized to a DNA array. Alternative methods to fractionate hyper- or hypo-methylated DNA sequences, include the use of methylation-sensitive restriction enzymes. The disadvantage however with this method is that it can present a sequence bias. Thus far two studies have been performed in cancer, using the MeDIP/DNA array techniques. DNA methylation differences however have never been investigated on a global basis in inflammation and this information, which is now accessible will yield information about the role of methylation in transcription generally. IBD is a suitable model disease since it has a clearly demonstrated polygenetic component, is also influenced by environmental factors, has well defined phenotypes, and shares many similarities with other complex diseases. The biomedical importance of a disease-focused approach is emphasised by the fact that only well characterized phenotypes can be correlated to methylation patterns. Discordant monozygotic twins represent an ideal cohort for monitoring association between epigenetic and transcriptional differences and the methylation levels since they have identical genomes but differ in disease status. Levels of both gene expression and methylation will be analysed in mucosa biopsies. Using intestinal mucosa biopsies is of great advantage as opposed to e.g. blood, since it is the tissue in which disease occurs, and it is accessible to environmental influences: The human intestine represents the largest interface between an individual and the environment.

Methods

The IBD methylation profile will be generated from an integrated genome wide mRNA expression/ DNA methylation profiling analysis in mucosal tissues. mRNA expression profiling will be performed by Professor S. Schreiber's research group at U. of Kiel. Both analyses will be done on IBD discordant monozygotic twin pairs and healthy monozygotic twin pairs. The global methylation profiling will be performed using high resolution tiling-path arrays. The analysis will be performed on mucosa biopsies from the monozygotic twins. In brief, sonicated genomic DNA is incubated with antibodies against methylated DNA. Methylated genomic fragments are separated by immuno-precipitation. This fraction is labeled with Cy3 and mixed with equal amounts of un-enriched DNA labeled with Cy5 and subsequently hybridized to the tiling path array. This method will determine methylation levels across the epigenome.

MHC Haplotype

The MHC Haplotype Project provides genetic resources for association studies into inflammatory, autoimmune and infectious disease as well as forming a framework for population genetic studies.

It was conducted between 2000 and 2006 at the Sanger Institute and offers a framework and resource for association studies of all MHC-linked-diseases. It provides the genomic sequences and gene annotation of 8 different HLA-homozygous typing haplotypes (listed below), their resulting variations (see Data below) and ancestral relationships.

The table below lists the eight cell lines used in the project along with their HLA haplotypes and alleles. Links are given to the Sanger Institute Vega database from which sequences and gene annotation can be downloaded, and to the same regions in the UCSC browser for those who prefer that approach.

The sequence from PGF is now incorporated in the reference sequence for chromosome 6. For the other seven haplotype sequences links are given to GRC entries (numbers with GL prefixes). These GRC haplotype contigs, called "alternate loci", are constructed so that they begin with additional anchor sequence derived from the reference. The fasta sequence derived from these links will, therefore, differ from that derived from Vega.

PGF

  • Haplotype: A3-B7-DR15
  • HLA-A: 03:01:01:01
  • HLA-B: 07:02:01
  • HLA-C: 07:02:01:03
  • HLA-DRB1: 15:01:01:01
  • HLA-DQB1: 06:02:01
  • HLA-DPB1: 04:01
  • Links: Vega; UCSC

COX

  • Haplotype: A1-B8-DR3
  • HLA-A: 01:01:01:01
  • HLA-B: 08:01:01
  • HLA-C: 07:01:01
  • HLA-DRB1: 03:01:01:01
  • HLA-DQB1: 02:01
  • HLA-DPB1: 03:01
  • Links: Vega; UCSC; GL000251.1      

APD

  • Haplotype: A1-B60-DR13
  • HLA-A: 01:01:01:01
  • HLA-B: 40:01:01
  • HLA-C: 06:02:01:01
  • HLA-DRB1: 13:01:01
  • HLA-DQB1: 06:03:01
  • HLA-DPB1: 04:02
  • LinksVega; UCSCGL000250.1

DBB

  • Haplotype: A2-B57-DR7
  • HLA-A: 02:01:01:01
  • HLA-B: 57:01:01
  • HLA-C: 06:02
  • HLA-DRB1: 07:01:01
  • HLA-DQB1: 03:03:02
  • HLA-DPB1: 04:01:01
  • LinksVega; UCSC; GL000252.1

MANN

  • Haplotype: A29-B44-DR7
  • HLA-A: 29:02:01
  • HLA-B: 44:03:01
  • HLA-C: 16:01
  • HLA-DRB1: 07:01:01:01
  • HLA-DQB1: 02:02
  • HLA-DPB1: 02:01:02
  • Links: Vega; UCSC; GL000253.1

SSTO

  • Haplotype: A32-B44-DR4
  • HLA-A: 32:01:01
  • HLA-B: 44:02:01:01
  • HLA-C: 05:01:01:02
  • HLA-DRB1: 04:03:01
  • HLA-DQB1: 03:05:01
  • HLA-DPB1: 04:01:01
  • LinksVega; UCSC; GL000256.1

QBL

  • Haplotype: A26-B18-DR3
  • HLA-A: 26:01:01
  • HLA-B: 18:01:01
  • HLA-C: 05:01:01:01
  • HLA-DRB1: 03:01:01:02
  • HLA-DQB1: 02:01:01
  • HLA-DPB1: 02:02
  • LinksVega; UCSC; GL000255.1

MCF

  • Haplotype: A2-B62-DR4
  • HLA-A: 02:01
  • HLA-B: 15:01:01:01
  • HLA-C: 03:04:01:01
  • HLA-DRB1: 04:01
  • HLA-DQB1: 03:01
  • HLA-DPB1: 04:02
  • LinksVega; UCSC; GL000254.1

MHC Haplotype Project Data

The data from the MHC Haplotype Project are available in .txt file format file for viewing in the UCSC Genome Browser.

Follow UCSC instructions for the loading of BED file data as a custom track. The data are colour-coded by haplotype and will initially be displayed showing 1Mb in the centre of the MHC. You are then free to adjust the co-ordinates and to zoom in to your region of interest.

When zoomed in you should change the display of the Sanger_MHC custom track from "dense" to "full". You may also want to unhide "Variation and Repeats SNPs (131)" and "Genes and Gene Prediction Tracks Vega genes".

Although the project data were originally issued in co-ordinates of a previous release of the Human Genome they have been converted using the UCSC tool LiftOver for use with the February 2009 GRCh37/hg19 Assembly.

The convention used for naming variations is: 

[haplotype name]:[BAC sequence SV number]_[base position in SV]_[variation] 

For single nucleotide polymorphisms "variation" consists of two letters, firstly, the base in the reference sequence, and secondly, the base in the other haplotype. Insertions and deletions are identified by "_i" and "_d" respectively, followed by the numerical value of their length, and their base sequence, if this is 12 bases or less. For longer sequences an "X" value is given which refers to a look-up table.

Publications

  • Horton et al 2008 Immunogenetics 60(1):1-18.
  • Traherne et al 2006 PLoS Genet. 2(1):e9.
  • Stewart et al 2004 Genome Res. 14(6):1176-87.
  • Allcock et al 2002 Tissue Antigens 59(6):520-1.

 

eFORGE

This project involved the development of eFORGE and eFORGE2, computational tools for the analysis and interpretation of DNA methylation data from Epigenome-Wide Association Studies (EWAS).

450K Analysis Pipeline

This project involved the development ChAMP and ChAMP v2, computational tools for the analysis of DNA methylation data from the Illumina 450k and EPIC arrays.

NOCRC

NOCRC aims to develop and validate high performance blood based tests for colorectal cancer in a recently initiated screening program in Denmark quantifying existing and novel markers by sensitive and cost-effective methods. Our contribution was towards the development of a targeted bisulfite sequencing assay.

BLUEPRINT

This project represents the EU contribution to the International Human Epigenome Consortium (IHEC). It focuses on the analysis of haematopoietic epigenomes from healthy individuals and patients with common leukaemias and autoimmune disease. Among our main contributions to BLUEPRINT were an epigenome-wide association study (EWAS) in Type 1 Diabetes and an epigentic variability study in immune cells.

Find out more

Oncotrack

This EU Network of Excellence aims to link epigenetics with systems biology. The multidisciplinary consortium modelled and quantified the dynamics and mechanisms of epigenetic modulation at the cellular and organismal level. Our contribution was to explore and quantify the link between genotype and epigenotype.

Find out more

Epigenesys

This EU Network of Excellence aims to link epigenetics with systems biology. The multidisciplinary consortium modelled and quantified the dynamics and mechanisms of epigenetic modulation at the cellular and organismal level. Our contribution was to explore and quantify the link between genotype and epigenotype.

Find out more

Cancer Genomics Engineering Facility

The BRC-funded Cancer Genome Engineering (CAGE) Facility develops and provides TALE- and CRISPR-based genome engineering technologies for targeted genetic and epigenetic reprogramming to advance personal and medical genomics.

Find out more

Epigenetics of Urological Cancers

Bladder cancer is the fifth most common cancer in the UK with over 11,000 new cases and almost 5,000 deaths per year. This project provided a better understanding of how aberrant DNA methylation can affect the aetiology of bladder cancer and how we can use this information to identify novel epigenetic biomarkers for the diagnosis and prognosis of the disease. The main outcome of this study was the development of UroMark.

Clinical Epigenetics

This programme aimed to establish integrated (epi)genomic analysis in a clinical setting, utilizing the NET BioBank at the Royal Free and UCL Hospitals. Neuroendocrine Tumors (NETs) are a heterogeneous group of neoplasms which arise from the hormone-producing cells of the bodyís nervous and endocrine systems and affect 1/50,000 in the UK population.

EpiTrain

EpiTrain is an Integrated Training Network (ITN) providing high-level training and career development for PhD students and postdocs at local host institutions, complemented by exchange programs and topical workshops. The scope of the project is to develop a broad scale understanding of epigenetic processes in common disease.

IDEAL

This consortium carried out integrated research on developmental determinants of aging and longevity with focus on epigenetic mechanisms. Our contribution to IDEAL is to determine the role of DNA methylation in haematopoietic stem cell aging.

IT Future of Medicine (ITFoM)

IT Future of Medicine (ITFoM) is one of six flagship pilot projects taking advantage of recent technological advances to produce computational models of individual patients - virtual patients! These models will follow each patient through their healthcare system enabling physicians to virtually test and optimise personalised treatments.

Nerve Sheath Tumour Methylome

Using comparative methylome analysis of benign and malignant peripheral nerve sheath tumors (MPNST), we identified loss of DNA methylation at satellite repeats as candidate biomarker for disease progression and challenged the dogma of global hypomethylation in cancer.

Main findings

ZooArray: Epigenetic Insights into Vertebrate Genomes

Epigenetic modifications play crucial roles in organizing chromatin structure, specifically in regions which control gene expression and regulate other cellular processes. The aim of this project is to elucidate and characterize the epigenetic states of evolutionarily conserved sequences.

HEROIC

The EU-FP7 funded HEROIC Project was conducted between 2005-2010 and provides a resource (Ensembl Projects) for functional studies into chromatin remodeling, mouse embryonic stem (ES) cell differentiation and regulation of the (epi)genome in general.

Human Epigenome Project

The EU-FP5 and Wellcome Trust funded HEP was conducted between 1999-2006 and provides an epigenetic resource of chromosomal DNA methylation reference profiles of human tissues and cell lines.

The LRC Haplotype Project

The LRC Haplotype Project provides genetic resources for association studies into inflammatory, autoimmune and infectious disease as well as forming a framework for population genetic studies.