Curation Process

High quality human gene annotations are generated through a combination of computational and manual techniques (Barrell et al., 2009, Dimmer et al., 2012), both of which require a team of skilled biologists and software engineers.

Manual gene annotation involves the extraction of information from published scientific papers (Balakrishnan et al., 2013, Orchard et al., 2014 & Huntley et al., 2015). Every Gene Ontology (GO) or molecular interaction including protein interaction annotation is attributed to an identified reference by use of a publication identifier and each annotation must indicate what kind of evidence supports the association between the gene product and the GO term, or the molecular/protein interaction.

The large-scale assignment of GO terms to human gene products using computational methods is a fast and efficient way of associating high-level terms to a large number of genes. However, to provide more reliable and specific annotations, GO curators use information from the published scientific literature to ‘manually’ associate highly descriptive GO terms to gene products. Similarly, protein interaction data is captured from both high-throughput datasets, such as yeast-2-hybrid experiments as well as from small scale experimental data.

Consequently complete, highly detailed annotation of the processes and networks that a single gene product is involved in, may take a considerable time, depending on the number of published papers describing the gene product, and the complexity of the papers being annotated.

UCL functional gene annotation GO curation pipeline

  (click on the image to see a high resolution version)