XClose

UCL Institute of Cardiovascular Science

Home
Menu

Manual Curation

We curate the scientific literature to provide: Gene Ontology annotations to the GO Consortium and protein interaction data to the IMEx Consortium, this process is known as 'manual curation'.

Biocuration is the translation of biological knowledge and its integration into databases thereby making the data easily accessible for visualisation and analysis. High quality human gene annotations are generated through a combination of computational and manual techniques (Barrell et al., 2009, Dimmer et al., 2012), both of which require a team of skilled biologists (often expert biocurators) and software engineers. As biocurators we are creating annotations which are propagated to popular freely available online knowledgebases, such as UniProt, Ensembl and NCBIGene as well as numerous other public and commercial analysis tools.

Manual gene annotation (biocuration) involves the extraction of information from published scientific papers (Balakrishnan et al., 2013, Orchard et al., 2014 & Huntley et al., 2015). In the resources we contribute to the majority of Gene Ontology (GO) or protein or molecular interaction annotations are attributed to an identified reference by use of a publication identifier and each annotation must indicate what kind of evidence supports the association between the gene product and the GO term, or the molecular/protein interaction. In addition, we are also creating new GO terms to enable the full description of the molecular function of gene product, its biological role in the cell and its cellular location.

The large-scale assignment of GO terms to human gene products using computational methods is a fast and efficient way of associating high-level terms to a large number of genes. However, to provide more reliable and specific annotations, GO curators use information from the published scientific literature to 'manually' associate highly descriptive GO terms to gene products. Similarly, protein interaction data is 'manually' captured from both high-throughput datasets, such as yeast-2-hybrid experiments as well as from small scale experimental data.

Consequently complete, highly detailed annotation of the processes and networks that a single gene product is involved in, may take a considerable time, depending on the number of published papers describing the gene product, and the complexity of the papers being annotated. These gene product annotations enable researchers to rapidly evaluate and interpret data and generate hypotheses to guide future research into cardiovascular and neurological processes.

In addition, as active members of the GO Consortium we participate in discussions about guidelines to ensure consistant approaches to annotation.  

Gene Ontology Annotation and Term Development

We use the GOA curation tool to associate Gene Ontology terms to gene products including proteins, RNAs and macromolecular complexes. The data we produce is incorporated into the GOA and GO Consortium databases. Annotations contributed by UCL are attributed to BHF-UCL, ParkinsonsUK-UCL, ARUK-UCL or SynGO-UCL and also many of the HGNC GO annotations were created by the HGNC while based at UCL.

Annotation of Molecular Interactions

The creation of protein interaction networks is an essential step towards the unravelling of the complex molecular interactions in living organisms. Therefore, the UCL annotation team is contributing experimentally verified protein interaction data to two databases:

Contributing to Annotation Guidelines

The UCL team is actively involved the formulation of annotation guidelines, as part of the Annotation team in the GO Consortium.