Manual Curation

We curate the scientific literature to provide: Gene Ontology annotations to the GO Consortium and protein interaction data to the IMEx Consortium, this process is known as 'manual curation'.

Biocuration is the translation of biological knowledge and its integration into databases thereby making the data easily accessible for visualisation and analysis. High quality human gene annotations are generated through a combination of computational and manual techniques (Barrell et al., 2009, Dimmer et al., 2012), both of which require a team of skilled biologists (often expert biocurators) and software engineers. As biocurators we are creating annotations which are propagated to popular freely available online knowledgebases, such as UniProt, Ensembl and NCBIGene as well as numerous other public and commercial analysis tools.

Manual gene annotation (biocuration) involves the extraction of information from published scientific papers (Balakrishnan et al., 2013, Orchard et al., 2014 & Huntley et al., 2015). In the resources we contribute to the majority of Gene Ontology (GO) or protein or molecular interaction annotations are attributed to an identified reference by use of a publication identifier and each annotation must indicate what kind of evidence supports the association between the gene product and the GO term, or the molecular/protein interaction. In addition, we are also creating new GO terms to enable the full description of the molecular function of gene product, its biological role in the cell and its cellular location.

The large-scale assignment of GO terms to human gene products using computational methods is a fast and efficient way of associating high-level terms to a large number of genes. However, to provide more reliable and specific annotations, GO curators use information from the published scientific literature to 'manually' associate highly descriptive GO terms to gene products. Similarly, protein interaction data is 'manually' captured from both high-throughput datasets, such as yeast-2-hybrid experiments as well as from small scale experimental data.

Consequently complete, highly detailed annotation of the processes and networks that a single gene product is involved in, may take a considerable time, depending on the number of published papers describing the gene product, and the complexity of the papers being annotated. These gene product annotations enable researchers to rapidly evaluate and interpret data and generate hypotheses to guide future research into cardiovascular and neurological processes.

In addition, as active members of the GO Consortium we participate in discussions about guidelines to ensure consistant approaches to annotation.

Gene Ontology Annotation and Term Development

We use the GOA curation tool to associate Gene Ontology terms to gene products including proteins, RNAs and macromolecular complexes. The data we produce is incorporated into the GOA and GO Consortium databases. Annotations contributed by UCL are attributed to BHF-UCL, ParkinsonsUK-UCL, ARUK-UCL or SynGO-UCL and also many of the HGNC GO annotations were created by the HGNC while based at UCL.

Annotation of Molecular Interactions

The creation of protein interaction networks is an essential step towards the unravelling of the complex molecular interactions in living organisms. Therefore, the UCL annotation team is contributing experimentally verified protein interaction data to two databases:

We use the IntAct editing tool to capture protein interactions at a very detailed level as described in the IMEx guidelines. These annotations are directly incorporated into the IntAct database, from where they are exported to the IMEx Consortium database.
Protein-protein and microRNA-mRNA interactions are also submitted to the Gene Ontology Consortium (GOC) database using the GOA curation tool and can be accessed via the QuickGO website.
New! All of these datasets ("bhf-ucl", "EBI-GOA-nonIntAct" and "EBI-GOA-miRNA"), can also be accessed using the Cytoscape software via PSICQUIC.

Contributing to Annotation Guidelines

The UCL team is actively involved the formulation of annotation guidelines, as part of the Annotation team in the GO Consortium.

Topic	Details
Gene expression and transcription curation guidelines	Rachael and Barbara created a decision tree based on experimental assays used to measure gene expression and transcription to assist curators in choosing the appropriate GO terms for the evidence provided. Ruth is working with GREEKC to improve the representation of transcription processes in several online resources, including the Gene Ontology
Evidence code guidelines	Rachael provided several use case examples for the IGI evidence code documentation and Rebecca has initiated a change to the IC evidence code application.
Capturing non-GOC contributions	Ruth was involved in discussions to create GOC webpages that encourage expert scientist to suggest improvements to the GO and submit GO annotations.
Protein complexes	Nancy participated in two protein complex working group (WG); the complex annotation WG and the complex ontology WG. The groups discuss the various aspects of creating and annotating protein complexes.
Annotation extension	Ruth, Rachael and Rebecca were involved in discussions about the use of this field and the associated relationships. Creating wiki pages for each relation to document their application.
microRNA	Ruth chaired discussions on guidelines for the annotation of microRNAs. Rachael has created a miRNA curation manual in consultation with the GOC and miRNA experts.
Response to	Ruth and Varsha were part of the 'response to' working group formulating guidelines to standardise the use of 'response to' GO terms.
Regulation	Ruth and Varsha were part of the 'regulation' working group creating guidelines to standardise the use of 'regulation' GO terms.
Downstream processes	Varsha and Rachael Huntley (previously GOA) co-chaired the 'downstream process' working group, to formulate guidelines for the annotation of downstream processes.
Binding	Ruth was a member of the binding working group which formulated guidelines for the useage of binding terms.

The Functional Gene Annotation team is supported by Alzheimer's Research UK grant ARUK-NAS2017A-1 and the National Institute for Health Research University College London Hospitals Biomedical Research Centre.

Manual Curation

Highlights

Ruth Lovering Resignation

Recent Publications

Contact details