We curate the scientific literature to provide: Gene Ontology annotations to the GO Consortium and protein interaction data to the IMEx Consortium, this process is known as 'manual curation'.
Manual gene annotation (biocuration) involves the extraction of information from published scientific papers (Balakrishnan et al., 2013, Orchard et al., 2014 & Huntley et al., 2015). In the resources we contribute to the majority of Gene Ontology (GO) or protein or molecular interaction annotations are attributed to an identified reference by use of a publication identifier and each annotation must indicate what kind of evidence supports the association between the gene product and the GO term, or the molecular/protein interaction. In addition, we are also creating new GO terms to enable the full description of the molecular function of gene product, its biological role in the cell and its cellular location.
The large-scale assignment of GO terms to human gene products using computational methods is a fast and efficient way of associating high-level terms to a large number of genes. However, to provide more reliable and specific annotations, GO curators use information from the published scientific literature to 'manually' associate highly descriptive GO terms to gene products. Similarly, protein interaction data is 'manually' captured from both high-throughput datasets, such as yeast-2-hybrid experiments as well as from small scale experimental data.
Consequently complete, highly detailed annotation of the processes and networks that a single gene product is involved in, may take a considerable time, depending on the number of published papers describing the gene product, and the complexity of the papers being annotated. These gene product annotations enable researchers to rapidly evaluate and interpret data and generate hypotheses to guide future research into cardiovascular and neurological processes.
In addition, as active members of the GO Consortium we participate in discussions about guidelines to ensure consistant approaches to annotation.
- Gene Ontology Annotation and Term Development
We use the GOA curation tool to associate Gene Ontology terms to gene products including proteins, RNAs and macromolecular complexes. The data we produce is incorporated into the GOA and GO Consortium databases. Annotations contributed by UCL are attributed to BHF-UCL, ParkinsonsUK-UCL, ARUK-UCL or SynGO-UCL and also many of the HGNC GO annotations were created by the HGNC while based at UCL.
- Annotation of Molecular Interactions
The creation of protein interaction networks is an essential step towards the unravelling of the complex molecular interactions in living organisms. Therefore, the UCL annotation team is contributing experimentally verified protein interaction data to two databases:
- We use the IntAct editing tool to capture protein interactions at a very detailed level as described in the IMEx guidelines. These annotations are directly incorporated into the IntAct database, from where they are exported to the IMEx Consortium database.
- Protein-protein and microRNA-mRNA interactions are also submitted to the Gene Ontology Consortium (GOC) database using the GOA curation tool and can be accessed via the QuickGO website.
New! All of these datasets ("bhf-ucl", "EBI-GOA-nonIntAct" and "EBI-GOA-miRNA"), can also be accessed using the Cytoscape software via PSICQUIC.
- Contributing to Annotation Guidelines
The UCL team is actively involved the formulation of annotation guidelines, as part of the Annotation team in the GO Consortium.