Community curation

Contributing to this community curation effort will improve your understanding of Gene Ontology and, potentially, improve the profile of your research papers.

How to submit GO annotations for proteins and non-coding RNAs

By submitting annotations to the GO Consortium resource you have the potential to improve your analysis of high-throughput data. Furthermore, by annotating your own published research, you will have the opportunity to improve the profile of your papers in the world's leading biological databases, such as NCBIGene, UniProt and GeneCards.

UCL functional gene annotation GO curation pipeline

Before you start consider registering for an ORCID ID. This will enable all your annotations to be attributed to you. In addition, many journals request authors provide an ORCID ID. In addition, there are a wide range of bioinformatic resources available, so consider which resource you want to contribute to. For example, you may want to focus on curating a bioinformatic resource that you use because you are aware that these are missing key information. Go to the ISB website for a list of resources which provide facilities to submit annotations.

Consider downloading this

tutorial

which covers most of the information below with annotation examples for you to work through.

How to submit a GO annotation

Simple Steps for GO annotation

Email Ruth Lovering r.lovering@ucl.ac.uk to let her know you plan to submit some GO annotations
Download the

Annotation Form

Annotation Form

excel spreadsheet to capture your annotations or, for protein annotation only, try the Canto tool (see below for Canto tutorial)
Select article for annotation, often based on abstract.

Check that the PubMed Identifier (PMID) has not already been annotated using QuickGO (see below for QuickGO tutorial for filtering with PMID)
If it has already been annotated go to another paper
If it hasn’t been annotated see if you can annotate it
Add your name and the PMID for the paper to the file name of the Annotation Form

4. Read method section to confirm you can identify species of gene/protein used for experiment

If not here, check rest of paper for this information (including supplemental data)
Add statement confirming species information to the Annotation Form
Add species and gene symbol/alias to Annotation Form (for human proteins include the HGNC approved symbol)
Find the protein UniProt ID or the non-coding RNA ID in RNAcentral (see below for UniProt or RNAcentral tutorial)
PASTE the UniProt ID or RNAcentral ID in the Annotation Form.

5. Read results section and identify GO terms supported by experimental evidence

Find the appropriate GO term using QuickGO (see below for QuickGO tutorial for finding GO terms)
PASTE the GO ID and GO term name in the Annotation Form
OPTIONAL: Add the evidence code (see below for evidence code tutorial)
OPTIONAL: If you want to capture a molecular interaction then you can use the WITH field to add the 'target' of the interaction.
OPTIONAL: Add the relevant cell or tissue type (and the target protein if relevant) (see below for annotation extension tutorial)
Add the Figure number
Add the supporting statement

6. Optional extra - read paper introduction

Are any of the statements here supported by the results section, but not captured by your annotations? Eg toxin metabolism supported by experiment confirming thiourea metabolism
If there are add these annotations
If there are potential GO terms that are not supported by the results, are these GO terms already in the database?
If they are already in the database, then don’t worry about them.
If there are potential GO terms that are not supported by the results and are not in the database either try to find papers to support these key statements (or annotate these using the TAS or NAS evidence code)

Navigating QuickGO

The QuickGO browser has been developed by EMBL-EBI to browse the GO hierarchy and view annotations for individual gene products. The QuickGO home page provides a text box to start searching for GO information. You may search for any aspect of a GO annotation including; GO term names and synonyms, GO IDs, UniProtKB accessions, or UniProtKB keywords. Below are some example uses for the browser.

1. Browsing GO terms

Open QuickGO and enter in the Search field a cellular component name, such as ‘nucleus’
QuickGO will return any relevant GO terms associated with the word ‘nucleus’
The first 5 GO terms are displayed but the full list can be viewed using the 'show all results' option below the returned list
Select 'show all results' and use the options in the left-hand side menu to view terms from a particular aspect of the GO, i.e. Molecular Function, Biological Process or Cellular Component.
NOTE: Some terms are retrieved due to information in their synonym or definition fields
Click on the GO ID for the term to see the full details of the selected term within the GO term record
The menu on the left-hand side of the GO term record enables quick navigation to the sections you are interested in
All gene products associated with a single GO term are accessible using the blue 'annotation' button below the GO term definition
The ancester chart is useful for viewing the terms parent terms, whereas the child terms are listed in the section below
Try to choose a term to associate with your gene which is a specific as the data supports. There is no need to select the parent and child terms

2. Using the filtering options

Filtering allows you to manipulate the dataset according to the attributes you are interested in. Use the filtering tabs in the bar above the table with annotations.

Click on annotations (under the GO term definition) in the QuickGO entry to see all the gene products associated with a single GO term or 'view GO annotations' from the home page, the following assumes you have started from a GO term record
Click on the 'taxon' tab and add a taxon ID, such as 9606 for human and click ‘apply’ at the bottom of the window - only human gene products assocated with the term are displayed
Click on the 'Gene Product' tab and add a UniProt or RNAcentral ID, click "Add' and click ‘apply’ at the bottom of the window - only gene products assocated with the term are now listed
Click on ‘GO terms’, and on ‘Options’ at the bottom of the drop-down list. Select the bottom option bullet: ‘is_a, part_of, occurs_in, regulates’ to include annotations to the regulation child terms click ‘apply’.
To clear the selected filters use the 'clear all' button at the end of the filter tab menu

3. To view GO annotations associated with a specific paper

Click on ‘View GO Annotations’ on the QuickGO homepage
Click on the ‘References’ tab above the Annotation table
Type in the PMID e.g. ‘PMID:15919722’ without any spaces or '15919722’
Click ‘Add’ then click ‘apply’.

4. Additional tips

Changing the format of the ancester chart: at the top of the ancester chart is a blue button 'chart options' this enables you to increase the size of the term boxes displayed so that you can read the full GO term names
Use the basket icons to select GO terms and then compare how they are located in the ontology within a single chart

Finding UniProt IDs to curate

It is important to make sure that you are curating the protein you want to curate before you start, as many proteins have multiple aliases and the name you are using might be the alias of more than one protein in that species. To avoid the association of data with the wrong protein, the UCL protein annotations are associated with the unique UniProt identifier (ID) for the protein. So, to start you need to learn how to find the correct identifier for the proteins you would like to curate.

1. At the top of the UniProt home page is the 'Search' field which allows the database to be searched using keywords, similar to how one searches Google (logical operators such as “and” and “but not” can be used to restrict search).

2. In the 'Search' field type in the name of your protein of interest (for example “Myosin light chain kinase” or "BTK") and click on the search button

3. On the lefthand side are options to filter the results, eg ‘human’ or you can use the other organism field to cut down on the volume of results

4. UniProt/SwissProt entries have a gold star and UniProt/TrEMBL entries are a grey-blue. The UniProt/SwissProt gold star entries have been manually annotated by a curator. Ideally all annotations you submit will be associated with the gold star entries if there is one for your gene.If you are submitting an annotation for a human protein there should be a gold star (manually curated) record. At the top of the UniProt record it will state: Status reviewed (for gold star records) or unreviewed (for grey star records)

NOTE: the search looks for any mention of the symbol you submitted within the protein record. Usually a match with the name or alias fields will be near the top. Many of the returned records are for proteins that interact with the protein you have searched for.

5. After selecting the protein record you are interested in check the record to make sure the information is as expected for your protein, eg the chromosomal location, or the other listed names and aliases

6. Click on the ‘FUNCTION’ link (in the blue panel on the left side of the page), for your protein, and scroll down to the “GO - Molecular function” section and click on the link to: Complete GO annotation on QuickGO ...

7. This will list all the existing annotations for your protein of interest

NOTE: if you have a long list of proteins to curate the Retrieve/ID mapping facility is very good

Finding RNAcentral IDs to curate

It is important to make sure that you are curating the non-coding RNA (ncRNA)you want to curate before you start. To avoid the association of data with the wrong non-coding RNA, the UCL non-coding RNA annotations are associated with the unique RNAcentral identifier for your ncRNA. For microRNAs this is reasonably straight forward, but for long non-coding RNAs (lncRNAs) this is more complicated due to the large number of transcripts associated with these genes. If you would like to curate lncRNAs contact Ruth to discuss further. (ID), which is a unique ID. So, you need to learn how to find the correct identifier for the ncRNAs you would like to curate.

1. At the top of the RNAcentral home page is the 'Search' field which allows the database to be searched using keywords, similar to how one searches Google (logical operators such as “and” and “but not” can be used to restrict search).

2. In the 'Search' field type in the name of your ncRNA of interest (for example “hsa-mir-126” or "mir-126-3p") and click on the search button

3. On the lefthand side are options to filter the results, eg ‘human’ or you can use the other organism field to cut down on the volume of results

4. After selecting the ncRNA you are interested in check the RNAcentral record to make sure the information is as expected for your ncRNA, eg the chromosomal location, or the sequence.

5. If there are any GO annotations associated with your ncRNA these will be listed below the genomic map of the ncRNA

Selecting an Evidence code (Optional)

All GO annotations have an associated evidence code which indicates the category of the evidence that was used to make the annotation. There are currently six evidence codes that are used to categorise experimental data (see GO Consortium website for more information). There are ten evidence codes for computational evidenced annotations and five evidence codes for non-experimentally evidenced annotations. There are only 5 experimental evidence codes for you to consider:

IDA	Inferred from Direct Assay Enzyme assays In vitro reconstitution (e.g. transcription) Immunofluorescence (for cellular component) Cell fractionation (for cellular component) Physical interaction/binding assay (sometimes appropriate for cellular component or molecular function)
IGI	Inferred from Genetic Interaction (Ideally has a protein accession ID in the ‘WITH’ field, remember to create the reciprocal annotation) "Traditional" genetic interactions such as suppressors, synthetic lethals, etc. Functional complementation Rescue experiments Inference about one gene drawn from the phenotype of a mutation in a different gene
IMP	Inferred from Mutant Phenotype mutations, natural or introduced, that result in partial or complete impairment or alteration of the function of that gene polymorphism or allelic variation (including where no allele is designated wild-type or mutant) any procedure that disturbs the expression or function of the gene, including RNAi or the use of any molecule or experimental condition that may disturb or affect the normal functioning of the gene, such as inhibitors overexpression or ectopic expression of wild-type or mutant gene that results in aberrant behavior of the system or aberrant expression where the resulting mutant phenotype is used to make a judgment about the normal activity of that gene product
IPI	Inferred from Physical Interaction (must have a protein accession ID in the ‘WITH’ field, remember to create the reciprocal annotation) 2-hybrid interactions Co-purification Co-immunoprecipitation Ion/protein binding experiments

Using the Annotation Extension field (Optional)

If the results provide additional information, which would add value to the annotation, e.g. cell type, tissue type, or a regulation target, this can be included in the annotation extension (AE) field. More information about this is available in Huntley and Lovering, 2017.

NOTE: the AE relates the primary GO term in the annotation.

1. In order to find the correct identifier for the cell, or tissue type, use the Ontology Lookup Service
2. This service allows you to search for terms from any biological ontology including GO.
3. For human annotations only use the Cell Ontology (CL) or UBERON for tissue types
4. Or just paste the information into this field and the checker will find the relevant IDs to include
5. To add the target of the protein you are curating (eg if you are curating a kinase and know the protein phosphorylated, then add this information

Canto curation tool

The Canto - GO Community Curation - tool was developed and is maintained by PomBase. It has been adapted to enable the curation of UniProt IDs across any species. Please read the Canto tutorial before starting, although it is a very easy tool to use.

Email Ruth (r.lovering@ucl.ac.uk) if you have any problems.

Excel Annotation Form

The

Annotation Form

can be used to submit GO annotations. Once you have completed the annotation of an article email the excel doc to Ruth (r.lovering@ucl.ac.uk). Information about GO annotation is provided above.

Please ensure you include the following in your excel spreadsheet (the :

Name
Email address
PubMed identifier (ID, in the format PMID:#####) of the article
Information in the article that confirms the species of the protein you have curated
Species of the protein you have curated
Gene symbol
UniProt ID
GO ID
GO term name
Figure number(s)
Summary of supporting data in figure(s)
- Evidence code is optional, we can supply this
- WITH is optional, we can supply this if required
- Annotation extension information, is optional, we can supply this if data is available in the article

The Functional Gene Annotation team is supported by Alzheimer's Research UK (grants ARUK-NAS2017A-1) and the National Institute for Health Research University College London Hospitals Biomedical Research Centre.

Community curation

How to submit GO annotations for proteins and non-coding RNAs

tutorial

How to submit a GO annotation

Annotation Form

Annotation Form

Highlights

Ruth Lovering Resignation

Recent Publications

Contact details