Additional notes when using Cytoscape with the GOlorize and BiNGO plugins. These notes assume you are already familiar with using Cytoscape. If you haven't used this resource yet instructions are available at http://manual.cytoscape.org/en/stable/ or attend a training course (available at EMBL-EBI and UCL). These notes are written for Cytoscape 3.7.2 and may not be relevant when using other versions of Cytoscape.
- What data is in the PSICQUIC EBI-GOA-miRNA file
Key information listed in the Cytoscape Network view 'Edge Table'
The primary interaction type is listed as 'physical association' if there is experimental evidence that the microRNA binds to a mRNA. The primary interaction type is listed as 'association' if the microRNA is predicted to bind an mRNA and there is experimental evidence that the mRNA levels are altered by the presence or absence of the microRNA.
Key information listed in the Cytoscape Network view 'Node Table'
The 'name' lists the ENSG identifiers for the gene targets of the microRNA.
- How to make sure each entity is only represented by a single node
Unless you are planning to investigate the different interactions associated with different isoforms it is likely that you will only want each protein to be represented by a single node.
There are several places where care is required to ensure this happens:
- If you are merging several networks then you need to check what the best column to use for the merge is.
- It is important to look at the identifies in the columns that will used for the merge to edit any isoform IDs so that the canonical protein and its isoforms are merged into a single node. Cytoscape will merge these if they are edited even if they are from the same resource.
- Consider what the identifier is representing. For example, IntAct now includes DNA sites bound by transcription regulators. These are represented by ENSG IDs. It would not be appropriate to merge a protein ID with a DNA ID for the same gene, as this would imply that the protein binds a transcription regulator, when it is the genomic DNA that is binding the transcription regulator.
UniProt represents protein isoforms using '-#' and posttranscriptional modification/cleavage products are represented with ‘:PRO#'. To merge 2 or more datasets you will probably want to remove the isoform and other specific details so that only 1 node represents each protein. While the UniProt accession column is often used for the merge, if your network has miRs or other IDs, or if you plan to overlay the network with GO terms it will be more efficient to edit the ‘name’ column.
If you don’t do this then 2 isoforms of a single protein eg P12345 and P12345-2 will be represented on your network as 2 nodes.Before merging your networks aim to only edit the ‘name’ column in each network Identify and edit all isoforms and PRO information in the different networks
- This can be done using the select/filter option: choose column: node:name and include – in the free text field, or :PRO.
- Then edit the IDs in the name column to remove the isoform info '-#' (ie just delete -1), or the PRO ID and leave the rest of the UniProt ID intact).
- use the select option to find all of these (eg select/filter option: choose column: node:name and include EBI– in the free text field
- click on one of these selected nodes and right click > edit > cut, this will delete all the selected nodes
- check that you are happy with the information in the column being used for the merge in each network before merging.
- Merge the networks based on the name, as you have now edited this column.
- Allow the merge to merge the same nodes within a network too.
- How to download current GO ontology
To download a current ontology file (go-basic.obo) open this page in Firefox http://geneontology.org/page/download-ontology, right-click on the Ontology file URL (http://purl.obolibrary.org/obo/go/go-basic.obo) and use ‘open link in new tab’ > Save File which should download the file with the name go-basic.obo.
- How to download and merge EBI gene_association files
Note: As of March 2021 the Cytoscape App BinGO is not able to use the new gaf 2.2 files. Therefore for BinGO analyses download previous gaf 2.1 files from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/old/
Use Firefox browser to download files to download a current human annotation file go to ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/ connect as Guest and select goa_human.gaf.gz, after the file is unzipped rename it to gene_association.goa_human.
For other species go to ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/.
For previous files go to ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/old/
NOTE: There are now multiple files for each species, you need to make sure you choose the right ones to use to fully exploit the annotation data. For example there are 4 files for human: protein, isoforms, RNA and complex. If you are only looking at protein networks make sure you include both the protein and isoform files in your analysis or you will miss a lot of annotations.
If these are needed, then all the required files for the proteins, isoforms, complexes and RNAs need to be downloaded and merged into one file.
The easiest way would be todownload the files open them with text editing application, not word as it will add additional hidden information. Remove the heading information (see below) from the smaller text files (ie delete until the first row of annotations). It is important to make sure no extra returns etc are present. Heading information to remove:
!Select and copy all annotation text in the smaller file Paste into the larger file (usually this is the human protein file), just underneath the header section. Then check that there are no extra returns at the end of the pasted section or at the top. Save file and repeat as required Cytoscape plugins GOlorize and BinGO will not be able to use the downloaded files with their current names for files released since June 2016 which coincides with release 158 and above. A typical file name format that looks like this:
- where 164 is the release number.
- the expression "gene_association" must exist in the file name
- the expression "goa_human" must exist in the file extension (or goa_anytext)
- .gaf should be removed from the file name
- The ideal format is: gene_association.goa_human
!The set of protein accessions included in this file is based on UniProt reference proteomes, which provide one protein per gene.
!They include the protein sequences annotated in Swiss-Prot or the longest TrEMBL transcript if there is no Swiss-Prot record.
!If a particular protein accession is not annotated with GO, then it will not appear in this file.
!Note that the annotation set in this file is filtered in order to reduce redundancy; the full, unfiltered set can be found in
!date-generated: 2021-02-16 12:37
- How to add new annotations not yet included in the ftp gene association files
For students submitting annotations for their project.
- Go to QuickGO
- Filter references with the PMIDs of the articles you have annotated
- Download the data from QuickGO as a gaf file
- Paste the annotations into the larger ftp gaf files as above.
Can also download gaf files from Protein2GO, on an article-by-article basis.
- How to overlay GO terms onto a Cytoscape network
The GO gene association isoform file associates GO terms with UniProt, RNAcentral or Model Organism Identifiers.
1. The BinGO analysis APP uses the information in the ‘name’ column for the GO enrichment analysis.
2. In the GO gene_association files, isoform information is included in column 17, but this information is not interpreted by BinGO.
a. Therefore, BinGO only overlays GO terms onto the canonical non-isoform IDs.
3. It is therefore necessary to edit the ‘name’ column in the network table to remove isoform and PRO information (eg -1, -2) so that the identifiers are the canonical UniProt Identifiers that BinGO will recognise.
If you don’t do this then 2 isoforms of a single protein eg P12345 and P12345-2 will be analysed as 2 distinct proteins, and the more specific isoform will not have the full GO terms associated with it.
If you have followed the information above which describes how to edit the ‘name’ column you may not need to do the following:
1. Select all nodes and view annotation rows in ‘Node Table’.
2. In the column ‘name’ find any identifiers that are not used in the GO gene association file, such as ENSG or EBI IDs. It is necessary to edit this column so that the these identifiers in the ‘name’ column are changed to the equivalent UniProt, RNAcentral or Model Organism IDs.
a. Note that you might want to just delete the ENSG and EBI IDs (see section below if there are a lot to edit)
b. UniProt has a very good mapping tool to enable you to download the UniProt identifiers: http://www.uniprot.org/uploadlists/.
3. Edit the Node table, replacing all the identifiers with their equivalent UniProt RNAcentral or Model Organism identifiers.
UniProt has a very good mapping tool to enable you to download the UniProt identifiers.
- Additional revisions required for microRNA networks
It is nice to see HGNC symbols or other gene, microRNA, protein names as the network node labels.
1. The best column to use for this is the Human Readable Label column. Many of the protein interaction files automatically include appropriate names in this column.
2. However, although the EBI-GOA-miRNA file now has UniProt and RNAcentral IDs in many of the relevant ID columns, it does not have the HGNC symbol in the Human Readable Label column
3. So if you want to see the protein/mRNA and microRNA names this information needs to be changed.
4. In addition, maybe necessary (for example if you are using IntAct files) to edit the ‘name’ column so that the ENSG identifiers in the 'name' column are changed to the equivalent UniProt Identifiers or removed, for information about this see sections above.
In order for the correct names to show in the interaction network, the “Human Readable Label” must have the gene symbol or miRNA name present.
Editing the ‘Human Readable Label’ column
Although it is possible to edit the Human Readable Label names directly in Cytoscape this is very time consuming.
When working with microRNA data it is more efficient to change the Human Readable Label names by uploading a table with the data.
In order to change the RNAcentral IDs from the EBI-GOA-miRNA interaction network to human-readable miRNA names, follow these steps:
1. If you have a lot of miRNAs to rename you will need to download a ‘mapping file’, options include:
- ftp (guest access works fine) RNAcentral FTP site: ftp://ftp.ebi.ac.uk/pub/databases/RNAcentral/current_release/id_mapping/....
- The tarbase file has the miR-##-## symbol format
- The hgnc file has the HGNC symbols.
- Or you could download annotations from QuickGO and use the names that are in that file (remove the additional information eg ‘Homo sapiens (human) hsa-‘.
2. Download the EBI-GOA-miRNA interaction network node table from Cytoscape
3. Edit the Human Readable Label column of the node table in Excel (use VLOOKUP (eg =VLOOKUP(C2, Sheet1!B:C, 2, FALSE) if sheet 1 has the ID mapping file and the ID to import is in column 2) and the mapping file in (1) above if lots of IDs to change)
4. If you use VLOOKUP remember to change the formulas to values.
5. Create a .txt file which has 2 columns only: ‘shared name’ + ‘Human Readable Label’
- ie delete the unwanted columns and rearrange the columns to the correct order and save as a .txt file
- NOTE: remember to ALWAYS add the column title, otherwise Cytoscape will turn the first term row into column titles.
6. Go to the Cytoscape tool bar menu on top left of the screen and select “Import table from file”
7. Check the 'Import Columns From Table' pop up window has the following options selected:
- Where to Import Table Data > To a Network Collection
- Network Collection > select the appropriate name of the network
- Import Data as > Node Table Columns
- Key Column for Network > shared name
- Case Sensitive Key Values: ticked
8. Click on OK
NOTE: the column titles must correspond precisely to the titles from the original node table. This way Cytoscape will know into which column(s) the new data needs to be pasted.
9. Check that the Human Readable Label in Cytoscape in the node table has been updated, as per info in the new imported table.
- Importing interactions that are not publicly available into Cytoscape
For students submitting annotations for their project.
In order to get miRNA:mRNA interactions that have not yet been publicly released into the network:
1. Download the data from Protein2GO
2. Edit the information in excel with the column headings shown below. The format details can be found here: https://psicquic.github.io/MITAB27Format.html, which has links to the PSI-MI controlled vocabulary.
3. Save the file with as a .txt file
4. Import and merge this file into the EBI-GOA-miRNA interaction network in Cytoscape:
- File -> Import -> Network from File (open .txt file from popup window)
- A window appears
- Click on Advanced Options (at the bottom of the window)
- Select the file import options delimiter: ‘tab’
- Default Interaction: interacts with
- Tick the ‘Use the first line as column names’ option
- Start import row:1
- Press OK
- If option present : Change default interaction from ‘pp’ to null
- In the column drop-downs at the top of the heading rows in the Import from Network Table window the default attribute for each column is edge attribute (indicated by page icon next to the column name. These need to be set to node or left as edge depending on the column contents. Select 'meaning' options as follows:
- Column Headings > meaning option
- Interactor A (eg UniProt or RNAcentral ID) > Source node
- Interactor B (eg UniProt or RNAcentral ID) > Target node
- Alias A (eg hsa-miR-302a-3p) > Source node attribute
- Alias B (eg STAT3) > Target node attribute
- Interaction Detection Method (eg psi-mi:"MI:0045) > Edge attribute [psi-mi:"MI:0045 is best method to use and means experimental interaction detection]
- Publication (eg pubmed:25524771) > Edge attribute
- NCBI Taxonomy A (eg taxid:9606(Homo sapiens) > Source node attribute
- NCBI Taxonomy B (eg taxid:9606(Homo sapiens) > Target node attribute
- Primary Interaction Type (eg physical association) > Edge attribute
- Column Headings > meaning option
3. Click on OK
4. I haven’t worked out how to get all the gene names in the Human Readable Label column with this import. Therefore, go to the section above which explains how to:
- Download the network table you have just made
- Open in excel and edit to create 2 columns: shared name and Human Readable Label
- Save as .txt file
- In Cytoscape import table from file to: selected Networks only, then select the appropriate network.
Once the file is imported you can merge it with the EBI-GOA-miRNA network. Either with name or shared name
- Save an image of the network
To save image make sure you have the whole network visible in view.
file > export > network to image (choose the highest zoom 500%)
- How to create a GO term mapping file to map GO term IDs to GO term name
This file is very useful when working with BiNGO enrichment analysis data, if you have not managed to save the enrichment file during the analysis.
When you save the enrichment results (eg by copying the data in the output table and pasting into an excel spreadsheet) the 'Description' (GO term name) does not copy across.
Therefore, it can be useful to generate a flat file of the ontology and to use the VLOOKUP option in excel to import the GO term names.
1. A file of GO term Ids and equivalent GO term names can be generated by copying the GO terms from:
2. Save data to an Excel spreadsheet.
3. Select column A and go to 'Data' Menu.
4. Select the icon 'Text to Columns', and 'Fixed width', Next.
5. Drag the cursor where you want to split (e.g. in front of the GO term name), 'Next', 'Finish'. This will produce 2 columns in Excel - one of GO IDs and one of GO names.
6. Remove the space after each GO identifier using find>replace option in Excel.
7. The BiNGO output table just has the GO IDs, not GO:#######. So either add GO:000… to each GO ID in your Cytoscape table, or, edit the mapping file to reduce the GO IDs to just numbers, using find>replace GO:000000, then find>replace GO:00000, etc. This file can then be used in future for other analyses.
- More Tips will be added here
Please send me more tips that you have found useful