Cytoscape Tips

Additional notes when using Cytoscape with the GOlorize and BiNGO plugins. These notes assume you are already familiar with using Cytoscape. If you haven't used this resource yet instructions are available at http://manual.cytoscape.org/en/stable/ or attend a training course (available at EMBL-EBI and UCL). These notes are written for Cytoscape 3.7.2 and may not be relevant when using other versions of Cytoscape.

What data is in the PSICQUIC EBI-GOA-miRNA file

Key information listed in the Cytoscape Network view 'Edge Table'

The primary interaction type is listed as 'physical association' if there is experimental evidence that the microRNA binds to a mRNA. The primary interaction type is listed as 'association' if the microRNA is predicted to bind an mRNA and there is experimental evidence that the mRNA levels are altered by the presence or absence of the microRNA.

Key information listed in the Cytoscape Network view 'Node Table'

The 'name' lists the UniProt identifiers for the 'gene targets' of the microRNA.

How to make sure each entity is only represented by a single node

Unless you are planning to investigate the different interactions associated with different isoforms it is likely that you will only want each protein to be represented by a single node.

There are several places where care is required to ensure this happens:

If you are merging several networks then you need to check what the best column to use for the merge is.
It is important to look at the identifies in the columns that will used for the merge to edit any isoform IDs so that the canonical protein and its isoforms are merged into a single node. Cytoscape will merge these if they are edited even if they are from the same resource.
Consider what the identifier is representing. For example, IntAct now includes DNA sites bound by transcription regulators. These are represented by ENSG IDs. It would not be appropriate to merge a protein ID with a DNA ID for the same gene, as this would imply that the protein binds a transcription regulator, when it is the genomic DNA that is binding the transcription regulator.

UniProt represents protein isoforms using '-#' and posttranscriptional modification/cleavage products are represented with ‘:PRO#'. To merge 2 or more datasets you will probably want to remove the isoform and other specific details so that only 1 node represents each protein. While the UniProt accession column is often used for the merge, if your network has miRs or other IDs, or if you plan to overlay the network with GO terms it will be more efficient to edit the ‘name’ column.

If you don’t do this then 2 isoforms of a single protein eg P12345 and P12345-2 will be represented on your network as 2 nodes.

Before merging your networks aim to only edit the ‘name’ column in each network Identify and edit all isoforms and PRO information in the different networks

This can be done using the select/filter option: choose column: node:name and include – in the free text field, or :PRO.
Then edit the IDs in the name column to remove the isoform info '-#' (ie just delete -1), or the PRO ID and leave the rest of the UniProt ID intact).

There may be other nodes you want to remove or edit, such as EBI-IDs for DNA and mRNAs

use the select option to find all of these (eg select/filter option: choose column: node:name and include EBI– in the free text field
click on one of these selected nodes and right click > edit > cut, this will delete all the selected nodes, or edit these to provide the HGNC approved name for example
check that you are happy with the information in the column being used for the merge in each network before merging.

Now the specific isoform etc information in the network has been removed you can now merge your networks

Merge the networks based on the name, as you have now edited this column.
Allow the merge to merge the same nodes within a network too.

How to download current GO ontology

To download a current ontology file (go-basic.obo) open this page in Firefox http://geneontology.org/page/download-ontology, right-click on the Ontology file URL (http://purl.obolibrary.org/obo/go/go-basic.obo) and use ‘open link in new tab’ > Save File which should download the file with the name go-basic.obo.

How to download and merge EBI gene_association files

Note: As of March 2021 the gene association gaf files have been provided in the new 2.2 format. Cytoscape App BinGO is not able to use the new gaf 2.2 files.

The GOA team at EMBL-EBI now provide gaf 2.1 files for current annotation datasets: http://ftp.ebi.ac.uk/pub/contrib/goa/GAF21/. Contact the GOA team if additional model organisms annotation files need to be converted.

To download current human or mouse gaf 2.1 files: Use Firefox browser and go to ttp://ftp.ebi.ac.uk/pub/contrib/goa/GAF21/ connect as Guest and select goa_human_21.gaf (for human proteins), after the file is unzipped rename it to gene_association.goa_human.

To download current human gaf 2.2 files: Use Firefox browser to download files to download a current human annotation file go to ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/ connect as Guest and select goa_human.gaf.gz, after the file is unzipped rename it to gene_association.goa_human.

For other species go to ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/.

For previous files go to ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/old/

NOTE: There are now multiple files for each species, you need to make sure you choose the right ones to use to fully exploit the annotation data. For example there are 4 files for human: protein, isoforms, RNA and complex. If you are only looking at protein networks make sure you include both the protein and isoform files in your analysis or you will miss a lot of annotations.

If these are needed, then all the required files for the proteins, isoforms, complexes and RNAs need to be downloaded and merged into one file.

The easiest way would be to download the files open them with text editing application, not word as it will add additional hidden information. Remove the heading information from the smaller text files (ie delete until the first row of annotations). It is important to make sure no extra returns etc are present.

Then select and copy all annotation text in the smaller file and paste into the larger file (usually this is the human protein file), just underneath the header section. Then check that there are no extra returns at the end of the pasted section or at the top. Save file and repeat as required.

Cytoscape plugins GOlorize and BinGO will not be able to use the downloaded files with their current names for files released since June 2016 which coincides with release 158 and above. A typical file name format that looks like this:

gene_association.goa_human.164
where 164 is the release number.

In general, the file and its extension must follow these rules:

the expression "gene_association" must exist in the file name
the expression "goa_human" must exist in the file extension (or goa_anytext)
.gaf should be removed from the file name
The ideal format is: gene_association.goa_human

Other combinations can work too. However, we have not tried them all and we can only advise that the above combination will work.

How to convert gaf 2.2 files to gaf 2.1 files

The Cytoscape users can download GAF 2.1 files here

https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fftp.ebi.ac.uk%2Fpub%2Fcontrib%2Fgoa%2FGAF21%2F&data=04%7C01%7Cr.lovering%40ucl.ac.uk%7Cfda00a7632724988a4c408d9a4f81d06%7C1faf88fea9984c5b93c9210a11d9a5c2%7C0%7C0%7C637722207817930504%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=utC2LataOSB6efwhxhumnXgTfL0B8eDdz8Giqr4Rtz0%3D&reserved=0

Contact the GOA team if additional model organisms annotation files need to be converted

To convert gaf 2.2 files to gaf 2.1 files

1. Open a terminal window
2. Use cd ~/folder/subfolder etc to provide the address the 2.2 gaf file to be converted is in.
3. Use the following script (provided by Alex Ignatchenko, EBI) replacing 'filename.gaf' with the name of the file to be converted: grep '^[^!;]' filename.gaf | awk -F '\t' '{printf $1 "\t" $2 "\t" $3 "\t"}; {printf ($4 ~ /NOT/)?"NOT":""}; {for (x=5; x<=17; x++) printf("\t%s", $x);printf("\n")}' >> filename_21.ga

How to add new annotations not yet included in the ftp gene association files

For students submitting annotations for their project.

Go to QuickGO
Filter references with the PMIDs of the articles you have annotated
Download the data from QuickGO as a gaf file
Paste the annotations into the larger gaf files as above.

Or you can also download gaf files from Protein2GO, on an article-by-article basis.

How to overlay GO terms onto a Cytoscape network

The GO gene association isoform file associates GO terms with UniProt, RNAcentral or Model Organism Identifiers.
1.   The BinGO analysis APP uses the information in the ‘name’ column for the GO enrichment analysis.
2.   In the GO gene_association files, isoform information is included in column 17, but this information is not interpreted by BinGO.
a.   Therefore, BinGO only overlays GO terms onto the canonical non-isoform IDs.
3.   It is therefore necessary to edit the ‘name’ column in the network table to remove isoform and PRO information (eg -1, -2) so that the identifiers are the canonical UniProt Identifiers that BinGO will recognise.
If you don’t do this then 2 isoforms of a single protein eg P12345 and P12345-2 will be analysed as 2 distinct proteins, and the more specific isoform will not have the full GO terms associated with it.

If you want to do the enrichment analysis with the the isoforms as separate entities then you need to merge the gene_association file with the gene_association_isoform file.

Similarly if you want to analyse a network with both RNA and Proteins you need to merge the gene_association file with the gene_association_RNA file.

If you have followed the information above which describes how to edit the ‘name’ column you may not need to do the following:

1.   Select all nodes and view annotation rows in ‘Node Table’.
2.   In the column ‘name’ find any identifiers that are not used in the GO gene association file, such as ENSG or EBI IDs. It is necessary to edit this column so that the these identifiers in the ‘name’ column are changed to the equivalent UniProt, RNAcentral or Model Organism IDs.
a.   Note that you might want to just delete the ENSG and EBI IDs (see section below if there are a lot to edit)
b.   UniProt has a very good mapping tool to enable you to download the UniProt identifiers: http://www.uniprot.org/uploadlists/.
3.   Edit the Node table, replacing all the identifiers with their equivalent UniProt RNAcentral or Model Organism identifiers.

UniProt has a very good mapping tool to enable you to download the UniProt identifiers.

For UCL studdnts - Importing interactions that are not publicly available into Cytoscape

For students submitting annotations for their project.

In order to get miRNA:mRNA interactions that have not yet been publicly released into the network:
1.   Download the data from Protein2GO
2.   Edit the information in excel with the column headings shown below. The format details can be found here: https://psicquic.github.io/MITAB27Format.html, which has links to the PSI-MI controlled vocabulary.
3.   Save the file with as a .txt file
4.   Import and merge this file into the EBI-GOA-miRNA interaction network in Cytoscape:

File -> Import -> Network from File (open .txt file from popup window)
A window appears
- Click on Advanced Options (at the bottom of the window)
- Select the file import options delimiter: ‘tab’
- Default Interaction: interacts with
- Tick the ‘Use the first line as column names’ option
- Start import row:1
- Press OK
- If option present : Change default interaction from ‘pp’ to null
- In the column drop-downs at the top of the heading rows in the Import from Network Table window the default attribute for each column is edge attribute (indicated by page icon next to the column name. These need to be set to node or left as edge depending on the column contents. Select 'meaning' options as follows:
  - Column Headings > meaning option
    - Interactor A (eg UniProt or RNAcentral ID) > Source node
    - Interactor B (eg UniProt or RNAcentral ID) > Target node
    - Alias A (eg hsa-miR-302a-3p) > Source node attribute
    - Alias B (eg STAT3) > Target node attribute
    - Interaction Detection Method (eg psi-mi:"MI:0045) > Edge attribute [psi-mi:"MI:0045 is best method to use and means experimental interaction detection]
    - Publication (eg pubmed:25524771) > Edge attribute
    - NCBI Taxonomy A (eg taxid:9606(Homo sapiens) > Source node attribute
    - NCBI Taxonomy B (eg taxid:9606(Homo sapiens) > Target node attribute
    - Primary Interaction Type (eg physical association) > Edge attribute

3. Click on OK
4. I haven’t worked out how to get all the gene names in the Human Readable Label column with this import. Therefore, go to the section above which explains how to:

Download the network table you have just made
Open in excel and edit to create 2 columns: shared name and Human Readable Label
Save as .txt file
In Cytoscape import table from file to: selected Networks only, then select the appropriate network.
Once the file is imported you can merge it with the EBI-GOA-miRNA network. Either with name or shared name

Save an image of the network

To save image make sure you have the whole network visible in view.

file > export > network to image (choose the highest zoom 500%)

How to create a GO term mapping file to map GO term IDs to GO term name

This file is very useful when working with BiNGO enrichment analysis data, if you have not managed to save the enrichment file during the analysis.

When you save the enrichment results (eg by copying the data in the output table and pasting into an excel spreadsheet) the 'Description' (GO term name) does not copy across.

Therefore, it can be useful to generate a flat file of the ontology and to use the VLOOKUP option in excel to import the GO term names.

1. A file of GO term Ids and equivalent GO term names can be generated by copying the GO terms from:

http://www.ebi.ac.uk/QuickGO/GSearch?format=termlist&what=Process

http://www.ebi.ac.uk/QuickGO/GSearch?format=termlist&what=Function

http://www.ebi.ac.uk/QuickGO/GSearch?format=termlist&what=Component

2. Save data to an Excel spreadsheet.

3. Select column A and go to 'Data' Menu.

4. Select the icon 'Text to Columns', and 'Fixed width', Next.

5. Drag the cursor where you want to split (e.g. in front of the GO term name), 'Next', 'Finish'. This will produce 2 columns in Excel - one of GO IDs and one of GO names.

6. Remove the space after each GO identifier using find>replace option in Excel.

7. The BiNGO output table just has the GO IDs, not GO:#######. So either add GO:000… to each GO ID in your Cytoscape table, or, edit the mapping file to reduce the GO IDs to just numbers, using find>replace GO:000000, then find>replace GO:00000, etc. This file can then be used in future for other analyses.

UCL students - Adding GO annotations not included in the current gaf2.1 file

This section has been written for UCL students that have access to the EMBL-EBI curation tool Protein2GO

To create a gaf 2.1 file which includes gaf 2.1 annotations downloaded from Protein2GO

   •   Paste annotations into excel tab1 Paste same number of annotations from gaf 2.1 file into excel tab2
   •   Paste same number of annotations from gaf 2.1 file into excel tab2
   •   Note that this pastes into columns A-Q even though there is nothing in column Q, but it is important when copying this back to include Q.
   •   Edited the protein names in column J of P2G gaf so that these are gene symbols
   •   Paste the columns A, B, C, E, F, G, I, J, L, O, P from ‘tab1’ to the equivalent columns in tab2
   •   Copy the annotations from columns A-Q to the annotation gaf2.1 file
   •   Run the analysis to check that the new annotations are included in the BinGO analysis.

Importing or changing names or identifiers

It is nice to see HGNC symbols or other gene, microRNA, protein names as the network node labels.

1.   The best column to use for this is the Human Readable Label column. Many of the protein interaction files automatically include appropriate names in this column.
2.   However, although the EBI-GOA-miRNA file now has UniProt and RNAcentral IDs in many of the relevant ID columns, it does not have the HGNC symbol in the Human Readable Label column
3.   So if you want to see the protein/mRNA and microRNA names this information needs to be changed.
4.   In addition, maybe necessary (for example if you are using IntAct files) to edit the ‘name’ column so that the ENSG identifiers in the 'name' column are changed to the equivalent UniProt Identifiers or removed, for information about this see sections above.

In order for the correct names to show in the interaction network, the “Human Readable Label” must have the gene symbol or miRNA name present.
Editing the ‘Human Readable Label’ column
Although it is possible to edit the Human Readable Label names directly in Cytoscape this is very time consuming.
When working with microRNA data it is more efficient to change the Human Readable Label names by uploading a table with the data.

In order to change the RNAcentral IDs from the EBI-GOA-miRNA interaction network to human-readable miRNA names, follow these steps:

1. If you have a lot of miRNAs to rename you will need to download a ‘mapping file’, options include:

ftp (guest access works fine) RNAcentral FTP site: ftp://ftp.ebi.ac.uk/pub/databases/RNAcentral/current_release/id_mapping/.... The tarbase file has the miR-##-## symbol format The hgnc file has the HGNC symbols. Or you could download annotations from QuickGO and use the names that are in that file (remove the additional information eg ‘Homo sapiens (human) hsa-‘.

2.   Download the EBI-GOA-miRNA interaction network node table from Cytoscape
3.   Edit the Human Readable Label column of the node table in Excel (use VLOOKUP (eg =VLOOKUP(C2, Sheet1!B:C, 2, FALSE) if sheet 1 has the ID mapping file and the ID to import is in column 2) and the mapping file in (1) above if lots of IDs to change)
4.   If you use VLOOKUP remember to change the formulas to values.
5.   Create a .txt file which has 2 columns only: ‘shared name’ + ‘Human Readable Label’ ie delete the unwanted columns and rearrange the columns to the correct order and save as a .txt file NOTE: remember to ALWAYS add the column title, otherwise Cytoscape will turn the first term row into column titles.

6. Go to the Cytoscape tool bar menu on top left of the screen and select “Import table from file”

7. Check the 'Import Columns From Table' pop up window has the 'key column for network': 'shared name' selected

8. Click on OK

NOTE: the column titles must correspond precisely to the titles from the original node table. This way Cytoscape will know into which column(s) the new data needs to be pasted.

9. Check that the Human Readable Label in Cytoscape in the node table has been updated, as per info in the new imported table.

Changing the visual features of a molecular interaction network

The visual features of a network can be modified using the ‘Style’ tab in the Control Panel. The default visual features of the network, such as the size of the nodes or the colour of the edges, can be defined using existing characteristics associated with the nodes and edges. For example, the thickness of the edges a network can depend on each interaction confidence score.

The most common labels used on MI network nodes is 'Human Readable Label'.

In the Style tab, select the node subtab. In the 'Properties' section there are now 4 columns: Def. (default) Map. (mapping) ByP. (Bypass) and the last column lits the type of property applied

Scroll down to 'Label' and click on the 'Label' black arrow in the Map. column. Two new rows will appear, click in the column field and select 'Human Readable Label' from the drop-down list, then set the mapping type to 'passthrough mapping'.

To change the style of all of the edges in the network, go to the Style tab, and the edge subtab and select the style type you want to change.

MSc students undertaking an annotation project at UCL may want to show which interactions they have submitted. To do this, upload a .cvs table of your annotations but in the Primary Interaction Type column use a phrase that is specfic to your project, such as your name or the project focus. Then select the edge feature you want to change, eg Stroke Colour or Line Type, select the Column: shared interaction, and Mapping Type: Discrete Mapping. Then select your specific project name, and the type of line/colour required.

More tips may be added

Please send me more tips that you have found useful

The Functional Gene Annotation team is supported by Alzheimer's Research UK grant ARUK-NSG2016-13 and the National Institute for Health Research University College London Hospitals Biomedical Research Centre.

Cytoscape Tips

Key information listed in the Cytoscape Network view 'Edge Table'

Key information listed in the Cytoscape Network view 'Node Table'

Highlights

Ruth Lovering Resignation

Recent Publications

Contact details