Orengo Group


Gene3D: Assigning CATH Structures to all Protein Sequences

Gene3D takes CATH domain families (from PDB structures) and assigns them to the millions of protein sequences with no PDB structures.

Assigning a CATH superfamily to a region of a protein sequence gives information on its structure and homologous relationships. CATH superfamilies have a limited set of functions and so the domain family assignments provide functional insights. Furthermore most proteins have multiple domain families in a specific order (sometimes referred to as the multi domain architecture (MDA)). Identifying proteins with similar domain family organisations can provide further functional insights.

Recently we have subdivided (the sometimes large and functionally diverse) CATH superfamilies into more functionally coherent functional families (FunFams) (PMID:23514456) improving the functional insights gained from the domain family assignments. There are many other uses of domain family assignments, for example a certain family may show expansion in a species, and it is possible to detect these expansions and relate them to evolutionary pressures.

In regions not assigned a CATH domain we try and predict SUPERFAMILY or Pfam domain families. Combining the resources in this way gives greater domain sequence coverage.