Research Impact


CATH structural classification of proteins aids drug discovery in the pharmaceutical industry

12 December 2014


The CATH classification of protein structure, developed at the UCL Institute of Structural and Molecular Biology has been widely used to guide experiments on proteins. The structural data has aided drug discovery.

One of the first classifications of proteins, CATH (denoting protein Class, Architecture, Topology, Homologous superfamily), was established by Professors Janet Thornton and Christine Orengo (UCL Institute of Structural &Molecular Biology) in 1994. CATH groups homologues (related proteins) according to their structural and therefore likely functional similarity, using a combination of automated and manual procedures.

Less than 10% of proteins have detailed experimental characterisation - even in human - and computational approaches have emerged to predict the function of a protein by identifying evolutionarily related proteins (homologues) whose functions are likely to be similar and which have already been experimentally characterised (e.g. in fly or worm). CATH now classifies 26 million protein sequences - 70% of domain sequences from 6,000 completed genomes and 60% of domain sequences from human.

The CATH classification is available on a UCL-hosted website. This internationally renowned resource is one of the leading protein structure classifications in the field. Web access to CATH ranges from 8,900 to 22,500 unique visits per month. Two thirds of these web accesses are from industry-based sites.

CATH data is further disseminated through DAS and the InterPro web server, at the European Bioinformatics Institute. InterPro is one of the most widely used web portals by biologists in industry and academia, with more than five million web page accesses per month. It combines protein family data from multiple resources to assign greater confidence. DAS was established by 30 European partners as part of the Biosapiens network, headed by Professor Thornton whilst the InterPro server is being developed by a consortium of 11 European partners, including CATH. Information from CATH is also disseminated via the web portal of the international Protein Databank (PDB) resource, the primary source of protein structures. Further links to CATH are provided by many international web-based computational biology resources.

Since very few human proteins are experimentally characterised, CATH and related resources are used to search for homologues with known, experimentally validated functions. The structural data associated with these relatives can then be exploited to build 3D models for the human proteins. In addition, the CATH classification can be searched with the structures of proteins which are potential drug targets to identify close relatives which might also bind the target drug, giving rise to side effects.

CATH is routinely used by the pharmaceutical industry to identify the structures of proteins implicated in disease. CATH prediction methods and domain assignments are widely used by industry, as are the methods for analysing the structures. CATH is widely used by researchers in the pharmaceutical industry to explore protein structure function relationships and to aid in drug design.

Funded by the BBSRC and the Wellcome Trust.

Related links