Research Impact


Protein function tool helps accelerate diagnosis and drug discovery

UCL researchers have developed a new online tool to predict the structure and function of proteins, which can help in medical diagnosis and drug discovery in the pharmaceutical industry.


28 April 2022

Proteins are involved in all major biological processes. Knowing their structure and function is essential for detecting pathogenic changes, for example mutations in protein sites, and for designing drugs. However, this information has been experimentally determined for less than 10% of these proteins, even in humans. 

Classification of protein domains

A team of researchers, including Professors Christine Orengo and Janet Thornton and Dr Ian Sillitoe (all UCL Division of Biosciences), developed algorithms for the classification of protein domains known as CATH, which stands for Class, Architecture, Topology (fold family), Homologous superfamily.  

CATH algorithms predict the structures and functions of proteins by identifying related proteins (homologues), the properties of which are likely to be similar and which have already been described – such as in Drosophila (fruit flies) or C. elegans (nematode worms). It groups evolutionary superfamilies into functional families (FunFams). These identify groups of relatives which have common features that are likely to be important for function (FunSites). Knowing how close disease-related mutations are to these FunSites can help clinicians predict their damaging effects and suggest what may have caused them.  

CATH algorithms have helped identify possible cancer genes and antibiotic-resistant microbes. They have also been used to explore targeting enzymes to treat toxoplasmosis and other parasitic infections, and to identify which animals are most susceptible to SARS-CoV-2 (which causes COVID-19), or which might become reservoirs for the infection.

A world-leading protein structure portal

CATH is freely available on a UCL-hosted website and is one of the world’s leading protein structure classifications. There are more than 22,500 unique web visits to CATH per month and 2,000,000 pages are accessed per month. Two thirds of these visits are from industry-based sites. CATH has been made an ELIXIR Europe-wide Core Data Resource (CDR), endorsed as meeting the highest standards in data quality and data access. It is the only CDR to be endorsed in the UK.

CATH data is further disseminated through the InterPro web server, at the European Bioinformatics Institute (EBI). Outside academia, InterPro is one of the most widely used web portals by biologists in industry, with over 723,000 unique visitors per year.

Research synopsis


CATH structural classification of proteins aids medical diagnostics and drug discovery in the pharmaceutical industry

The CATH classification of protein domains developed at UCL’s Institute of Structural and Molecular Biology is the basis of the CATH database of proteins and their relationships, which receives >22,500 unique visitors per month. Outside academia, CATH (which stands for Class, Architecture, Topology (fold family), Homologous superfamily) is widely used cross the global pharmaceutical industry for drug design and research and development, including in COVID-19.