UCL-led research reveals new protein map with potential to transform disease research
14 November 2024
UCL researchers have developed The Encyclopedia of Domains (TED), a tool mapping millions of unknown protein regions, with potential for breakthroughs in drug discovery and disease research.
A team from UCL has launched a new tool set to advance our understanding of proteins and their role in health and disease. The Encyclopedia of Domains (TED), created by researchers at UCL Computer Science and the Division of Biosciences, provides the most comprehensive map yet of biologically important protein regions, or ‘domains’, within the AlphaFold Protein Structure Database (AFDB), uncovering millions of previously uncharted areas.
This project builds on foundational work by Professor David Jones and Professor Christine Orengo. Professor Jones, from UCL Computer Science, was among the original contributors to AlphaFold, the revolutionary AI programme developed by Google DeepMind that predicts protein structures with unprecedented accuracy. His bioinformatics expertise helped shape AlphaFold’s initial success, which has since transformed protein science and earned a Nobel Prize citation.
Professor Orengo’s widely recognised contributions to protein structure classification complement these advances, together creating a foundation that could drive breakthroughs across biology and medicine, from drug discovery to personalised therapies.
With TED, the UCL team are building on AlphaFold’s potential by using AI to identify and categorise millions of new protein domains. This allows researchers to better understand how proteins interact, evolve, and function, opening new pathways to explore complex diseases and biological processes. TED leverages advanced AI and structural comparison techniques to map 365 million protein domains - expanding detection of critical biological features missed by traditional sequence-based methods.
The TED project has also highlighted evolutionary trends by identifying both conserved and species-specific protein structures, providing new insights into evolutionary biology and unique targets for future therapies. TED’s growing data will continue to offer researchers a valuable resource for studying protein structure, function and evolution.
Professor David Jones said: “I’m pleased to have played a part in this incredible journey. Google DeepMind was able to take things further than academia could alone, and it’s been rewarding to see the lasting impact of our work."
He added: "TED enables us to fill in missing pieces of the protein puzzle. With this resource, we’re expanding the catalogue of known protein structures, revealing new biological relationships and potentially paving the way for medical advances."
Co-author, Professor Christine Orengo, said: " The massive expansion of protein structure data, captured in TED, will shine a powerful light on how new protein functions evolve through genetic changes in the DNA. This will help understand the impacts of genetic variations in human proteins linked to disease."
UCL’s TED project will be updated in line with future releases of AlphaFold data, providing a continually evolving resource for scientists. With its ability to detect previously hidden protein domains, TED is positioned to support advancements in both foundational biological research and applied clinical fields.