The Research Software Development Group includes the “AI Studio” which focuses on data science and machine learning aspects of projects. We also maintain a separate list of AI Studio projects.
School of Laws, Arts & Humanities, and Social & Historical Sciences (SLASH)
Our work with the Open Richly Annotated Cuneiform project continues as part of project Nahrein ("two rivers" in Arabic), to help local middle eastern scholars have the tools to do their own digital humanities.
- Soil parameters inference: KaSKA
We are improving KaSKA, a smoothing software package developed by UCL researchers in the Geography department, used to solve inverse problems in Earth Observation and infer soil parameters from satellite and other observations. We are making the software more readable, sustainable and easy to use, contributing to aspects such as refactoring, optimisation, documentation, adding automated tests and package building.
School of the Built Environment, Engineering, and Mathematical and Physical Sciences (BEAMS)
We're creating a web service for the BEM++ boundary element method code, to allow it to be combined with other tools into more complicated workflows and to create a web-based front end for commercial use. This follows on from previous work on parallelisation and refactoring for the simulation code itself.
We're contributing to this project to develop simulation pipelines for non-invasive surgery using High-Intensive Focus Ultrasound.
- Geometrically Unfitted FEM
- Cosmological modelling and mass reconstruction: Pliny/Glimpse
We are working on two software packages for this project with the cosmology group. We are improving the fast, parallel nested sampler Pliny, which has been developed by UCL researchers and is used for cosmological modelling; our contributions will improve the efficiency of the sampler when the underlying distribution is multi-modal or annular. We are also contributing to the weak lensing mass-mapping software Glimpse, which is used for recovering dark matter distributions from gravitational lensing of distant galaxies; we are creating tests and validating the existing 3-D algorithm, which is written for GPUs in CUDA. We will then re-write it in C++ for CPU, and parallelise it with MPI, in order to make it usable on CPU-based hardware.
- Ultrasound treatment planning: k-plan
We are contributing to the development of the k-plan package from the Medical Physics and Biomedical Engineering department, which will be used for treatment planning of brain ultrasound therapy. We are working on the GUI features and workflow, and on the interface between the simulation, database and front-end of the application.
- XNAT medical imaging support
Setting up an XNAT server for storing Covid imaging data within UCL for research use; extracting said data from Cimar to the UCL XNAT.
- Real-time Advanced Data Assimilation For Digital Simulation Of Numerical Twins On HPC
Digital Twins of physical and human systems are now becoming ubiquitous in Engineering, Science and Urban Analytics. However, the increasingly widespread use of these representations of systems - informed by data in real time - has yet to be accompanied by computationally efficient implementations of the latest methods on High Performance Computing (HPC). Both in terms of quality and speed of methods, Digital Twinning can learn from Data Assimilation (DA) approaches. This project will perform the essential computational groundwork to allow researchers to apply DA methods to coupled human-environmental systems. Specifically it will:
- Scale-up computationally an Agent-Based Model from the current toy problems to a realistic coastal city with mobile agents.
- Couple it with a physical tsunami model to simulate the impact of a physical emergency on the flow of individuals (the latter reacting to the emergency by evacuating away from the tsunami).
- Develop new implementations of state-of-art DA code (ranging from high-dimensional Particle Filters initially to well-studied ensemble Kalman-Filter approaches, including emulators, as appropriate) targeted at HPC to allow real-time data to be assimilated into the model.
- PI: Serge Guillas
- RSDG team: Mosè Giordano, Tuomas Koskela, Sanaz Jabbari
- Duration: Aug 2019 - Oct 2020
- Languages and technologies: Matlab, C++, Python
- Links: Uncertainty quantification of multi-scale and multi-physics computer models
School of Life and Medical Sciences (SLMS)
- Big Data for Critical Care with the National Health Informatics Collaborative
We have an ongoing collaboration with the UCLH Critical Care team and colleagues at the Institute for Health Informatics in their work as part of the NIHR Health Informatics Collaborative. Intensive care provides rich, complicated, large, sensitive, high-speed data, with a great deal of information recorded about every patient every moment - as complete an example of the four V's of big data (Volume, Variety, Velocity and Veracity) as you are likely to get. Together we are building tools and systems to allow this data to be used for research, and ultimately, for real-time patient care.
This project with UCLH builds on our collaboration for the Health Informatics Collaborative. It is an ambitious programme of work to define a research programming environment within the hospital's IT structures, whereby researcher-built code can be run against near-real-time patient data, without incurring operational risk to the hospital’s core systems. This platform is known as EMAP – the Experimental Medical Applications Platform. Alongside building the core platform, we will also be collaborating on demonstrator projects using it to enable improved patient outcomes. This system is now being used as part of the trust’s Covid-19 response, providing the data pipeline for the DECOVID national initiative to apply data science modelling in support of proactive care and management during the pandemic. We are also using EMAP to build various internal dashboards for the UCLH intensive care units.
- Thanzi la Onse
This project's name means "Health of All' and is developing epidemiological models which we hope will ultimately inform decision-makers in Malawi on national health care budgets and in allocating resources. The team aims to explore ways to improve the health of the population in Malawi, as well as reducing health inequality in low and middle income countries. RSDG are providing the software framework for building and simulating the mathematical models involved.
- HIV model development
This project will work with a very large HIV epidemiological model developed in SAS over many years by the research team. We will look to fully understand the model, adding tests to ensure reproducibility and improving the documentation before extending it. Beyond just adding new features, the aims are to make it easier for new modellers to contribute, and to increase the simulation speed, potentially re-writing the model in another language if preliminary work suggests sufficient benefit.
- FASt-Mal Laboratory Information Management System
Translating state-of-the-art robotics and machine-learning research into a benchtop prototype capable of Fast, Accurate and Scalable malaria diagnosis. The project aims to overcome diagnostic challenges by replacing human-expert optical-microscopy with a robotic automated computer-expert system that assesses similar digital-optical-microscopy representations of the disease. RSDG is developing the information management system for the project, storing acquired images with all metadata, allowing human experts to annotate images for training and test datasets, and interfacing with the machine learning software being developed by UCL Computer Science.
- Mathematical Modelling Led Design of Tissue-Engineered Constructs
We will be creating a user-friendly interface to the mathematical models developed by the researchers on this project, which will help enable their update by tissue engineers and clinicians.
- Reproducible model development with the Web Lab
Models are developed to answer specific scientific questions, and the process of model selection, parameterisation and evaluation is typically manual and laborious. There is no straightforward means to determine which (if any) model is the most appropriate to answer a new question, or be used as a component in a larger model. We aim to make the process of model development documented, automated and repeatable, so that models can easily be tested and updated to incorporate new data. In collaboration with modellers at Oxford, Nottingham and elsewhere, RSDG are building an online resource to run virtual experiments and automate the process of parameterisation of cardiac cell models from data by using state-of-the-art Bayesian inference methods.
- Silver Lab: Neuroscience data analysis pipeline & Open Source Brain
The Silver Lab develops state-of-the-art acousto-optic lens (AOL) two-photon microscopes and uses these to gain understanding of neurophysiology. We are working to integrate and optimise various Matlab analysis scripts developed by lab members into a coherent analysis pipeline, with the data at each stage stored in the open NeurodataWithoutBorders format, based on HDF5. These data can also be visualised in the Open Source Brain website developed by the Silver Lab. As an early part of the collaboration we also did some work is developing new API libraries in C++ and Matlab for NeuroML2 - a model description language for computational neuroscience.
- Health Data Research UK - London Hub
Health Data Research UK is the national institute for data science for health, which was established in 2018 with long term funding support from research councils, UKRI, charitable and governmental research funders. The HDR UK seeks to drive improvements in the health of patients and populations through research at regional and national scale. The triple mission of HDR UK spans discovery of disease mechanisms in science through: precision medicine and trials to public health; establishing platforms and underlying infrastructure to enable research at national scale; developing training and capacity opportunities. As part of the HDR London Hub, RSDG have recruited a Health Data Specialist to support researchers in making the most effective use possible of resources available at UCL and elsewhere, such as the Data Safe Haven. Particularly we will look at scaling up analyses to deal with massive volumes of highly sensitive personal data, e.g. gathered during routine health care.
- Modelling the rules of sleep-wake organisation in hospitalised infants
Neonatal infants do not fit the standard human model for time spent awake or asleep. They don't follow a circadian rhythm, and their periods of wakefulness are very short (~2% of each day). In this project we are supporting work to develop and simulate a mathematical model of sleep regulation in normal pre-term and full-term infants aged 28-48 weeks corrected gestational age (≥37 weeks is full-term) using an existing dataset of brain and physiological recordings across sleep-wake states. This should provide a definition of 'normal' that can be used to identify infants who deviate from these values and are therefore at risk of health problems, allowing interventions to be put in place.
- PI: Kimberley Whitehead
- RSDG team: Asif Tamuri, Anastasis Georgoulas
- Duration: May 2019 - Ongoing
- Languages and technologies: Matlab, Python
- LEAF - Lab Efficiency Assessment Framework
This project aims to expand a current Excel-based environmental audit for experimental labs within UCL and expand it as an online system for labs to submit their survey responses and have them analysed automatically to score their sustainability credentials. The system will allow multiple organisations to participate, with each able to manage their own labs, and report on their progress against environmental goals. We are working with ISD's Digital Service Enablement team to deliver this project, based on the Outsystems toolkit.
School of Laws, Arts & Humanities, and Social & Historical Sciences (SLASH)
The Open Richly Annotated Cuneiform Corpus (Oracc) supports editing of translations and transliterations of ancient Mesopotamian (Iraqi) texts. The principal aim of this project is to create a local GUI for Oracc.
- DataSpring: Enabling complex analysis of large scale digital collections
Funded through the JISC "Research Data Spring" initiative, this project seeks to make it possible to efficiently query a corpus of 81000 out-of-copyright books using UCL's research computing infrastructure, and to thereby understand the issues that arise in using traditional HPC resources for humanities work.
- PI: Professor Melissa Terras - UCL Centre for Digital Humanities, James Baker - British Library
- Funding: JISC Research Data Spring
- RSDG team: James Hetherington
- Duration: Apr-Jun 2015
- Languages and technologies: Python, libxml, zip, mpi4py
- Links: Bluclobber or enabling complex analysis of large scale digital collections
- Times Digital Archive queries
We did some initial investigation into using Spark on UCL supercomputers to query large newspaper archives for terms of interest, accounting for mis-spellings, OCR failures, etc.
- Gaussian Process Emulator
Researchers in the Department of Geography have developed this code to monitor the historical and current state of terrestrial vegetation cover using satellite images. We are working to enable the most computationally intensive aspects of this code to run on GPUs in order to accelerate the analysis of these images.
- Oceanic Exchanges
We're working with the Oceanic Exchanges project to enable the analysis of large collections of newspaper articles, from multiple countries, on UCL's high performance computing platforms.
- Bentham Transcription Desk Migration
We're helping the Transcribe Bentham project to put their server technologies on a more stable footing.
School of Education
PopChat uses web-based technologies and pedagogical research to improve English comprehension of kids in primary and secondary school through music, song lyrics and rhymes. We are collaborating with schools in the Phillipines, where focused groups of students will be testing a set of interactive games where they can play their favourite songs and guess part of the lyrics, as well as create and share their own.
School of the Built Environment, Engineering, and Mathematical and Physical Sciences (BEAMS)
- BICO: Big data compressive sensing
Collaborating with Dr Jason McEwen of the Mullard Space Science Laboratory, we are contributing to a reusable high performance framework for the application of compressive sensing to image cleanup, with application to the square kilometre array.
- Electrical Impedance Tomography
Prof. David Holder's group in the Department of Medical Physics and Biomedical Engineering have been pioneering the use of Electrical Impedance Tomography for imaging brain function. We are helping to identify areas of improvement in the software used to produce 4D EIT visualisations and mentoring the EIT team on how to adopt good software development practices.
We worked with Dr Nicolae Panoiu on the OPTIMET-3D code, a fast and massively distributed electromagnetic solver for advanced HPC studies of 3D photonic nanostructures. The objective is to further scale the code from running efficiently on Legion to running efficiently on ARCHER.
- Radiance Monte Carlo
The aim of this project is to strike a performance/accuracy balance between Radiance Monte Carlo algorithms that operate on a polyhedral mesh (slow but accurate) and a regular grid (fast but less accurate) by using an Octree.
- PI: Dr Ben Cox, Medical Physics and Biomedical Engineering
- Funding: UCL
- RSDG team: Mayeul D'Avezac, Gary Macindoe
- Duration: 2015
- Languages and technologies: C++11, MATLAB, Boost, CMake, VTK
- Bahler Lab
We worked with the group to organise, stabilise and refactor a number of pieces of software relating to Yeast genomics.
- PI: Jorg Bahler
- Funding: RSDG free call
- RSDG team: Sinan Shi
- Duration: 2015
- Links: www.bahlerlab.info
- CWA Downsampling
We helped a PhD student process FitBit data, filtering large datasets to extract information of relevance and helping optimise the analysis code.
- PI: Harry Kennard
- Funding: UCL
- RSDG team: Raquel Alegre, Stuart Grieve
- Duration: September 2017 - May 2018
- Languages and Technologies: Python, SQLAlchemy, Postgres
After a successful first round of collaboration we are now working to make further improvements to this open-source computational suite for fluid dynamics simulations of blood flow. We have combined an elastic model of a red blood cell with the underlying Computational Fluid Dynamics simulation of blood flow. Case study
- ShipViz: AIS Data Visualisation
This project is funded by the European Climate Foundation to substantiate shipping policy debates with high-quality infographics. They need the RSD team to create 4D visualisations of ship tracks, similar to previous work they've done to help researchers analysing the 1 billion records of shipping tracks data.
GloTraM is a global transport model that combines multi-disciplinary analysis and modelling techniques to estimate foreseeable futures of the shipping industry, forecasting the evolution of a fleet and its activity in response to external stimuli: changing fuel prices, transport demand, regulation, technology availability...
We are working with Dr. Tristan Smith and Dr. Carlo Raucci at the UCL Energy Institute to improve the status and performance of this model written in MatLab, enabling other projects which will be based on this outcome.
We acted as consultants to advise the UCL Energy Institute and Baringa consulting partners on software architecture for complex multi-scale models of the future of the UK energy and housing infrastructure.
- DIRAC 2 RSE
We're working with the Dirac STFC supercomputing project to benchmark physics and astronomy codes on different computing platforms and cloud providers.
We are contributing to the Zacros Project, a Kinetic Monte Carlo (KMC) software package written in Fortran, for simulating molecular phenomena on catalytic surfaces. Over the years, we have improved the structure and maintainability of the package, added automated testing, and parallelised it with OpenMP and MPI.
- Novel multimodality imaging for navigation in skull base surgery
To develop a system that will combine MRI and ultrasound imaging to enhance the surgeon’s view of a tumour, facial nerve and other surrounding critical structures during surgery. This information will be made available in the MRI guidance system in the operating room so that operations are more precise resulting in better tumour removal rates and fewer complications.
- Catastrophe modelling for Tsunamis in the Indian Ocean
Our task is to automatically extract bathymetry data from scanned maps provided by the Department of Statistics team (Dimitra and Serge). They require certain digitised information from the maps, in order to develop a model to forecast effects of Tsunamis in the coasts of India, and to some extent Pakistan and Iran. The bathymetry information is marked in the map in a consistent format (a digit, and subscript digit). But also there are other information in the map such as contours and other written information in text and digits. Our task is to identify these target digits from the rest of the information, localise them, find their coordinates and provide the 3-dimensional coordinates of these data points. To accomplish this, we need to implement a pipeline of Image processing tasks, including object detection, localisation, and digit recognition.
- The value of interconnection in a changing EU electricity system
This project aims to assess the value of UK interconnectors to EU-27 countries and Norway, examining both the GB and Single Irish electricity markets. It aims to identify the optimum level of UK interconnection to other markets using an analysis that addresses limitations of previous studies. We created a package that aggregates data for different sectors and energy products for each of the countries relevant to this study, for easier preparation of the input to modelling software.
- Enabling high performance computing for Electrical Impedance Tomography (EIT)
This was a feasibility study to refactor and improve the availability of two key pieces of software for Electrical Impedance Tomography (EIT), to take better advantage of HPC resources. EIT-PEITS is an EIT forward solver, and EIT-MESHER is C++ based software which generates stable, good quality meshes for solving the EIT forward solution. MESHER was ported to run on UCL's Myriad supercomputer. PEITS was updated to work with the latest version of its dependency libraries, and packaged using Spack, Docker and Singularity for easy development & deployment.
- Agile Architecture
We worked with a PhD student who needed to process long log files. We looked at what tools they were using, and helped them add in a database to better handle the quantity of data. We then ran regular sessions to teach the student how to better use Python and SQL within the context of her research.
- PI: Jemima Unwin
- RSDG Team: Asif Tamuri, Roma Klapaukh, Tim Spain
- Duration: July - September 2019
- Languages and technologies: Python, SQL
School of Life and Medical Sciences (SLMS)
- Moorfields collaboration
The Moorfields Eye Hospital tasked RSDG with a challenging project to put together a data pipeline dealing with all of the hospital's image metadata from optical coherence tomography images (eye scans) and details of visual acuity measurements on patients who have received medication through eye injections. RSDG have put together a database with all of this information accessible to the researchers, which will update automatically with daily information of new appointments. This will enable different research studies performed by Dr Pearse Keane's team.
We helped Professor David Balding of the UCL Genetics Institute to prepare his forensic DNA analysis package for submission to CRAN, the online repository for sharing R packages. Case study
- PI: Professor David Balding - UCL Genetics Institute
- Funding: Free call
- RSDG team: Mayeul D'Avezac
- Duration: Jan-May 2015
- Languages and technologies: R
We worked with Dr Ben Cox of to build a parallel simulation code for propagation of high frequency ultrasound in anisotropic media.
We worked with researchers at Great Ormond Street Hospital and the UCL Institute for Child Health to develop a web interface for a new non-invasive downs syndrome test.
Abysis is an antibody discovery system supporting the analysis of antibody sequence and structure. We refactored the existing codebase and added new features such as the ability to discover and annotate patterns in antibody protein sequences.
- DCProgs HJCFIT
HJCFIT is a library for the maximum likelihood fitting of kinetic mechanisms to sequences of open and shut time intervals from single-channel experiments. It is a part of the DCProgs suite of tools. In this project, we have transformed HJCFIT from a single-process library running on desktop computers to a multi-precision library that can utilise a full Archer node and is thus 14 times faster than the original serial version. We have implemented multi-precision arithmetic, made the code easier to use on high-performance systems and made several other improvements to the overall codebase.
- PI: Prof. Lucia Sivilotti, Dr. Remigijus Lape, UCL Department of Neuroscience, Physiology, and Pharmacology
- Funding: Archer eCSE + RSDG free call
- RSDG team: Mayeul D'Avezac, Jens Nielsen, Raquel Alegre
- Duration: 2013, 2015-2016
- Languages and technologies: C++, Python, Swig, OpenMP, MPI, MPI4Py, CMake, Eigen, GMP, MPFR
- RFH-GFR Web Calculator
The Glomerular filtration rate calculator based on research produced at the Royal Free Hospital and UCL. The GFR calculator is a web app for educational purposes that can provide insight on doctors taking care of patients with cirrhosis.
- Modelling and Optimisation of Antibody Purification Processes
Multi-product biopharmaceutical facilities need flexible process configurations that can adapt to products with diverse characteristics and impurity loads so as to avoid bottlenecks and delays, whilst meeting final product specifications and cost targets. In this project we are working with the group to convert existing bioprocess models and optimisers from Excel and C# to Python. The emphasis is on making the model representations clear and easy for the researchers to modify, with robust testing to verify expected behaviour. We are also building web interfaces to these tools - Jupyter notebooks for use by researchers, and a Flask application for end users.
- PI: Professor Suzanne Farid - UCL Department of Biochemical Engineering
- Funding: UCL
- RSDG team: Jonathan Cooper, Anastasis Georgoulas
- Duration: Jan 2016 - Jan 2018
- Languages and Technologies: Python, C#, Flask, DEAP, pint
- Links: While the model is closed source, the generic framework is available on GitHub
- Delivering accurate structural bioinformatics to the yeast community with the HHprY data
Progress in cell biology is hampered by the relatively high proportion of proteins for which there is no known function at the molecular level. Such proteins have no domains annotated in the databases. Structural bioinformaticians have for many years been developing profile-profile search tools (such as the HHsearch tool developed by Soeding and colleagues) that are far more sensitive than the standard tools, but due to computational demand these tools have not been widely applied to create fully annotated complete genomes. We are working to deploy these tools on UCL supercomputing infrastructure, and we are developing an automated pipeline and web interface, to make available a fully augmented annotation of the entire yeast genome.
The aim of this project is to improve the allocation and evaluation of critical care within UCLH. This will involve using existing electronic health records to monitor patients at risk of deterioration outside of intensive care units, enabling health care teams to make decisions on critical care admissions. Phase 1 will build a near-future forecasting system for ICU bed occupancy that takes into account the current workload and planned high-risk surgical admissions. Phase 2 will evaluate whether this decreases surgical cancellations, allows fairer allocation of beds and reduces harm by admitting the right patient at the right time. RSDG are working on extracting the data in various hospital systems and incorporating it into the modelling work.
- PI: Steve Harris, UCLH
- Funding: Health Foundation
- RSDG team: Jonathan Cooper, Roma Klapaukh, Sanaz Jabbari
- Duration: Apr 2017 - Mar 2020
- Languages and Technologies: Python, Electronic Health Records
- Links: https://www.health.org.uk/programmes/improvement-science-fellowships/projects/dr-steve-harris-improvement-science-fellow
We worked with colleagues across ISD to develop an easy-to-use hybrid cloud facility for UCL researchers. This project is currently on hold due to a lack of funding.
- Research Software Dashboard
This is a UCL-wide integrated web-based environment for UCL researchers to manage, share and promote their software research outputs. The dashboard consists of an automated online list of all the software created and maintained by UCL researchers. It allows UCL to promote and measure the quantity and quality of UCL computational research, raising the institution's profile in this space, and facilitating both obtaining and delivering research grants with a significant software component.
- Global River Concavity (Cardiff University)
Project to develop an HPC topographic analysis workflow using LSDTopoTools and Legion to quantify structural variations in river channel morphology with climate data at a global scale.