Advanced Research Computing


Research software development projects

The Research Software Development Group includes the “AI Studio” which focuses on data science and machine learning aspects of projects. We also maintain a separate list of AI Studio projects.


​​​Generic Tooling

Research Software Dashboard

This is a UCL-wide integrated web-based environment for UCL researchers to manage, share and promote their software research outputs. The dashboard consists of an automated online list of all the software created and maintained by UCL researchers. It allows UCL to promote and measure the quantity and quality of UCL computational research, raising the institution's profile in this space, and facilitating both obtaining and delivering research grants with a significant software component.

  • PI: Paul Ayris, Library Services
  • Funding: UCL
  • ARC team: Jonathan Cooper, Helen Sherwood-Taylor, Gary Macindoe, Ben Laken, Roland Guichard, Asif Tamuri, Amanda Ho-Lyn
  • Duration: Feb 2015 - ongoing
  • Languages and Technologies: Python, Django
  • Links: dashboard.rc.ucl.ac.uk
XNAT - Medical Imaging Support

Our medical imaging research software group (MIRSG), a subgroup run jointly with Prof Geoff Parker in CMIC, aims to help the medical imaging community develop and maintain high-quality software as well as provide solutions for the storage and curation of data. The key activity is the development, deployment, and maintenance of the UCL XNAT Service.

  • PI: Eric de Silva, Danny Alexander, Folkert Asselbergs, Geoff Parker
  • Funding: BRC seed funding; now a cost-recovered service
  • ARC team: Dan Mathews, Haroon Chughtai
  • Duration: Oct 2019 - Ongoing
  • Languages and Technologies: XNAT
Digital Pathology & Microscopy

Establishing a data storage service for histopathology data using OMERO, an open source package developed by the Open Microscopy Environment.

  • PI: Geoff Parker
  • Funding: UCL seed funding; now a cost-recovered service
  • ARC team: Daniel Mathews
  • Duration: Aug 2021 - Jul 2022
Experimental Medicine Applications Platform (EMAP)

This project with UCLH builds on our long-standing collaboration for the Health Informatics Collaborative. EMAP is a translational data science platform built in and for the NHS and has been specifically created to support research. Today, the typical way for a researcher to access hospital data is to extract it from the hospital into the outside world. This introduces privacy risks, as the data leave the protected environment of the NHS. EMAP reverses this process. By providing a software environment within the hospital, we enable research to happen inside the NHS, so that patient data never have to leave. EMAP has been developed as a non-operational “mirror” of a subset of UCLH data (historical and live). The underpinning aim is to ensure that no clinical data are corrupted or destroyed during the interaction between the research process and the hospital’s systems and that the systems are not compromised (for instance, if they were interrupted or slowed down by research enquiries).

  • PI: Mark White, UCLH
  • Funding: UCLH Charity and NIHR BRC
  • ARC team: Sarah Keating, Jonathan Cooper, Roma Klapaukh, Jeremy Stein, Stef Piatek, Steve Roderick, Harvey Mannering, Anika Cawthorn
  • Duration: Apr 2018 - ongoing
  • Languages and Technologies: Java, R, Python, Docker
  • Links: www.uclhospitals.brc.nihr.ac.uk/criu/data-sources-infrastructure

    Current Projects

    HIV Modelling

    This project aims to maintain and expand a model of HIV in Sub-Saharan Africa. The model covers the transmission, effects, and management of HIV under different assumptions and policy interventions. UCL's role is mainly to make it simpler to modify the model and verify its outputs. As part of this, the RSDG team is rewriting it into Python, which will make contributions to it easier.

    • PI: Andrew Phillips
    • Funding: 
    • RSDG team: Anastasis Georgoulas, Michael Mcleod, Peter Schmidt 
    • Duration: Mar 2020 - Sep 2023
    • Languages and Technologies: SAS, Python, pandas 
    Oracc 9

    We are continuing to support the Open Richly Annotated Cuneiform Corpus by developing tools for scholars around the world. Among other things, we are developing a new user-friendly environment for working with digital records of cuneiform texts, which will automate many of the tasks involved. We are also expanding the capabilities of the ORACC website, making it easier to search through the vast collection of information it contains, for both researchers and interested members of the public.

    • PI: Eleanor Robson
    • Funding: Donation
    • RSDG team: Anastasis Georgoulas, Mose Giordano, James Hughes, Rachel Alcraft
    • Duration: May 2021 - May 2031
    Silver Lab

    Building on previous collaborations with the Silver Lab, we continued our work on automatically converting experimental measurements and metadata to the NWB2 format. This new standard offers increased portability and interoperability for the lab's acousto-optic lens microscopy data. This phase included handling more complex geometries, to properly describe the results of the lab's arboreal functional imaging, as well as more detailed timing information captured by the new microscope setup.

    • PI: Angus Silver
    • Funding: NIH
    • RSDG team: Anastasis Georgoulas, Jonathan Cooper, Alessandro Felder
    • Duration: Sep 2019 - Mar 2025
    • Languages and Technologies: Python, MATLAB, HDF5
    Deep Time

    A web application where users can view events that have occurred throughout all of the time.

    • PI: Pieter Vermeesch
    • Funding: Existing Grant
    • RSDG team: James Hughes, Amanda Ho-Lyn, Tim Band 
    • Duration: Apr 2022 - Jul 2022
    Stewardship of AntiMicrobials Using Real-Time Artificial Intelligence

    Supporting the infrastructure creation for a prediction pipeline inside UCLH, ensuring data governance and documentation to scale with requirements for the development of a medical device.

    • PI: Laura Shallcross and Peter Dutey
    • Funding: NIHR 
    • RSDG team: Anika Cawthorn 
    • Duration: Dec 2020 - Sep 2022
    Disease Atlas

    Assisting with the transferral of DOME initiative work into the UCLH environment. This includes ensuring data access as well as enabling researchers to run code.

    • PI: Harry Hemmingway and Spiros Denaxas
    • Funding: BRC 
    • RSDG team: Anika Cawthorn, Nel Swanepoel, Tom Young, Sarah Jaffa 
    • Duration: Nov 2021 - Nov 2022
    EPPI-Reviewer Covid Evidence Base Classification

    As part of a funded research project for the Dept of Health & Social Care, IoE is maintaining a constant ‘surveillance’ of the COVID evidence base. We are developing machine learning models for text classification and flowchart detection.

    • PI: James Thomas
    • Funding: 
    • RSDG team: Sanaz Jabbari, Orod Razeghi
    • Duration: Oct 2021 - Dec 2022
    • Languages and Technologies: Python, Azure, Deep Learning, BERT, Graph Neural Networks, Transformers
     Using ML to Identify Questions in Longitudinal Studies

    We currently hold one of the premier metadata repositories in social science for longitudinal data (discovery.closer.ac.uk). This rich metadata resource enables researchers to identify at granular level information on the provenance of the UK's leading longitudinal studies in social science but also biomedical studies.

    • PI: Jon Johnson
    • Funding: DiRAC & ESRC
    • RSDG team: Sanaz Jabbari, Harry Moss  
    • Duration: Feb 2021 - Dec 2022 

      Over 9,000 babies die each day in low-resource settings. An estimated 70% of these deaths are preventable through evidence-based guidelines and low-cost interventions. Neotree is a digital learning health care system for newborn care in low-resource settings. The aim is to build on a seven-year, multi-disciplinary research project to co-develop and implement a digital data capture and quality improvement system for newborn care called the Neotree, the current phase of work aims to optimise our modelling strategy for predicting neonatal sepsis from routine healthcare data.

      • PI: Michelle Heys
      • Funding: Internal UCL
      • RSDG team: Sanaz Jabbari, Ed Lowther 
      • Duration: Jan 2021 - Jul 2022
      XNAT: International Alliance for Cancer Early Detection (ACED)

      Data storage using XNAT for ACED.

      • PI: Danny Alexander
      • Funding: CRUK
      • RSDG team: Daniel Mathews, Haroon Chughtai
      • Duration: Nov 2020 - Feb 2024
      ExCALIBUR H&ES: HPC Hardware Piloting Service & Benchmarking

      With our colleagues in RITS Research Computing and UCL Computer Science, we are running the Interconnect Demonstrator for the UK exascale computing initiative. We will measure the impacts of a variety of in-network technologies – doing computation at the switch level and looking at the possibility of using acceleration on the network adaptor to off-load some of the work of the host machine. The focus of RSDG's work will be developing benchmarking and profiling approaches to understand how best to understand application performance and write effective research software for new exascale systems, in a co-design process with the hardware developments.

      • PI: Jeremy Yates
      • Funding: DiRAC, STFC, RCNIC, ExCALIBUR
      • RSDG team: Mose Giordano, Tuomas Koskela
      • Duration: Nov 2020 - Feb 2024
      • Languages and Technologies: Julia, Fortran, C++, ARMforge
      Data Infrastructure Capacity For EOSC (DICE)

      The DICE project is a multi-institutional EU-funded project which aims to enable a European storage and data management infrastructure for EOSC, providing generic services and building blocks to store, find, access and process data in a consistent and persistent way.

      UCL’s involvement in the project includes the installation and configuration of two EUDAT services (B2Safe and B2Find). These will be used to transfer and share large volumes of data from partner institutions (including the Barcelona SuperComputing Centre) in order to support the CompBioMed2 project. We will evaluate the benefits of EUDAT tools as future components of UCL’s suite of research data services, and our work will help inform EUDAT’s future development, enabling it to better serve the needs of large multi-disciplinary research institutions.

      • PI: James Wilson
      • Funding: EC H2020
      • Duration: Jan 2021 - Jun 2023
      Genomic Medicine Service Alliance

      This project is with Great Ormond Street Hospital (GOSH), which leads the North Thames Genomic Medicine Service Alliance (GMSA). The project is part of an overall suite of work to store patients' genomic reports in an interoperability standard for healthcare: FHIR. Our involvement is to help with the existing data modelling work already done at GOSH to produce a front-end web application to enable input of genomic report information and transformation of the FHIR data model, persisting the data using a hospital FHIR API. Alongside this, using best practices in software development to ensure that the FHIR data model definition is well robust and that validation that other sources of FHIR data adhere to this model.

      • PI: Elias Zapantis
      • Funding: UCLPartners, NHS
      • RSDG team: Stefan Piatek, James Hughes
      • Duration: Apr 2022 - Oct 2022
      • Languages and Technologies: TypeScript, JavaScript, React, FHIR
      Health Data Research UK - London Hub

      Health Data Research UK is the national institute for data science for health, which was established in 2018 with long-term funding support from research councils, UKRI, and charitable and governmental research funders. The HDR UK seeks to drive improvements in the health of patients and populations through research at regional and national scales. The triple mission of HDR UK spans the discovery of disease mechanisms in science through precision medicine and trials for public health; establishing platforms and the underlying infrastructure to enable research at a national scale; developing training and capacity opportunities. As part of the HDR London Hub, RSDG has recruited a Health Data Specialist to support researchers in making the most effective use possible of resources available at UCL and elsewhere, such as the Data Safe Haven. Particularly we will look at scaling up analyses to deal with massive volumes of highly sensitive personal data, e.g. gathered during routine health care.

      • PI: Harry Hemingway
      • Funding: HDR UK
      • RSDG team: Nel Swanepoel, Anika Cawthorn, Haroon Chughtai, Jonathan Cooper
      • Duration: Aug 2018 - Mar 2023
      SUMMIT Data Repository

      The SUMMIT clinical trial is the largest lung screening programm in the UK. ARC oversaw the development of an in-house blood samples management app which has significantly reduced the administrative overhead in receiving, cross-checking, and automatically validating reports from the multiple clinical centres involved in the trial. We are hosting this app on Cloud@UCL and assist the data manager with updates as needed. ARC also imaging data for the trial researchers on our XNAT platform and advised the team on best practices in this area. 

      • PI: John McCabe
      • Funding: Pharma Company
      • RSDG team: Jonathan Cooper, Nel Swanepoel, Haroon Chughtai
      • Duration: Apr 2020 - Jan 2025
      Manolis Mavrikis & EdTech Opportunities

      The purpose of this project is to consolidate use cases and technical requirements for applications requested by researchers and teams at the Institute of Education. There is a sense that requests for software solutions can be grouped into categories with the aim of avoiding the repetitive effort and/or “reinventing the wheel”.

      • PI: Manolis Mavrikis
      • Funding: Internal UCL
      • RSDG team: Peter Schmidt
      • Duration: Jun 2022 - Sep 2022
      QNICE - Quantitative Neuroradiology Initiative Central Engine

      The Quantitative Neuroradiology Initiative (QNI) is a collaboration between researchers in the Neuroradiological Academic Unit and the UCL Centre for Medical Image Computing to rapidly translate innovative computing solutions into practical clinical tools deployed in NHS imaging departments. The project aims to automatically derive robust, objective markers of disease from clinical neuro-images and integrate these measurements back into the clinical workflow for patient management, changing the paradigm for image-based diagnosis, follow-up, and treatment.

      Our work is to develop a central management engine that will control the automated analysis of a 3D MR scan. The engine will determine the appropriate analysis pathway based on the DICOM metadata from the scan series, run a Dockerised version of the analysis algorithm, pick up the report produced and convert it to a DICOM-encapsulated PDF which can then be returned to PACS for clinicians to view. The system must be adaptable to different kinds of analyses, and future applications to a range of diseases.

      • PI:  John Thornton
      • Funding: 
      • RSDG team: Peter Schmidt, Daniel Mathews
      • Duration: Feb 2022 - Feb 2023
      Understanding Mutations Through Protein Structure

      A pan-tissue analysis of the somatic mutations observed in aging, using DNA sequence data from human samples. The effects of each mutation on protein structure stability will be estimated computationally.

      • PI: Ben Hall
      • Funding: Royal Society
      • RSDG team: Robert Vickerstaff, Asif Tamuri, Rachel Alcraft
      • Duration: Mar 2022 - Mar 2023
      Learned Exascale Computation Imaging (LEXCI)

      Maintenance and developing of TensorFlow interface to astrophysics imaging codes.

      • PI: Jason McEwen
      • Funding: Royal Society
      • RSDG team: Robert Vickerstaff, Asif Tamuri, Rachel Alcraft
      • Duration: Mar 2022 - Mar 2023
      OMOP Pipeline 

      This project involved working with the OHDSI tools (https://www.ohdsi.org/) to enable applying them to an OMOP database. It also involved some work on the OMOP extract tools being developed at UCLH, namely the development of a Mock Caboodle testing database and a Validation Module for the package.

      • PI: Tim Roberts, Wai Keong Wong
      • Funding: UCLH BRC
      • RSDG team: Stefan Piatek, Sarah Keating
      • Duration: Feb 2022 - Jul 2022
      • Languages and Technologies: R
      CHIMERA: Collaborative Healthcare Innovation through Mathematics, EngineeRing and AI

      This is a large healthcare project involving many aspects. Our current involvement is the design our a secure pipeline for collecting and anonymising data and transferring it from the partner hospitals to the Data Safe Haven.

      • PI: Becky Shipley
      • Funding: EPSRC
      • RSDG team: Sarah Keating, Stefan Piatek
      • Duration: Aug 2021 - Jul 2024
        XNAT: CCP in Synergistic Reconstruction for Biomedical Imaging

        Biomedical imaging increasingly involves multiple modalities and/or imaging time points. Advances in instrumentation open up the exciting potential for synergistic imaging. This EPSRC-funded collaborative computational project (CCP) aims to bring together the best of the UK's and international image reconstruction expertise to make this potential a reality. CCP SyneRBI (Synergistic Reconstruction for Biomedical Imaging) is the successor of CCP PETMR, extending the scope to other multi-modal biomedical imaging systems, concentrating on SPECT, PET, and MR reconstruction, but including PET/CT and SPECT/CT systems. RITS' Medical Imaging Research Software Group, part of RSDG, will be providing the data management and analysis platform for the project.

        • PI:  Kris Thielemans
        • Funding: EPSRC
        • RSDG team: Daniel Mathews
        • Duration: Apr 2020 - Apr 2025
        • Languages and Technologies: XNAT
        XNAT: Second European Carotid Surgery Trial (ECST-2)

        Data storage and support using XNAT to support an international multisite study.

        • PI:  Martin Brown
        • Funding: Departmental Discretionary Funds
        • RSDG team: Sarah Keating, Stefan Piatek
        • Duration: Aug 2020 - Jul 2025

        Past Projects

        UKRI-DiRAC Federation Project

        Benchmark ML/AI: Benchmark neuroscience python code developed by UCL group. Different from SciML, as it's multinode, so the whole code has to scale, not just the (single-node) components. How different will that be/scale from other multinode codes? Probably not, interesting to find out. The group has benchmark and will work with us to study performance.

        • PI: Jeremy Yates and Mark Wilkinson
        • Funding: STFC
        • RSDG team: Ilektra Christidi, Jonathan Cooper, Krishnakumar-Gopalakrishnan, Sanaz Jabbari, Tom Couch, Tuomas Koskela, Tom Young
        • Duration: Jan 2021 - Jun 2022 
        SBML/COMBINE Library Support CompBioLibs

        SBML (the Systems Biology Markup Language) is a standard format that facilitates the reuse and
        exchange of unambiguous models in the field of computational biology. This grant provided the seed funding to ensure the long-term sustainability of libSBML (a parsing, validation, and manipulation/conversion library for SBML) and Deviser (a code generation tool), with the goal of
        being able to expand this to other computational biology standards libraries. Establishing a base with a
        permanent RSE (Research Software Engineering) group will allow expertise to be shared and thus
        retained within the institute.
        The libSBML code base was reorganized to make community contribution easier, and the build process was established using GitHub Actions.
        In addition, a separate XML parsing layer allowing access to one or more of the three most
        popular XML parsing libraries was created by separating it from existing libSBML code. Deviser was extended to use this layer to allow easy generation of API libraries for other XML-based standards.

        • PI: Sarah Keating
        • Funding: CZI
        • RSDG team: Sarah Keating
        • Duration: Sep 2020 - Aug 2021
        • Languages and Technologies: C, C++, C#, Java, Javascript, MATLAB, Perl, PHP, Python, R, Ruby
        Geometrically Unfitted FEM

        We worked on updating the CutFEM library to work with newer versions of the popular FEniCS library for solving partial differential equations.

        • PI: Erik Burman
        • Funding:  EP/P01576X/1
        • RSDG team: David Perez Suarez, Anastasis Georgoulas
        • Duration: Nov 2020 - Oct 2020
        • Languages and Technologies: C++, Python, pybind11
        Parallelize TROVE

        TROVE is a Fortran codebase for nuclear motion calculations. Funded by DiRAC, we updated the codebase, integrated different versions, added tests, and standardised the process of building and running the code. We also investigated the performance of TROVE on new DiRAC HPC platforms.

        • PI: Sergey Yurchenko, Jonathan Tennyson
        • Funding:  DiRAC
        • RSDG team:  Anastasis Georgoulas, Jamie J Quinn, Sarah Jaffa
        • Duration: Jan 2021 - Dec 2021
        • Languages and Technologies: Fortran, MPI
        A Brain Atlas

        A web application for viewing histological slices of brain matter.

        • PI: Eugenio Iglesias González
        • Funding:  EC Horizon 2020
        • RSDG team:  James Hughes
        • Duration: Apr 2021 - Jan 2022
        Precision UTI

        Providing code reviews and general advice for a SNOMED CT server API for R.

        • PI: Laura Shallcross, Peter Dutey
        • Funding:  PrecisionAMR
        • RSDG team: Anika Cawthorn
        • Duration: Nov 2021 - May 2022
        Data Analysis for Fluorescence Lifetime Microscopy

        Project aims:

        1. Development of segmentation tools and Jupyter notebooks for the analysis of exosome fluorescence lifetime microscopy data.
        2. Extending functionality of an ImageJ/FIJI plugin for analysis of fluorescence lifetime microscopy data.
        • PI: Paul Barber
        • Funding:  CRUK
        • RSDG team: Daniel Mathews, Haroon Chughtai
        XNAT: Mirror of NHSx NCCID

         Creation of a mirror of the National COVID-19 Chest Imaging Database (NCCID) on an XNAT server hosted by the department of Computer Science at UCL and by Microsoft through the use of Azure. The intention is to give researchers at UCL (mainly CMIC) access to the national database.

        • PI: Joe Jacob
        • Funding: Microsoft
        • RSDG team: Daniel Mathews, Haroon Chughtai
        • Duration: Mar 2021 - Mar 2022
        XNAT: Hosting of PHOSP-COVID Data

        Technical support and data storage of Post-hospitalisation COVID-19 (PHOSP-COVID) study from NCIMI, Oxford to UCL for analysis.

        • PI: Joe Jacob
        • Funding: Wellcome Trust
        • RSDG team: Daniel Mathews
        • Duration: Mar 2021 - Mar 2022
        Napari Plugins

        This project involves writing a napari plugin that enables users to run the cell-tracking software BayesianTracker from the napari GUI. It also involves further improvements to Arboretum, a napari plugin to visualise cell lineage trees.

        • PI: Alan Lowe
        • Funding: CZI napari Plugin Accelerator Grants
        • RSDG team: David Stansby, Alessandro Felder, David Perez Suarez
        • Duration: Jan 2022 - Jun 2022
        • Languages and Technologies: C++, Napari
        Coast Erosion - Geography Citizen Science Data Collection app

        Using data gathered by a dedicated team of citizen scientists, we created a web app to visualist the effect of coast erosion and storms on the shoreline at Bawdsey Beach.

        • PI: Helene Burningham
        • Funding: Internal UCL 
        • RSDG team: David Perez Suarez, Sarah Jaffa
        • Duration: Jun 2021 - Apr 2022
        Social Contagion of Violence 

        This project’s focus is the prevention of serious violence, and in particular interrupting the spread of violence across social networks. Evidence from the US shows that violence exhibits contagion-like properties, whereby violent incidents trigger follow-up violent victimisations among the social contacts of the original victim. Our aim is to establish whether violence in the UK displays similar patterns of contagion, in the hope of bolstering a move towards a public-health approach to violent crime, which takes a population-level perspective and seeks to identify upstream opportunities for prevention.

        • PI: Toby Davies
        • Funding: Home Office 
        • RSDG team: Ed Lowther
        • Duration: Feb 2022 - Jun 2022
        Next Generation Trusted Research Environments

        A collaboration with the Turing Institute and the University of Cambridge aims to assess the readiness of a Turing-developed Information Governance app for deployment by other research organisations to support their data management policies and Trusted Research Environment infrastructure. The current phase of this project combines technical work, user testing, and developing an understanding of the existing information governance processes and needs at the three collaborating institutions, with the ultimate aim of releasing an open-source version of the app.

        • PI: James Hetherington
        • Funding: Turing Institute 
        • RSDG team: Ed Lowther, James Hetherington, Nel Swanepoel, Tom Couch
        • Duration: Oct 2021 - Jul 2022
        Hospital Onset Covid-19 Infection (HOCI)

        This project continued the work started during the COVID-19 pandemic by COG-UK-HOCI, which provided modeling prediction of hospital-aquired COVID-19 infection. The COG-UK-HOCI study required patient demographic information, infection status, admission, and location information within the hospital - which was prohibitively time-consuming. This work automated the collection and linking of patient data with Sarc-CoV-2 sequencing within UCLH. The final result was a system that was able to run without manual patient data collection, resulting in a report of the likelihood of hospital-acquired COVID-19 infection. This proof of principle is being used to apply for a clinical trial that will model the spread of multiple respiratory viruses in multiple NHS hospitals.

        • PI: Judith Breuer
        • Funding: NIHR
        • RSDG team: Stefan Piatek
        • Duration: Sep 2021 - Mar 2022
        • Languages and Technologies: Java, Spring, Hibernate

        The aim is to bring together Next-Gen Sequencing (NGS) and patient data from Electronic Health Records (EHRs) to diagnose antimicrobial resistance and track its spread around the hospital (UCLH, also GOSH though we won't be involved in that part particularly). They hope to employ 2 bioinformaticians as well as have support from us.

        • PI: Judith Breuer
        • Funding: NIHR
        • RSDG team: Stefan Piatek
        • Duration: Aug 2020 - Nov 2021
        A Brain Atlas

        The objective of the BrainAtlas project is to use a web-based application to navigate the human brain and visualise anatomical details. The team plans to use this tool in a teaching environment. The underlying data are based on MRI and histology image sequences from 5 patients. The team developed the app using the React framework and the web app is available through GitHub pages on http://github-pages.ucl.ac.uk/BrainAtlas/#/atlas.

        • PI: Eugenio Iglesias González
        • Funding: EC Horizon 2020
        • RSDG team: Peter Schmidt, James Hughes
        • Duration: Jan 2021 - Jan 2022
        Real-time Advanced Data Assimilation for Digital Simulation of Numerical Twins on HPC / RADDISH

        Improving interfaces and growing the user base of the software developed in RADDISH.

        • PI: Serge Guillas
        • Funding: ATI
        • RSDG team: Tuomas Koskela, Mosè Giordano
        • Duration: Jul 2021 - Apr 2022
        • Languages and Technologies: Julia, Github Actions, MPI, HDF5
        Improve CeMMAP Statistical Software

        Removing proprietary tools from econometrics optimisation software and providing a user interface from R.

        • PI: Lars Nesheim
        • Funding: ESRC via the IFS
        • RSDG team: Tuomas Koskela
        • Duration: Apr 2021 - Aug 2021

        The Open Richly Annotated Cuneiform Corpus (Oracc) supports the editing of translations and transliterations of ancient Mesopotamian (Iraqi) texts. The principal aim of this project is to create a local GUI for Oracc.

        • PI: Professor Eleanor Robson
        • Funding: UCL
        • RSDG team: Raquel Alegre, Stuart Grieve, James Hetherington, Jens Nielsen, Benjamin Laken
        • Duration: 2014 - 2017
        • Languages and Technologies: PLY, Jython, Swing, Maven, SOAP/WSDL
        DataSpring: Enabling complex analysis of large-scale digital collections

        Funded through the JISC "Research Data Spring" initiative, this project seeks to make it possible to efficiently query a corpus of 81000 out-of-copyright books using UCL's research computing infrastructure and to thereby understand the issues that arise in using traditional HPC resources for humanities work.

        Times Digital Archive queries

        We did some initial investigation into using Spark on UCL supercomputers to query large newspaper archives for terms of interest, accounting for mis-spellings, OCR failures, etc.

        • PI: Peter Guillery, Melissa Terras
        • RSDG team: Raquel Alegre, Roma Klapaukh
        • Duration: 2017
        • Languages and Technologies: PySpark