AI Studio @ UCL-RITS
24 September 2018
Research IT Services are launching a new service called AI Studio: a data consultancy service in artificial intelligence (AI) and data science.
As an offshoot of the Research Software Development Group (RSDG), the aim is to help the scientific community across UCL by providing consultancy and software development for data science and machine learning projects.
Bridging the gap
Data science has its roots in the integration of statistics and computer science and is the discipline that deals with collecting, preparing, managing, analysing, interpreting and visualising data sets. As more research domains move to intersect with data-intensive computation and nearly every field of discovery is transitioning from data-poor to data-rich, techniques such as machine learning and AI are driving scientific and technological advancement in diverse areas such as astrophysics, particle physics, biology, meteorology, medicine, finance, healthcare, and social sciences.
With such widespread interest, it can be difficult for academics to find contributors with the necessary practical experience to apply computational and data science tools within their research domains. On the other hand, potential collaborators in computer science and statistics are motivated to work on problems that result in new publishable advances rather than the application of tried and tested methods.
In much the same way that RITS’s Research Software Development Group (RSDG) currently provides software engineering as a service, there is an opportunity for collaborative science-as-a-service roles to step in and provide data science expertise to academics. With the launch of the AI Studio, our motivation is to reduce the barrier for those academic researchers who are less experienced in the application of data science and machine learning to their research problems by providing two services: project collaboration and consultation.
How does this relate to Research Software Development?
The working practices of Research Software Engineers and Research Data Scientists overlap in many respects with both roles sharing skillsets which include literate programming, performant programming, algorithmic understanding, verification and testing, data wrangling and data visualisation. From data collection through to analysis and visualisation, most research software projects contain data science components and for our RSDG team, applied data science has always been a part of software development.
But available data for research is growing and a wider variety of data types are used in any given domain. There are daily advancements in Machine Learning techniques, tools, and algorithms specialised for specific data types and problems (e.g., natural language processing (NLP), image processing, reinforcement learning), and each is becoming a new speciality to be resourced.
The AI studio will add the distinct skills that are required to provide a high quality data science support service. This includes a deeper familiarity with Machine Learning (ML) theories, principles and algorithms, as well as hands-on knowledge of the latest data science software packages. The AI Studio will be able to complement our current RSD activities by:
Providing support and consultation where a deeper focus is needed in a data science component of a project
Building machine learning and AI applications, especially those requiring NLP and image processing solutions
Streamlining data wrangling and all pre-processing steps needed for realising ML applications
Liaising with various departments across UCL on ML and AI activities, and establishing collaborations for ML focused projects between RITS and them.
As has been true for the RSDG, by working closely with other RITS service teams, the AI Studio service will be backed up by first class resources and support in the form of the Research Data Storage service and the research computing platforms Legion, Grace and Myriad. As such, RITS is the perfect host for such a campus-wide AI service and its computational underpinnings.
Working with AI Studio
Since joining RITS in March 2018, Dr Sanaz Jabbari has been leading the development of the AI Studio service and envisions a service that will appeal to both novice and experienced data scientists. For those researchers who are already programmers and machine learning savvy, Sanaz will be able to advise on issues such as model selection, parameter tuning, or achieving a higher performance, while for researchers who are newer to data science, the consultation can be brainstorming on what sort of predictions and analysis are possible from their data and what type of data they need to collect.
Sanaz has been discussing a range of potential projects, some examples of which are below, and is working on a couple of these over the next few months. We're looking to take on new projects starting in 2019, so do talk to us about how we can help you!
Email “email@example.com” if you would like to discuss a project.
Sanaz will also join the RITS drop-in sessions on a regular basis to provide quick consultations on any tricky data science related problems.
Extracting bathymetry digits from map images (status: started)
This project requires automatically extracting bathymetry (depth) data from scanned maps provided by our academic collaborators in the Department of Statistics. They require certain digitised information from the maps in order to develop a model to forecast effects of tsunamis in the coasts of India, and to some extent Pakistan and Iran. To accomplish this, we are implementing a pipeline of image processing tasks. The pipeline addresses object detection, boundary detection, localisation, and finally digit recognition.
UCH Critical Care Forecast (status: starting 2019)
Here we aim to estimate the effect of prompt admission to critical care on mortality and propose a model for optimal bed occupancy. The project tackles a classical prediction and modelling problem, while working with highly sensitive data and avoiding “black box” solutions in order to maintain transparency.
Moorfields Eye Hospital (status: scoping exercise in progress)
The task is to automate extraction of structured information from the text of electronic reports (letters), where there is the possibility for the consultant to write his/her findings either in a structured format or in narrative free text (e.g. “the visual Acuity was excellent”, or “was 20”). One of the major issues is ascribing a value or description to the correct eye. If everyone only had one eye we could do a lot with keyword search, but as they have two we need to use NLP techniques to work out whether a visual acuity, pressure, etc. corresponds to the right or left eye. There are many permutations to consider; records can be written “X on the right and Y on the left” or “X and Y on the right and left respectively”, or “XR and YL”, and so on.