Opinion: Tsunami modelling and AI map reading

In March 2018, RITS launched a new service to complement and augment its current activities in Data Science and expand it further to include AI and Machine Learning applications in research

Giant tsunami wave crashing onto a tropical beach, Photo by Matt Paul Catalano on Unsplash (https://unsplash.com/s/photos/tsunami)

26 April 2019

Nearly every field of discovery, such as astrophysics, biology, meteorology, medicine, finance, healthcare, and social sciences, is evolving from being data-poor to being data-rich. High Performance Computing, coupled with Artificial Intelligence (AI), Machine Learning and Data Science provide us with tools and science for handling, understanding, analysing and modelling large data sets.

In March 2018, the Research Software Development Group within Research IT Services (RITS), launched a new service to complement and augment its current activities in Data Science and expand it further to include AI and Machine Learning applications in research.

This new service is called “AI Studio”. This service aims to help the UCL research community by providing consultancy and developing software for their data science and machine learning problems, complying with sound software engineering principles.

Integral to this goal is to reduce the barrier for those researchers who are less experienced in the application of AI and how it may benefit their research.

Working closely with other Research IT Services within our group, we have access to the resources and support for computing and storage. Being part of ISD makes us an ideal host for such a campus-wide AI service and its computational underpinnings.

The “AI studio” is able to provide training courses related to data science, and we actively support and facilitate knowledge transfer activities between departments. Our team organises and participates in “Software Carpentry” sessions and is active in supervising PhD students within the Centre for Doctoral Training in Data Intensive Science. We also work closely other institutions, for example through a Crick Institute Networking Fund project to initiate the development of a knowledge hub supporting researchers in the rapidly growing domain of medical imaging.

As with the wider Research Software Development Group, the main activity for AI Studio is working on collaborative projects with researchers. At present our main expertise is in developing natural language processing (NLP) and image processing applications, but we are looking to grow the team and hence range of expertise with time. If you would like to work with us, email rc-softdev@ucl.ac.uk with your ideas.

Below we give some details of two projects we are currently supporting.

“We have been seeing steadily increasing demand from researchers for assistance in data analytics, applying machine learning and AI to their datasets. This new service will provide a focus to accelerate work in this area.”

Dr Jonathan Cooper, Head of Research Software Engineering

1. Catastrophe modelling for tsunamis in the Indian Ocean (in collaboration with the Department of Statistical Science)

This is an image processing application, in which our task is to automatically extract bathymetry (depth) data from scanned maps provided by UCL's Department of Statistical Science. They require digitised information from the maps in order to develop a model to forecast the effects of tsunamis on the coasts of India, and to some extent Pakistan and Iran. The bathymetry information is marked on the map in a consistent format (a digit, and subscript digit). We must identify these target digits among the rest of the information, localise them, find their coordinates and provide the 3-dimensional coordinates of these data points. To accomplish this, we need to implement a pipeline of image processing tasks, including object detection, localisation, and digit recognition.

Scanned map after object detection and character recognition pipeline applied

An example of a scanned map providing the input for our pipeline.

After we apply the object detection and character recognition pipeline on these maps, we extract the coordinates of the bathymetry figures in the map.

The extracted bathymetry data will be further used by the researchers for Tsunami modelling as part of a project that seeks to better understand the complex multi-disciplinary phenomena associated with the entire life cycle of a tsunami, to develop better physical source models, and to carry out statistical emulation of numerical tsunami models, in order to efficiently assess the hazard, especially for India.

“The superb work of Research Software Development Group has greatly accelerated our research. By creating and employing automatic methods to digitize maps, the coverage of our model has quickly and greatly increased. It would have been impossible, or very expensive and lengthy, to do this manually. As a result we can also readily work with updated maps or in other areas of the world. The interaction with the AI Studio has been excellent, with multiple iterations. Such an inclusion of Software Engineering and AI tools into projects will become the norm in modern research endeavours." — Professor Serge Guillas

PI: Professor Serge Guillas
Project: Uncertainty quantification of multi-scale and multi-physics computer models
Funding: EPSRC IAA Discovery to Use
RSDG and AI Studio team: Sanaz Jabbari, Raquel Alegre

2) Extracting information from medical letters (in collaboration with Moorfields Eye Hospital)

This is a Natural Language Processing (NLP) application, in which the task is automatic extraction of structured information from the text of electronic reports (letters) about patients. There is the possibility for the hospital consultant to write their findings either in a structured format or in narrative free text.

A typical summary letter from an ophthalmologist starts with patient attributes such as visual acuity measurements, followed by some medical history background of the patient, followed by diagnosis and course of action to be taken. Not all letters have all the clinical information in them. However, there are often multiple letters about the same patient, so by considering many letters we can build a fuller picture. Once information is extracted correctly one could potentially answer questions like “how does visual acuity change over time in a cohort of patients with central serous retinopathy”.

One of the major issues is ascribing a value or description to the correct eye. If everyone only had one eye we could do a lot with keyword search, but as they have two we need the NLP to work out whether a visual acuity, pressure, etc. corresponds to the right or left eye. This can be expressed in different ways, for instance “X on the right and Y on the left”, or “X and Y on the right and left respectively”, or “XR and YL”, and so on.

Sample fictitious letter from ophthalmologist that demonstrates features similar to real letters

A sample fictitious letter that demonstrates features similar to real letters.

We are still in the pilot phase of this project, determining the scope of what is possible to extract reliably from the letters. Follow-on projects, once the information is extracted in a structured format, could look into scenarios that include longitudinal work, e.g. build a timeline of treatment by pooling data across visits, or build an automatic audit machine to ask questions like “what is the average fall in IOP after treatment with cyclodiode in a patient with CRVO”.

To find us:

Email me: s.jabbari@ucl.ac.uk
Email: rc-softdev@ucl.ac.uk