XClose

Advanced Research Computing

Home
Menu

Data Stewardship Case Study: Convalescence Project

ARC's Research Data Stewards worked with the MRC Unit for Lifelong Health & Ageing, making data analysis pipelines more open and reproducible and ensuring the security of underlying sensitive data.

Institute of Lifelong Health & Ageing Lodo

25 January 2024

Background

UCL's Medical Research Council Unit for Lifelong Health and Ageing (MRC LHA) is home to the oldest birth cohort study in the UK.  Data is held on study participants stretching back over 75 years, containing longitudinal data from birth on the health and social circumstances of a representative sample (5362) of men and women born in England, Scotland, or Wales in March 1946. Much of the current data management is designed to fit within existing infrastructures – including a public facing, data discovery portal called Skylark. 

ARC’s Research Data Stewardship team engaged in a collaboration with the MRC LHA team in January 2023 to enhance the analysis pipelines for their data so it would better meet open and reproducible standards. Due to the nature of the special category data involved, the Data Safe Haven (UCL) service was used to host the underlying cohort information. 

Project Goals

To ensure future accessibility and reproducibility, whilst providing methods no longer reliant on legacy device outputs and standards, the team would modernise the codebase to accepted open standards This would provide a resource that would be more readily managed and increase efficiency of analysis. This code would then be stored as an open-source software in a public repository in line with UCL’s Open Science and Scholarship policy.

During the initial discussions, multiple workflows were identified. Each planned output was prioritised, and the decision made to focus on the task of unifying the data collected from clinical measurements to an open reproducible format – in this case a tabular format such as CSV or similar. To achieve this the Data Stewards would produce modular code in Python/R that could be reused for future analysis of similar data types.  

What we did

A team of Data Stewards joined the collaboration with the MRC LHA team to start work in April/May 2023, with weekly meetings between ARC and MRC LHA teams held to discuss progress and address challenges. This Teams group also provided direct communication through the collaboration for more rapid queries and feedback. As part of the Agile process, which ARC uses for its projects, the Data Stewards had weekly retrospective meeting to review progress and challenges encountered over the course of their work.

  • The first task was to produce a script that would extract bioimpedance data – data used to establish body composition, such as body fat and muscle mass - from output files XLSX to CSV format, whilst maintaining a specific output order requested by the researchers.
  • The second task was to produce a script that would extract Cardiopulmonary Exercise Testing (CPET) data – which captured how the heart, lungs and muscles respond to exercise - from XLSX to CSV, again keeping in line with the researcher’s output rationale. MATLAB code had previously been developed for this analysis prior to the collaboration. However, the MRC LHA team requested a Python code to perform the same operation for consistency and general code accessibility. After conversing with a MATLAB specialist in the Data Steward team, the decision was made to focus on the Python translation. 
  • The final task was to produce a script that would extract Spirometry data – which measured lung function - from xml to csv, with the output layout specified again by the researchers. 

Outcomes

At the end of the development and testing cycles, the Data Stewards produced a codebase for each objective which met the requirements and standards of the MRC LHA. After ensuring that the staff understood the pipeline processes, the code was then exported from the Safe Haven for storage as an open-source software in a public repository within the UCL organisation on GitHub. 

The ARC team were professional throughout; they understood the issues presented and were able to work collaboratively to provide effective solutions." - MRC LHA Team