Advanced Research Computing


Getting a handle on third-party datasets: researcher needs and challenges

ARC's Data Stewards have completed the first phase of work on the third-party datasets project which will help researchers better access and manage data provided to UCL by external organisations.

Data Stewards

25 January 2024

The problem

Modern research frequently demands access to large volumes of data generated outside of universities. These datasets, which are provided to UCL by third parties, are normally generated during routine delivery of a service or other activity and are used in research to identify patterns and make predictions about the world. UCL research and teaching increasingly rely on access to these datasets to achieve their objectives. This includes everything from NHS data and data provided by other government departments to large-scale commercial datasets, such as those provided by ‘X’, formerly known as Twitter.

Currently, there is no centrally-supported process for research groups wanting to access third-party datasets. Researchers sometimes use departmental procedures to acquire personal or university-wide licenses for third-party datasets. They then transfer, store, document, extract and undertake actions to minimise information risk before using the data for various analyses. The process to obtain third-party data has a huge overhead involving contracts, compliance (IG) and finance. Delays in acquiring access to data can be a significant barrier to research. Some UCL research teams also perform additional support services such as sharing, managing access to, licensing and (re)distributing specialist third-party datasets for other research teams. These teams increasingly take on governance and training responsibilities for these specialist datasets. Concurrently, the e-resources team in the library also negotiates access to third-party datasets for UCL staff and students following established library procedures.

It has long been recognized that UCL's processes for acquiring and managing third-party data, such as they exist, are uncoordinated and inefficient, leading to inadvertent duplication, unnecessary expense, and the underutilisation of datasets that in some cases could support transformative research across multiple projects or research groups. This was recognized in the “Data First, 2019 UCL Research Data Strategy”.

What we did

Last year, the ARC Data Stewards team reached out to UCL professional services staff and researchers to understand the processes and challenges they faced in relation to accessing and using third-party research datasets. We hoped that insights from these conversations could be used to develop more streamlined support and services for researchers and make it easier for them to find and use data already provided to UCL by third parties (where this is within licensing conditions).

During this phase of work, we spoke with 14 members of staff:

  • 7 research teams that manage third-party datasets
  • 7 members of professional services that support or may support the process, including contracts, data protection, legal, Information Services Division (databases), information security, research ethics and integrity, and the library

What we've learnt

An important aspect of this work involved capturing the existing processes researchers use when accessing, managing, storing, sharing and deleting third-party research data at UCL. This enabled us to understand the range of processes involved in handling this type of data and identify the various stakeholders involved – or who potentially need to be involved.  

In practice, we found that researchers follow similar processes to access and manage third-party research data, depending on the security of the dataset. However, as there is no central, agreed procedure to support the management of third-party datasets in the organisation, different parts of the process may be implemented differently by different teams using the methods and resources available to them.

We turned the challenges researchers identified in accessing and managing this type of data into requirements for a suite of services to support the delivery and management of third-party datasets at UCL.

Next steps

We have been working on addressing some of the common challenges researchers identified. The table below provides a brief overview of what we have been doing.

You said...We did...

Getting contracts agreed and signed off takes too long

We have reached out to the RIS Contract Services Team who are actively working to build additional capacity into the service as part of a wider transformation programme. 

Information about how to access third-party datasets is fragmented and researchers don’t know where to go for help/advice, in particular governance and technical advice 

We are bringing relevant professional services together to agree a process for supporting access to third-party datasets.

There’s too much duplication of data, costs are high and it’s not easy to know what’s already available internally 

We are building a searchable catalogue of third-party datasets already licensed to UCL researchers and available for others to request access to reuse. 

Our progress will be reported to the Research Data Working Group which acts as a central point of contact and a forum for discussion on aspects of research data support at UCL. The group advocates for continual improvement of research data governance.

If you would like to know more about any of these strands of work, please do not hesitate to reach out (email: researchdata-support@ucl.ac.uk). We are keen to work with researchers and other professional services to solve these shared challenges and accelerate research and collaboration using third-party datasets.