XClose

Advanced Research Computing

Home
Menu

Harbour - UCL's External Data Service

Harbour helps UCL researchers to access and manage data provided to UCL by external sources.

Image depicting Harbour logo using a series of interconnected black lines and colourful circles
Research often depends on access to, and links between, increasingly diverse data to achieve its objectives. These datasets often come with complex requirements necessitating coordination between multiple services to manage the data efficiently and responsibly. 

Harbour accelerates research and interdisciplinary collaboration using these high-volume, externally provided datasets. It does this by simplifying the processes by which researchers access and manage these data. As a result, Harbour fosters communication and collaboration across disciplines, helping researchers maximize the value of these important resources.

How can Harbour help me?

Harbour licenses and manages external data resources on behalf of UCL. We facilitate access to these datasets to acclelerate data-driven research. You can learn more about the data we currently manage below. 

BBC AVS: 10k dataset

The BBCAVS: 10k dataset is a collection of BBC TV programmes with subtitles and associated metadata. The dataset comprises 10,160 programmes publicly broadcast by the BBC in the UK between June 2007 and December 2021 and contains content originally recorded between 1962 and 2017. ​

​The BBC provides this data for machine learning and data science research – but it may have other applications.​

You can learn more about the BBCAVS: 10K dataset on its dedicated Github page.​

Submit a request to us to learn more about accessing this data under the UCL license.

Clinical Practice Research Datalink

CPRD is a collection of anonymised patient data from a network of GP practices across the UK. CPRD also links primary care data to a range of other health related data to provide a longitudinal, representative UK population health dataset. The data encompass 60 million patients, including 18 million currently registered patients. ​

​Harbour delivers CPRD in collaboration with UCL researchers as part of the Clinical Data Science and Technology Platform (STP).

Collaborations and consultancy

The Harbour collaborations and consultancy service is available for researchers looking for help accessing and managing data from external sources. We focus on controlled or restricted-access datasets. Harbour does not support access to open-source, third-party datasets.​

​We can help you by providing free drop-in advice if you are struggling with a particular problem or with longer-term support through a funded collaboration. Visit our collaborations and consultancy pages to learn more about how ARC can support you with delivering your research. 

Training

We are currently developing training modules to support researchers  with using Harbour datasets. Training for CPRD will be delivered by the Clinical Data STP. ​Training for BBC data is in development and will be advertised on ARC's Training pages.

Contact us

You can reach out to the Harbour Team on My Services