The Research Data Services team are in the process of developing an institutional Research Data Repository for UCL researchers across all academic disciplines.
The Repository will enable the long-term preservation and curation of data underpinning published research or which is otherwise of value.
Many research funders now mandate that the data generated during research projects they pay for is preserved and made available to others (where appropriate) for periods of 10 years or more beyond the point of research publication or the end of a project. Even when preservation is not a requirement of funding, it may be beneficial to preserve and make available valuable data in order to:
- Improve the reproducibility of your research findings
- Receive credit via the citation of your data in future research
- Ensure that your data can be re-used, re-analysed, or combined with other datasets in future, to contribute to the advancement of your field
- Ensure you can locate and access the data yourself in the future, without having to actively manage it and maintain it.
UCL supports the principles of Open Science and FAIR data1. The UCL Research Data Policy states that members of UCL should respect the principle that data should be “as open as possible, as closed as necessary”2.
Whilst UCL is developing a general research data repository intended to serve any academic discipline, there are already specialist data repositories for many types of academic data. If there already exists a specialist data repository in you field, you should in most circumstances use that in preference to the UCL repository. The Re3data website lists most existing data repositories.
How does the UCL Research Data Repository Work?
We have awarded the contract for providing the Repository to Figshare, part of Digital Science. Figshare already run a popular commercial research data repository and have several years of experience in this field. The institutional version of their service keeps the intuitive interfaces, search, and visualization tools, whilst allowing for customizations and a local copy to be take of uploaded data and accompanying metadata.
The Repository will have a graphical interface where researchers can describe and upload their datasets
When uploading data, researchers will need to fill in some information to ensure that the dataset can be discovered, accurately cited, and understood. Whether or not the data itself is generally accessible, people should be able to find a record that the data exists and have a reasonable idea of what the data covers.
Data can be embargoed or access restricted where necessary
It may be necessary to keep data that should ultimately be made public under embargo for a period to maximise the potential for the researchers that produced it to publish their results. The timed embargo functionality will be in place from launch. Other forms of access restriction will be considered by the Repository team and implemented over the first year of operations.
Datasets and collections will be ‘published’, and assigned a unique identifier such as a DOI (Digital Object Identifier). This enables data to be cited.
Datasets or collections of datasets can be referenced in research publications, enabling researchers to gain credit for their data and a case to be made for its impact. This can support the REF process.
Data can be linked with associated publications, software code, and other datasets.
The metadata record for each dataset or collection will hold relationship information, enabling data to be linked to publications that reference it, to the software code that generated it, or to other datasets from which the data was selected, adapted, or so forth.
Data will be preserved over the long term.
The repository will enable information professionals to help ensure data remains readable far into the future, via integrity checks, migrations between storage platforms, format migrations, and other curation activities.
It is not anticipated that the Research Data Repository will initially be certified to hold sensitive personal data such as non-anonymised patient data. UCL is currently in the process of upgrading the Data Safe Haven, and will look at developing a long-term repository suitable for sensitive data over the coming years.
It is unlikely that the Repository will initially be able to harvest information about UCL datasets described and deposited in external data repositories. Users may therefore be required to manually add references to datasets deposited elsewhere to the UCL Repository. We will look at automating this process in due course.
The Research Data Repository is currently available to early adopters for testing. It will be opened up to all UCL staff and doctoral students towards the end of May 2019. If you are keen to start using the Repository before then and are happy to provide us with some feedback, let us know and we will add you to the early adopter programme. You can get in touch with us via email@example.com