The number of data repositories in the UK has grown steadily due to consistent demand over the past decade, including both institutional [2] and subject-specific kinds [3]. Institutional repositories are vital for showcasing a university’s research outputs of all types [4], and being able to present a comprehensive overview of such outputs is of growing importance for exercises such as the Research Excellence Framework (REF) [5]. Benefits for researchers are also increasingly recognised, with the UCL Publications Policy for example stating that the repository should also “provide each researcher with a central hub for a comprehensive personal record of his/her outputs” [6].

At the same time, higher education institutions do not always have the resources and skills to curate and preserve datasets [7], and several recent studies have concluded both that there is “a clear value in aggregating a large number of data sets by subject or discipline” [3], and a need for more such repositories to cater for all disciplines [8].

There are therefore two competing drivers for managing repositories - institutions would like to be able to present all of their outputs from all disciplines in one place, while subject repositories would like to be able to present all of the data from a single discipline but many institutions. This situation was recognised in the RIN report Data centres: their use, value and impact: “… better understanding and coordination are needed of the relationship between local, national and international provision of data curation and aggregation services.” [3]

In order to find a solution whereby both kinds of repositories can achieve their goal of holding records of all research outputs relevant to their missions, it will be important to place as low a burden as possible on the researchers themselves, who will not want to enter the same information into multiple systems. Wherever possible automated systems should be developed: “Core challenges in the near future will be implementing systematic techniques for populating repositories, perhaps with mediated deposit workflows, and developing value-added service layers.” [9]

In particular three JISC-funded projects have done important work towards developing solutions in this area: 

DryadUK: Among other things the Dryad-UK project (Sep-2010 to Oct-2011) demonstrated a way that the deposit of data in a subject repository could be integrated in the workflows of a wide range of publishers, with the transfer of metadata from the journal to the repository in order to lower the burden for the researcher [10].

REWARD: Running at UCL from Oct-2011 to Mar-2012, this short pilot project has demonstrated that it is relatively easy to modify an institutional EPrints repository such as UCL Discovery to accept datasets, and that this can be manually integrated into a publisher’s workflow [11].

SWORD-ARM: Currently underway at the Archaeology Data Service in York, the SWORD-ARM project is piloting automation of data deposit to the repository using the SWORD protocol [12].

By each tackling a different aspect of data publication, these projects have together laid a foundation for designing a system that would enable records to be automatically exchanged between different types of repositories and with publishers. Such a system is a potential solution to the issues facing both institutional and subject repositories in terms of the completeness of their respective collections.

The PRIME Project

The PRIME project (Publisher, Repository and Institutional Metadata Exchange) will pilot a system to exchange metadata between institutional repositories, subject repositories and publishers. The scope of the project is to be kept narrow in order to ensure that the objectives are realistic and obtainable within the available one-year timeframe. The project will therefore focus solely on archaeology data, and metadata exchange between the UCL Discovery EPrints repository, the Archaeology Data Service (ADS) repository, and the Journal of Open Archaeology Data (JOAD) from Ubiquity Press. 

While the project takes a focused approach, its benefits will be broad reaching. Archaeological data is extremely diverse and multidisciplinary in nature, and is therefore well suited to proof of concept work. UCL Discovery has already been modified to accept data deposits during the short REWARD project and as a result already contains datasets relevant to PRIME. The ADS have already been investigating the use of the SWORD protocol for data transfer as part of the SWORD-ARM project, and have previously implemented a metadata exchange project with the US-based TDAR repository. JOAD has been set up to manually direct users to both UCL Discovery and the ADS, and has papers about several datasets in each repository already.

As a pilot project, PRIME will consist of three main stages: Scoping and design, development and integration, and community feedback. In the first stage, a one-day workshop will be held at UCL to determine the required common metadata elements to be exchanged, and the technology(/-ies) to be employed. The metadata design will draw upon existing standards such as the Dryad Metadata Application Profile, the DataCite Metadata Schema and the CERIF data model, and be based on Dublin Core with the long-term goal of being aligned to the semantic web [13]. As a transfer protocol, SWORD is both very promising and a JISC priority, and it will be evaluated alongside OAI-PMH. SWORD and OAI-PMH are quite different approaches and it is possible that the system could be built on either a push or pull basis, or a combination of these. An important aspect of the system design will be to ensure that the metadata packages involved can be used with either protocol in future, for flexibility and scalability. Additional options will also be considered, including integration with the Symplectic Elements system used by UCL and many other UK HEIs for repository ingest. The workshop will include delegates from the following organisations: ADS, DCC, Dryad, EPrints, Figshare, Symplectic, Ubiquity Press, UCL Library Services, 2-3 additional UK-based open access publishers.

Representatives from additional publishers are being included in to validate the technical feasibility of the approach for a broad range of back end publishing systems. The decisions taken at the workshop will then be written up and more widely circulated for feedback.

Development and integration of the agreed system will then take place within each of the three systems. A full-time developer based with Ubiquity Press at the UCL Institute of Archaeology will be provided for this stage, and staff will also be allocated on a targeted, part-time basis by UCL Management Systems (who manage the Symplectic Elements ingest system used by UCL Discovery) and the ADS. Data produced by researchers at the UCL Institute of Archaeology during the project will be used for testing the system, and five case studies will be recorded. 

In the third phase of the project, feedback will be sought from the repository and publishing community on the success of the system developed, and Dryad and Figshare in particular will allocate 1-2 days for providing an evaluation. This will include a critique of the business plan developed to ensure the sustainability of the system. A second workshop will then be held at UCL to disseminate and discuss the results.

Use Cases

figure 1

Use case 1: A UCL Researcher deposits data in an external subject repository. The subject repository sends the metadata and DOI of the data to the UCL institutional repository so that it has a record of the output.

figure 2

Use case 2: A UCL Researcher deposits data in their institutional repository. The institutional repository sends the metadata and DOI of the data to the appropriate subject repository so that it has a record of the output. 

figure 3

Use case 3: A UCL Researcher submits an article to a journal, and is asked to archive the data as a precondition of publication. The journal sends the metadata to the repository so that the author does not have to re-enter it. The institutional repository sends the metadata and DOI of the data to the appropriate subject repository so that it has a record of the output, and the DOI back to the journal to link the article with the data


[1] Heery, R. and Anderson, S. 2005 Digital Repositories Review. UKOLN and AHDS. []

[2] Cullen, R. and Chawner, B. 2011 Institutional Repositories, Open Access, and Scholarly Communication: A Study of Conflicting Paradigms. The Journal of Academic Librarianship, 37(6): 460-470. DOI:

[3] RIN 2011 Data Centres: Their Use, Value and Impact. London: Research Information Network. []

[4] Bankier, J.-G. and Perciali, I. 2008 The Institutional Repository Rediscovered: What Can a University Do for Open Access Publishing? Serials Review, 34(1): 21-26. DOI:

[5] Day, M. 2004 Institutional Repositories and Research Assessment. University of Bath: UKOLN. []

[6] UCL 2010 UCL Publications Policy 2010. []

[7] Hockx-Yu, H. 2006 Digital Preservation in the Context of Institutional Repositories. Program: Electronic Library and Information Systems. DOI:

[8] PARSE.Insight, 2010, Insight into Digital Preservation of Research Output in Europe: Insight Report. []

[9] Palmer, C. L., Teffeau, L. C. and Newton, M. P. 2008 Strategies for Institutional Repository Development: A Case Study of Three Evolving Initiatives. Library Trends, 57(2): 142-167. []

[10] Hole, B. 2011 DryadUK Final Report. []

[11] Hole, B. (forthcoming) REWARD Final Report. []

[12] SWORD-ARM Project Website. []

[13] Greenberg, J., White, H.C., Carrier, S. and Scherle, R. 2008 A Metadata Best Practice for a Scientific Data Repository. Journal of Library Metadata, 9(3-4): 194-212. DOI:

[14] Registry of Open Access Repositories, University of Southampton. [], accessed 14/03/12.

Page last modified on 12 nov 12 13:06