REWARD Project



There is strong support for the archiving, dissemination and reuse of research data from government, funding bodies, institutions and researchers. A recent study has shown that researchers in particular highly value the resulting re-analysis of existing data, its potential for future validation purposes, role in the advancement of science, and potential for stimulating interdisciplinary collaborations [1]. More efficient research through the reuse of data is especially appealing in a tight funding environment.

Despite this, data archiving in Higher Education Institutions is not yet widespread, and 80% of data is not adequately preserved. The 2010 PARSE.Insight project found that 81% of researchers stored their data on a PC at work, 66% used portable storage, 51% a PC at home, and only 20% used a digital archive [1]. This is particularly the case for “long tail” data from smaller research efforts and in areas such as the humanities and social sciences, as described by a 2011 RIN report on humanities research habits [3].

Institutional repositories for research data are being developed in some locations within the UK, but most of these projects are still some way from completion. At the same time, institutional repositories for textual outputs are already well established, with 126 in the UK [4]. These hold the currently “overwhelmingly dominant channels” for research dissemination, including journal articles, conference reports, books and book chapters [3]. The use of such electronic materials is has also been shown to be increasing, already accounting for the majority of reading done by researchers [5]. Both publishing research articles electronically and depositing them in institutional repositories can therefore be seen as established parts of researcher workflows.

Such repositories play central roles in fulfilling the strategic aims of institutions such as UCL and its library [f; g]. This is especially important for promoting research outputs for the Research Excellence Framework (REF) exercise [8]. It is important to note that while the 2014 REF will include research data as a recognised output [2], few of these repositories have been set up to take data as well [9], and those designed specifically for the purpose are not likely to be ready in time to have an impact on the exercise. This makes it particularly difficult for departments to both disseminate their data and to argue its impact to the REF panels. UCL Discovery is an example of a repository that will need to begin taking data if it is to fulfil its mandate to take all REF-submitted outputs [7].

The REWARD Project

The REWARD (Researchers using Established Workflows to Archive Research Data) project aims to address the issues outlined in section 1.1. A six-month pilot project based at the UCL Institute of Archaeology, REWARD will introduce data management planning and work with several case study research projects to prepare data for archiving. The project will explore adding the archiving of data to the current researcher workflow of publishing a paper, by integrating the UCL Discovery repository, already familiar to researchers, with the Journal of Open Archaeological Data (which will have been launched prior to the project in September 2011). Because the project uses existing workflows and systems, it is realistic to expect it to achieve its objectives within the short timeframe.

UCL Discovery is the most heavily used institutional repository in the UK [4], and is well patronised by the Institute of Archaeology with 3,012 deposits, 14.3% of the total number [10]. Researchers at the Institute are therefore familiar with depositing research articles in the repository, and it can be said to form a part of their normal research workflow. Adding data to a repository is a “natural extension” [11], and UCL Discovery is based on EPrints meaning that it requires little modification to be suitable for ingesting of datasets. Such a modification is already being trialled with reported success at the University of Southampton. In addition to being familiar and convenient for researchers, UCL Discovery is appropriate because it complements the strategic objectives of UCL Library Services, with increased realisation of the following benefits identified in the 2011 RIN and RLUK report The Value of Libraries for Research and Researchers [12]:

· A wider range of research content will be conveniently available to researchers

· An increase in the visibility of the institution and the raising of its research profile

· Enabling a closer link between the library and academic departments

In addition to its healthy use of UCL Discovery for research articles, the Institute of Archaeology has been chosen as representative field of the humanities that produces a wide range of types and sizes of datasets. There is an acknowledged risk of a ‘digital divide’ opening between the humanities and social sciences and the sciences due to a disproportionate investment in data management and infrastructure [1]. A summary overview of projects funded by the JISC MRD programme to date suggests that only around 5 out of 29, or 17% of are from the humanities or social sciences [13]. It has also been noted elsewhere that there is great potential in more linking up of arts and humanities research data with cultural heritage data [9], which will be one longer term result of this project. As such it is felt that work to improve management and archiving of research data in the humanities is of great urgency and importance.

The project will introduce structured research data management to the Institute of Archaeology by holding 2-3 workshops to introduce researchers to the processes and tools involved, and will bring in an archaeological data expert for this. All researchers in the Institute will be encouraged to participate, some of whom may already be familiar with the requirement for data management plans due to AHRC requirements. The Digital Curation Centre’s DMP Online data management planning tool will be used for this purpose, and the first workshop will involve a DCC staff member. Recommendations for any modifications to the tool will be submitted to the DCC at the end of the pilot. Researchers will be provided with mentoring, and all use of the tool will be tracked. A group of 5 research projects due to complete during the project timeframe will be chosen for in-depth case studies. Projects that have already begun will be included, but preference will be given to those that fit entirely within the 6 month timeframe. At the end of the pilot recommendations will be made both to the Institute of Archaeology, in terms of how to ensure that data management plans are produced for both funded and unfunded projects, and available for reuse. Recommendations will also be made to UCL for its institutional Data Management, Preservation and Sharing Policy.

Due to the small number of changes required, initial modification of the UCL Discovery will be fairly light, and can be achieved in the first few weeks of the project. When users from the Institute of Archaeology come to UCL Discovery, they will be able to select ‘research data’ as a category for deposit, and either deposit the full dataset plus metadata, or only the metadata if the data has already been archived elsewhere (e.g. the Archaeological Data Service, or the UCL Research Data Repository under development). This will ensure that UCL has a record of all datasets produced by its researchers, even if they are archived elsewhere, and also increase discoverability.

The DataShare project concluded that introducing data management is important, but not sufficient to create culture change [14] in research data archiving and sharing. REWARD will explore whether greater culture change can be achieved by coupling data archiving with the existing researcher workflow, as has been recommended by Heery and Anderson [8], in this case with research paper publication and the associated incentives and rewards. This pilot project does not involve the production of a data publication system. It will however make use of the Journal of Open Archaeological Data (JOAD), which is being launched by Ubiquity Press at UCL in September before REWARD has begun, and already has links to subject-specific repositories such as the Archaeology Data Service in York, and the Data Archiving and Networked Services (DANS) repository in the Netherlands. JOAD contains data papers, which are short, peer-reviewed descriptions of archaeological datasets, focusing on the methodology of their production and detailing their reuse potential, along with rich metadata. When a paper is accepted by JOAD, the author is guided to an appropriate repository to deposit the associate dataset, and UCL researchers will be sent to UCL Discovery. There they will be able to deposit the data in the same way they currently do their research papers, along with a DOI that points back to the relevant data paper. They will also be asked to provide the permanent identifier of the UCL Discovery Record to JOAD, so that the published paper is linked to the data.

The linking of the deposited data to a citable research paper, will enable the researcher to accumulate citations and recognition of the impact of their research in the way they are accustomed to. Within the 6-month timeframe this project will be able to collect feedback on how much of an incentive researchers find this to be, and in the longer term it is expected that the use of OAI and RDF by the journal will result in the datasets being much more discoverable. Recent research by Piwowar et al has estimated that papers associated with openly archived data are cited 70% more frequently [15], which may provide an even stronger incentive to deposit data via this route in the longer term. In order to further maximise this increased citation and author incentivisation, application of DataCite standards for data citation will be encouraged in the workshops.

By increasing the number of datasets archived by Institute of Archaeology Researchers, and by improving their discoverability through citation, it will be much easier to assess their impact as research outputs. It could be argued for example that a dataset with 10 citations for reuse has had a significantly greater impact than a research paper with the equivalent number. It is only by making data available in a repository and making it citable in the academic literature that this kind of impact information can be made available to a REF panel in form that they are already familiar with assessing.

[1] PARSE.Insight, 2010, Insight into Digital Preservation of Research Output in Europe: Insight Report.

[2] HEFCE, 2011, REF 2014 Assessment Framework and Guidance on Submissions [].

[3] RIN, 2011, Reinventing Research? Information Practices in the Humanities. RIN.

[4] Registry of Open Access Repositories, University of Southampton. [], accessed 24/07/11.

[5] Tenopir, C., King, D. W., Edwards, S. and Wu, L. 2009, Electronic journals and changes in scholarly article seeking and reading patterns. Aslib Proceedings: New Information Perspective. 61(1) 5-32. doi: 10.1108/00012530910932267.

[6] UCL, 2011, Delivering a Culture of Wisdom: 2011 UCL Research Strategy & Implementation Plan, Office of the UCL Vice-Provost (Research), []

[7] UCL Library Services, 2007, UCL Library Services e-strategy []

[8] Heery, R. and Anderson, S., 2005, Digital Repositories Review. UKOLN and AHDS.

[9] Lyon, L., 2007, Dealing with Data: Roles, Rights, Responsibilities and Relationships: Consultancy Report.

[10] UCL Discovery: breakdown of deposits for Institute of Archaeology. [], accessed 24/07/11.

[11] Swan, A. and Brown, S., 2008, The Skills, Role and Career Structure of Data Scientists and Curators: An Assessment of Current Practice and Future Needs: Report to the JISC.

[12] RIN and RLUK, 2011, The Value of Libraries for Research and Researchers. []

[13] JISC, 2011, Managing Research Data (JISCMRD) website. [], accessed 24/07/11.

[14] Rice, R., 2009, DISC-UK DataShare Project: Final Report. []

[15] Piwowar, H., Day, R. S. and Fridsma, D. B., 2007, Sharing Detailed Research Data is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308.

Page last modified on 27 oct 11 14:33