This guide provides advice and guidance on how to effectively preserve and share research software.
Software and code play an increasingly important role in research; whether you have written a small script for a particular paper, or are developing larger software projects, it is useful to think about how to most effectively preserve software.
Preserving software will help others verify your results and reuse or adapt your code. It is usual for software to be updated regularly, either through the release of a new version or on a rolling basis. These releases, which may add new features, fix bugs or address security issues, will necessarily change the behaviour of the software; so, in a research context, it should be possible for anyone to verify or reproduce results using the specific version of the software that was used for the original analysis. Preserving the version of the software or code used to generate results is therefore desirable. True reproducibility can be very difficult to achieve: code may rely on multiple external libraries, or code may have run on specialist platforms which are not available to others.
How to preserve?
It is difficult to perfectly preserve software but there are a number of steps you can take to ensure that code and software are more likely to be available for verification, reuse and adaptation.
A software management plan has similar aims to a data management plan. If software is going to form a major part of your research it might be necessary to discuss how you will manage this in a data management or outputs management plan. Even if it is not a funder requirement it can still be beneficial to develop a plan. A software management plan will help to anticipate potential issues that can occur in the development of research software.
The Software Sustainability Institute provide further guidance on writing and using a software management plan. You can find a template Software Management Plan on DMPOnline by selecting the Software Sustainability Institute as your organisation, choosing 'no funder', and then choosing either the full or minimal software management plan.
Documenting your software includes writing useful comments in the code, documentation for users and/or tutorials on using your software. Good documentation can make it easier for external users to make use of your software, but also provides benefits for the team developing the software. It is not necessary to take an all- or- nothing approach to documentation. You may begin by providing a quick start guide that explains how to begin using the software and develop this documentation over time. You should also state what your software aims to do - a repository with no explanation of the purpose of the software is unlikely to be reused by other researchers.
Documentation of your application programming interface (API) will help ensure people are able to use your software for their research. This can be supported by code vignettes which demonstrate how software can be used.
Whilst developing extensive documentation can be time consuming the process can be made easier by using tools such as GitHub Pages to host documentation, and Continuous Integration tools can be used to build automatic API documentation. Advanced Research Computing have further information on the use of Continous Integration services.
You can find further guidance on documentation as part of UCL's Research Software Development Group's Research Software Engineering course. The Software Sustainability Institute also have tips on best practices for documentation.
- Licencing your software
A software licence tells other people under what terms they may use your software. It is important to assign a licence to software you share so that is clear to others if and how they may use that software. With no licence, copyright law implies that all rights are reserved, so by default the software cannot be used in any way. Before assigning a licence, you should clarify whether you own the copyright of the software, and if you have reused existing code as part of your software you should ensure licences are compatible. Further guidance on licensing software is available.
It is important to give credit to other researchers for software they have developed which you have used in your research. You should cite software you used in your publications. Traditionally this has been done by citing a paper describing the software but increasingly publishers are allowing researchers to directly cite software. You can find further guidance on how to cite and describe software and discussion of the best approaches.
It is good practice to cite software and libraries you have reused in your code. This is not always easy when you have a large number of dependencies, but listing your dependencies should provide a starting point for citing other software and libraries you have used.
You should also ensure that it is possible for other people to cite your code/software. GitHub and other version control platforms are increasingly offering tools for making your software easier to cite, including assigning DOIs to your code. You can find further guidance from DataCite on software citation workflows.
Preserving software in a GitHub repository
It is a good practice to use version control when developing code and GitHub is a popular choice for Open Source software projects. Whilst GitHub is a good place to develop and share software it does not provide persistent, long term preservation. You should therefore use version control for collaboration and archival storage for long-term software preservation. You can easily preserve software on GitHub further by using Zenodo. This will ensure you can share a particular release of your software, create a DOI for your software and that your software will be available in the future.
Zenodo is a digital repository funded by the EU OpenAIRE project and managed by CERN. Zenodo has developed an integration with GitHub that will automatically download a zip archive of each new release and register a DOI. Using releases on GitHub alongside Zenodo will not only help preserve your software but also allow you to capture a particular stage of development if software will continue to be developed. This means you could create a release on GitHub at the time of publishing a paper and link to the DOI for that version of the software in your paper. This will then allow you to continue to release new versions of your software in GitHub, whilst also maintaining persistent access to earlier versions of the software you used to produce the results in your publications.
Zenodo is free to use up to a storage limit of 50GB per dataset which is likely to cover the majority of software preservation requirements. You can find guidance on archiving a GitHub repository using Zenodo.
If you use another version control system you could use a similar workflow to deposit data into Zenodo or another research repository that accepts software/code.
Further research software resources
Further support on best practices for research software can be found through the following organisations.
- The Software Sustainability Institute provides a range of support to help improve the quality and recognition of research software.
- Software Carpentry aims 'to help researchers become more productive by teaching them basic computing skills like program design, version control, testing, and task automation.' Sessions are regularly organised at UCL.
- The UCL Research Software Development Group is a team of professional software developers with particular expertise in creating software for academic research. Their goal is to enhance UCL's capacity to produce high quality scientific software by collaborating with researchers to create readable, reliable and efficient code. They provide a range of services including programming work and training.
- UCL Research IT Services provide a range of services to UCL researchers including Research Computing and Research Data storage.