XClose

Information Services Division

Home
Menu

Research Data Storage FAQs

At present, you must be on the UCL network in order to register or manage a project. If you are not physically connected to the network or wirelessly using Eduroam, please connect to the UCL VPN. This restriction will be removed in the near future.

Registering a project

How do I register a project and what information do I need to provide?

Only the Principal Investigator for the project can register.

Please visit the administration interface and sign in using your UCL credentials:

Research Data Storage Administration Interface

Click “New project” to open the application form.

You will be asked to provide the following information:

  • One or more project administrators: an appointed person who you authorise to act as a point of contact for us to make decisions about adding or removing group members and other decisions about the storage.
  • A list of the project members (other than yourself and the optional administrators).
  • Start and end dates for your project.
  • Volume of data that you expect to have in terabytes. This is an upper limit. If you expect to use less than 1TB, enter 1.
  • If you ask for more than 5TB you will be required to submit further information regarding your intended usage. This will help us to consider how best to cater for your needs. Note that we can’t guarantee to accommodate very large storage requests.
  • Project title: We suggest choosing something that is likely to be unique to your project. A title such as “neuroscience” may not adequately distinguish your project from others.
  • Description: A brief description (approximately the length of an abstract for a paper and which may match with what was used in the grant application).
  • Grants: optionally, enter one or more grants that fund the project. Enter each grant on a new line.
  • Agreement for you and your project team to abide by the conditions of use.
What is a project in the context of RDS?

A project (from the Research Data Services perspective) is a body of work requiring data storage that has a start date, an end date, a title, a description and one or more people that are granted access to it. A project will often relate to a particular grant application, but ‘unfunded’ research projects can use the service.

Who can register to use the service?

You need to be a current member of UCL staff and the Principal Investigator (PI) of the project for which the data is associated. The PI will be the person who applied for the grant to fund the project or, in the case of ‘unfunded’ research, the person leading the research project. 

Why do you require that the project PI completes the registration form?
  • Though there are usually several people involved in one project, it is useful to have one person as a single point of contact for the group who doesn’t change or disappear at short notice. PIs move around to other institutions less frequently than other members of staff and students. 
  • The storage at the time of writing is free to users. However, there is a considerable cost in terms of equipment and time for RDS. It is useful to have someone in a position of responsibility to be accountable for its usage.
  • At present we don’t charge for the service, but we are interested in associating a grant application with the storage that we offer in order to assist the documentation of the data throughout the research life cycle.
Why do you require an end date for a project?

We are trying to guard against the following situations:

  • Important research data being forgotten about after the conclusion of a project, eventually becoming ‘orphaned’ as researchers move jobs or retire.
  • Unwanted data being left on the service, taking up space that could be better used by others.
  • Data not being moved to more appropriate long-term archive and repository services. Many research funders now require this, and it is useful to have a reminder before a project concludes that it’s time to archive the data.

By applying an end date to a project, it doesn’t mean that research on that topic should come to an end. However, it encourages our users to tidy-up/organise and annotate their data. When decisions have been made about what could reasonably be useful to themselves or to others, this data can be publicly archived and the rest discarded.

Why do I need to provide the title and description of the project?

Knowing what our users are working on helps us to make decisions about which storage technologies we should be investing in for the future. Occasionally a project description can alert us to special considerations that we may be able to advise on.

As part of the normal Research Data Repository service this information may be harvested into the metadata for the archive, saving you the effort of having to enter the same data again.

Do I need to justify my usage of the service in the description?

Provided that your project falls within our guidelines, you will be allocated space on our live storage service. It is not necessary to qualify the value of your research to us, it’s not like applying for a grant!

Can I register more than one project?

Yes, but it should be clear that the purpose is different. There should be a distinct title and description.

I’m the PI, but wish to nominate someone else to manage the account, is that possible?

Yes, we have the concept of an administrator. You can nominate one or more administrators during the registration process, and can add or remove administrators at any time using the administration interface.

An administrator can add / remove members and make other changes to the project. Like the Principal Investigator, they can request changes to storage volume and project end date.

Can I store sensitive data in the RDS?

The RDS takes data security seriously and it should not be possible for unauthorised users to access your project folder. The hardware we use is kept in secure server rooms, and we apply strict user access controls to ensure only those authorised to do so can access the data hosted by the service. That said, we are not at present certified to the ISO 27001 standard nor authorised to host restricted NHS data. We therefore prohibit the use of the Research Data Storage Service for hosting unencrypted personally identifiable information or anything else that falls under the GDPR legilation of the UK Data Protection Act 2018.

As part of our terms of use, the Principal Investigator assumes the responsibility of ensuring that data uploaded to the service is not in breach of the Data Protection Act 1998 or other relevant legal and contractual agreements, including 3rd party agreements.

UCL Researchers working with sensitive data should consider using the Data Safe Haven, which provides a highly-secure ISO 27001-certified facility for data storage and processing.

UCL provides guidance on research data anonymisation in its web pages on Information Governance.

How much space can I have, is there a limit?

We don’t have a specific project size limit per se. So far we have been accepting storage requests up to 5 TB for all projects. For projects requesting larger quotas we will make decisions based on a balance of project size vs the remaining capacity on our storage system.

If your project has the budget to pay for storage beyond 5 TB, we will be happy to discuss this with you.

Do I have to pay?

There is no charge at present for this service, although if you require a large allocation (>5 TB) and have the budget, then we would like to discuss whether a contribution to the cost of the storage can be made.

We plan to introduce a more formal charging model in future; however, this will not be applied retrospectively to existing projects.

It is likely that the future service charge will be based on the underlying cost of the storage hardware (including the second copy), which is currently around £50/TB/year. We can’t yet make promises about what a future pricing policy will be like though. In the meantime, if you have storage needs in excess of our standard allocation, please get in touch as we’d like to have a conversation about it.

What happens when the project time limit runs out?

At the moment we don’t do anything when your project time runs out. This will probably remain the status quo until our Research Data Repository service is in operation. The idea is that when your project comes to an end, your work can be organised, curated and packaged up to be put in a publicly facing environment. The researchers and PI would of course be the ones to cause this to happen (we would not try to publish your work without permission). When we do start enforcing the end of a project there will be a succession of events to close it:

  1. We will give notice by email that the project is coming to an end
  2. Access will be blocked on or shortly after the end date (not irrevocably so)
  3. We can’t keep everything forever, so we will eventually delete the content of the project. This will be a year after we block access.
Why are there two different types of storage?

The Research Data Storage Service provides two different storage facilities: the GPFS facility consists of ‘block’ storage, which is useful for projects wishing to use the UCL high-performance computing services, as well as general storage for ‘active’ data; the iRODS/WOS facility is based on ‘object’ storage, which allows the addition of project-specific metadata that was formerly not so straightforward with GPFS.

We expanded and upgraded our GPFS storage facility in November 2017, partly in response to repeated performance issues with the WOS facility, and this is now the default storage option for all new projects. The new GPFS facility can be mounted as a local drive, provides better metadata support, and offers optional functionality such as snapshotting. We are in the process of migrating users of the iRODS/WOS facility to the new storage, although this will take some time. We no longer recommend that projects are created on the iRODS/WOS facility unless particular data management workflow issues are identified that might benefit from its specific characteristics.

Can I choose to be put on the block storage system or object storage?

We would strongly recommend that new projects use the default GPFS storage facilities. This offers simpler access to data, it’s quick, and it has proven to be more reliable. If you think that the iRODS/WOS facility would be more appropriate for your particular data processing workflow, please get in touch with us at researchdata-support@ucl.ac.uk so that we can go through the pros and cons of each option.

I would like to transfer multiple projects from my department to RDS; what should I do?

If you have responsibility for supporting the research data storage needs of multiple projects in your department, and would like to talk to us about transferring them to RDS, please get in contact with us at researchdata-support@ucl.ac.uk. We will arrange to meet up and discuss this in person. 

Can an undergraduate/postgraduate student register a project with RDS?

No, your PI must register the project.

Can someone from outside of UCL be the PI for a project with RDS?

No 

Access and permissions

Can non-UCL collaborators be given access?

Our live storage facilities currently require UCL login credentials.

If external collaborators are given UCL credentials due to honorary researcher status then we can add them to the list of users with access to your project in the usual way.

Requests for honorary researcher status are handled by academic departments, but usually involve making an inquiry to your head of department. Many departmental websites have a form that you can complete for this.

Can an undergraduate/postgraduate student have read/write access to a project?

Yes, if the PI nominates them as members. 

Will anyone else be able to access my data?

In principle RDS and on occasion our hardware vendors can access your data. In practice we only look in your project directories if you ask us to or to investigate a problem with the operation of the service. 

To what extent can I control access for different members of my group?

There isn’t much that a PI can do that is privileged compared to the other members of his project group, except for choosing the project members. Also we cannot assign read only access to certain members at present. Our how-to guide explains the options that are available: 

RDS: How to control access for different members of a project

Can I host content from your service so that the public can access it?

Not for our Research Data Storage service, which is currently the only service we offer. Our planned Research Data Repository service will allow datasets to be published alongside descriptive metadata and allow the assignment of Digital Object Identifiers (DOIs). The service is due to be launched in January 2019. See our research data repository overview for further information.

Can I access our storage like a drive on my computer? (i.e. as a mounted drive)

In short, yes, but there are a few caveats and considerations involved. To view the storage as though it is a mounted drive you will either need to use third-party software, or have a degree of technical knowledge. See the how-to guide:

How to mount RDS on your computer

Can I access the Research Data Service via Desktop @ UCL Anywhere?

If you are using our block store (GPFS) service then you can use WinSCP, which is already installed. If you are using the iRODS service then we recommend using CyberDuck. Cyberduck is scheduled to be available via Desktop @ UCL by the end of 2016.

How do I access my data from Legion?
How do I access my data from outside of UCL?
What programs can I use to access this service?

Block storage (GPFS) service

A list of recommended programs can be found in our access guide

The underlying access mechanisms that we use for our service are SSH, SCP and SFTP. These are ubiquitous protocols and there are many client programs that can connect using them. Please see the bottom of our storage access guide for a list of some programs that can be used.

iRODS service

For Windows and Mac users we recommend ‘Cyberduck’ as a graphical interface to iRODS. See the iRODS access guide for further information.

iRODS can also be accessed through a Command Line Interface (CLI) using iCommands. These are available in binary form for a few flavours of Linux and the source code is available to be compiled for other operating systems:

Support and troubleshooting

Which email address should I use for contacting Research Data Services?

Please email researchdata-support@ucl.ac.uk from your UCL email address to ensure your message reaches us.

Can we talk to you face to face?

You can come along and discuss things or get help with setting up your connection to our service at the regular research IT and data management drop-in sessions. Feel free to drop by with any questions about our services or for general advice regarding research data management.

Alternatively, if you would prefer a house-call, then you can email us at researchdata-support@ucl.ac.uk

What operating systems and software do you support?

Our service isn’t OS dependant, though within the RDS team we have experience in using Windows, OS X and Linux-based systems. If you wish to use our service with a GUI, the best support is for Windows and OS X, and if you want to use it from a command line, the best support is in Linux.

How do I check my usage and quota?

The easiest way is to log in to the administration interface, find the project in the ‘My Projects’ list and click the title. This will take you to the project details page where you can see details of storage type, usage and quota.

Our how-to guide explains other options for checking your usage and quota:

How to check your usage and quota on RDS

How do I add/remove members of my project?

We are currently developing a web interface that will allow project PIs and administrators to add and remove project members themselves.

In the meantime, the PI or administrator of the project should send an email to researchdata-support@ucl.ac.uk. He/she should ideally include the registration email they were sent, but we at least need the group name (rd00##). For the members you would like to add or remove, can you please specify the:

  • name
  • user name
  • email address
Can I rename my project?

Please don’t attempt this! The way the system manages and registers data depends on the project name remaining the same as it was when it was created. We are looking into ways to allow a project to be renamed without causing confusion. 

Can I change the end date for my project once I've started?

Log in to the administration interface, find the project in the ‘My projects’ list and click the Edit button. Enter a new End Date and click ‘Submit update’. Your request will be sent to researchdata-support@ucl.ac.uk and you will be contacted to discuss your project’s requirements.

Is my data encrypted?

We do not currently encrypt the data on our storage systems. Data is transmitted between our datacentres along optical fibres unencrypted, though these are difficult to intercept without datacentre access.

Encryption over SSH connections to our GPFS facility are enabled by default, though you may wish to select a weaker but faster algorithm if the transfers are too slow (e.g. add "-c arcfour" or "-c none" as an argument in your scp connection command, the latter turns off encryption for data transfers).

With the exception of authentication, connections to our iRODS facility are not encrypted. Encrypted data transfer has been disabled for performance reasons.

 

Is my data backed up?

The RDS has two types of live data store; a GPFS based system which is accessed via SSH or SFTP and an object storage system that uses iRODS for administration and access. Neither of these systems is backed up in the sense of having a separate tape archive where periodic updates are made. However, both GPFS and object store use redundancy to safeguard your data.

The GPFS system uses two replicas of your data at different locations at UCL. The object store system works by dividing the storage hardware up into three zones such that any one zone can be lost or unreachable while still being able to retrieve your data. Additionally, both of these systems use erasure encoding to allow for the failure of individual hard drives within zones; in the case of GPFS, the erasure encoding is RAID6 and in the object store, a proprietary form of erasure encoding is used.

A conventional tape based backup system is being developed for the GPFS storage system to allow for the recovery of previous versions of files and will provide an even greater level of resiliency in the event of a large scale disaster.

Do you make the kind of backups that allow me to recover a previous version of a file in case I delete or modify it by mistake?

Not at the moment.

What kind of data transfer speeds am I likely to get?

It isn’t possible to accurately predict the transfer speeds that you’ll get using our service as it depends on a number of factors that include:

  • current loading on our service
  • underlying speed of the network where you are (local network)
  • current loading on your local network
  • the protocol used (SFTP, SCP or iRODS)
  • the programs you use for file transfer
  • the cryptographic cipher used (an option on some programs)
  • whether you are transferring lots of small files or a smaller number of larger files (each new file transfer entails an overhead, hence one large archive file will transfer quicker than many small files of the equivalent size in kb)

We have seen transfer rates from a few MB/s to a few 10s of MB/s (megabytes per second) on the block storage facility and up to about 80MB/s using iRODS.

Is there anything I can do to improve the speed of data transfer?

Some suggestions for improving data transfer speed are:

  • Changing the time that the data transfer takes place – local network loading or loading of our system will vary over time. Work hours are likely to be busier than out of hours.
  • If it is practical, you could try connecting to the service from other locations around the university.
  • We have seen that some researchers are on 100Mb/s networks (about 10 megabytes per second), which is rather slow by modern standards. You may be able to encourage your local IT support to upgrade you to a gigabit (1 Gb/s) network.
  • If you are transferring a large number of small files, it may be better to compress them together into a smaller number of archive files such as zip, 7z or tar before transferring the data.
Why do you have separate home and project space directories on the block storage (GPFS) service?

Some people are members of multiple projects. When someone logs in, unless the starting directory is specified, they need to be placed somewhere in the directory tree that relates to the individual, so that they can decide which project they wish to interact with. We decided to opt for the standard where everybody has a small bit of personal space on the system.

The iRODS facility operates differently and puts you in the directory (collection) behind all of the project directories. You only have visibility of projects that you have access to.

I've written about 500MB to my GPFS storage and I'm now getting messages saying no further writing is possible; what's going on?

By default when you first log in, you will be in your home area. The home area only has a small quota set on it as it’s not intended to be used for storing project data. In the registration email that you were sent there should be a location for your shared project directory. This will be found under 

/mnt/gpfs/UCL/

or

/mnt/gpfs/live/

All new projects are being placed under ‘live’.

How will I know what kind of storage my project is on?

It should say in the email you received when you joined our service. Possibly we have moved your project, but this will have been discussed with you in subsequent emails.

Do you offer CIFS/NFS exports of the storage?

Not as a formal part of our service. If you believe it is essential for your needs then please get in contact with us at researchdata-support@ucl.ac.uk and we can discuss this.

As a guide to terminology, CIFS (Common Internet File System) is the technology that Windows uses to connect network drives/folder shares. NFS (Network File System) serves a similar purpose but is aimed more towards Unix-like operating systems such as Linux.

Do you offer block level access to the storage?

No 

What is SSH?

SSH is short for Secure Shell. It provides a way to connect two computers together using command line instructions. This is great if you are comfortable using Linux and want to make changes to your data remotely. If not then you will probably be more comfortable using one of the GUI type applications; e.g., WinSCP or Cyberduck. 

Cyberduck crashes or locks up when attempting to transfer files

Sometimes the Cyberduck team makes an update to their software, which causes it to break when using pre-existing profiles* set up for iRODS. Reinstalling by itself may fix this, but certain configuration files are not replaced when you do this. The following is a solution for Windows based systems (Macintosh users may find a similar approach works):

  1. Uninstall Cyberduck using windows add/remove programs
  2. Navigate to %AppData%\Cyberduck in Windows Explorer†
  3. Remove the contents of the folders: "Bookmarks", "Profiles" and "Sessions"‡
  4. Reinstall a fresh version of Cyberduck from its website
  5. Download a fresh copy of the UCL-RDS-iRODS.cyberduckprofile file and run it. Note that this file may have changed since the copy you previously downloaded.
  6. Follow the instructions on our access page

* Cyberduck supports a number of protocols, which in most cases have a single "profile" corresponding to each (these are selected at the very top of the settings for a Bookmark. In order to use the iRODS protocol, users have to run a profile file that RDS have created and which contains certain bits of connection information. This will add to the list of profiles that you can connect to. Neither re-running the cyberduck profile file or resintalling cyberduck will replace the profile file in the Cyberduck configuration.

† The '%' characters evaluate a built in environmental variable and "%AppData%" may resolve to "C:\Users\<user name>\AppData\Roaming\". The '%' characters should be included when you copy and paste into your file browser.

‡ You probably only really need to remove "Profiles". If you use Cyberduck for other things, you may wish to be more careful about which files you remove.

How do I know whether my data is sufficiently anonymised?

UCL provides guidance on research data anonymisation in its web pages on Information Governance.

How do I view my project details?

Log in to the administration interface, find the project in the ‘My Projects’ list and click the title.

How do I change my project details?

Log in to the administration interface, find the project in the ‘My projects’ list and click the  Edit button. Most changes will be immediately executed, but changes to storage volume and project end date will be sent to researchdata-support@ucl.ac.uk for approval.

How do I make someone else the PI?

Please send an email to researchdata-support@ucl.ac.uk with the details of the new PI and the reason why they should be assigned the role. They will need to be a current member of UCL staff.

State whether you want to be removed from the project altogether or remain as a member or administrator.

How do I apply for more storage space or an extension to the end date?

Log in to the administration interface, find the project in the ‘My projects’ list and click the  Edit button. Enter a new End Date and / or volume and click ‘Submit update’. Your request will be sent to researchdata-support@ucl.ac.uk and you will be contacted to discuss your project’s requirements.

Other data management services

Is there somebody who can help me write a data management plan?

One to one support and guidance for data management planning is available from UCL Library Services. See their website for details: Research Data Management

Where can I store my data when the project comes to an end?

In the first instance, nothing will be removed without notifying you and giving you fair warning. If we haven’t heard from you, we will move your data to a medium term repository facility from which it can be recovered on request.

We are actively developing a long term repository facility which will allow you to store and cite your datasets for public access.

Alternative data repositories may also be available to you. See the Research Data Management pages on the UCL Library Service website for guidance.