Frequently asked questions

What kind of data does CAVA accept?

The CAVA repository is aimed at researchers looking at human communication and interaction. This may include any rights-cleared primary audio and video recordings, especially raw data featuring use of natural language. If you think your data might be suitable for inclusion in the repository, please contact the CAVA Project Officer.

How do I get access to the data in the repository?

To access CAVA you need to apply for a user licence by contacting the Project Officer. The CAVA team will issue you with a login that will give you access to all the data that does not have a further access restriction.

What will I need to do to submit data to the repository?

The most important thing we require, aside from the data itself, is information that makes it searchable. You will need to complete the metadata spreadsheet (a user guide can be found in the documents), and sign a depositor's licence, giving us permission to store and manage the data. This also confirms to us that the data you provide has appropriate permission to be used in the repository. Once the form and the licence are complete, the CAVA team will begin uploading your data to the repository. If you have data that you are interested in storing with CAVA, please contact the Project Officer.

Will there be a cost for storing the data?

There is currently no charge for the services that CAVA offers. At present CAVA is accepting data which will be made available through the repository, and will not accept preservation-only versions of recordings. However, the team are investigating the possibility of long-term offline storage of preservation-quality data.

What formats does CAVA accept?

You can see the preferred file formats in the report. However, the repository is designed to be flexible. If you have a query about a particular format you would prefer to use, please contact the Project Officer.

Can I use data from CAVA for teaching purposes?

All the data that CAVA accepts has appropriate permission for reasonable use in teaching and research. Do not use CAVA data in a situation where you would not use data you collected yourself. If you have a specific question about the use of a particular dataset, please contact the researcher who submitted it (found in the metadata record under 'Project Contact'), or alternatively contact the Project Officer.

Can students access the repository?

Due to the consent arrangements for data in the CAVA repository, MSc students must discuss their need for membership with the CAVA team. If you wish to use data from the archive in an MSc project, please contact Dr Suzanne Beeke (if your research focuses on adult subjects) or Dr Merle Mahon (children), with details of your project and supervisor.

Researchers (including PhDs) and staff should contact to request membership.

Where can I find a form of words for consent to store and use my data?

Most recent consent forms provide the necessary permission to archive and store data. The CAVA team has produced pro forma permissions forms, available in documents.

How should I cite and acknowledge CAVA data?

All users of our data must acknowledge and cite data sources correctly in any publications and outputs. Details of how to cite data can be derived from the catalogue records.

An acknowledgement is a general statement giving credit to the source and distributor and includes copyright information. It can be given at the start of, or within, the text, or at the end of the article before the bibliographic references/citations. You can find this information (e.g. depositor, sponsor) in the metadata records. Please include:

  • The project ID and title
  • The sponsor, funder or owner (if different from the institution at which the research was conducted)
  • the name of the CAVA repository and its web address (
  • Copyright information

A suggested format for acknowledging data, using the example of research based on the TSA2007/05 (The evaluation of a novel conversation-focused therapy for agrammatism) project, is:

"The data in this article was collected for the following project: Evaluation of a novel conversation-focused therapy for agrammatism (TSA2007/05), 2010, at University College London, funded by the Stroke Association, and supplied by the CAVA repository ( The data are copyright."

A citation is more formal than an acknowledgement. It follows a standard format and should include enough information so that the exact version of the data being cited can be located. New standards for citing data are emerging, and will depend on the preferred style of the journal or publisher. For further information, refer to the UKDA's guidance.