Selection and retention

Discover how to effectively appraise your research data in order to decide what should be retained from your project.

 

Data management plan

It is important to appraise the data that you have collected to determine what should be retained and what should be destroyed. Consider how your data might be reused, any obligations to retain data, and the resources required to prepare your data for sharing and preservation.

It is not always practical or advisable to keep all the research data that you collect or generate during your project. There are economic and environmental costs to long-term data storage, as well as staff resources required for ongoing stewardship of the data.

Data appraisal

Researchers are responsible for selecting which research data to retain, and potentially share, by considering factors such as reproducibility, legal or funder requirements, and potential long-term value of the data.

Data underpinning research findings

Research data that underpins a research publication or postgraduate thesis should be kept so that others can reproduce or verify your findings, and potentially build upon your research. You should also retain any software that you created to generate, process or analyse the data as this will be needed to validate your results.

It is more important to retain the raw data and the record of how it was processed or transformed than any ‘intermediate’ data. The processed data in its final format may also be useful to retain.

Policy requirements

Many funders, and increasingly publishers, require data to be shared as part of their research data policy. Be aware of what data you will need to keep in order to comply with these requirements. More information is available from the funder requirements page.

Data with long-term value

Research data with acknowledged long-term value should be preserved and remain accessible for future research. Consider the broad appeal of the data and whether the dataset would be especially difficult or costly to reproduce. Extremely large simulation datasets, for example, may be less valuable to preserve than the code and input parameters, through which the set of results can be reproduced.

You should also review the quality of the documentation and metadata accompanying the data – is there enough information to contextualise the data to enable it to be discovered, understood and reused?

Data preservation costs

There are costs and resources to preparing data for archiving, as well as for storage and curation beyond the lifespan of the project. Funders, such as UKRI, the European Commission and the Wellcome Trust, ask that you include costs relating to data management activities in your funding proposal. For extremely large datasets there may be cases in which it is not cost-effective to preserve the research data.

Further information

The DCC have an excellent resource, ‘Five steps to decide what data to keep’, to help with the appraisal process.

Disposal of data

You must securely dispose of research data identified for deletion with particular concern for the sensitivity of the data; for example, where data has been anonymised, the un-anonymised data may need to be destroyed.

When deciding which data to dispose of, you should be aware of any data retention policies in place. Disposal of research data should be carried out in accordance with legal, contractual, regulatory, or ethical requirements and the University’s Information Security Policy.