Documentation and metadata

Documentation and metadata are essential in ensuring your research data is findable, accessible, interoperable, and reusable (FAIR).

 

Data management plan

Rich documentation and metadata should accompany your data to help others find, understand and reuse it. Consider what information is needed for the data to be read and interpreted, how you will capture this information, and what metadata standards, if any, you plan to use.

What are documentation and metadata?

Researchers are responsible for preparing data prior to deposit and/or sharing, as set out in the Research Data Management Policy. Data must be described in sufficient detail to allow other researchers to find and understand it. Where research data has been made openly accessible, it should be accompanied by rich metadata and documentation to enable reading, reuse and interoperation by humans and machines.

Metadata is structured, machine-readable data that provides information about other data. This can include its source, format and location.

Documentation is descriptive data which provides humans with the contextual information required to interpret, understand and reuse the data.

Metadata and documentation are typically created at two different levels:

  • Project-level metadata describes what the study is about and what methodology was used; it aims to provide context for understanding why the data were collected.
  • Data-level metadata provides more granular information such as origin, data type, file formats, authorship, access conditions, and terms of use. 

Good metadata facilitates the discovery and reuse of your research data. When considering what to include, remember the FAIR Data Principles; rich metadata and documentation will ensure your data is Findable, Accessible, Interoperable, Reusable.

Documentation

You should start to document your data from the outset of your project. The type and format will depend on what data you are generating; some common examples include:

  • README file: a simple text file which provides information on the contents and structure of your data. This might include an explanation of your file naming convention and how the data are organised.
  • Data dictionary/codebook: documentation of the variables, structure, content, and layout of your data. This might include descriptions of variable names, measurement units, allowed values and codes for missing data.
  • Laboratory/research notebook: a record of research activity which documents key information about your experiments. This might include notes and observations on the data you have collected.

The UK Data Service has a good breakdown of the types of documentation you should typically include with qualitative, quantitative, and secondary sources.

Example

The authors of this Figshare dataset have provided documentation in the form of a data dictionary to clearly explain what all the variables in the data mean. They have also highlighted where data might be missing or incomplete.

Metadata

Rich metadata will help your data to be found, understood and reused. Consider what kind of metadata is required for your data to be interpreted, how you will capture it, and which standards, if any, you will use.

Using a data repository is a simple way to add high-quality metadata to your research data. Fill out all of the required metadata fields and as many optional fields as possible in the submission form. Try to include links to relevant publications, research data and related documentation.

Example

The authors in this example from the University of Salford Figshare repository have provided rich, project-level metadata: the title is a useful summary of the data being presented; the details section includes lots of information about the objective and methodology of the project. They have also included data-level metadata in the description to improve understanding.

A handful of detailed keywords have been provided to boost findability of the data and the choice of CC BY licence facilitates accessibility and reusability.

Metadata standards

A metadata standard is a common set of guidelines which define the structure and format of metadata. Standards make data more consistent, enabling integration and comparison with other datasets. They are a vital step towards interoperable data.

General metadata standards are available, such as Dublin Core and MODS, as well as discipline-specific standards such as Darwin Core for biological sciences and the Data Documentation Initiative for social and behavioural sciences.

FAIRsharing is a resource that describes and links community-driven standards, databases, repositories and data policies. You can use FAIRsharing to search over 1,700 standards across natural sciences, engineering, and humanities and social sciences. The Research Data Alliance have also created a catalogue of metadata standards which you may find helpful.