Data Curation

Planning what to keep (and share)

In some ways, sharing data is the natural counterpart to a much older tradition of writing up a description of a project and a summary of its outcome(s) and sharing it, e.g. via a journal publication. In fact, publishing your data in the right repository can have an analogous outcome to publishing a paper in the right outlet: People find your work, cite it, and build on it to make new discoveries. 

Preserving your data, including by publishing it in a repository, is done at the completion of a project, but it’s good to start planning earlier than that. Things to consider when developing a plan for publishing your data include:

  • What data and documentation do you need to keep?
  • What are the funder or journal requirements?
  • How long does the data need to be preserved?
  • Who is responsible for the data at the end of the project?
  • Is there sufficient documentation that anyone can use your data without your assistance, including software needed and file structures?
  • Are file formats open and sustainable?
  • What repository do you want to publish your data in?

Not sure what data to share? 

The Digital Curation Centre has both a thorough guide[1] and a checklist[2] that can help. Here are some of the considerations from the guide that can help in deciding what data to keep: 

  • Relevance to Mission: The resource content fits priorities stated in the research institution or funding body’s current strategy, including any legal requirement to retain the data beyond its immediate use.
  • Uniqueness: The extent to which the resource is the only or most complete source of the information that can be derived from it, and whether it is at risk of loss if not accepted, or may be preserved elsewhere.
  • Potential for Redistribution: The reliability, integrity, and usability of the data files can be determined; these are received in formats that meet designated technical criteria; and Intellectual Property and/or human subjects issues are addressed.
  • Non-Replicability: It would not be feasible to replicate the data/resource, or doing so would not be financially viable.
  • Documentation: The information necessary to facilitate future discovery, access, and reuse is comprehensive and accurate; including metadata on the resource’s provenance and the context of its creation and use.

In some cases, publishing your dataset may not be feasible or even possible. In this case, it’s still important to plan on preserving it. Many of the same considerations above apply here, too, and some additional things to think about are:

Most fields have professional standards for how long you need to retain your data - you should make sure you’re familiar with the best practices in your research area. 

As a minimum baseline, the US federal government requires data from projects supported by federal funds to be retained for at least 3 years after the end of the project (OMB A-110), but individual agencies may have stricter guidelines. 

There are options for Princeton researchers to store research data in a secure, long-term way -- you don’t have to rely on the shelf-life of local hardware!  

References

[1] Whyte, A. & Wilson, A. (2010). "How to Appraise and Select Research Data for Curation". DCC How-to Guides. Edinburgh: Digital Curation Centre. Available online: http://www.dcc.ac.uk/resources/how-guides
A guide from the Digital Curation Centre on appraising and selecting research data.

[2] DCC (2014). 'Five steps to decide what data to keep: a checklist for appraising research data v.1'. Edinburgh: Digital Curation Centre. Available online: http://www.dcc.ac.uk/resources/how-guides
A checklist to help determine what research data should be retained.