Planning how you'll manage your data

There are a number of things that you can do at the beginning of a project that will help with managing your data later; you should have a plan for how you’ll manage not only the data but documentation about the data that are used or generated as part of your project.

Questions to answer now that will help with your data later

  • What type and format of data will be collected in this project? 
  • If human or non-human vertebrate animals are involved, what ethical issues need to be addressed? What specific steps will be taken to address these issues, including privacy and confidentiality (for human subjects)? 
  • What types of documentation and metadata standards will I use to describe the data?
  • Who will (re)use this data? What are they likely to use it for? What information will they need to have for the data to be useful?
  • What access restrictions will be placed on the data?
  • Where will this data be stored at the end of the project?
  • Are there any costs for this project for documenting the data, formatting them, storing them, cleaning and anonymizing them, and archiving them?

It’s good to think about data and documentation in the broadest terms, including

  • Quantitative and qualitative data
  • Primary (raw) and secondary (cleaned or analyzed) data
  • Notes
  • Laboratory and/or research notebooks
  • Codebooks
  • Code or software used to run data analyses
  • The computational environment
  • Data workflows and/or pipelines
  • Metadata (documentation describing the data)

It’s important to have clear documentation for the processes involved with your data - the more you document upfront and as you go, the easier working with the data later will be, including writing these details up later for publication and progress reports. Ideally, you’ll have sufficient information for others to be able to use your data and understand or replicate your work.

Examples for Research Project Documentation

  • Rationale and context for data collection
  • Data collection methods
  • Data sources used
  • Structure and organization of data files
  • Processes for data validation and quality assurance
  • Analytical steps and processes used to process data
  • Information on data confidentiality, access, and use conditions

Examples for Dataset Documentation

  • Variable names and descriptions
  • Explanation of codes or other classification schemes
  • Algorithms used to transform data (including code)
  • File format (including version) for any software used