What’s a DOI and what should I know about citing datasets?
Data citation is analogous to the citation of any other published work. Cite the dataset (and software) supporting your research using a standard citation in the reference section of your published work. This gives credit to the data authors and make research datasets findable and accessible.
Most journals now provide standards for how to cite datasets (e.g. and most data repositories automatically generate a citation when a dataset is published, which includes the data authors, the repository where the data are archived, and a persistent identifier, most often a digital object identifier (DOI).
A DOI is a unique string of numbers, letters, and symbols assigned by a central non-profit registration agency to provide a persistent link to the location of content (such as a paper or dataset) on the Internet. DOIs are standardized by the International Organization for Standardization (ISO) and are used to unambiguously identify (and access) published content, usually by resolving to a URL.
You have likely seen DOIs in the references of journal articles, probably in the form of a url, e.g. https://doi.org/10.1000/182, though they can also be found in the form of “doi:” followed by the alphanumeric identifier, e.g. doi:10.1000/182
DOIs are a widely, internationally adopted way to provide persistent, unambiguous links to content online and should be included when citing datasets.
The Joint Declaration of Data Citation Principles provides general guidance on why data citation is important and how it is defined.
You can also find more information, including specific examples of how to cite datasets, here: https://libguides.princeton.edu/citingdata