Data Provenance

Definition

The term “data provenance”, sometimes called “data lineage,” refers to a documented trail that accounts for the origin of a piece of data and where it has moved from to where it is presently. The purpose of data provenance is to tell researchers the origin, changes to, and details supporting the confidence or validity of research data. The concept of provenance guarantees that data creators are transparent about their work and where it came from and provides a chain of information where data can be tracked as researchers use other researchers’ data and adapt it for their own purposes.

Examples

A molecular biologist uses data that is derived from public databases, some of which are derived from academic papers and from experimental observations. A provenance record will keep this history for each piece of data, including where it came from, who originally collected it, and what modifications or transformations have been done to the data.

Further Resources

Werder, K., Ramesh, B., & Zhang, R. (Sophia). (2022). Establishing Data Provenance for Responsible Artificial Intelligence Systems. ACM Transactions on Management Information Systems, 13(2), 22:1-22:23. https://doi.org/10.1145/3503488

Mayernik, M. S., DiLauro, T., Duerr, R., Metsger, E., Thessen, A. E., & Choudhury, G. S. (2013). Data Conservancy Provenance, Context, and Lineage Services: Key Components for Data Preservation and Curation. Data Science Journal, 12, 158–171. DOI: http://doi.org/10.2481/dsj.12-039

Viglas, S. D. (2013). Data Provenance and Trust. Data Science Journal, 12, GRDI58–GRDI64. DOI: http://doi.org/10.2481/dsj.GRDI-010

Search for a Term

Send us your feedback or suggestions for new terms

Contact information
CAPTCHA This question is to prevent spam submissions. Contact nwso@hshsl.umaryland.edu for any accessibility issues.
9 + 8 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.