Data Provenance


The term “data provenance”, sometimes called “data lineage,” refers to a documented trail that accounts for the origin of a piece of data and where it has moved from to where it is presently. The purpose of data provenance is to tell researchers the origin, changes to, and details supporting the confidence or validity of research data. The concept of provenance guarantees that data creators are transparent about their work and where it came from and provides a chain of information where data can be tracked as researchers use other researchers’ data and adapt it for their own purposes.


A molecular biologist uses data that is derived from public databases, some of which are derived from academic papers and from experimental observations. A provenance record will keep this history for each piece of data, including where it came from, who originally collected it, and what modifications or transformations have been done to the data.

Further Resources

Werder, K., Ramesh, B., & Zhang, R. (Sophia). (2022). Establishing Data Provenance for Responsible Artificial Intelligence Systems. ACM Transactions on Management Information Systems, 13(2), 22:1-22:23.

Mayernik, M. S., DiLauro, T., Duerr, R., Metsger, E., Thessen, A. E., & Choudhury, G. S. (2013). Data Conservancy Provenance, Context, and Lineage Services: Key Components for Data Preservation and Curation. Data Science Journal, 12, 158–171. DOI:

Viglas, S. D. (2013). Data Provenance and Trust. Data Science Journal, 12, GRDI58–GRDI64. DOI:

