Capturing and Using Scientific Data Provenance

Barbara Lerner
Elizabeth Fong
Mount Holyoke College
Emery Boose
Aaron Ellison
Harvard Forest
Margo Seltzer
University of British Columbia
Thomas Pasquier
University of Bristol
Joe Wonsil
Carthage College
Orenna Brand
Columbia University

Papers and Presentations

Barbara Lerner, Emery Boose and Luis Perez, "Using Introspection to Collect Provenance in R", Informatics 2018, 5, 12. (Abstract, Paper)

Thomas Pasquier, Matthew Lau, Xueyuan Han, Elizabeth Fong, Barbara Lerner, Emery Boose, Merce Crosas, Aaron Ellison, and Margo Seltzer (2018). Sharing and Preserving Computational Analyses for Posterity with encapsulator. IEEE Computing in Science and Engineering (CiSE). (Abstract)

Emery R. Boose and Barbara S. Lerner, "Replication of data and metadata: a case-study of the analytic web", in There and back again: the challenge of replication in long-term biodiversity research, Ayelet Shavit and Aaron M. Ellison, eds., Yale University Press, 2017.

Barbara Lerner and Emery Boose, "RDataTracker: Collecting Provenance in an Interactive Scripting Environment", 6th USENIX Workshop on the Theory and Practice of Provenance, Cologne, Germany, June 2014. (Abstract, Paper)

Barbara Lerner and Emery Boose, "RDataTracker and DDG Explorer Capture, Visualization and Querying of Provenance from R Scripts", 5th International Provenance and Annotation Workshop (IPAW '14), Cologne, Germany, June 2014. (Paper)

Xiang Zhao, Emery R. Boose, Yuriy Brun, Barbara Staudt Lerner and Leon J. Osterweil, "Supporting Undo and Redo in Scientific Data Analysis", Workshop on the Theory and Practice of Provenance, 2013. (Abstract, Paper)

Xiang Zhao, Barbara Lerner, Leon Osterweil, Emery Boose, Aaron Ellison, "Provenance Support for Rework", 4th USENIX Workshop on the Theory and Practice of Provenance (TaPP '12), Cambridge, Massachusetts, June 2012. (Abstract) (Paper (pdf))

Sofiya Taskova, Capturing, Persisting and Querying the Provenance of Scientific Data, Honors Thesis, May 2012, Summer 2010 and 2011 REU student.

Barbara Lerner, Emery Boose, Leon Osterweil, Aaron Ellison and Lori Clarke, "Provenance and Quality Control in Sensor Networks", Environmental Information Managemet 2011 Conference, Santa Barbara, California, September 2011. (Abstract) (Paper (pdf))

Corietta L. Teshera-Sterne, A Software Engineering Approach to Scientfic Data Management, Summer 2009 REU student, Independent Study project at Mount Holyoke College, May 2010. This project was presented at the New England Undergraduate Computing Symposium

Emery R. Boose, Aaron M. Ellison, Leon J. Osterweil, Rodion Podorozhny, Lori Clarke, Alexander Wise, Julian L. Hadley, David R. Foster. 2007. "Ensuring Reliable Datasets for Environmental Models and Forecasts", Ecological Informatics, 2: 237-247. (Paper (pdf))

Aaron M. Ellison, Leon J. Osterweil, Julian L. Hadley, Alexander Wise, Emery R. Boose, Lori Clarke, David R. Foster, Allen Hanson, David Jensen, P. S. Kuzeja, Ed Riseman, Howard Schultz. 2006. "Analytic Webs Support the Synthesis of Ecological Data Sets", Ecology, 87: 1345-1358. (Paper (pdf))

Harvard Forest REU Students

Joe Wonsil, Using Provenance to Make a Better Debugger, Summer 2018.

Orenna Brand, Increasing the Use of Provenance Through a User-Friendly Debugger in R, Summer 2018.

Connor Gregorich-Trevor, Data Provenance in R and Python Across Multiple Scripts, Summer 2017.

Jen Johnson, Collecting Provenance in Python, Summer 2017.

Alex Liu, Improving RDataTracker Accessibility and Functionality, Summer 2016.

Moe Pwint Phyu, Accessible Data Provenance with Debugging Feature in R, Summer 2016.

Marios Dardas, Searching Data Provenance, Summer 2015.

Lia Poulos, Providing Context for Provenance, Summer 2015.

Luis Perez. Fixing Science: Accessible and Efficient Data Provenance in the R Scripting Environment, Summer 2014.

Nikki Hoffler. The Aesthetics of Data Derivation, Summer 2014.

Shay Adams, Capturing Data Provenance from R Script Execution, Summer 2013 REU student

Vasco Carinhas. The Data's Story Made Accessible, Summer 2013.

Miruna Oprescu, Visualization Tools for Digital Dataset Derivation Graphs, Summer 2012 REU student

Yujia Zhou. Quality Control of Raw Data and Data Provenance Tracking, Summer 2012.

Snickers, The Blog of an Ecologist Dog, Summer 2012 REU mascot