Barbara Lerner Elizabeth Fong Mount Holyoke College |
Emery Boose Aaron Ellison Harvard Forest |
Margo Seltzer University of British Columbia |
Thomas Pasquier University of Bristol |
Joe Wonsil Carthage College |
Orenna Brand Columbia University |
RDataTracker is an R package that contains functions used to collect data provenance during an R console session or while executing an R script. To use RDataTracker, the user can record a console session or run a script contained in a file. In addition to its normal operation, using RDataTracker to execute a script will create a JSON file containing the provenance of that script execution. It will also have stored the intermediate values calculated during the execution and saved copies of the script, its input and output files, and plots created.
Here is an example of how the scientist would collect provenance from an interactive console session
library(RDataTracker) prov.init() |
Initializes provenance collection. |
... calibrated.data <- data * calibration.factor |
Then, the scientist enters normal R code. |
... plot.data(calibrated.data, "calibrated-plot.jpeg") ... prov.quit() |
Finally, prov.quit saves the provenance. |
Alternatively, if the script resides in a file named calibrate.R, the user can use execute prov.run ("calibrate.R") to run the script, collecting provenance as it does so.