MTCProv: A practical provenance query framework for many-task scientific computing
MTCProv: A practical provenance query framework for many-task scientific computing
L. M. R. Gadelha Jr., M. Wilde, M. Mattoso, I. Foster
Distrib. Parallel Databases (2012) 30:351-370
This article presents a system for storing provenance of runs of scientific computations. It is fairly high-level, but presents two technical contributions:
- a detailed description of the schema / data model used, which allows for repeated tasks, annotations describing relevant content/configuration file parameters, linkage between the program graph and run graph, and nesting of tasks (analogous to our hierarchical provenance model
- a higher-level query language (SPQL) that allows for more direct statements of typical queries over the results
I would have liked to see more about the computational model (Swift scripts/programs) and how these are mapped to the provenance records. Most workflow systems/languages lack a formal specification of their semantics (with or without provenance). Similarly, detailed comparison with other proposed (e.g. the Harvard systems group's PQL) would be interesting.
Labels: provenance, query languages, workflows