Wednesday, May 15, 2013

MTCProv: A practical provenance query framework for many-task scientific computing

MTCProv: A practical provenance query framework for many-task scientific computing
L. M. R. Gadelha Jr., M. Wilde, M. Mattoso, I. Foster
Distrib. Parallel Databases (2012) 30:351-370

This article presents a system for storing provenance of runs of scientific computations.  It is fairly high-level, but presents two technical contributions:
  1. a detailed description of the schema / data model used, which allows for repeated tasks, annotations describing relevant content/configuration file parameters, linkage between the program graph and run graph, and nesting of tasks (analogous to our hierarchical provenance model
  2. a higher-level query language (SPQL) that allows for more direct statements of typical queries over the results

I would have liked to see more about the computational model (Swift scripts/programs) and how these are mapped to the provenance records.  Most workflow systems/languages lack a formal specification of their semantics (with or without provenance).  Similarly, detailed comparison with other proposed (e.g. the Harvard systems group's PQL) would be interesting.

Labels: , ,

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home