Thursday, May 01, 2014

Observing SQL queries in their natural habitat

Torsten Grust and Jan Rittinger. 2013. Observing SQL queries in their natural habitat. ACM Trans. Database Syst. 38, 1, Article 3 (April 2013), 33 pages.

This paper presents a system called Habitat, that allows highlighting a subexpression of an SQL query whose results are of interest (e.g. due to surprising/incorrect query results).  The system then compiles it into another query whose results are rows containing the values of variables in the "closure" (that is, the values of variables bound outside the subexpression) and the associated values of the selected query.



As a simple example, suppose the query (with underlined part being the WHERE clause):

SELECT r.A, s.D FROM R,S WHERE B=C 

\[R = \begin{array}{c|c}
A & B\\
\hline
1 & 2\\
3 & 7
\end{array}\quad
S = \begin{array}{c|c}
A & B\\
\hline
3 & 4\\
7 & 8
\end{array}
\]

Then the normal result would be something like:

\[\begin{array}{cc}
A & D \\
\hline  3
&8
\end{array}\]

and the result of the debug query arising from the above highlighting would be:


\[\begin{array}{ccccc}
R.A & R.B & S.C & S.D & B = C\\
\hline
1 & 2 & 3 & 4 & false\\
1 & 2 & 7 & 8 & false\\
3 & 7 & 3 & 4 & false\\
3 & 7 & 7 & 8 & true
\end{array}\]

Of course, this is a very simplistic example.  Habitat supports much richer queries including aggregation and grouping operations, and deals with the case that some highlighed part of the query is never reached for some set of bindings.

Stray thoughts:
  • The compilation process essentially generates one new column per subexpression, and postprocessing is needed to eliminate columns/queries that are not actually needed to answer the user's request.  It could be interesting to generate just what is needed directly.
  • it would be interesting to compare the results with the information provided by different provenance systems (which allow selecting a part of the result and asking for an explanation), or combine the information somehow to understand how "this" part of the query influenced "that" part of the result
  • it might also be interesting to formulate the compilation in terms of a higher-level / comprehension-based query language (perhaps with some syntactic marking indicating the subexpression of interest).

Labels: ,

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home