Abstract
An answer to a query has a well-defined lineage expression (alternatively called how-provenance) that explains how the answer was derived. Recent work has also shown how to compute the lineage of a non-answer to a query. However, the cause of an answer or non-answer is a more subtle notion and consists, in general, of only a fragment of the lineage. In this paper, we adapt Halpern, Pearl, and Chockler's recent definitions of causality and responsibility to define the causes of answers and non-answers to queries, and their degree of responsibility. Responsibility captures the notion of degree of causality and serves to rank potentially many causes by their relative contributions to the effect. Then, we study the complexity of computing causes and responsibilities for conjunctive queries. It is known that computing causes is NP-complete in general. Our first main result shows that all causes to conjunctive queries can be computed by a relational query which may involve negation. Thus, causality can be computed in PTIME, and very efficiently so. Next, we study computing responsibility. Here, we prove that the complexity depends on the conjunctive query and demonstrate a dichotomy between PTIME and NP-complete cases. For the PTIME cases, we give a non-trivial algorithm, consisting of a reduction to the max-flow computation problem. Finally, we prove that, even when it is in PTIME, responsibility is complete for LOGSPACE, implying that, unlike causality, it cannot be computed by a relational query.
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarDigital Library
- P. Buneman, S. Khanna, and W. C. Tan. Why and where: A characterization of data provenance. In ICDT, 2001. Google ScholarDigital Library
- A. Chapman and H. V. Jagadish. Why not? In SIGMOD, 2009. Google ScholarDigital Library
- J. Cheney, L. Chiticariu, and W. C. Tan. Provenance in databases: Why, how, and where. Foundations and Trends in Databases, 1(4):379--474, 2009. Google ScholarDigital Library
- H. Chockler and J. Y. Halpern. Responsibility and blame: A structural-model approach. J. Artif. Intell. Res. (JAIR), 22:93--115, 2004. Google ScholarDigital Library
- H. Chockler, J. Y. Halpern, and O. Kupferman. What causes a system to satisfy a specification? ACM Trans. Comput. Log., 9(3), 2008. Google ScholarDigital Library
- Y. Cui, J. Widom, and J. L. Wiener. Tracing the lineage of view data in a warehousing environment. ACM Trans. Database Syst., 25(2):179--227, 2000. Google ScholarDigital Library
- N. Dalvi and D. Suciu. Management of probabilistic data: Foundations and challenges. In PODS, pages 1--12, Beijing, China, 2007. (invited talk). Google ScholarDigital Library
- T. Eiter and T. Lukasiewicz. Complexity results for structure-based causality. Artif. Intell., 142(1):53--89, 2002. (Conference version in IJCAI, 2002). Google ScholarDigital Library
- T. Eiter and T. Lukasiewicz. Causes and explanations in the structural-model approach: Tractable cases. Artif. Intell., 170(6--7):542--580, 2006. Google ScholarCross Ref
- G. Gottlob, N. Leone, and F. Scarcello. The complexity of acyclic conjunctive queries. J. ACM, 48(3):431--498, 2001. Google ScholarDigital Library
- T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, 2007. Google ScholarDigital Library
- J. Y. Halpern and J. Pearl. Causes and explanations: A structural-model approach. Part I: Causes. Brit. J. Phil. Sci., 56:843--887, 2005. (Conference version in UAI, 2001). Google ScholarDigital Library
- M. Herschel, M. A. Hernández, and W. C. Tan. Artemis: A system for analyzing missing answers. PVLDB, 2(2):1550--1553, 2009. Google ScholarDigital Library
- J. Huang, T. Chen, A. Doan, and J. F. Naughton. On the provenance of non-answers to queries over extracted data. PVLDB, 1(1):736--747, 2008. Google ScholarDigital Library
- D. Lewis. Causation. The Journal of Philosophy, 70(17):556--567, 1973.Google ScholarCross Ref
- A. Meliou, W. Gatterbauer, J. Halpern, C. Koch, K. F. Moore, and D. Suciu. Causality in databases. IEEE Data Engineering Bulletin, Sept. 2010.Google Scholar
- A. Meliou, W. Gatterbauer, K. F. Moore, and D. Suciu. The complexity of causality and responsibility for query answers and non-answers. CoRR, abs/1009.2021, 2010.Google Scholar
- A. Meliou, W. Gatterbauer, K. F. Moore, and D. Suciu. Why so? or Why no? Functional causality for explaining query answers. In MUD, 2010. Full version: CoRR abs/0912.5340 (2009).Google Scholar
- P. Menzies. Counterfactual theories of causation. Stanford Encylopedia of Philosophy, 2008.Google Scholar
- D. Olteanu and J. Huang. Secondary-storage confidence computation for conjunctive queries with inequalities. In SIGMOD, 2009. Google ScholarDigital Library
- P. Senellart and G. Gottlob. On the complexity of deriving schema mappings from database instances. PODS, 2008. Google ScholarDigital Library
- Q. T. Tran and C.-Y. Chan. How to conquer why-not questions. In SIGMOD, 2010. Google ScholarDigital Library
- International multidisciplinary workshop on causality. IRIT, Toulouse, June 2009.Google Scholar
Recommendations
Finding Causality and Responsibility for Probabilistic Reverse Skyline Query Non-Answers
Causality and responsibility is an essential tool in the database community for providing intuitive explanations for answers/non-answers to queries. Causality denotes the causes for the answers/non-answers to queries, and responsibility represents the ...
First-order under-approximations of consistent query answers
Consistent Query Answering (CQA) is a principled approach for answering queries on inconsistent databases. The consistent answer to a query q on an inconsistent database db is the intersection of the answers to q on all repairs, where a repair is any ...
Causality and responsibility: probabilistic queries revisited in uncertain databases
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementRecently, due to ubiquitous data uncertainty in many real-life applications, it has become increasingly important to study efficient and effective processing of various probabilistic queries over uncertain data, which usually retrieve uncertain objects ...
Comments