ABSTRACT
We show that relational algebra calculations for incomplete databases, probabilistic databases, bag semantics and why-provenance are particular cases of the same general algorithms involving semirings. This further suggests a comprehensive provenance representation that uses semirings of polynomials. We extend these considerations to datalog and semirings of formal power series. We give algorithms for datalog provenance calculation as well as datalog evaluation for incomplete and probabilistic databases. Finally, we show that for some semirings containment of conjunctive queries is the same as for standard set semantics.
Supplemental Material
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarDigital Library
- F. N. Afrati and C. H. Papadimitriou. The parallel complexity of simple logic programs. J. ACM, 40(4):891--916, 1993. Google ScholarDigital Library
- O. Benjelloun, A. D. Sarma, A. Y. Halevy, and J. Widom. ULDBs: Databases with uncertainty and lineage. In VLDB, 2006. Google ScholarDigital Library
- S. Bistarelli, U. Montanari, and F. Rossi. Semiring-based constraint satisfaction and optimization. JACM, 44(2):201--236, 1997. Google ScholarDigital Library
- S. Bistarelli, U. Montanari, and F. Rossi. Semiring-based constraint logic programming: syntax and semantics. ACM TOPLAS, 23(1):1--29, 2001. Google ScholarDigital Library
- P. Buneman, S. Khanna, and W. -C. Tan. Why and where: A characterization of data provenance. In ICDT, 2001. Google ScholarDigital Library
- A. K. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries in relational data bases. In STOC, 1977. Google ScholarDigital Library
- L. Chiticariu and W.-C. Tan. Debugging schema mappings with routes. In VLDB, 2006. Google ScholarDigital Library
- N. Chomsky and M. -P. Schützenberger. The algebraic theory of context-free languages. Computer Programming and Formal Systems, pages 118--161, 1963.Google ScholarCross Ref
- M. P. Consens and A. O. Mendelzon. Low complexity aggregation in graphlog and datalog. In ICDT, 1990. Google ScholarDigital Library
- P. Crawley and R. P. Dilworth. Algebraic Theory of Lattices. Prentice Hall, 1973.Google Scholar
- Y. Cui. Lineage Tracing in Data Warehouses. PhD thesis, Stanford University, 2001. Google ScholarDigital Library
- Y. Cui, J. Widom, and J. L. Wiener. Tracing the lineage of view data in a warehousing environment. TODS, 25(2), 2000. Google ScholarDigital Library
- N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, 2004.Google ScholarDigital Library
- F. Dong and L. V. S. Lakshmanan. Deductive databases with incomplete information. In Symposium on Logic Programming, 1992.Google Scholar
- N. Fuhr. Probabilistic datalog -- a logic for powerful retrieval methods. In SIGIR, 1995. Google ScholarDigital Library
- N. Fuhr and T. Rölleke. A probabilistic relational algebra for the integration of information retrieval and database systems. TOIS, 14(1), 1997. Google ScholarDigital Library
- T. J. Green and V. Tannen. Models for incomplete and probabilistic information. In EDBT Workshops, 2006. Google ScholarDigital Library
- T. Imielinski and W. Lipski. Incomplete information in relational databases. JACM, 31(4), 1984. Google ScholarDigital Library
- Y. E. Ioannidis and R. Ramakrishnan. Containment of conjunctive queries: beyond relations as sets. TODS, 20(3), 1995. Google ScholarDigital Library
- P. Kolaitis and M. Vardi. Conjunctive-query containment and constraint satisfaction. In PODS, 1998. Google ScholarDigital Library
- W. Kuich. Semirings and formal power series. In Handbook of formal languages, volume 1. Springer, 1997.Google ScholarDigital Library
- L. V. S. Lakshmanan, N. Leone, R. Ross, and V. S. Subrahmanian. Probview: a flexible probabilistic database system. TODS, 22(3), 1997. Google ScholarDigital Library
- L. V. S. Lakshmanan and F. Sadri. Probabilistic deductive databases. In Symposium on Logic Programming, 1994. Google ScholarDigital Library
- M. Maher and R. Ramakrishnan. Déjàvu in fixpoints of logic programs. In NACLP, 1989.Google Scholar
- I. S. Mumick. Query Optimization in Deductive and Relational Databases. PhD thesis, Stanford University, 1991. Google ScholarDigital Library
- I. S. Mumick, H. Pirahesh, and R. Ramakrishnan. The magic of duplicates and aggregates. In VLDB Journal, 1990. Google ScholarDigital Library
- I. S. Mumick and O. Shmueli. Finiteness properties of database queries. In Fourth Australian Database Conference, 1993.Google Scholar
- Y. Sagiv and M. Yannakakis. Equivalences among relational expressions with the union and difference operators. J. ACM, 27(4), 1980. Google ScholarDigital Library
- A. D. Sarma, O. Benjelloun, A. Halevy, and J. Widom. Working models for uncertain data. In ICDE, 2006. Google ScholarDigital Library
- J. D. Ullman. Principles of Database and Knowledge-Base Systems, volume II. Computer Science Press, 1989. Google ScholarDigital Library
- L. A. Zadeh. Fuzzy sets. Inf. Control, 8(3), 1965.Google Scholar
- E. Zimányi. Query evaluation in probabilistic relational databases. TCS, 171(1-2), 1997. Google ScholarDigital Library
Index Terms
- Provenance semirings
Recommendations
Querying data provenance
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of dataMany advanced data management operations (e.g., incremental maintenance, trust assessment, debugging schema mappings, keyword search over databases, or query answering in probabilistic databases), involve computations that look at how a tuple was ...
Data Provenance for Historical Queries in Relational Database
Compute '15: Proceedings of the 8th Annual ACM India ConferenceCapturing, modeling, and querying data provenance in databases has gained considerable importance in the last decade. All kinds of applications developed on top of databases, now a days collect provenance for various purposes like trustworthiness of ...
PUG: a framework and practical implementation for why and why-not provenance
Explaining why an answer is (or is not) returned by a query is important for many applications including auditing, debugging data and queries, and answering hypothetical questions about data. In this work, we present the first practical approach for ...
Comments