ABSTRACT
Provenance, or information about the origin or derivation of data, is important for assessing the trustworthiness of data and identifying and correcting mistakes. Most prior implementations of data provenance have involved heavyweight modifications to database systems and little attention has been paid to how the provenance data can be used outside such a system. We present extensions to the Links programming language that build on its support for language-integrated query to support provenance queries by rewriting and normalizing monadic comprehensions and extending the type system to distinguish provenance metadata from normal data. The main contribution of this paper is to show that the two most common forms of provenance can be implemented efficiently and used safely as a programming language feature with no changes to the database system.
- Y. Amsterdamer, D. Deutch, and V. Tannen. Provenance for aggregate queries. In PODS 2011, pages 153--164, 2011. Google ScholarDigital Library
- O. Benjelloun, A. D. Sarma, A. Y. Halevy, M. Theobald, and J. Widom. Databases with uncertainty and lineage. VLDB J., 17(2):243--264, 2008. Google ScholarDigital Library
- D. Bhagwat, L. Chiticariu, W. C. Tan, and G. Vijayvargiya. An annotation management system for relational databases. VLDB J., 14(4):373--396, 2005.Google ScholarCross Ref
- P. Buneman, S. A. Naqvi, V. Tannen, and L. Wong. Principles of programming with complex objects and collection types. Theor. Comp. Sci., 149(1):3--48, 1995. Google ScholarDigital Library
- P. Buneman, S. Khanna, and W.-C. Tan. Why and where: A characterization of data provenance. In ICDT 2001, number 1973 in LNCS, pages 316--330. Springer Berlin / Heidelberg, 2001. Google ScholarDigital Library
- P. Buneman, J. Cheney, and S. Vansummeren. On the expressiveness of implicit provenance in query and update languages. ACM Trans. Database Syst., 33(4):28:1--28:47, Dec. 2008. Google ScholarDigital Library
- J. Cheney, L. Chiticariu, and W.-C. Tan. Provenance in databases: Why, how, and where. Foundations and Trends in Databases, 1(4):379--474, Apr. 2009. Google ScholarDigital Library
- J. Cheney, A. Ahmed, and U. A. Acar. Database queries that explain their work. In PPDP 2014, pages 271--282. ACM, 2014a. Google ScholarDigital Library
- J. Cheney, S. Lindley, G. Radanne, and P. Wadler. Effective quotation: Relating approaches to language-integrated query. In PEPM 2014, pages 15--26. ACM, 2014b. Google ScholarDigital Library
- J. Cheney, S. Lindley, and P. Wadler. Query shredding: Efficient relational evaluation of queries over nested multisets. In SIGMOD 2014, pages 1027--1038. ACM, 2014c. Google ScholarDigital Library
- A. Chlipala. Ur/Web: A simple model for programming the web. In POPL 2015, pages 153--165. ACM, 2015. Google ScholarDigital Library
- E. Cooper. The script-writer's dream: How to write great SQL in your own language, and be sure it will succeed. In DBPL 2009, volume 5708 of LNCS, pages 36--51. Springer Berlin Heidelberg, 2009. Google ScholarDigital Library
- E. Cooper, S. Lindley, P. Wadler, and J. Yallop. Links: Web programming without tiers. In FMCO 2006, pages 266--296. Springer-Verlag, 2007. Google ScholarDigital Library
- Y. Cui, J. Widom, and J. L. Wiener. Tracing the lineage of view data in a warehousing environment. ACM Trans. Database Syst., 25(2):179--227, June 2000. Google ScholarDigital Library
- S. Fehrenbach and J. Cheney. Language-integrated provenance in Links. In TaPP Workshop, July 2015. Google ScholarDigital Library
- J. N. Foster, T. J. Green, and V. Tannen. Annotated XML: queries and provenance. In PODS, pages 271--280, 2008. Google ScholarDigital Library
- G. Giorgidze, T. Grust, T. Schreiber, and J. Weijers. Haskell boards the ferry: Database-supported program execution for Haskell. In IFL 2010, pages 1--18. Springer-Verlag, 2011. Google ScholarDigital Library
- G. Giorgidze, T. Grust, A. Ulrich, and J. Weijers. Algebraic data types for language-integrated queries. In DDFP 2013, pages 5--10. ACM, 2013. Google ScholarDigital Library
- B. Glavic and G. Alonso. Provenance for nested subqueries. In EDBT 2009, pages 982--993, 2009a. Google ScholarDigital Library
- B. Glavic and G. Alonso. Perm: Processing provenance and data on the same data model through query rewriting. In ICDE 2009, pages 174--185, 2009b. Google ScholarDigital Library
- B. Glavic, R. Miller, and G. Alonso. Using SQL for efficient generation and querying of provenance information. In Festschrift in Honour of Peter Buneman, volume 8000 of LNCS, pages 291--320. Springer Berlin Heidelberg, 2013.Google Scholar
- T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS 2007, pages 31--40. ACM, 2007. Google ScholarDigital Library
- T. Grust and A. Ulrich. First-class functions for first-order database engines. In DBPL 2013, 2013.Google Scholar
- T. Grust, J. Rittinger, and T. Schreiber. Avalanche-safe LINQ compilation. PVLDB, 3(1):162--172, 2010. Google ScholarDigital Library
- G. Karvounarakis, Z. G. Ives, and V. Tannen. Querying data provenance. In SIGMOD 2010, pages 951--962, 2010. Google ScholarDigital Library
- S. Lindley and J. Cheney. Row-based effect types for database integration. In TLDI 2012, pages 91--102. ACM, 2012. Google ScholarDigital Library
- E. Meijer, B. Beckman, and G. Bierman. LINQ: Reconciling object, relations and XML in the .NET framework. In SIGMOD 2006, pages 706--706. ACM, 2006. Google ScholarDigital Library
- A. Ohori and K. Ueno. Making Standard ML a practical database programming language. In ICFP 2011, pages 307--319. ACM, 2011. Google ScholarDigital Library
- M. Serrano. Hop, a fast server for the diffuse web. In COORDINATION, 2009. Google ScholarDigital Library
- L. K. Shar and H. B. K. Tan. Defeating SQL injection. IEEE Computer, 46(3):69--77, 2013. Google ScholarDigital Library
- D. Syme. Leveraging .NET meta-programming components from F#: integrated queries and interoperable heterogeneous execution. In ML Workshop, 2006. Google ScholarDigital Library
- A. Ulrich and T. Grust. The flatter, the better: Query compilation based on the flattening transformation. In SIGMOD 2015, pages 1421--1426. ACM, 2015. Google ScholarDigital Library
- L. Wong. Normal forms and conservative extension properties for query languages over collection types. J. Comput. Syst. Sci., 52(3), 1996. Google ScholarDigital Library
Recommendations
Querying data provenance
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of dataMany advanced data management operations (e.g., incremental maintenance, trust assessment, debugging schema mappings, keyword search over databases, or query answering in probabilistic databases), involve computations that look at how a tuple was ...
Language-integrated provenance by trace analysis
DBPL 2019: Proceedings of the 17th ACM SIGPLAN International Symposium on Database Programming LanguagesLanguage-integrated provenance builds on language-integrated query techniques to make provenance information explaining query results readily available to programmers. In previous work we have explored language-integrated approaches to provenance in and ...
Language-integrated provenance in links
TaPP'15: Proceedings of the 7th USENIX Conference on Theory and Practice of ProvenanceToday's programming languages provide no support for data provenance. In a world that increasingly relies on data, we need provenance to judge the reliability of data and therefore should aim for making it easily accessible to programmers. We report our ...
Comments