skip to main content
research-article
Public Access

Approximation Algorithms for Schema-Mapping Discovery from Data Examples

Published:28 April 2017Publication History
Skip Abstract Section

Abstract

In recent years, data examples have been at the core of several different approaches to schema-mapping design. In particular, Gottlob and Senellart introduced a framework for schema-mapping discovery from a single data example, in which the derivation of a schema mapping is cast as an optimization problem. Our goal is to refine and study this framework in more depth. Among other results, we design a polynomial-time log(n)-approximation algorithm for computing optimal schema mappings from a given set of data examples (where n is the combined size of the given data examples) for a restricted class of schema mappings; moreover, we show that this approximation ratio cannot be improved. In addition to the complexity-theoretic results, we implemented the aforementioned log(n)-approximation algorithm and carried out an experimental evaluation in a real-world mapping scenario.

References

  1. Bogdan Alexe, Laura Chiticariu, Renée J. Miller, and Wang Chiew Tan. 2008a. Muse: Mapping understanding and design by example. In ICDE. 10--19.Google ScholarGoogle Scholar
  2. Bogdan Alexe, Wang-Chiew Tan, and Yannis Velegrakis. 2008b. STBenchmark: Towards a benchmark for mapping systems. Proc. VLDB Endow. 1, 1 (Aug. 2008), 230--244. DOI:http://dx.doi.org/10.14778/1453856.1453886 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bogdan Alexe, Balder ten Cate, Phokion G. Kolaitis, and Wang-Chiew Tan. 2011a. Characterizing schema mappings via data examples. ACM Trans. Database Syst. 36, 4, (Dec. 2011) Article 23, 48 pages. DOI:http://dx.doi.org/10.1145/2043652.2043656 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bogdan Alexe, Balder ten Cate, Phokion G. Kolaitis, and Wang Chiew Tan. 2011b. Designing and refining schema mappings via data examples. In SIGMOD. 133--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Noga Alon, Dana Moshkovitz, and Shmuel Safra. 2006. Algorithmic construction of sets for K-restrictions. ACM Trans. Algor. 2, 2 (Apr. 2006), 153--177. DOI:http://dx.doi.org/10.1145/1150334.1150336 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Patricia C. Arocena, Boris Glavic, Radu Ciucanu, and Renée J. Miller. 2015. The iBench integration metadata generator. Proc. VLDB Endow. 9, 3 (Nov. 2015), 108--119. DOI:http://dx.doi.org/10.14778/2850583.2850586 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Alessandro Artale, Diego Calvanese, Roman Kontchakov, and Michael Zakharyaschev. 2009. The DL-Lite family and relations. J. Artif. Intell. Res. 36 (2009), 1--69.Google ScholarGoogle ScholarCross RefCross Ref
  8. Pablo Barceló. 2009. Logical foundations of relational data exchange. SIGMOD Record 38, 1 (2009), 49--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Bonifati, E. Q. Chang, T. Ho, V. S. Lakshmanan, and R. Pottinger. 2005. HePToX: Marrying XML and heterogeneity in your P2P databases. In VLDB. 1267--1270.Google ScholarGoogle Scholar
  10. Balder ten Cate, Víctor Dalmau, and Phokion G. Kolaitis. 2012. Learning schema mappings. In Proceedings of ICDT. ACM, New York, NY, 182--195. DOI:http://dx.doi.org/10.1145/2274576.2274596 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Balder ten Cate, Víctor Dalmau, and Phokion G. Kolaitis. 2013. Learning schema mappings. ACM Trans. Database Syst. 38, 4 (Dec. 2013), Article 28, 31 pages. DOI:http://dx.doi.org/10.1145/2539032.2539035 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Balder ten Cate, Richard Halpert, and Phokion Kolaitis. 2016. Exchange-repairs: Managing inconsistency in data exchange. J. Data Seman. 5, 2 (2016), 77--97. Google ScholarGoogle ScholarCross RefCross Ref
  13. Balder ten Cate, Phokion Kolaitis, Kun Qian, and Wang-Chiew Tan. 2015. Approximation algorithms for schema-mapping discovery from data examples. In AMW 2015.Google ScholarGoogle Scholar
  14. V. Chvtal. 1979. A greedy heuristic for the set covering problem. Math. Oper. Res. 4 (1979), 233--235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Anish Das Sarma, Aditya G. Parameswaran, Hector Garcia-Molina, and Jennifer Widom. 2010. Synthesizing view definitions from data. In ICDT. 89--103.Google ScholarGoogle Scholar
  16. R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. 2005a. Data exchange: Semantics and query answering. TCS 336, 1 (2005), 89--124. Google ScholarGoogle ScholarCross RefCross Ref
  17. R. Fagin, P. G. Kolaitis, L. Popa, and W.-C. Tan. 2005b. Composing schema mappings: Second-order dependencies to the rescue. ACM Trans. Database Syst. 30, 4 (2005), 994--1055. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. George H. L. Fletcher and Catharine M. Wyss. 2009. Towards a general framework for effective solutions to the data mapping problem. J. Data Seman. XIV (2009), 37--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Georg Gottlob and Pierre Senellart. 2010. Schema mapping discovery from data instances. J. ACM 57, 2 (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Wolfang Gutjahr, Emo Welzl, and Gerhart Woeginger. 1992. Polynomial graph-colorings. Discr. Appl. Math. 35 (1992), 29--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. M. Haas, M. A. Hernández, H. Ho, L. Popa, and M. Roth. 2005. Clio grows up: From research prototype to industrial tool. In ACM SIGMOD. 805--810.Google ScholarGoogle Scholar
  22. David S. Johnson. 1973. Approximation algorithms for combinatorial problems. In Proceedings of STOC. ACM, New York, NY, 38--49. DOI:http://dx.doi.org/10.1145/800125.804034 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Richard M. Karp. 1972. Reducibility among combinatorial problems. In Complexity of Computer Computations, R. E. Miller and J. W. Thatcher (Eds.). Plenum, 85--103. Google ScholarGoogle ScholarCross RefCross Ref
  24. P. G. Kolaitis. 2005. Schema mappings, data exchange, and metadata management. In ACM PODS. 61--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. KRDB. 2013a. OBDA Mappings from IMDB to Ontology. Retreived from https://raw.githubusercontent.com/wiki/ontop/ontop/attachments/Example_MovieOntology/movieontology.obda.Google ScholarGoogle Scholar
  26. KRDB. 2013b. Ontop Project. Retrieved from https://github.com/ontop/ontop/wiki/Example_MovieOntology.Google ScholarGoogle Scholar
  27. KRDB. 2013c. SQL Script to Generate the Schema Only IMDB Database. Retrieved from https://raw.githubusercontent.com/wiki/ontop/ontop/attachments/Example_MovieOntology/imdbpg.sql.Google ScholarGoogle Scholar
  28. M. Lenzerini. 2002. Data integration: A theoretical perspective. In ACM PODS. 233--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Heikki Mannila and Kari-Jouko Räihä. 1989. Automatic generation of test data for relational queries. JCSS 38, 2 (1989), 240--258. Google ScholarGoogle ScholarCross RefCross Ref
  30. R. J. Miller, L. M. Haas, and M. A. Hernández. 2000. Schema mapping as query discovery. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 77--88.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Christopher Olston, Shubham Chopra, and Utkarsh Srivastava. 2009. Generating example data for dataflow programs. In ACM SIGMOD. 245--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Christos Papadimitriou and Mihalis Yannakakis. 1988. Optimization, approximation, and complexity classes. In Proceedings of STOC. ACM, New York, NY, 229--234. DOI:http://dx.doi.org/10.1145/62212.62233 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ran Raz and Shmuel Safra. 1997. A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC’97). ACM, New York, NY, 475--484. DOI:http://dx.doi.org/10.1145/258533.258641 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L. Yan, R. J. Miller, L. M. Haas, and R. Fagin. 2001. Data-driven understanding and refinement of schema mappings. In ACM SIGMOD. 485--496. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Approximation Algorithms for Schema-Mapping Discovery from Data Examples

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Database Systems
        ACM Transactions on Database Systems  Volume 42, Issue 2
        Invited Paper from SIGMOD 2015, Invited Paper from PODS 2015 and Regular Papers
        June 2017
        251 pages
        ISSN:0362-5915
        EISSN:1557-4644
        DOI:10.1145/3086510
        Issue’s Table of Contents

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 28 April 2017
        • Accepted: 1 January 2017
        • Revised: 1 October 2016
        • Received: 1 December 2015
        Published in tods Volume 42, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader