Abstract
In recent years, data examples have been at the core of several different approaches to schema-mapping design. In particular, Gottlob and Senellart introduced a framework for schema-mapping discovery from a single data example, in which the derivation of a schema mapping is cast as an optimization problem. Our goal is to refine and study this framework in more depth. Among other results, we design a polynomial-time log(n)-approximation algorithm for computing optimal schema mappings from a given set of data examples (where n is the combined size of the given data examples) for a restricted class of schema mappings; moreover, we show that this approximation ratio cannot be improved. In addition to the complexity-theoretic results, we implemented the aforementioned log(n)-approximation algorithm and carried out an experimental evaluation in a real-world mapping scenario.
- Bogdan Alexe, Laura Chiticariu, Renée J. Miller, and Wang Chiew Tan. 2008a. Muse: Mapping understanding and design by example. In ICDE. 10--19.Google Scholar
- Bogdan Alexe, Wang-Chiew Tan, and Yannis Velegrakis. 2008b. STBenchmark: Towards a benchmark for mapping systems. Proc. VLDB Endow. 1, 1 (Aug. 2008), 230--244. DOI:http://dx.doi.org/10.14778/1453856.1453886 Google ScholarDigital Library
- Bogdan Alexe, Balder ten Cate, Phokion G. Kolaitis, and Wang-Chiew Tan. 2011a. Characterizing schema mappings via data examples. ACM Trans. Database Syst. 36, 4, (Dec. 2011) Article 23, 48 pages. DOI:http://dx.doi.org/10.1145/2043652.2043656 Google ScholarDigital Library
- Bogdan Alexe, Balder ten Cate, Phokion G. Kolaitis, and Wang Chiew Tan. 2011b. Designing and refining schema mappings via data examples. In SIGMOD. 133--144. Google ScholarDigital Library
- Noga Alon, Dana Moshkovitz, and Shmuel Safra. 2006. Algorithmic construction of sets for K-restrictions. ACM Trans. Algor. 2, 2 (Apr. 2006), 153--177. DOI:http://dx.doi.org/10.1145/1150334.1150336 Google ScholarDigital Library
- Patricia C. Arocena, Boris Glavic, Radu Ciucanu, and Renée J. Miller. 2015. The iBench integration metadata generator. Proc. VLDB Endow. 9, 3 (Nov. 2015), 108--119. DOI:http://dx.doi.org/10.14778/2850583.2850586 Google ScholarDigital Library
- Alessandro Artale, Diego Calvanese, Roman Kontchakov, and Michael Zakharyaschev. 2009. The DL-Lite family and relations. J. Artif. Intell. Res. 36 (2009), 1--69.Google ScholarCross Ref
- Pablo Barceló. 2009. Logical foundations of relational data exchange. SIGMOD Record 38, 1 (2009), 49--58. Google ScholarDigital Library
- A. Bonifati, E. Q. Chang, T. Ho, V. S. Lakshmanan, and R. Pottinger. 2005. HePToX: Marrying XML and heterogeneity in your P2P databases. In VLDB. 1267--1270.Google Scholar
- Balder ten Cate, Víctor Dalmau, and Phokion G. Kolaitis. 2012. Learning schema mappings. In Proceedings of ICDT. ACM, New York, NY, 182--195. DOI:http://dx.doi.org/10.1145/2274576.2274596 Google ScholarDigital Library
- Balder ten Cate, Víctor Dalmau, and Phokion G. Kolaitis. 2013. Learning schema mappings. ACM Trans. Database Syst. 38, 4 (Dec. 2013), Article 28, 31 pages. DOI:http://dx.doi.org/10.1145/2539032.2539035 Google ScholarDigital Library
- Balder ten Cate, Richard Halpert, and Phokion Kolaitis. 2016. Exchange-repairs: Managing inconsistency in data exchange. J. Data Seman. 5, 2 (2016), 77--97. Google ScholarCross Ref
- Balder ten Cate, Phokion Kolaitis, Kun Qian, and Wang-Chiew Tan. 2015. Approximation algorithms for schema-mapping discovery from data examples. In AMW 2015.Google Scholar
- V. Chvtal. 1979. A greedy heuristic for the set covering problem. Math. Oper. Res. 4 (1979), 233--235. Google ScholarDigital Library
- Anish Das Sarma, Aditya G. Parameswaran, Hector Garcia-Molina, and Jennifer Widom. 2010. Synthesizing view definitions from data. In ICDT. 89--103.Google Scholar
- R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. 2005a. Data exchange: Semantics and query answering. TCS 336, 1 (2005), 89--124. Google ScholarCross Ref
- R. Fagin, P. G. Kolaitis, L. Popa, and W.-C. Tan. 2005b. Composing schema mappings: Second-order dependencies to the rescue. ACM Trans. Database Syst. 30, 4 (2005), 994--1055. Google ScholarDigital Library
- George H. L. Fletcher and Catharine M. Wyss. 2009. Towards a general framework for effective solutions to the data mapping problem. J. Data Seman. XIV (2009), 37--73. Google ScholarDigital Library
- Georg Gottlob and Pierre Senellart. 2010. Schema mapping discovery from data instances. J. ACM 57, 2 (2010). Google ScholarDigital Library
- Wolfang Gutjahr, Emo Welzl, and Gerhart Woeginger. 1992. Polynomial graph-colorings. Discr. Appl. Math. 35 (1992), 29--46. Google ScholarDigital Library
- L. M. Haas, M. A. Hernández, H. Ho, L. Popa, and M. Roth. 2005. Clio grows up: From research prototype to industrial tool. In ACM SIGMOD. 805--810.Google Scholar
- David S. Johnson. 1973. Approximation algorithms for combinatorial problems. In Proceedings of STOC. ACM, New York, NY, 38--49. DOI:http://dx.doi.org/10.1145/800125.804034 Google ScholarDigital Library
- Richard M. Karp. 1972. Reducibility among combinatorial problems. In Complexity of Computer Computations, R. E. Miller and J. W. Thatcher (Eds.). Plenum, 85--103. Google ScholarCross Ref
- P. G. Kolaitis. 2005. Schema mappings, data exchange, and metadata management. In ACM PODS. 61--75. Google ScholarDigital Library
- KRDB. 2013a. OBDA Mappings from IMDB to Ontology. Retreived from https://raw.githubusercontent.com/wiki/ontop/ontop/attachments/Example_MovieOntology/movieontology.obda.Google Scholar
- KRDB. 2013b. Ontop Project. Retrieved from https://github.com/ontop/ontop/wiki/Example_MovieOntology.Google Scholar
- KRDB. 2013c. SQL Script to Generate the Schema Only IMDB Database. Retrieved from https://raw.githubusercontent.com/wiki/ontop/ontop/attachments/Example_MovieOntology/imdbpg.sql.Google Scholar
- M. Lenzerini. 2002. Data integration: A theoretical perspective. In ACM PODS. 233--246. Google ScholarDigital Library
- Heikki Mannila and Kari-Jouko Räihä. 1989. Automatic generation of test data for relational queries. JCSS 38, 2 (1989), 240--258. Google ScholarCross Ref
- R. J. Miller, L. M. Haas, and M. A. Hernández. 2000. Schema mapping as query discovery. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 77--88.Google ScholarDigital Library
- Christopher Olston, Shubham Chopra, and Utkarsh Srivastava. 2009. Generating example data for dataflow programs. In ACM SIGMOD. 245--256. Google ScholarDigital Library
- Christos Papadimitriou and Mihalis Yannakakis. 1988. Optimization, approximation, and complexity classes. In Proceedings of STOC. ACM, New York, NY, 229--234. DOI:http://dx.doi.org/10.1145/62212.62233 Google ScholarDigital Library
- Ran Raz and Shmuel Safra. 1997. A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC’97). ACM, New York, NY, 475--484. DOI:http://dx.doi.org/10.1145/258533.258641 Google ScholarDigital Library
- L. Yan, R. J. Miller, L. M. Haas, and R. Fagin. 2001. Data-driven understanding and refinement of schema mappings. In ACM SIGMOD. 485--496. Google ScholarDigital Library
Index Terms
- Approximation Algorithms for Schema-Mapping Discovery from Data Examples
Recommendations
Learning schema mappings
Invited papers issueA schema mapping is a high-level specification of the relationship between a source schema and a target schema. Recently, a line of research has emerged that aims at deriving schema mappings automatically or semi-automatically with the help of data ...
Characterizing schema mappings via data examples
Schema mappings are high-level specifications that describe the relationship between two database schemas; they are considered to be the essential building blocks in data exchange and data integration, and have been the object of extensive research ...
Characterizing schema mappings via data examples
PODS '10: Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsSchema mappings are high-level specifications that describe the relationship between two database schemas; they are considered to be the essential building blocks in data exchange and data integration, and have been the object of extensive research ...
Comments