ABSTRACT
We study the problem of answering queries through a target schema, given a set of mappings between one or more source schemas and this target schema, and given that the data is at the sources. The schemas can be any combination of relational or XML schemas, and can be independently designed. In addition to the source-to-target mappings, we consider as part of the mapping scenario a set of target constraints specifying additional properties on the target schema. This becomes particularly important when integrating data from multiple data sources with overlapping data and when such constraints can express data merging rules at the target. We define the semantics of query answering in such an integration scenario, and design two novel algorithms, basic query rewrite and query resolution, to implement the semantics. The basic query rewrite algorithm reformulates target queries in terms of the source schemas, based on the mappings. The query resolution algorithm generates additional rewritings that merge related information from multiple sources and assemble a coherent view of the data, by incorporating target constraints. The algorithms are implemented and then evaluated using a comprehensive set of experiments based on both synthetic and real-life data integration scenarios.
- S. Abiteboul and N. Bidoit. Non-first Normal Form Relations: An Algebra Allowing Data Restructuring. JCSS, 33:361--393, 1986.]] Google ScholarDigital Library
- B. Amann, C. Beeri, I. Fundulaki, and M. Scholl. Querying XML sources using an ontology-based mediator. In CoopIS, 2002.]] Google ScholarDigital Library
- C. Beeri, A. Y. Levy, and M.-C. Rousset. Rewriting queries using views in description logics. In PODS, 1997.]] Google ScholarDigital Library
- C. Beeri and M. Y. Vardi. A Proof Procedure for Data Dependencies. J. ACM, 31(4):718--741, 1984.]] Google ScholarDigital Library
- M. Benedikt, C. Y. Chan, W. Fan, J. Freire, and R. Rastogi. Capturing both types and constraints in data integration. In SIGMOD, 2003.]] Google ScholarDigital Library
- D. Calvanese, G. D. Giacomo, and M. Lenzerini. View-based query processing for regular path queries with inverse. In PODS, 2000.]] Google ScholarDigital Library
- D. Chamberlin. XQuery: An XML query language. IBM Systems Journal, 41(4):597--615, 2002.]] Google ScholarDigital Library
- A. Chapman, C. Yu, and H. V. Jagadish. Effective integration of protein data through better data modeling. OMICS: A Journal of Integrative Biology, 7(1):101--102, 2003.]]Google ScholarCross Ref
- A. Deutsch, L. Popa, and V. Tannen. Physical data independence, constraints, and optimization with universal plans. In VLDB, 1999.]] Google ScholarDigital Library
- A. Deutsch and V. Tannen. MARS: A system for publishing XML from mixed and redundant storage. In VLDB, 2003.]] Google ScholarDigital Library
- A. Deutsch and V. Tannen. Reformulation of XML queries and constraints. In ICDT, 2003.]] Google ScholarDigital Library
- O. Duschka, M. Genesereth, and A. Levy. Recursive query plans for data integration. Journal of Logic Programming, 43(1):49--73, 2000.]]Google ScholarCross Ref
- O. M. Duschka and M. R. Genesereth. Answering recursive queries using views. In PODS, 1997.]] Google ScholarDigital Library
- R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data exchange: Semantics and query answering. In ICDT, 2003.]] Google ScholarDigital Library
- M. Fernández, Y. Kadiyska, D. Suciu, A. Morishima, and W.-C. Tan. SilkRoute: A framework for publishing relational data in XML. TODS, 27(4):438--493, 2002.]] Google ScholarDigital Library
- M. Friedman, A. Levy, and T. Millstein. Navigational Plans for Data Integration. In AAAI, 1999.]] Google ScholarDigital Library
- A. Y. Halevy. Answering queries using views: A survey. The VLDB Journal, 10:270--294, 2001.]] Google ScholarDigital Library
- V. Josifovski, P. Schwarz, L. Haas, and E. Lin. Garlic: a new flavor of federated query processing for DB2. In SIGMOD, 2002.]] Google ScholarDigital Library
- R. Krishnamurthy, R. Kaushik, and J. F. Naughton. XML-SQL Query Translation Literature: The State of the Art and Open Problems. In XSym, 2003.]]Google ScholarCross Ref
- M. Lenzerini. Data Integration: A Theoretical Perspective. In PODS, 2002.]] Google ScholarDigital Library
- A. Y. Levy, A. O. Mendelzon, Y. Sagiv, and D. Srivastava. Answering queries using views. In PODS, 1995.]] Google ScholarDigital Library
- I. Manolescu, D. Florescu, and D. Kossman. Answering XML queries over heterogeneous data sources. In VLDB, 2001.]] Google ScholarDigital Library
- R. J. Miller, L. M. Haas, and M. A. Hernández. Schema mapping as query discovery. In VLDB, 2000.]] Google ScholarDigital Library
- Y. Papakonstantinou and V. Vassalos. Rewriting queries using semistructured views. In SIGMOD, 1999.]] Google ScholarDigital Library
- L. Popa and V. Tannen. An equational chase for path-conjunctive queries, constraints, and views. In ICDT, 1999.]] Google ScholarDigital Library
- L. Popa, Y. Velegrakis, R. J. Miller, M. A. Hernández, and R. Fagin. Translating web data. In VLDB, 2002.]] Google ScholarDigital Library
- E. Rahm and P. A. Bernstein. A Survey of Approaches to Automatic Schema Matching. The VLDB Journal, 10(4):334--350, 2001.]] Google ScholarDigital Library
- J. Shanmugasundaram, J. Kiernan, E. Shekita, C. Fan, and J. Funderburk. Querying XML views of relational data. In VLDB, 2001.]] Google ScholarDigital Library
- R. van der Meyden. Logical Approaches to Incomplete Information: A Survey. In Logics for Databases and Information Systems, pages 307--356. Kluwer, 1998.]] Google ScholarDigital Library
- Constraint-based XML query rewriting for data integration
Recommendations
Query rewriting for semistructured data
We address the problem of query rewriting for TSL, a language for querying semistructured data. We develop and present an algorithm that, given a semistructured query q and a set of semistructured views V, finds rewriting queries, i.e., queries that ...
Rewriting queries for XML integration systems
DEXA'06: Proceedings of the 17th international conference on Database and Expert Systems ApplicationsA data integration system typically creates a target XML schema to represent an application domain and source schemas are mapped to the target schema. A user poses a query over the target schema, and the system rewrites the query into a set of queries ...
An XML Schema integration and query mechanism system
The availability of large amounts of heterogeneous distributed web data necessitates the integration of XML data from multiple XML sources for many reasons. For example, currently, there are many e-commerce companies, which offer similar products but ...
Comments