ABSTRACT
Peer data management systems (PDMS) offer a flexible architecture for decentralized data sharing. In a PDMS, every peer is associated with a schema that represents the peer's domain of interest, and semantic relationships between peers are provided locally between pairs (or small sets) of peers. By traversing semantic paths of mappings, a query over one peer can obtain relevant data from any reachable peer in the network. Semantic paths are traversed by reformulating queries at a peer into queries on its neighbors.Naively following semantic paths is highly inefficient in practice. We describe several techniques for optimizing the reformulation process in a PDMS and validate their effectiveness using real-life data sets. In particular, we develop techniques for pruning paths in the reformulation process and for minimizing the reformulated queries as they are created. In addition, we consider the effect of the strategy we use to search through the space of reformulations. Finally, we show that pre-computing semantic paths in a PDMS can greatly improve the efficiency of the reformulation process. Together, all of these techniques form a basis for scalable query reformulation in PDMS.To enable our optimizations, we developed practical algorithms, of independent interest, for checking containment and minimization of XML queries, and for composing XML mappings.
- K. Aberer, P. Cudre-Mauroux, and M. Hauswirth. A framework for semantic gossiping. SIGMOD Record, 31(4), 2002. Google ScholarDigital Library
- S. Abiteboul and O. Duschka. Complexity of answering queries using materialized views. In Proc. of PODS, pages 254--263, Seattle, WA, 1998. Google ScholarDigital Library
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison Weseley, 1995. Google ScholarDigital Library
- S. Amer-Yahia, S. Cho. L. Lakshmanan, and D. Srivastava. Minimization of tree pattern queries. In Proc. of SIGMOD, 2001. Google ScholarDigital Library
- M. Arenas, V. Kantere, A. Kementsietsidis, I. Kiringa, R. J. Miller, and J. Mylopoulos. The hyperion project: From data integration to data coordination. SIGMOD Record, September 2003. Google ScholarDigital Library
- T. Berners-Lee, J. Hendler, and O. Lassila. The semantic web. Scientific American, May 2001.Google ScholarCross Ref
- P. Bernstein, F. Giunchiglia, A. Kementsietsidis, J. Mylopoulos, L. Serafini, and I. Zaihrayeu. Data management for peer-to-peer computing: A vision. In Proceedings of the WebDB Workshop, 2002.Google Scholar
- S. Cluet, P. Veltri, and D. Vodislav. Views in a large scale XML repository. In Proc. of VLDB, 2001. Google ScholarDigital Library
- A. Deutsch and V. Tannen. Containment and integrity constraints for xpath fragments. In KRDB, 2001.Google Scholar
- A. Deutsch and V. Tannen. Mars: A system for publishing xml from mixed and redundant storage. In Proc. of VLDB, 2003. Google ScholarDigital Library
- H.-H. Do and E. Rahm. COMA - a system for flexible combination of schema matching approaches. In Proc. of VLDB, 2002. Google ScholarDigital Library
- A. Doan, P. Domingos, and A. Halevy. Reconciling schemas of disparate data sources: a machine learning approach. In Proc. of SIGMOD, 2001. Google ScholarDigital Library
- X. Dong, A. Halevy, and I. Tatarinov. Containment of nested XML queries. Submitted for publication, 2004.Google Scholar
- O. M. Duschka and M. R. Genesereth. Query planning in Informaster. In Proc. of the ACM Symposium on Applied Computing, 1997. Google ScholarDigital Library
- R. Fagin, P. Kolaitis, L. Popa, and W.-C. Tan. Composing schema mappings: Second-order dependencies to the rescue. In Proc. of PODS, 2004. Google ScholarDigital Library
- M. Fernandez, W.-C. Tan, and D. Suciu. Silkroute: Trading between relations and xml. In Proc. of the Int. WWW Conf., 1999. Google ScholarDigital Library
- S. Flesca, F. Furfaro, and E. Masciari. On the minimization of xpath queries. In VLDB, 2003.Google ScholarDigital Library
- D. Florescu, A. Levy, and A. Mendelzon. Database techniques for the world-wide web: A survey. SIGMOD Record, 27(3):59--74, September 1998. Google ScholarDigital Library
- A. Halevy, Z. Ives, P. Mork, and I. Tatarinov. Piazza: Data management infrastructure for semantic web applications. In Proc. of the Int. WWW Conf., 2003. Google ScholarDigital Library
- A. Halevy, Z. Ives, D. Suciu, and I. Tatarinov. Schema mediation in peer data management systems. In Proc. of ICDE, 2003.Google ScholarCross Ref
- A. Y. Halevy. Answering queries using views: A survey. VLDB Journal, 10(4), 2001. Google ScholarDigital Library
- A. Y. Halevy, Z. G. Ives, D. Suciu, and I. Tatarinov. Schema mediation in peer data management systems. In Proc. of ICDE, 2003.Google ScholarCross Ref
- M. Lenzerini. Data integration: A theoretical perspective. In Proc. of PODS, 2002. Google ScholarDigital Library
- A. Y. Levy and D. Suciu. Deciding containment for queries with complex objects and aggregations. In Proc. of PODS, Tucson, Arizona., 1997. Google ScholarDigital Library
- J. Madhavan and A. Halevy. Composing mappings among data sources. In Proc. of VLDB, 2003. Google ScholarDigital Library
- G. Miklau and D. Suciu. Containment and equivalence for an xpath fragment. In Proc. of PODS, 2002. Google ScholarDigital Library
- B. Ooi, Y. Shu, and K.-L. Tan. Relational data sharing in peer-based data management systems. SIGMOD Record, 23(3), 2003. Google ScholarDigital Library
- E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. VLDB Journal, 10(4):334--350, 2001. Google ScholarDigital Library
- Efficient query reformulation in peer data management systems
Recommendations
Efficient Range Query Processing in Peer-to-Peer Systems
With the increasing popularity of the peer-to-peer (P2P) computing paradigm, many general range query schemes for distributed hash table (DHT)-based P2P systems have been proposed in recent years. Although those schemes can provide range query ...
Towards OLAP query reformulation in peer-to-peer data warehousing
DOLAP '10: Proceedings of the ACM 13th international workshop on Data warehousing and OLAPInter-business collaborative contexts prefigure a distributed scenario where companies organize and coordinate themselves to develop common and shared opportunities. Traditional business intelligence systems do not provide support to this end. Peer Data ...
OLAP query reformulation in peer-to-peer data warehousing
Inter-business collaborative contexts prefigure a distributed scenario where companies organize and coordinate themselves to develop common and shared opportunities, but traditional business intelligence systems do not provide support to this end. To ...
Comments