Article

Efficient query reformulation in peer data management systems

Authors:
Igor Tatarinov

University of Washington, Seattle, WA

University of Washington, Seattle, WA
View Profile

,
Alon Halevy

University of Washington, Seattle, WA

University of Washington, Seattle, WA
View Profile

SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of dataJune 2004Pages 539–550https://doi.org/10.1145/1007568.1007629

Published:13 June 2004Publication History

SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data

Pages 539–550

ABSTRACT

Peer data management systems (PDMS) offer a flexible architecture for decentralized data sharing. In a PDMS, every peer is associated with a schema that represents the peer's domain of interest, and semantic relationships between peers are provided locally between pairs (or small sets) of peers. By traversing semantic paths of mappings, a query over one peer can obtain relevant data from any reachable peer in the network. Semantic paths are traversed by reformulating queries at a peer into queries on its neighbors.Naively following semantic paths is highly inefficient in practice. We describe several techniques for optimizing the reformulation process in a PDMS and validate their effectiveness using real-life data sets. In particular, we develop techniques for pruning paths in the reformulation process and for minimizing the reformulated queries as they are created. In addition, we consider the effect of the strategy we use to search through the space of reformulations. Finally, we show that pre-computing semantic paths in a PDMS can greatly improve the efficiency of the reformulation process. Together, all of these techniques form a basis for scalable query reformulation in PDMS.To enable our optimizations, we developed practical algorithms, of independent interest, for checking containment and minimization of XML queries, and for composing XML mappings.

References

K. Aberer, P. Cudre-Mauroux, and M. Hauswirth. A framework for semantic gossiping. SIGMOD Record, 31(4), 2002. Google ScholarDigital Library
S. Abiteboul and O. Duschka. Complexity of answering queries using materialized views. In Proc. of PODS, pages 254--263, Seattle, WA, 1998. Google ScholarDigital Library
S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison Weseley, 1995. Google ScholarDigital Library
S. Amer-Yahia, S. Cho. L. Lakshmanan, and D. Srivastava. Minimization of tree pattern queries. In Proc. of SIGMOD, 2001. Google ScholarDigital Library
M. Arenas, V. Kantere, A. Kementsietsidis, I. Kiringa, R. J. Miller, and J. Mylopoulos. The hyperion project: From data integration to data coordination. SIGMOD Record, September 2003. Google ScholarDigital Library
T. Berners-Lee, J. Hendler, and O. Lassila. The semantic web. Scientific American, May 2001.Google ScholarCross Ref
P. Bernstein, F. Giunchiglia, A. Kementsietsidis, J. Mylopoulos, L. Serafini, and I. Zaihrayeu. Data management for peer-to-peer computing: A vision. In Proceedings of the WebDB Workshop, 2002.Google Scholar
S. Cluet, P. Veltri, and D. Vodislav. Views in a large scale XML repository. In Proc. of VLDB, 2001. Google ScholarDigital Library
A. Deutsch and V. Tannen. Containment and integrity constraints for xpath fragments. In KRDB, 2001.Google Scholar
A. Deutsch and V. Tannen. Mars: A system for publishing xml from mixed and redundant storage. In Proc. of VLDB, 2003. Google ScholarDigital Library
H.-H. Do and E. Rahm. COMA - a system for flexible combination of schema matching approaches. In Proc. of VLDB, 2002. Google ScholarDigital Library
A. Doan, P. Domingos, and A. Halevy. Reconciling schemas of disparate data sources: a machine learning approach. In Proc. of SIGMOD, 2001. Google ScholarDigital Library
X. Dong, A. Halevy, and I. Tatarinov. Containment of nested XML queries. Submitted for publication, 2004.Google Scholar
O. M. Duschka and M. R. Genesereth. Query planning in Informaster. In Proc. of the ACM Symposium on Applied Computing, 1997. Google ScholarDigital Library
R. Fagin, P. Kolaitis, L. Popa, and W.-C. Tan. Composing schema mappings: Second-order dependencies to the rescue. In Proc. of PODS, 2004. Google ScholarDigital Library
M. Fernandez, W.-C. Tan, and D. Suciu. Silkroute: Trading between relations and xml. In Proc. of the Int. WWW Conf., 1999. Google ScholarDigital Library
S. Flesca, F. Furfaro, and E. Masciari. On the minimization of xpath queries. In VLDB, 2003.Google ScholarDigital Library
D. Florescu, A. Levy, and A. Mendelzon. Database techniques for the world-wide web: A survey. SIGMOD Record, 27(3):59--74, September 1998. Google ScholarDigital Library
A. Halevy, Z. Ives, P. Mork, and I. Tatarinov. Piazza: Data management infrastructure for semantic web applications. In Proc. of the Int. WWW Conf., 2003. Google ScholarDigital Library
A. Halevy, Z. Ives, D. Suciu, and I. Tatarinov. Schema mediation in peer data management systems. In Proc. of ICDE, 2003.Google ScholarCross Ref
A. Y. Halevy. Answering queries using views: A survey. VLDB Journal, 10(4), 2001. Google ScholarDigital Library
A. Y. Halevy, Z. G. Ives, D. Suciu, and I. Tatarinov. Schema mediation in peer data management systems. In Proc. of ICDE, 2003.Google ScholarCross Ref
M. Lenzerini. Data integration: A theoretical perspective. In Proc. of PODS, 2002. Google ScholarDigital Library
A. Y. Levy and D. Suciu. Deciding containment for queries with complex objects and aggregations. In Proc. of PODS, Tucson, Arizona., 1997. Google ScholarDigital Library
J. Madhavan and A. Halevy. Composing mappings among data sources. In Proc. of VLDB, 2003. Google ScholarDigital Library
G. Miklau and D. Suciu. Containment and equivalence for an xpath fragment. In Proc. of PODS, 2002. Google ScholarDigital Library
B. Ooi, Y. Shu, and K.-L. Tan. Relational data sharing in peer-based data management systems. SIGMOD Record, 23(3), 2003. Google ScholarDigital Library
E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. VLDB Journal, 10(4):334--350, 2001. Google ScholarDigital Library

Efficient query reformulation in peer data management systems
1. Information systems
  1. Data management systems
    1. Database management system engines
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory

Recommendations

Efficient Range Query Processing in Peer-to-Peer Systems

With the increasing popularity of the peer-to-peer (P2P) computing paradigm, many general range query schemes for distributed hash table (DHT)-based P2P systems have been proposed in recent years. Although those schemes can provide range query ...
Read More
Towards OLAP query reformulation in peer-to-peer data warehousing
DOLAP '10: Proceedings of the ACM 13th international workshop on Data warehousing and OLAP

Inter-business collaborative contexts prefigure a distributed scenario where companies organize and coordinate themselves to develop common and shared opportunities. Traditional business intelligence systems do not provide support to this end. Peer Data ...
Read More
OLAP query reformulation in peer-to-peer data warehousing

Inter-business collaborative contexts prefigure a distributed scenario where companies organize and coordinate themselves to develop common and shared opportunities, but traditional business intelligence systems do not provide support to this end. To ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data
June 2004
988 pages
ISBN:1581138598
DOI:10.1145/1007568
Conference Chairs:
Arnd Christian König
Microsoft Research
,
Stefan Dessloch
University of Kaiserslautern, Germany
,
General Chair:
Patrick Valduriez
INRIA, France
,
Program Chair:
Gerhard Weikum
University of the Saarland
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 June 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 137
  Total Citations
  View Citations
- 1,131
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient query reformulation in peer data management systems

SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Recommendations

Efficient Range Query Processing in Peer-to-Peer Systems

Towards OLAP query reformulation in peer-to-peer data warehousing

OLAP query reformulation in peer-to-peer data warehousing