ABSTRACT
Partial evaluation has recently proven an effective technique for evaluating Boolean XPath queries over a fragmented tree that is distributed over a number of sites. What left open is whether or not the technique is applicable to generic data-selecting XPath queries. In contrast to Boolean queries that return a single truth value, a generic XPath query returns a set of elements, and its evaluation introduces difficulties to avoiding excessive data shipping. This paper settles this question in positive by providing evaluation algorithms and optimizations for generic XPath queries in the same distributed and fragmented setting. These algorithms explore parallelism and retain the performance guarantees of their counterpart for Boolean queries, regardless of how the tree is fragmented and distributed. First, each site is visited at most three times, and down to at most twice when optimizations are in place. Second, the network traffic is determined by the final answer of the query, rather than the size of the tree, without incurring unnecessary data shipping. Third, the total computation is comparable to that of centralized algorithms on the tree stored in a single site. We show both analytically and experimentally that our algorithms and optimizations are scalable and efficient on large trees and complex XPath queries.
- S. Abiteboul, A. Bonifati, G. Cobena, I. Manolescu, and T. Milo. Dynamic XML documents with distribution and replication. In SIGMOD, 2003. Google ScholarDigital Library
- S. Amer-Yahia, D. Srivastava, and D. Suciu. Distributed evaluation of network directory queries. TKDE, 16(4):474--486, 2004. Google ScholarDigital Library
- J. M. Bremer and M. Gertz. On distributing XML repositories. In WebDB, 2003.Google Scholar
- N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal XML pattern matching. In SIGMOD, 2002. Google ScholarDigital Library
- P. Buneman, G. Cong, W. Fan, and A. Kementsietsidis. Using partial evaluation in distributed query evaluation. In VLDB, 2006. Google ScholarDigital Library
- E. Colen, H. Kaplan, and T. Milo. Labeling dynamic XML tree. In PODS, 2002. Google ScholarDigital Library
- A. Crainiceanu, P. Linga, J. Gehrke, and J. Shanmugasundaram. Querying peer-to-peer networks using P-Trees. In WebDB, 2004. Google ScholarDigital Library
- D. J. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. Commun. ACM, 35(6), 1992. Google ScholarDigital Library
- P. Ganesan, M. Bawa, and H. Garcia-Molina. Online balancing of range-partitioned data with applications to Peer-to-Peer systems. In VLDB, 2004. Google ScholarDigital Library
- P. Godfrey and J. Gryz. A strategy for partial evaluation of views. In Intelligent Information Systems, 2000. Google ScholarDigital Library
- G. Gottlob, C. Koch, and R. Pichler. Efficient algorithms for processing XPath queries. In VLDB, 2002. Google ScholarDigital Library
- A. Gupta, Y. Sagiv, J. D. Ullman, and J. Widom. Constraint checking with partial information. In PODS, 1994.Google ScholarDigital Library
- A. Y. Halevy, Z. G. Ives, J. Madhavan, P. Mork, D. Suciu, and I. Tatarinov. The Piazza peer data management system. TKDE, 16(7):787--798, 2004. Google ScholarDigital Library
- H. Hsiao and D. J. DeWitt. Chained Declustering: A New Availability Strategy for Multiprocessor Database Machines. In ICDE, 1990.Google ScholarDigital Library
- Z. G. Ives, A. Y. Halevy, and D. S. Weld. An XML query engine for network-bound data. The VLDB Journal, 11(4):380--402, 2002. Google ScholarDigital Library
- H. V. Jagadish, L. V. S. Lakshmanan, T. Milo, D. Srivastava, and D. Vista. Querying network directories. In SIGMOD, 1999.Google ScholarDigital Library
- H. V. Jagadish, B. C. Ooi, and Q. H. Vu. Baton: A balanced tree structure for peer-to-peer networks. In SIGMOD, 2005. Google ScholarDigital Library
- N. D. Jones. An introduction to partial evaluation. ACM Computing Surveys, 28(3), 1996. Google ScholarDigital Library
- C. C. Kanne, M. Brantner, and G. Moerkotte. Cost-sensitive reordering of navigational primitives. In SIGMOD, 2005. Google ScholarDigital Library
- C. Koch. Efficient processing of expressive node-selecting queries on XML data in secondary storage: A tree automata-based approach. In VLDB, 2003. Google ScholarDigital Library
- D. Kossman. The State of the Art in Distributed Query Processing. ACM Computing Surveys, 32(4):422--469, 2000. Google ScholarDigital Library
- V. Papadimos and D. Maier. Distributed queries without distributed state. In WebDB, 2002.Google Scholar
- P. Ramanan. Efficient algorithms for minimizing tree pattern queries. In SIGMOD, 2002. Google ScholarDigital Library
- Saxon. http://saxon.sourceforge.net/.Google Scholar
- A. Schmidt, F. Waas, M. Kersten, M. Carey, I. Manolescu, and R. Busse. XMark: A benchmark for XML data management. In VLDB, 2002. Google ScholarDigital Library
- M. Smith and T. A. Howes. LDAP : Programming Directory-Enabled Apps. Sams, 1997. Google ScholarDigital Library
- D. Suciu. Distributed query evaluation on semistructured data. TODS, 27(1):1--62, 2002. Google ScholarDigital Library
- A. Tomasic, L. Raschid, and P. Valduriez. Scaling heterogeneous databases and the design of Disco. In ICDCS, pages 449--457, 1996. Google ScholarDigital Library
- Xerces and Xalan. http://xalan.apache.org.Google Scholar
Index Terms
- Distributed query evaluation with performance guarantees
Recommendations
Partial Evaluation for Distributed XPath Query Processing and Beyond
This article proposes algorithms for evaluating XPath queries over an XML tree that is partitioned horizontally and vertically, and is distributed across a number of sites. The key idea is based on partial evaluation: it is to send the whole query to ...
Performance analysis of "Groupby-After-Join" query processing in parallel database systems
Queries containing aggregate functions often combine multiple tables through join operations. This query is subsequently called "Groupby-Join". There is a special category of this query whereby the group-by operation can only be performed after the join ...
Experimental evaluation of query processing on encrypted Telemedicine data
CBMS '10: Proceedings of the 2010 IEEE 23rd International Symposium on Computer-Based Medical SystemsWhile Telemedicine technology greatly increases the quality of medical care with less health care costs, it also raises critical challenges for data security and privacy in the course of data entry, data review, patient education, video conference, and ...
Comments