skip to main content
10.1145/1247480.1247537acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Distributed query evaluation with performance guarantees

Published:11 June 2007Publication History

ABSTRACT

Partial evaluation has recently proven an effective technique for evaluating Boolean XPath queries over a fragmented tree that is distributed over a number of sites. What left open is whether or not the technique is applicable to generic data-selecting XPath queries. In contrast to Boolean queries that return a single truth value, a generic XPath query returns a set of elements, and its evaluation introduces difficulties to avoiding excessive data shipping. This paper settles this question in positive by providing evaluation algorithms and optimizations for generic XPath queries in the same distributed and fragmented setting. These algorithms explore parallelism and retain the performance guarantees of their counterpart for Boolean queries, regardless of how the tree is fragmented and distributed. First, each site is visited at most three times, and down to at most twice when optimizations are in place. Second, the network traffic is determined by the final answer of the query, rather than the size of the tree, without incurring unnecessary data shipping. Third, the total computation is comparable to that of centralized algorithms on the tree stored in a single site. We show both analytically and experimentally that our algorithms and optimizations are scalable and efficient on large trees and complex XPath queries.

References

  1. S. Abiteboul, A. Bonifati, G. Cobena, I. Manolescu, and T. Milo. Dynamic XML documents with distribution and replication. In SIGMOD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Amer-Yahia, D. Srivastava, and D. Suciu. Distributed evaluation of network directory queries. TKDE, 16(4):474--486, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. M. Bremer and M. Gertz. On distributing XML repositories. In WebDB, 2003.Google ScholarGoogle Scholar
  4. N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal XML pattern matching. In SIGMOD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Buneman, G. Cong, W. Fan, and A. Kementsietsidis. Using partial evaluation in distributed query evaluation. In VLDB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Colen, H. Kaplan, and T. Milo. Labeling dynamic XML tree. In PODS, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Crainiceanu, P. Linga, J. Gehrke, and J. Shanmugasundaram. Querying peer-to-peer networks using P-Trees. In WebDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. J. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. Commun. ACM, 35(6), 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Ganesan, M. Bawa, and H. Garcia-Molina. Online balancing of range-partitioned data with applications to Peer-to-Peer systems. In VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Godfrey and J. Gryz. A strategy for partial evaluation of views. In Intelligent Information Systems, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Gottlob, C. Koch, and R. Pichler. Efficient algorithms for processing XPath queries. In VLDB, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Gupta, Y. Sagiv, J. D. Ullman, and J. Widom. Constraint checking with partial information. In PODS, 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Y. Halevy, Z. G. Ives, J. Madhavan, P. Mork, D. Suciu, and I. Tatarinov. The Piazza peer data management system. TKDE, 16(7):787--798, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. Hsiao and D. J. DeWitt. Chained Declustering: A New Availability Strategy for Multiprocessor Database Machines. In ICDE, 1990.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Z. G. Ives, A. Y. Halevy, and D. S. Weld. An XML query engine for network-bound data. The VLDB Journal, 11(4):380--402, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. V. Jagadish, L. V. S. Lakshmanan, T. Milo, D. Srivastava, and D. Vista. Querying network directories. In SIGMOD, 1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. V. Jagadish, B. C. Ooi, and Q. H. Vu. Baton: A balanced tree structure for peer-to-peer networks. In SIGMOD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. D. Jones. An introduction to partial evaluation. ACM Computing Surveys, 28(3), 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. C. Kanne, M. Brantner, and G. Moerkotte. Cost-sensitive reordering of navigational primitives. In SIGMOD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Koch. Efficient processing of expressive node-selecting queries on XML data in secondary storage: A tree automata-based approach. In VLDB, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Kossman. The State of the Art in Distributed Query Processing. ACM Computing Surveys, 32(4):422--469, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. V. Papadimos and D. Maier. Distributed queries without distributed state. In WebDB, 2002.Google ScholarGoogle Scholar
  23. P. Ramanan. Efficient algorithms for minimizing tree pattern queries. In SIGMOD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Saxon. http://saxon.sourceforge.net/.Google ScholarGoogle Scholar
  25. A. Schmidt, F. Waas, M. Kersten, M. Carey, I. Manolescu, and R. Busse. XMark: A benchmark for XML data management. In VLDB, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Smith and T. A. Howes. LDAP : Programming Directory-Enabled Apps. Sams, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Suciu. Distributed query evaluation on semistructured data. TODS, 27(1):1--62, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Tomasic, L. Raschid, and P. Valduriez. Scaling heterogeneous databases and the design of Disco. In ICDCS, pages 449--457, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Xerces and Xalan. http://xalan.apache.org.Google ScholarGoogle Scholar

Index Terms

  1. Distributed query evaluation with performance guarantees

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data
        June 2007
        1210 pages
        ISBN:9781595936868
        DOI:10.1145/1247480
        • General Chairs:
        • Lizhu Zhou,
        • Tok Wang Ling,
        • Program Chair:
        • Beng Chin Ooi

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 June 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader