Skip to main content
Erschienen in: Information Systems Frontiers 6/2016

01.12.2016

XHQE: A hybrid system for scalable selectivity estimation of XML queries

verfasst von: E.-S. M. El-Alfy, S. Mohammed, A. F. Barradah

Erschienen in: Information Systems Frontiers | Ausgabe 6/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the increasing popularity of XML applications in enterprise and big data systems, the use of efficient query optimizers is becoming very essential. The performance of an XML query optimizer depends heavily on the query selectivity estimators it uses to find the best possible query execution plan. In this work, we propose a novel selectivity estimator which is a hybrid of structural synopsis and statistics, called XHQE. The structural synopsis enhances the accuracy of estimation and the structural statistics makes it scalable to the allocated memory space. The structural synopsis is generated by labeling the nodes of the source XML dataset using a fingerprint function and merging subtrees with similar fingerprints (i.e. having similar structures). The generated structural synopsis and structural statistics are then used to estimate the selectivity of given queries. We studied the performance of the proposed approach using different types of queries and four benchmark datasets with different structural characteristics. We compared XHQE with existing algorithms such as Sampling, TreeSketch and one histogram-based algorithm. The experimental results showed that the XHQE is significantly better than other algorithms in terms of estimation accuracy and scalability for semi-uniform datasets. For non-uniform datasets, the proposed algorithm has comparable estimation accuracy to TreeSketch as the allocated memory size is highly reduced, yet the estimation data generation time of the proposed approach is much lower (e.g., TreeSketch took more than 50 times longer than that of the proposed approach for XMark dataset). Comparing to the histogram-based algorithm, our approach supports regular twig quires in addition to having higher accuracy when both run under similar memory constraints.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Aboulnaga, A, & Naughton, JF (2003). Building XML statistics for the hidden web. In Proceedings of the twelfth ACM International Conference on Information and Knowledge Management, pp 358–365. Aboulnaga, A, & Naughton, JF (2003). Building XML statistics for the hidden web. In Proceedings of the twelfth ACM International Conference on Information and Knowledge Management, pp 358–365.
Zurück zum Zitat Aboulnaga, A, Alameldeen, AR, & Naughton, JF (2001). Estimating the selectivity of XML path expressions for Internet scale applications. In Proceedings of the 27th International Conference on Very Large Data Bases, San Francisco, CA, USA, VLDB’01. Aboulnaga, A, Alameldeen, AR, & Naughton, JF (2001). Estimating the selectivity of XML path expressions for Internet scale applications. In Proceedings of the 27th International Conference on Very Large Data Bases, San Francisco, CA, USA, VLDB’01.
Zurück zum Zitat Agrawal, R, Ailamaki, A, Bernstein, PA, Brewer, EA, Carey, MJ, Chaudhuri, S, Doan, A, Florescu, D, Franklin, MJ, Garcia-Molina, H, & et al (2009). The claremont report on database research. Communications of the ACM, 52(6), 56–65.CrossRef Agrawal, R, Ailamaki, A, Bernstein, PA, Brewer, EA, Carey, MJ, Chaudhuri, S, Doan, A, Florescu, D, Franklin, MJ, Garcia-Molina, H, & et al (2009). The claremont report on database research. Communications of the ACM, 52(6), 56–65.CrossRef
Zurück zum Zitat Alrammal, M, & Hains, G (2014). A research survey on large XML data: Streaming, selectivity estimation and parallelism. Inter-cooperative Collective Intelligence: Techniques and Applications Studies in Computational Intelligence, 495, 167–202.CrossRef Alrammal, M, & Hains, G (2014). A research survey on large XML data: Streaming, selectivity estimation and parallelism. Inter-cooperative Collective Intelligence: Techniques and Applications Studies in Computational Intelligence, 495, 167–202.CrossRef
Zurück zum Zitat Alrammal, M, Hains, G, & Zergaoui, M (2011). Path tree: Document synopsis for XPath query selectivity estimation. In Proceedings of the 5th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS-2011), pp 321–328. Alrammal, M, Hains, G, & Zergaoui, M (2011). Path tree: Document synopsis for XPath query selectivity estimation. In Proceedings of the 5th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS-2011), pp 321–328.
Zurück zum Zitat Bruno, N, Koudas, N, & Srivastava, D (2002). Holistic twig joins: optimal XML pattern matching. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD ’02, pp 310–321. Bruno, N, Koudas, N, & Srivastava, D (2002). Holistic twig joins: optimal XML pattern matching. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD ’02, pp 310–321.
Zurück zum Zitat Chu, Y, & Yu, J (2012). The research of database query optimization based on XML. Advanced Materials Research, 546-547, 519–525.CrossRef Chu, Y, & Yu, J (2012). The research of database query optimization based on XML. Advanced Materials Research, 546-547, 519–525.CrossRef
Zurück zum Zitat Drukh, N, Polyzotis, N, Garofalakis, M, & Matias, Y (2004). Fractional XSketch synopses for XML databases. In Bellahsne, Z, Milo, T, Rys, M, Suciu, D, & Unland, R (Eds.) Database and XML Technologies, Lecture Notes in Computer Science, (Vol. 3186 pp. 189–203): Springer Berlin Heidelberg. Drukh, N, Polyzotis, N, Garofalakis, M, & Matias, Y (2004). Fractional XSketch synopses for XML databases. In Bellahsne, Z, Milo, T, Rys, M, Suciu, D, & Unland, R (Eds.) Database and XML Technologies, Lecture Notes in Computer Science, (Vol. 3186 pp. 189–203): Springer Berlin Heidelberg.
Zurück zum Zitat Fisher, D, & Maneth, S (2007). Structural selectivity estimation for XML documents. In Proceedings of the IEEE 23rd International Conference on Data Engineering, ICDE, (Vol. 2007 pp. 626–635). Fisher, D, & Maneth, S (2007). Structural selectivity estimation for XML documents. In Proceedings of the IEEE 23rd International Conference on Data Engineering, ICDE, (Vol. 2007 pp. 626–635).
Zurück zum Zitat Gou, G, & Chirkova, R (2007). Efficiently querying large XML data repositories: A survey. IEEE Transactions on Knowledge and Data Engineering, 19(10), 1381–1403.CrossRef Gou, G, & Chirkova, R (2007). Efficiently querying large XML data repositories: A survey. IEEE Transactions on Knowledge and Data Engineering, 19(10), 1381–1403.CrossRef
Zurück zum Zitat Grün C (2010). Storing and querying large XML instances. PhD thesis. Grün C (2010). Storing and querying large XML instances. PhD thesis.
Zurück zum Zitat Hachicha, M, & Darmont, J (2013). A survey of XML tree patterns. IEEE Transactions on Knowledge and Data Engineering, 25(1), 29–46.CrossRef Hachicha, M, & Darmont, J (2013). A survey of XML tree patterns. IEEE Transactions on Knowledge and Data Engineering, 25(1), 29–46.CrossRef
Zurück zum Zitat Haw, SC, & Lee, CS (2011). Data storage practices and query processing in XML databases: A survey. Knowledge-Based Systems, 24(8), 1317–1340.CrossRef Haw, SC, & Lee, CS (2011). Data storage practices and query processing in XML databases: A survey. Knowledge-Based Systems, 24(8), 1317–1340.CrossRef
Zurück zum Zitat He, W, Lv, T, Meis, M, & Yan, P (2013). Visual evaluation of XPath queries. In IEEE Fifth International Conference on Computational and Information Sciences (ICCIS), pp 434–437. He, W, Lv, T, Meis, M, & Yan, P (2013). Visual evaluation of XPath queries. In IEEE Fifth International Conference on Computational and Information Sciences (ICCIS), pp 434–437.
Zurück zum Zitat Izadi, SK, Haghjoo, MS, & H?rder, T (2012). S3: Processing tree-pattern XML queries with all logical operators. Data & Knowledge Engineering 72:31–62. Izadi, SK, Haghjoo, MS, & H?rder, T (2012). S3: Processing tree-pattern XML queries with all logical operators. Data & Knowledge Engineering 72:31–62.
Zurück zum Zitat Karp, RM, & Rabin, MO (1987). Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31(2), 249–260.CrossRef Karp, RM, & Rabin, MO (1987). Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31(2), 249–260.CrossRef
Zurück zum Zitat Lee, ML, Li, H, Hsu, W, & Ooi, BC (2004). A statistical approach for XML query size estimation. In Proceedings of the 2004 International Conference on Current Trends in Database Technology, EDBT’04. Lee, ML, Li, H, Hsu, W, & Ooi, BC (2004). A statistical approach for XML query size estimation. In Proceedings of the 2004 International Conference on Current Trends in Database Technology, EDBT’04.
Zurück zum Zitat Li, H, Lee, ML, & Hsu, W (2005a). A histogram-based selectivity estimator for skewed xml data. In Database and Expert Systems Applications, Springer, pp 270–279. Li, H, Lee, ML, & Hsu, W (2005a). A histogram-based selectivity estimator for skewed xml data. In Database and Expert Systems Applications, Springer, pp 270–279.
Zurück zum Zitat Li, H, Lee, ML, & Hsu, W (2005b). A histogram-based selectivity estimator for skewed XML data. In Andersen, K, Debenham, J, & Wagner, R (Eds.) Database and Expert Systems Applications, Lecture Notes in Computer Science, vol 3588, Springer Berlin Heidelberg (pp. 270–279). Li, H, Lee, ML, & Hsu, W (2005b). A histogram-based selectivity estimator for skewed XML data. In Andersen, K, Debenham, J, & Wagner, R (Eds.) Database and Expert Systems Applications, Lecture Notes in Computer Science, vol 3588, Springer Berlin Heidelberg (pp. 270–279).
Zurück zum Zitat Lim L, Wang M, Padmanabhan S, Vitter JS, & Parr R (2002). Xpathlearner: An on-line self-tuning markov histogram for XML path selectivity estimation. In Proceedings of the 28th International Conference on Very Large Data Bases, pp 442–453. Lim L, Wang M, Padmanabhan S, Vitter JS, & Parr R (2002). Xpathlearner: An on-line self-tuning markov histogram for XML path selectivity estimation. In Proceedings of the 28th International Conference on Very Large Data Bases, pp 442–453.
Zurück zum Zitat Liu, X, Chen, L, Wan, C, Liu, D, & Xiong, N (2013). Exploiting structures in keyword queries for effective XML search. Information Sciences, 240, 56–71.CrossRef Liu, X, Chen, L, Wan, C, Liu, D, & Xiong, N (2013). Exploiting structures in keyword queries for effective XML search. Information Sciences, 240, 56–71.CrossRef
Zurück zum Zitat Lu, J, Ling, T, Bao, Z, & Wang, C (2011). Extended XML tree pattern matching: Theories and algorithms. IEEE Transactions on Knowledge and Data Engineering, 23(3), 402–416.CrossRef Lu, J, Ling, T, Bao, Z, & Wang, C (2011). Extended XML tree pattern matching: Theories and algorithms. IEEE Transactions on Knowledge and Data Engineering, 23(3), 402–416.CrossRef
Zurück zum Zitat Luo, C, Jiang, Z, Hou, WC, Yu, F, & Zhu, Q (2009). A sampling approach for xml query selectivity estimation. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp 335–344. Luo, C, Jiang, Z, Hou, WC, Yu, F, & Zhu, Q (2009). A sampling approach for xml query selectivity estimation. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp 335–344.
Zurück zum Zitat Madria, S, Chen, Y, Passi, K, & Bhowmick, S (2007). Efficient processing of XPath queries using indexes. Information Systems, 32(1), 131–159.CrossRef Madria, S, Chen, Y, Passi, K, & Bhowmick, S (2007). Efficient processing of XPath queries using indexes. Information Systems, 32(1), 131–159.CrossRef
Zurück zum Zitat Neoklis, P, & Minos, G (2006). Xcluster synopses for structured xml content. In Proceedings of the International Conference on Data Engineering. Neoklis, P, & Minos, G (2006). Xcluster synopses for structured xml content. In Proceedings of the International Conference on Data Engineering.
Zurück zum Zitat Phan, BV, Pardede, E, & Rahayu, W (2013). On the improvement of active XML (AXML) representation and query evaluation. Information Systems Frontiers, 15(2), 203–222.CrossRef Phan, BV, Pardede, E, & Rahayu, W (2013). On the improvement of active XML (AXML) representation and query evaluation. Information Systems Frontiers, 15(2), 203–222.CrossRef
Zurück zum Zitat Polyzotis, N, & Garofalakis, M (2002). Statistical synopses for graph-structured XML databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD ’02, pp 358–369. Polyzotis, N, & Garofalakis, M (2002). Statistical synopses for graph-structured XML databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD ’02, pp 358–369.
Zurück zum Zitat Polyzotis, N, & Garofalakis, M (2006). XSketch synopses for XML data graphs. ACM Transactions on Database Systems, 31(3), 1014–1063.CrossRef Polyzotis, N, & Garofalakis, M (2006). XSketch synopses for XML data graphs. ACM Transactions on Database Systems, 31(3), 1014–1063.CrossRef
Zurück zum Zitat Polyzotis, N, Garofalakis, M, & Ioannidis, Y (2004a). Approximate XML query answers. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD ’04. Polyzotis, N, Garofalakis, M, & Ioannidis, Y (2004a). Approximate XML query answers. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD ’04.
Zurück zum Zitat Polyzotis, N, Garofalakis, M, & Ioannidis, Y (2004b). Selectivity estimation for XML twigs. In Proceedings of the IEEE 20th International Conference on Data Engineering. Polyzotis, N, Garofalakis, M, & Ioannidis, Y (2004b). Selectivity estimation for XML twigs. In Proceedings of the IEEE 20th International Conference on Data Engineering.
Zurück zum Zitat Sakr, S. (2007). Cardinality-aware and purely relational implementation of an XQuery processor: PhD thesis, University of Konstanz. Sakr, S. (2007). Cardinality-aware and purely relational implementation of an XQuery processor: PhD thesis, University of Konstanz.
Zurück zum Zitat Sakr, S (2008). Algebra-based XQuery cardinality estimation. International Journal of Web Information Systems, 4(1), 7–46.CrossRef Sakr, S (2008). Algebra-based XQuery cardinality estimation. International Journal of Web Information Systems, 4(1), 7–46.CrossRef
Zurück zum Zitat Sakr, S (2010). Towards a comprehensive assessment for selectivity estimation approaches of XML queries. International Journal of Web Engineering and Technology, 6, 58–82.CrossRef Sakr, S (2010). Towards a comprehensive assessment for selectivity estimation approaches of XML queries. International Journal of Web Engineering and Technology, 6, 58–82.CrossRef
Zurück zum Zitat Sartiani, C (2003). A framework for estimating XML query cardinality. In WebDB, pp 43–48. Sartiani, C (2003). A framework for estimating XML query cardinality. In WebDB, pp 43–48.
Zurück zum Zitat Schmidt, A, Waas, F, Kersten, M, Carey, MJ, Manolescu, I, & Busse, R (2002). XMark: A benchmark for XML data management. In Proceedings of the 28th International Conference on Very Large Databases, VLDB’02, pp 974–985. Schmidt, A, Waas, F, Kersten, M, Carey, MJ, Manolescu, I, & Busse, R (2002). XMark: A benchmark for XML data management. In Proceedings of the 28th International Conference on Very Large Databases, VLDB’02, pp 974–985.
Zurück zum Zitat Teubner, J, Grust, T, Maneth, S, & Sakr, S (2008). Dependable cardinality forecasts for XQuery. Proceedings of the VLDB Endowment, 1(1), 463–477.CrossRef Teubner, J, Grust, T, Maneth, S, & Sakr, S (2008). Dependable cardinality forecasts for XQuery. Proceedings of the VLDB Endowment, 1(1), 463–477.CrossRef
Zurück zum Zitat Tian, P, Luo, D, Li, Y, & Gu, J (2014). XML multi-core query optimization based on task preemption and data partition. In Semantic Technology (pp. 294–305): Springer. Tian, P, Luo, D, Li, Y, & Gu, J (2014). XML multi-core query optimization based on task preemption and data partition. In Semantic Technology (pp. 294–305): Springer.
Zurück zum Zitat Wang, C, Parthasarathy, S, & Jin, R (2006). A decomposition-based probabilistic framework for estimating the selectivity of XML twig queries. In Advances in Database Technology, EDBT (pp. 533–551): Springer. Wang, C, Parthasarathy, S, & Jin, R (2006). A decomposition-based probabilistic framework for estimating the selectivity of XML twig queries. In Advances in Database Technology, EDBT (pp. 533–551): Springer.
Zurück zum Zitat Wang, W, Jiang, H, Lu, H, & Yu, JX (2004a). Bloom histogram: path selectivity estimation for XML data with updates. In Proceedings of the 30th International Conference on Very Large Databases, VLDB’04. Wang, W, Jiang, H, Lu, H, & Yu, JX (2004a). Bloom histogram: path selectivity estimation for XML data with updates. In Proceedings of the 30th International Conference on Very Large Databases, VLDB’04.
Zurück zum Zitat Wang, W, Jiang, H, Lu, H, & Yu, JX (2004b). Bloom histogram: Path selectivity estimation for XML data with updates. In Proceedings of the Thirtieth International Conference on Very Large Databases. Wang, W, Jiang, H, Lu, H, & Yu, JX (2004b). Bloom histogram: Path selectivity estimation for XML data with updates. In Proceedings of the Thirtieth International Conference on Very Large Databases.
Zurück zum Zitat Wang, Y, Wang, H, Meng, X, & Wang, S (2004c). Estimating the selectivity of XML path expression with predicates by histograms. In Li, Q, Wang, G, & Feng, L (Eds.) Advances in Web-Age Information Management, Lecture Notes in Computer Science, (Vol. 3129 pp. 409–418): Springer Berlin Heidelberg. Wang, Y, Wang, H, Meng, X, & Wang, S (2004c). Estimating the selectivity of XML path expression with predicates by histograms. In Li, Q, Wang, G, & Feng, L (Eds.) Advances in Web-Age Information Management, Lecture Notes in Computer Science, (Vol. 3129 pp. 409–418): Springer Berlin Heidelberg.
Zurück zum Zitat Wu, X, & Liu, G (2008). XML twig pattern matching using version tree. Data & Knowledge Engineering, 64 (3), 580–599.CrossRef Wu, X, & Liu, G (2008). XML twig pattern matching using version tree. Data & Knowledge Engineering, 64 (3), 580–599.CrossRef
Zurück zum Zitat Wu, X, Theodoratos, D, Wang, WH, & Sellis, T (2013). Optimizing XML queries: Bitmapped materialized views vs. indexes. Information Systems, 38(6), 863–884.CrossRef Wu, X, Theodoratos, D, Wang, WH, & Sellis, T (2013). Optimizing XML queries: Bitmapped materialized views vs. indexes. Information Systems, 38(6), 863–884.CrossRef
Zurück zum Zitat Wu, Y, Patel, JM, & Jagadish, H (2002). Estimating answer sizes for XML queries. In Jensen, C, Šaltenis, S, Jeffery, K, Pokorny, J, Bertino, E, B?hn, K, & Jarke, M (Eds.) Advances in Database Technology, Lecture Notes in Computer Science, (Vol. 2287 pp. 590–608): Springer Berlin Heidelberg. Wu, Y, Patel, JM, & Jagadish, H (2002). Estimating answer sizes for XML queries. In Jensen, C, Šaltenis, S, Jeffery, K, Pokorny, J, Bertino, E, B?hn, K, & Jarke, M (Eds.) Advances in Database Technology, Lecture Notes in Computer Science, (Vol. 2287 pp. 590–608): Springer Berlin Heidelberg.
Zurück zum Zitat Yang, LH, Lee, ML, Hsu, W, Huang, D, & Wong, L (2008). Efficient mining of frequent XML query patterns with repeating-siblings. Information and Software Technology, 50(5), 375–389.CrossRef Yang, LH, Lee, ML, Hsu, W, Huang, D, & Wong, L (2008). Efficient mining of frequent XML query patterns with repeating-siblings. Information and Software Technology, 50(5), 375–389.CrossRef
Zurück zum Zitat Zhang, C, Naughton, J, DeWitt, D, Luo, Q, & Lohman, G (2001). On supporting containment queries in relationaldatabase management systems. SIGMOD Rec, 30(2), 425–436 . doi:10.1145/376284.375722.CrossRef Zhang, C, Naughton, J, DeWitt, D, Luo, Q, & Lohman, G (2001). On supporting containment queries in relationaldatabase management systems. SIGMOD Rec, 30(2), 425–436 . doi:10.​1145/​376284.​375722.CrossRef
Zurück zum Zitat Zhang, N, Ozsu, MT, Aboulnaga, A, & Ilyas, If (2006). Xseed: Accurate and fast cardinality estimation for XPath queries. In Proceedings of the IEEE 22nd International Conference on Data Engineering, Washington, DC, USA, ICDE’06. Zhang, N, Ozsu, MT, Aboulnaga, A, & Ilyas, If (2006). Xseed: Accurate and fast cardinality estimation for XPath queries. In Proceedings of the IEEE 22nd International Conference on Data Engineering, Washington, DC, USA, ICDE’06.
Metadaten
Titel
XHQE: A hybrid system for scalable selectivity estimation of XML queries
verfasst von
E.-S. M. El-Alfy
S. Mohammed
A. F. Barradah
Publikationsdatum
01.12.2016
Verlag
Springer US
Erschienen in
Information Systems Frontiers / Ausgabe 6/2016
Print ISSN: 1387-3326
Elektronische ISSN: 1572-9419
DOI
https://doi.org/10.1007/s10796-015-9561-6

Weitere Artikel der Ausgabe 6/2016

Information Systems Frontiers 6/2016 Zur Ausgabe