nach oben

Information Systems Frontiers

Erschienen in:

01.12.2016

XHQE: A hybrid system for scalable selectivity estimation of XML queries

verfasst von: E.-S. M. El-Alfy, S. Mohammed, A. F. Barradah

Erschienen in: Information Systems Frontiers | Ausgabe 6/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

With the increasing popularity of XML applications in enterprise and big data systems, the use of efficient query optimizers is becoming very essential. The performance of an XML query optimizer depends heavily on the query selectivity estimators it uses to find the best possible query execution plan. In this work, we propose a novel selectivity estimator which is a hybrid of structural synopsis and statistics, called XHQE. The structural synopsis enhances the accuracy of estimation and the structural statistics makes it scalable to the allocated memory space. The structural synopsis is generated by labeling the nodes of the source XML dataset using a fingerprint function and merging subtrees with similar fingerprints (i.e. having similar structures). The generated structural synopsis and structural statistics are then used to estimate the selectivity of given queries. We studied the performance of the proposed approach using different types of queries and four benchmark datasets with different structural characteristics. We compared XHQE with existing algorithms such as Sampling, TreeSketch and one histogram-based algorithm. The experimental results showed that the XHQE is significantly better than other algorithms in terms of estimation accuracy and scalability for semi-uniform datasets. For non-uniform datasets, the proposed algorithm has comparable estimation accuracy to TreeSketch as the allocated memory size is highly reduced, yet the estimation data generation time of the proposed approach is much lower (e.g., TreeSketch took more than 50 times longer than that of the proposed approach for XMark dataset). Comparing to the histogram-based algorithm, our approach supports regular twig quires in addition to having higher accuracy when both run under similar memory constraints.

Vorheriger Artikel The development and application of e-Geoscience in China

Nächster Artikel Understanding governance, risk and compliance information systems (GRC IS): The experts view

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Aboulnaga, A, & Naughton, JF (2003). Building XML statistics for the hidden web. In Proceedings of the twelfth ACM International Conference on Information and Knowledge Management, pp 358–365.

Aboulnaga, A, Alameldeen, AR, & Naughton, JF (2001). Estimating the selectivity of XML path expressions for Internet scale applications. In Proceedings of the 27th International Conference on Very Large Data Bases, San Francisco, CA, USA, VLDB’01.

Agrawal, R, Ailamaki, A, Bernstein, PA, Brewer, EA, Carey, MJ, Chaudhuri, S, Doan, A, Florescu, D, Franklin, MJ, Garcia-Molina, H, & et al (2009). The claremont report on database research. Communications of the ACM, 52(6), 56–65.CrossRef

Alrammal, M, & Hains, G (2014). A research survey on large XML data: Streaming, selectivity estimation and parallelism. Inter-cooperative Collective Intelligence: Techniques and Applications Studies in Computational Intelligence, 495, 167–202.CrossRef

Alrammal, M, Hains, G, & Zergaoui, M (2011). Path tree: Document synopsis for XPath query selectivity estimation. In Proceedings of the 5th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS-2011), pp 321–328.

Bosak, J (2014). verified April 2014 Shakespeare plays. http://www.ibiblio.org/xml/examples/shakespeare/.

Bray, TJ, Paoli, C, McQueen, S, & Maler, E (2000). Extensible markup language (XML) 1.0, Second Edition. Available: http://www.w3.org/TR/REC-xml.

Bruno, N, Koudas, N, & Srivastava, D (2002). Holistic twig joins: optimal XML pattern matching. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD ’02, pp 310–321.

Chu, Y, & Yu, J (2012). The research of database query optimization based on XML. Advanced Materials Research, 546-547, 519–525.CrossRef

Drukh, N, Polyzotis, N, Garofalakis, M, & Matias, Y (2004). Fractional XSketch synopses for XML databases. In Bellahsne, Z, Milo, T, Rys, M, Suciu, D, & Unland, R (Eds.) Database and XML Technologies, Lecture Notes in Computer Science, (Vol. 3186 pp. 189–203): Springer Berlin Heidelberg.

Fisher, D, & Maneth, S (2007). Structural selectivity estimation for XML documents. In Proceedings of the IEEE 23rd International Conference on Data Engineering, ICDE, (Vol. 2007 pp. 626–635).

Gou, G, & Chirkova, R (2007). Efficiently querying large XML data repositories: A survey. IEEE Transactions on Knowledge and Data Engineering, 19(10), 1381–1403.CrossRef

Grün C (2010). Storing and querying large XML instances. PhD thesis.

Hachicha, M, & Darmont, J (2013). A survey of XML tree patterns. IEEE Transactions on Knowledge and Data Engineering, 25(1), 29–46.CrossRef

Haw, SC, & Lee, CS (2011). Data storage practices and query processing in XML databases: A survey. Knowledge-Based Systems, 24(8), 1317–1340.CrossRef

He, W, Lv, T, Meis, M, & Yan, P (2013). Visual evaluation of XPath queries. In IEEE Fifth International Conference on Computational and Information Sciences (ICCIS), pp 434–437.

Izadi, SK, Haghjoo, MS, & H?rder, T (2012). S3: Processing tree-pattern XML queries with all logical operators. Data & Knowledge Engineering 72:31–62.

Karp, RM, & Rabin, MO (1987). Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31(2), 249–260.CrossRef

Lee, ML, Li, H, Hsu, W, & Ooi, BC (2004). A statistical approach for XML query size estimation. In Proceedings of the 2004 International Conference on Current Trends in Database Technology, EDBT’04.

Li, H, Lee, ML, & Hsu, W (2005a). A histogram-based selectivity estimator for skewed xml data. In Database and Expert Systems Applications, Springer, pp 270–279.

Li, H, Lee, ML, & Hsu, W (2005b). A histogram-based selectivity estimator for skewed XML data. In Andersen, K, Debenham, J, & Wagner, R (Eds.) Database and Expert Systems Applications, Lecture Notes in Computer Science, vol 3588, Springer Berlin Heidelberg (pp. 270–279).

Lim L, Wang M, Padmanabhan S, Vitter JS, & Parr R (2002). Xpathlearner: An on-line self-tuning markov histogram for XML path selectivity estimation. In Proceedings of the 28th International Conference on Very Large Data Bases, pp 442–453.

Liu, X, Chen, L, Wan, C, Liu, D, & Xiong, N (2013). Exploiting structures in keyword queries for effective XML search. Information Sciences, 240, 56–71.CrossRef

Lu, J, Ling, T, Bao, Z, & Wang, C (2011). Extended XML tree pattern matching: Theories and algorithms. IEEE Transactions on Knowledge and Data Engineering, 23(3), 402–416.CrossRef

Luo, C, Jiang, Z, Hou, WC, Yu, F, & Zhu, Q (2009). A sampling approach for xml query selectivity estimation. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp 335–344.

Madria, S, Chen, Y, Passi, K, & Bhowmick, S (2007). Efficient processing of XPath queries using indexes. Information Systems, 32(1), 131–159.CrossRef

Mohammed, SA, El-Alfy, ESM, & Barradah, AF (2014). Improved selectivity estimator for XML queries based on structural synopsis. World Wide Web 10.1007/s11280-014-0311-3.

Neoklis, P, & Minos, G (2006). Xcluster synopses for structured xml content. In Proceedings of the International Conference on Data Engineering.

Phan, BV, Pardede, E, & Rahayu, W (2013). On the improvement of active XML (AXML) representation and query evaluation. Information Systems Frontiers, 15(2), 203–222.CrossRef

Polyzotis, N, & Garofalakis, M (2002). Statistical synopses for graph-structured XML databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD ’02, pp 358–369.

Polyzotis, N, & Garofalakis, M (2006). XSketch synopses for XML data graphs. ACM Transactions on Database Systems, 31(3), 1014–1063.CrossRef

Polyzotis, N, Garofalakis, M, & Ioannidis, Y (2004a). Approximate XML query answers. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD ’04.

Polyzotis, N, Garofalakis, M, & Ioannidis, Y (2004b). Selectivity estimation for XML twigs. In Proceedings of the IEEE 20th International Conference on Data Engineering.

Sakr, S. (2007). Cardinality-aware and purely relational implementation of an XQuery processor: PhD thesis, University of Konstanz.

Sakr, S (2008). Algebra-based XQuery cardinality estimation. International Journal of Web Information Systems, 4(1), 7–46.CrossRef

Sakr, S (2010). Towards a comprehensive assessment for selectivity estimation approaches of XML queries. International Journal of Web Engineering and Technology, 6, 58–82.CrossRef

Sartiani, C (2003). A framework for estimating XML query cardinality. In WebDB, pp 43–48.

Schmidt, A, Waas, F, Kersten, M, Carey, MJ, Manolescu, I, & Busse, R (2002). XMark: A benchmark for XML data management. In Proceedings of the 28th International Conference on Very Large Databases, VLDB’02, pp 974–985.

Teubner, J, Grust, T, Maneth, S, & Sakr, S (2008). Dependable cardinality forecasts for XQuery. Proceedings of the VLDB Endowment, 1(1), 463–477.CrossRef

Tian, P, Luo, D, Li, Y, & Gu, J (2014). XML multi-core query optimization based on task preemption and data partition. In Semantic Technology (pp. 294–305): Springer.

Verified April (2014). DBLP: Digital bibliography & library project. http://dblp.uni-trier.de/xml/.

Verified April (2014). UniProt. http://www.uniprot.org/.

Wang, C, Parthasarathy, S, & Jin, R (2006). A decomposition-based probabilistic framework for estimating the selectivity of XML twig queries. In Advances in Database Technology, EDBT (pp. 533–551): Springer.

Wang, W, Jiang, H, Lu, H, & Yu, JX (2004a). Bloom histogram: path selectivity estimation for XML data with updates. In Proceedings of the 30th International Conference on Very Large Databases, VLDB’04.

Wang, W, Jiang, H, Lu, H, & Yu, JX (2004b). Bloom histogram: Path selectivity estimation for XML data with updates. In Proceedings of the Thirtieth International Conference on Very Large Databases.

Wang, Y, Wang, H, Meng, X, & Wang, S (2004c). Estimating the selectivity of XML path expression with predicates by histograms. In Li, Q, Wang, G, & Feng, L (Eds.) Advances in Web-Age Information Management, Lecture Notes in Computer Science, (Vol. 3129 pp. 409–418): Springer Berlin Heidelberg.

Wu, X, & Liu, G (2008). XML twig pattern matching using version tree. Data & Knowledge Engineering, 64 (3), 580–599.CrossRef

Wu, X, Theodoratos, D, Wang, WH, & Sellis, T (2013). Optimizing XML queries: Bitmapped materialized views vs. indexes. Information Systems, 38(6), 863–884.CrossRef

Wu, Y, Patel, JM, & Jagadish, H (2002). Estimating answer sizes for XML queries. In Jensen, C, Šaltenis, S, Jeffery, K, Pokorny, J, Bertino, E, B?hn, K, & Jarke, M (Eds.) Advances in Database Technology, Lecture Notes in Computer Science, (Vol. 2287 pp. 590–608): Springer Berlin Heidelberg.

Yang, LH, Lee, ML, Hsu, W, Huang, D, & Wong, L (2008). Efficient mining of frequent XML query patterns with repeating-siblings. Information and Software Technology, 50(5), 375–389.CrossRef

Zhang, C, Naughton, J, DeWitt, D, Luo, Q, & Lohman, G (2001). On supporting containment queries in relationaldatabase management systems. SIGMOD Rec, 30(2), 425–436 . doi:10.1145/376284.375722.CrossRef

Zhang, N, Ozsu, MT, Aboulnaga, A, & Ilyas, If (2006). Xseed: Accurate and fast cardinality estimation for XPath queries. In Proceedings of the IEEE 22nd International Conference on Data Engineering, Washington, DC, USA, ICDE’06.

Titel: XHQE: A hybrid system for scalable selectivity estimation of XML queries
verfasst von: E.-S. M. El-Alfy
S. Mohammed
A. F. Barradah
Publikationsdatum: 01.12.2016
Verlag: Springer US
Erschienen in: Information Systems Frontiers / Ausgabe 6/2016
Print ISSN: 1387-3326
Elektronische ISSN: 1572-9419
DOI: https://doi.org/10.1007/s10796-015-9561-6

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 6/2016

Understanding governance, risk and compliance information systems (GRC IS): The experts view

Competence management in knowledge intensive organizations using consensual knowledge and ontologies

The value of the CIO in the top management team on performance in the case of information security breaches

Towards a construction of an intelligent business process based on cloud services and driven by degree of similarity and QoS

Impact of object manipulation, customization and social loafing on competencies management in 3D Virtual Worlds

Employee knowledge profiles – a mixed-research methods approach