Skip to main content
Erschienen in: World Wide Web 6/2017

15.03.2017

A query refinement framework for xml keyword search

verfasst von: Zhifeng Bao, Yi Yu, Jian Shen, Zhangjie Fu

Erschienen in: World Wide Web | Ausgabe 6/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Existing work of XML keyword search focus on how to find relevant and meaningful data fragments for a query, assuming each keyword is intended as part of it. However, in XML keyword search, user queries usually contain irrelevant or mismatched terms, typos etc, which may easily lead to empty or meaningless results. In this paper, we introduce the problem of content-aware XML keyword query refinement, where the search engine should judiciously decide whether a user query Q needs to be refined during the processing of Q, and find a list of promising refined query candidates which guarantee to have meaningful matching results over the XML data, without any user interaction or a second try. To achieve this goal, we build a novel content-aware XML keyword query refinement framework consisting of two core parts: (1) we build a query ranking model to evaluate the quality of a refined query RQ, which captures the morphological/semantical similarity between Q and RQ and the dependency of keywords of RQ over the XML data; (2) we integrate the exploration of RQ candidates and the generation of their matching results as a single problem, which is fulfilled within a one-time scan of the related keyword inverted lists optimally. Finally, an extensive empirical study verifies the efficiency and effectiveness of our framework.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Basically, it considers a node type t in the DTD of XML data as an entity if t is “*”-annotated in its DTD. However, it may cause the multi-valued attribute to be mistakenly identified as an entity, thus it usually requires the verification and decision from database administrators.
 
2
Without ambiguity caused, we use “refinement rule” instead of “refinement rule instance” in the rest of the paper.
 
3
To facilitate our discussion, the dissimilarity score of a single term deletion rule is 2 throughout all examples in this paper.
 
5
The url is anonymized due to double blind review policy
 
6
To facilitate the discussion, we call our refinement approach as XRefine in the rest of the paper.
 
Literatur
4.
Zurück zum Zitat Agrawal, R., Imieliński, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: SIGMOD (1993) Agrawal, R., Imieliński, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: SIGMOD (1993)
5.
Zurück zum Zitat Bao, Z., Ling, T.W., Chen, B., Lu, J.: Effective XML Keyword Search with Relevance Oriented Ranking. In: ICDE (2009) Bao, Z., Ling, T.W., Chen, B., Lu, J.: Effective XML Keyword Search with Relevance Oriented Ranking. In: ICDE (2009)
6.
Zurück zum Zitat Bao, Z., Lu, J., Ling, T.W., Xu, L., Wu, H.: An Effective Object-Level XML Keyword Search. In: Database Systems for Advanced Applications, 15Th International Conference, DASFAA 2010, pp. 93–109 (2010) Bao, Z., Lu, J., Ling, T.W., Xu, L., Wu, H.: An Effective Object-Level XML Keyword Search. In: Database Systems for Advanced Applications, 15Th International Conference, DASFAA 2010, pp. 93–109 (2010)
7.
8.
Zurück zum Zitat Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30(1-7), 107–117 (1998) Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30(1-7), 107–117 (1998)
9.
Zurück zum Zitat Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: A Semantic Search Engine for XML. In: VLDB (2003) Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: A Semantic Search Engine for XML. In: VLDB (2003)
10.
Zurück zum Zitat Fain, D.C., Pedersen, J.O.: Sponsored Search. In: Bulletin of the American Society for Information Science and Technology (2005) Fain, D.C., Pedersen, J.O.: Sponsored Search. In: Bulletin of the American Society for Information Science and Technology (2005)
11.
Zurück zum Zitat Fellbaum, C.: Wordnet: an electronic lexical database Fellbaum, C.: Wordnet: an electronic lexical database
12.
Zurück zum Zitat Feng, J., Li, G.: Efficient fuzzy type-ahead search in XML data. IEEE Trans. Knowl. Data Eng. 24(5), 882–895 (2012)CrossRef Feng, J., Li, G.: Efficient fuzzy type-ahead search in XML data. IEEE Trans. Knowl. Data Eng. 24(5), 882–895 (2012)CrossRef
13.
Zurück zum Zitat Fu, Z., Ren, K., Shu, J., Sun, X., Huang, F.: Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Trans. Parallel Distrib. Syst. 27(9), 2546–2559 (2016)CrossRef Fu, Z., Ren, K., Shu, J., Sun, X., Huang, F.: Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Trans. Parallel Distrib. Syst. 27(9), 2546–2559 (2016)CrossRef
14.
Zurück zum Zitat Fu, Z., Sun, X., Liu, Q., Zhou, L., Shu, J.: Achieving efficient cloud search services: Multi-keyword ranked search over encrypted cloud data supporting parallel computing. IEICE Trans. 98-B(1), 190–200 (2015)CrossRef Fu, Z., Sun, X., Liu, Q., Zhou, L., Shu, J.: Achieving efficient cloud search services: Multi-keyword ranked search over encrypted cloud data supporting parallel computing. IEICE Trans. 98-B(1), 190–200 (2015)CrossRef
15.
Zurück zum Zitat Fu, Z., Wu, X., Guan, C., Sun, X., Ren, K.: Towards efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement. IEEE Transactions on Information Forensics and Security. doi:10.1109/TIFS.2016.2596138 (2016) Fu, Z., Wu, X., Guan, C., Sun, X., Ren, K.: Towards efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement. IEEE Transactions on Information Forensics and Security. doi:10.​1109/​TIFS.​2016.​2596138 (2016)
16.
Zurück zum Zitat Guo, J., Xu, G., Li, H., Cheng, X.: A Unified and Discriminative Model for Query Refinement. In: SIGIR (2008) Guo, J., Xu, G., Li, H., Cheng, X.: A Unified and Discriminative Model for Query Refinement. In: SIGIR (2008)
17.
Zurück zum Zitat Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked Keyword Search over XML Documents. In: SIGMOD (2003) Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked Keyword Search over XML Documents. In: SIGMOD (2003)
18.
Zurück zum Zitat Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword Proximity Search in XML Trees. In: TKDE (2006) Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword Proximity Search in XML Trees. In: TKDE (2006)
19.
Zurück zum Zitat Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword Proximity Search on XML Graphs. In: ICDE (2003) Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword Proximity Search on XML Graphs. In: ICDE (2003)
20.
Zurück zum Zitat Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4) (2002) Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4) (2002)
21.
Zurück zum Zitat Jones, R., Fain, D.: Query word deletion prediction. In: SIGIR03 Jones, R., Fain, D.: Query word deletion prediction. In: SIGIR03
22.
Zurück zum Zitat Jones, R., Rey, B., Madani, O., Greiner, W.: Generating Query Substitutions. In: WWW (2006) Jones, R., Rey, B., Madani, O., Greiner, W.: Generating Query Substitutions. In: WWW (2006)
24.
Zurück zum Zitat Li, G., Feng, J., Wang, J., Zhou, L.: Effective Keyword Search for Valuable Lcas over Xml Documents. In: CIKM, pp. 31–40 (2007) Li, G., Feng, J., Wang, J., Zhou, L.: Effective Keyword Search for Valuable Lcas over Xml Documents. In: CIKM, pp. 31–40 (2007)
25.
Zurück zum Zitat Li, G., Ooi, B.C., Feng, J., Wang, J., Zhou, L.: Ease: Efficient and Adaptive Keyword Search on Unstructured, Semi-Structured and Structured Data. In: SIGMOD (2008) Li, G., Ooi, B.C., Feng, J., Wang, J., Zhou, L.: Ease: Efficient and Adaptive Keyword Search on Unstructured, Semi-Structured and Structured Data. In: SIGMOD (2008)
26.
Zurück zum Zitat Li, J., Liu, C., Zhou, R., Wang, W.: Suggestion of Promising Result Types for XML Keyword Search. In: EDBT 2010, 13Th International Conference on Extending Database Technology, pp. 561–572 (2010) Li, J., Liu, C., Zhou, R., Wang, W.: Suggestion of Promising Result Types for XML Keyword Search. In: EDBT 2010, 13Th International Conference on Extending Database Technology, pp. 561–572 (2010)
27.
Zurück zum Zitat Li, J., Liu, C., Zhou, R., Wang, W.: Top-K Keyword Search over Probabilistic XML Data. In: Proceedings of the 27Th International Conference on Data Engineering, ICDE 2011, pp. 673–684 (2011) Li, J., Liu, C., Zhou, R., Wang, W.: Top-K Keyword Search over Probabilistic XML Data. In: Proceedings of the 27Th International Conference on Data Engineering, ICDE 2011, pp. 673–684 (2011)
28.
Zurück zum Zitat Li, M., Zhang, Y., Zhu, M., Zhou, M.: Exploring Distributional Similarity Based Models for Query Spelling Correction. In: ACL, pp. 1025–1032 (2006) Li, M., Zhang, Y., Zhu, M., Zhou, M.: Exploring Distributional Similarity Based Models for Query Spelling Correction. In: ACL, pp. 1025–1032 (2006)
29.
Zurück zum Zitat Li, Y., Yu, C., Jagadish, H.: Schema-Free XQuery. In: VLDB (2004) Li, Y., Yu, C., Jagadish, H.: Schema-Free XQuery. In: VLDB (2004)
30.
Zurück zum Zitat Liu, Z., Chen, Y.: Identifying Meaningful Return Information for Xml Keyword Search. In: SIGMOD (2007) Liu, Z., Chen, Y.: Identifying Meaningful Return Information for Xml Keyword Search. In: SIGMOD (2007)
31.
Zurück zum Zitat Liu, Z., Chen, Y.: Reasoning and identifying relevant matches for xml keyword search PVLDB 1(1) (2008) Liu, Z., Chen, Y.: Reasoning and identifying relevant matches for xml keyword search PVLDB 1(1) (2008)
32.
Zurück zum Zitat Liu, Z., Sun, P., Chen, Y.: Structured search result differentiation PVLDB (2009) Liu, Z., Sun, P., Chen, Y.: Structured search result differentiation PVLDB (2009)
33.
Zurück zum Zitat Lu, Y., Wang, W., Li, J., Liu, C.: Xclean: Providing Valid Spelling Suggestions for xml Keyword Queries. In: ICDE (2011) Lu, Y., Wang, W., Li, J., Liu, C.: Xclean: Providing Valid Spelling Suggestions for xml Keyword Queries. In: ICDE (2011)
34.
Zurück zum Zitat Mass, Y., Mandelbrod, M.: Component ranking and automatic query refinement for xml retrieval. In: INEX (2004) Mass, Y., Mandelbrod, M.: Component ranking and automatic query refinement for xml retrieval. In: INEX (2004)
35.
Zurück zum Zitat Pan, H., Theobald, A., Schenkel, R.: Query refinement by relevance feedback in an xml retrieval system. In: ER (2004) Pan, H., Theobald, A., Schenkel, R.: Query refinement by relevance feedback in an xml retrieval system. In: ER (2004)
36.
Zurück zum Zitat Peng, F., Ahmed, N., Li, X., Lu, Y., Lu, Y.: Context sensitive stemming for Web search. In: SIGIR (2007) Peng, F., Ahmed, N., Li, X., Lu, Y., Lu, Y.: Context sensitive stemming for Web search. In: SIGIR (2007)
37.
Zurück zum Zitat Petkova, D., Croft, W.B., Diao, Y.: Refining keyword queries for xml retrieval by combining content and structure. In: ECIR (2009) Petkova, D., Croft, W.B., Diao, Y.: Refining keyword queries for xml retrieval by combining content and structure. In: ECIR (2009)
38.
Zurück zum Zitat Pu, K.Q., Yu, X.: Keyword uery cleaning. In: VLDB (2008) Pu, K.Q., Yu, X.: Keyword uery cleaning. In: VLDB (2008)
39.
Zurück zum Zitat Qiu, Y., Frei, H.P.: Concept based query expansion. In: SIGIR, pp. 160–169 (1993) Qiu, Y., Frei, H.P.: Concept based query expansion. In: SIGIR, pp. 160–169 (1993)
40.
Zurück zum Zitat Risvik, K.M., Mikolajewski, T., Boros, P., Boros, P.: Query Segmentation for Web Search. In: WWW (2003) Risvik, K.M., Mikolajewski, T., Boros, P., Boros, P.: Query Segmentation for Web Search. In: WWW (2003)
41.
Zurück zum Zitat Ruthven, I.: Re-Examining the Potential Effectiveness of Interactive Query Expansion. In: SIGIR (2003) Ruthven, I.: Re-Examining the Potential Effectiveness of Interactive Query Expansion. In: SIGIR (2003)
42.
Zurück zum Zitat Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Inc (1986) Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Inc (1986)
43.
Zurück zum Zitat Spink, A., Jansen, B.J., Wolfram, D., Saracevic, T.: From e-sex to e-commerce: Web search changes. IEEE Computer 35(3) (2002) Spink, A., Jansen, B.J., Wolfram, D., Saracevic, T.: From e-sex to e-commerce: Web search changes. IEEE Computer 35(3) (2002)
44.
Zurück zum Zitat Sun, C., Chan, C.Y., Goenka, A.K.: Multiway Slca-Based Keyword Search in xml Data. In: WWW (2007) Sun, C., Chan, C.Y., Goenka, A.K.: Multiway Slca-Based Keyword Search in xml Data. In: WWW (2007)
45.
Zurück zum Zitat Tao, Y., Papadopoulos, S., Sheng, C., Stefanidis, K., Stefanidis, K.: Nearest Keyword Search in Xml Documents. In: SIGMOD (2011) Tao, Y., Papadopoulos, S., Sheng, C., Stefanidis, K., Stefanidis, K.: Nearest Keyword Search in Xml Documents. In: SIGMOD (2011)
46.
Zurück zum Zitat Termehchy, A., Winslett, M., Winslett, M.: Using structural information in xml keyword search effectively. ACM Trans. Database Syst. (2011) Termehchy, A., Winslett, M., Winslett, M.: Using structural information in xml keyword search effectively. ACM Trans. Database Syst. (2011)
47.
Zurück zum Zitat Theobald, M., Bast, H., Majumdar, D., Schenkel, R., Weikum, G.: Topx: efficient and versatile top-k query processing for semistructured data. VLDB J. 17(1) (2008) Theobald, M., Bast, H., Majumdar, D., Schenkel, R., Weikum, G.: Topx: efficient and versatile top-k query processing for semistructured data. VLDB J. 17(1) (2008)
48.
Zurück zum Zitat Vlez, B., Weiss, R., Sheldon, M.A., Gifford, D.K., Gifford, D.K.: Fast and Effective Query Refinement. In: SIGIR (1997) Vlez, B., Weiss, R., Sheldon, M.A., Gifford, D.K., Gifford, D.K.: Fast and Effective Query Refinement. In: SIGIR (1997)
49.
Zurück zum Zitat Wu, H., Bao, Z.: Object-Oriented XML Keyword Search. In: Conceptual Modeling - ER 2011, 30Th International Conference, 1 2011, pp. 402–410 (2011) Wu, H., Bao, Z.: Object-Oriented XML Keyword Search. In: Conceptual Modeling - ER 2011, 30Th International Conference, 1 2011, pp. 402–410 (2011)
50.
Zurück zum Zitat Xia, Z., Wang, X., Sun, X., Wang, Q.: A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans. Parallel Distrib. Syst. 27(2), 340–352 (2016)CrossRef Xia, Z., Wang, X., Sun, X., Wang, Q.: A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans. Parallel Distrib. Syst. 27(2), 340–352 (2016)CrossRef
51.
Zurück zum Zitat Xu, J., Croft, W.B.: Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Syst. (2000) Xu, J., Croft, W.B.: Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Syst. (2000)
52.
Zurück zum Zitat Xu, J., Croft, W.B., Croft, W.B.: Query Expansion Using Local and Global Document Analysis. In: SIGIR (1996) Xu, J., Croft, W.B., Croft, W.B.: Query Expansion Using Local and Global Document Analysis. In: SIGIR (1996)
53.
Zurück zum Zitat Xu, Y., Papakonstantinou, Y.: Efficient Keyword Search for Smallest LCAs in XML Databases. In: SIGMOD (2005) Xu, Y., Papakonstantinou, Y.: Efficient Keyword Search for Smallest LCAs in XML Databases. In: SIGMOD (2005)
54.
Zurück zum Zitat Xu, Y., Papakonstantinou, Y.: Efficient Lca Based Keyword Search in xml Data. In: EDBT (2008) Xu, Y., Papakonstantinou, Y.: Efficient Lca Based Keyword Search in xml Data. In: EDBT (2008)
55.
Zurück zum Zitat Zeng, Y., Bao, Z., Ling, T.W., Jagadish, H.V., Li, G.: Breaking out of the Mismatch Trap. In: IEEE 30Th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014, pp. 940–951 (2014). doi:10.1109/ICDE.2014.6816713 Zeng, Y., Bao, Z., Ling, T.W., Jagadish, H.V., Li, G.: Breaking out of the Mismatch Trap. In: IEEE 30Th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014, pp. 940–951 (2014). doi:10.​1109/​ICDE.​2014.​6816713
56.
Zurück zum Zitat Zhou, R., Liu, C., Li, J.: Fast ELCA Computation for Keyword Queries on XML Data. In: EDBT 2010, 13Th International Conference on Extending Database Technology, pp. 549–560 (2010) Zhou, R., Liu, C., Li, J.: Fast ELCA Computation for Keyword Queries on XML Data. In: EDBT 2010, 13Th International Conference on Extending Database Technology, pp. 549–560 (2010)
57.
Zurück zum Zitat Zhou, R., Liu, C., Li, J., Yu, J.X.: ELCAx Evaluation for keyword search on probabilistic XML data. World Wide Web 16(2), 171–193 (2013)CrossRef Zhou, R., Liu, C., Li, J., Yu, J.X.: ELCAx Evaluation for keyword search on probabilistic XML data. World Wide Web 16(2), 171–193 (2013)CrossRef
Metadaten
Titel
A query refinement framework for xml keyword search
verfasst von
Zhifeng Bao
Yi Yu
Jian Shen
Zhangjie Fu
Publikationsdatum
15.03.2017
Verlag
Springer US
Erschienen in
World Wide Web / Ausgabe 6/2017
Print ISSN: 1386-145X
Elektronische ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-017-0447-z

Weitere Artikel der Ausgabe 6/2017

World Wide Web 6/2017 Zur Ausgabe