Skip to main content

2016 | OriginalPaper | Buchkapitel

Query Classification by Leveraging Explicit Concept Information

verfasst von : Fang Wang, Ze Yang, Zhoujun Li, Jianshe Zhou

Erschienen in: Advanced Data Mining and Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A key task in query understanding is interpreting user intentions from the limited words that the user submitted to the search engines. Query classification (QC) has been widely studied for this purpose, which classifies queries into a set of target categories as user search intents. Query classification is an important as well as difficult problem in the field of information retrieval, since the queries are usually short in length, ambiguous and noisy. In this case, traditional “bag-of-words” based classification methods fail to achieve high accuracy in the task of QC. In this paper, we propose to mine explicit “Concept” information to help resolve this problem. Specifically, we first leverage existing knowledge bases to enrich the short query from the concept level. Then we discuss the usage of the mined concept information and propose a novel language model based query classification method which takes both words and concepts into consideration. Experimental results show that the mined concepts are very informative and effective to improve query classification.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Shen, D., Sun, J.-T., Yang, Q., Chen, Z.: Building bridges for web query classification. In: SIGIR (2006) Shen, D., Sun, J.-T., Yang, Q., Chen, Z.: Building bridges for web query classification. In: SIGIR (2006)
2.
Zurück zum Zitat Shen, D., Pan, R., Sun, J.-T., Pan, J.J., Wu, K., Yin, J., Yang, Q.: Query enrichment for web-query classification. ACM TOIS 24(3), 320–352 (2006)CrossRef Shen, D., Pan, R., Sun, J.-T., Pan, J.J., Wu, K., Yin, J., Yang, Q.: Query enrichment for web-query classification. ACM TOIS 24(3), 320–352 (2006)CrossRef
3.
4.
Zurück zum Zitat Cao, H., Hu, D.H., Shen, D., Jiang, D., Sun, J.-T., Chen, E., Yang, Q.: Context-aware query classification. In: SIGIR, pp. 3–10. ACM (2009) Cao, H., Hu, D.H., Shen, D., Jiang, D., Sun, J.-T., Chen, E., Yang, Q.: Context-aware query classification. In: SIGIR, pp. 3–10. ACM (2009)
5.
Zurück zum Zitat Hu, J., Wang, G., Lochovsky, F., Sun, J.-T., Chen, Z.: Understanding user’s query intent with wikipedia. In: WWW, pp. 471–480. ACM (2009) Hu, J., Wang, G., Lochovsky, F., Sun, J.-T., Chen, Z.: Understanding user’s query intent with wikipedia. In: WWW, pp. 471–480. ACM (2009)
6.
Zurück zum Zitat Yang, H., Hu, Q., He, L.: Learning topic-oriented word embedding for query classification. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9077, pp. 188–198. Springer, Heidelberg (2015). doi:10.1007/978-3-319-18038-0_15 Yang, H., Hu, Q., He, L.: Learning topic-oriented word embedding for query classification. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9077, pp. 188–198. Springer, Heidelberg (2015). doi:10.​1007/​978-3-319-18038-0_​15
7.
Zurück zum Zitat KhudaBukhsh, A.R., Bennett, P.N., White, R.W.: Building effective query classifiers: a case study in self-harm intent detection. In: CIKM, pp. 1735–1738. ACM (2015) KhudaBukhsh, A.R., Bennett, P.N., White, R.W.: Building effective query classifiers: a case study in self-harm intent detection. In: CIKM, pp. 1735–1738. ACM (2015)
8.
Zurück zum Zitat Silverstein, C., Marais, H., Henzinger, M., Moricz, M.: Analysis of a very large web search engine query log. In: ACM SIGIR Forum, vol. 33, pp. 6–12. ACM (1999) Silverstein, C., Marais, H., Henzinger, M., Moricz, M.: Analysis of a very large web search engine query log. In: ACM SIGIR Forum, vol. 33, pp. 6–12. ACM (1999)
9.
Zurück zum Zitat Shen, D., Pan, R., Sun, J.-T., Pan, J.J., Wu, K., Yin, J., Yang, Q.: Q2c@UST: our winning solution to query classification in KDDCUP 2005. SIGKDD 7(2), 100–110 (2005)CrossRef Shen, D., Pan, R., Sun, J.-T., Pan, J.J., Wu, K., Yin, J., Yang, Q.: Q2c@UST: our winning solution to query classification in KDDCUP 2005. SIGKDD 7(2), 100–110 (2005)CrossRef
10.
Zurück zum Zitat Dai, H.K., Zhao, L., Nie, Z., Wen, J.-R., Wang, L., Li, Y.: Detecting online commercial intention (OCI). In: WWW (2006) Dai, H.K., Zhao, L., Nie, Z., Wen, J.-R., Wang, L., Li, Y.: Detecting online commercial intention (OCI). In: WWW (2006)
11.
Zurück zum Zitat Broder, A.Z., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., Zhang, T.: Robust classification of rare queries using web knowledge. In: SIGIR, pp. 231–238. ACM (2007) Broder, A.Z., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., Zhang, T.: Robust classification of rare queries using web knowledge. In: SIGIR, pp. 231–238. ACM (2007)
12.
Zurück zum Zitat Wen, J.-R., Nie, J.-Y., Zhang, H.-J.: Query clustering using user logs. ACM Trans. Inf. Syst. 20(1), 59–81 (2002)CrossRef Wen, J.-R., Nie, J.-Y., Zhang, H.-J.: Query clustering using user logs. ACM Trans. Inf. Syst. 20(1), 59–81 (2002)CrossRef
13.
Zurück zum Zitat Beitzel, S.M., Jensen, E.C., Frieder, O., Lewis, D.D., Chowdhury, A., Kolcz, A.: Improving automatic query classification via semi-supervised learning. In: ICDM, pp. 42–49. IEEE (2005) Beitzel, S.M., Jensen, E.C., Frieder, O., Lewis, D.D., Chowdhury, A., Kolcz, A.: Improving automatic query classification via semi-supervised learning. In: ICDM, pp. 42–49. IEEE (2005)
14.
Zurück zum Zitat Beitzel, S.M., Jensen, E.C., Lewis, D.D., Chowdhury, A., Frieder, O.: Automatic classification of web queries using very large unlabeled query logs. ACM TOIS 25(2), 107–108 (2007)CrossRef Beitzel, S.M., Jensen, E.C., Lewis, D.D., Chowdhury, A., Frieder, O.: Automatic classification of web queries using very large unlabeled query logs. ACM TOIS 25(2), 107–108 (2007)CrossRef
15.
Zurück zum Zitat Arguello, J., Diaz, F., Callan, J., Crespo, J.-F.: Sources of evidence for vertical selection. In: SIGIR, pp. 315–322. ACM (2009) Arguello, J., Diaz, F., Callan, J., Crespo, J.-F.: Sources of evidence for vertical selection. In: SIGIR, pp. 315–322. ACM (2009)
16.
Zurück zum Zitat Gabrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge. In: AAAI (2006) Gabrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge. In: AAAI (2006)
17.
Zurück zum Zitat Seifzadeh, S., Farahat, A.K., Kamel, M.S., Karray, F.: Short-text clustering using statistical semantics. In: WWW, pp. 805–810. ACM (2015) Seifzadeh, S., Farahat, A.K., Kamel, M.S., Karray, F.: Short-text clustering using statistical semantics. In: WWW, pp. 805–810. ACM (2015)
18.
Zurück zum Zitat Huang, L.: Concept-based text clustering. Ph.D. thesis, The University of Waikato (2011) Huang, L.: Concept-based text clustering. Ph.D. thesis, The University of Waikato (2011)
19.
Zurück zum Zitat Beeferman, D., Berger, A.: Agglomerative clustering of a search engine query log. In: SIGKDD, pp. 407–416. ACM (2000) Beeferman, D., Berger, A.: Agglomerative clustering of a search engine query log. In: SIGKDD, pp. 407–416. ACM (2000)
20.
Zurück zum Zitat Craswell, N., Szummer, M.: Random walks on the click graph. In: SIGIR, pp. 239–246. ACM (2007) Craswell, N., Szummer, M.: Random walks on the click graph. In: SIGIR, pp. 239–246. ACM (2007)
21.
Zurück zum Zitat Fellbaum, C.: WordNet. Wiley Online Library, New York (1998)MATH Fellbaum, C.: WordNet. Wiley Online Library, New York (1998)MATH
22.
Zurück zum Zitat Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp. 1247–1250. ACM (2008) Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp. 1247–1250. ACM (2008)
23.
Zurück zum Zitat Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: WWW, pp. 697–706. ACM (2007) Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: WWW, pp. 697–706. ACM (2007)
24.
Zurück zum Zitat Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: SIGMOD, pp. 481–492. ACM (2012) Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: SIGMOD, pp. 481–492. ACM (2012)
25.
Zurück zum Zitat Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: EMNLP-CoNLL, vol. 7, pp. 708–716 (2007) Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: EMNLP-CoNLL, vol. 7, pp. 708–716 (2007)
26.
Zurück zum Zitat Witten, I., Milne, D.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 25–30. AAAI Press, Chicago (2008) Witten, I., Milne, D.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 25–30. AAAI Press, Chicago (2008)
27.
Zurück zum Zitat Alhelbawy, A., Gaizauskas, R.: Graph ranking for collective named entity disambiguation. In: ACL, pp. 75–80. ACL (2014) Alhelbawy, A., Gaizauskas, R.: Graph ranking for collective named entity disambiguation. In: ACL, pp. 75–80. ACL (2014)
28.
Zurück zum Zitat Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: EMNLP, pp. 782–792. ACL (2011) Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: EMNLP, pp. 782–792. ACL (2011)
29.
Zurück zum Zitat Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to wikipedia. In: ACL-HLT, pp. 1375–1384. ACL (2011) Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to wikipedia. In: ACL-HLT, pp. 1375–1384. ACL (2011)
30.
Zurück zum Zitat Li, P., Wang, H., Zhu, K.Q., Wang, Z., Wu, X.: Computing term similarity by large probabilistic ISA knowledge. In: CIKM, pp. 1401–1410. ACM (2013) Li, P., Wang, H., Zhu, K.Q., Wang, Z., Wu, X.: Computing term similarity by large probabilistic ISA knowledge. In: CIKM, pp. 1401–1410. ACM (2013)
31.
Zurück zum Zitat Wang, F., Wang, Z., Li, Z., Wen, J.-R.: Concept-based short text classification and ranking. In: CIKM, pp. 1069–1078. ACM (2014) Wang, F., Wang, Z., Li, Z., Wen, J.-R.: Concept-based short text classification and ranking. In: CIKM, pp. 1069–1078. ACM (2014)
32.
Zurück zum Zitat Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: SIGIR, pp. 334–342. ACM (2001) Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: SIGIR, pp. 334–342. ACM (2001)
33.
Zurück zum Zitat Cheng, X., Roth, D.: Relational inference for wikification. In: EMNLP 13. ACL (2013) Cheng, X., Roth, D.: Relational inference for wikification. In: EMNLP 13. ACL (2013)
34.
Zurück zum Zitat Chang, C.-C., Lin, C.-J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (ACM TIST) 2(3), 27 (2011) Chang, C.-C., Lin, C.-J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (ACM TIST) 2(3), 27 (2011)
Metadaten
Titel
Query Classification by Leveraging Explicit Concept Information
verfasst von
Fang Wang
Ze Yang
Zhoujun Li
Jianshe Zhou
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-49586-6_45