Skip to main content
Erschienen in: Discover Computing 4/2013

01.08.2013 | Search Intents and Diversification

Mining subtopics from text fragments for a web query

verfasst von: Qinglei Wang, Yanan Qian, Ruihua Song, Zhicheng Dou, Fan Zhang, Tetsuya Sakai, Qinghua Zheng

Erschienen in: Discover Computing | Ausgabe 4/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Web search queries are often ambiguous or faceted, and the task of identifying the major underlying senses and facets of queries has received much attention in recent years. We refer to this task as query subtopic mining. In this paper, we propose to use surrounding text of query terms in top retrieved documents to mine subtopics and rank them. We first extract text fragments containing query terms from different parts of documents. Then we group similar text fragments into clusters and generate a readable subtopic for each cluster. Based on the cluster and the language model trained from a query log, we calculate three features and combine them into a relevance score for each subtopic. Subtopics are finally ranked by balancing relevance and novelty. Our evaluation experiments with the NTCIR-9 INTENT Chinese Subtopic Mining test collection show that our method significantly outperforms a query log based method proposed by Radlinski et al. (2010) and a search result clustering based method proposed by Zeng et al. (2004) in terms of precision, I-rec, D-nDCG and D#-nDCG, the official evaluation metrics used at the NTCIR-9 INTENT task. Moreover, our generated subtopics are significantly more readable than those generated by the search result clustering method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Agrawal, R., Gollapudi, S., Halverson, A., &, Ieong, S. (2009). Diversifying search results. In Proceedings of the second ACM international conference on web search and data mining, ACM, pp. 5–14. Agrawal, R., Gollapudi, S., Halverson, A., &, Ieong, S. (2009). Diversifying search results. In Proceedings of the second ACM international conference on web search and data mining, ACM, pp. 5–14.
Zurück zum Zitat Beeferman, D., &, Berger, A. (2000). Agglomerative clustering of a search engine query log. In Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp. 407–416. Beeferman, D., &, Berger, A. (2000). Agglomerative clustering of a search engine query log. In Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp. 407–416.
Zurück zum Zitat Carbonell, J., &, Goldstein, J. (1998). The use of Mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 335–336). SIGIR ’98. New York, NY: ACM, ISBN 1-58113-015-5. doi:http://doi.acm.org/10.1145/290941.291025. Carbonell, J., &, Goldstein, J. (1998). The use of Mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 335–336). SIGIR ’98. New York, NY: ACM, ISBN 1-58113-015-5. doi:http://​doi.​acm.​org/​10.​1145/​290941.​291025.
Zurück zum Zitat Chandar, P., &, Carterette, B. (2010). Diversification of search results using webgraphs. In Proceeding of the 33rd international ACM SIGIR conference on research and development in information retrieval, ACM, pp. 869–870. Chandar, P., &, Carterette, B. (2010). Diversification of search results using webgraphs. In Proceeding of the 33rd international ACM SIGIR conference on research and development in information retrieval, ACM, pp. 869–870.
Zurück zum Zitat Chen, H., &, Dumais, S. (2000). Bringing order to the web: Automatically categorizing search results. In Proceedings of the SIGCHI conference on Human factors in computing systems, ACM, pp. 145–152. Chen, H., &, Dumais, S. (2000). Bringing order to the web: Automatically categorizing search results. In Proceedings of the SIGCHI conference on Human factors in computing systems, ACM, pp. 145–152.
Zurück zum Zitat Clarke, C. L., Craswell, N., &, Soboroff, I. (2009). Overview of the trec 2009 web track, technical report, DTIC document. Clarke, C. L., Craswell, N., &, Soboroff, I. (2009). Overview of the trec 2009 web track, technical report, DTIC document.
Zurück zum Zitat Clarke, C. L. A., Craswell, N., Soboroff, I., &, Ashkan, A. (2011). A comparative analysis of cascade measures for novelty and diversity. In Proceedings of the fourth ACM international conference on Web search and data mining, ACM, pp. 75–84. Clarke, C. L. A., Craswell, N., Soboroff, I., &, Ashkan, A. (2011). A comparative analysis of cascade measures for novelty and diversity. In Proceedings of the fourth ACM international conference on Web search and data mining, ACM, pp. 75–84.
Zurück zum Zitat Clough, P., Sanderson, M., Abouammoh, M., Navarro, S., &, Paramita, M. (2009). Multiple approaches to analysing query diversity. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, ACM, pp. 734–735. Clough, P., Sanderson, M., Abouammoh, M., Navarro, S., &, Paramita, M. (2009). Multiple approaches to analysing query diversity. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, ACM, pp. 734–735.
Zurück zum Zitat Cutting, D. R., Karger, D. R., &, Pedersen, J. O. (1993). Constant interaction-time scatter/gather browsing of very large document collections. In Proceedings of the 16th annual international ACM SIGIR conference on research and development in information retrieval, ACM, pp. 126–134. Cutting, D. R., Karger, D. R., &, Pedersen, J. O. (1993). Constant interaction-time scatter/gather browsing of very large document collections. In Proceedings of the 16th annual international ACM SIGIR conference on research and development in information retrieval, ACM, pp. 126–134.
Zurück zum Zitat Ferragina, P., &, Gulli, A. (2008). A personalized search engine based on web-snippet hierarchical clustering. Software: Practice and Experience 38(2), 189–225.CrossRef Ferragina, P., &, Gulli, A. (2008). A personalized search engine based on web-snippet hierarchical clustering. Software: Practice and Experience 38(2), 189–225.CrossRef
Zurück zum Zitat Gale, W.A., &, Sampson, G. (1995). Good-turing frequency estimation without tears*. Journal of Quantitative Linguistics 2(3), 217–237.CrossRef Gale, W.A., &, Sampson, G. (1995). Good-turing frequency estimation without tears*. Journal of Quantitative Linguistics 2(3), 217–237.CrossRef
Zurück zum Zitat Geraci, F., Pellegrini, M., Pisati, P., &, Sebastiani, F. (2006). A scalable algorithm for high-quality clustering ofweb snippets. In Proceedings of the 2006 ACM symposium on applied computing, ACM, pp. 1058–1062. Geraci, F., Pellegrini, M., Pisati, P., &, Sebastiani, F. (2006). A scalable algorithm for high-quality clustering ofweb snippets. In Proceedings of the 2006 ACM symposium on applied computing, ACM, pp. 1058–1062.
Zurück zum Zitat Gollapudi, S., &, Sharma, A. (2009). An axiomatic approach for result diversification. In Proceedings of the 18th international conference on world wide web, ACM, pp. 381–390. Gollapudi, S., &, Sharma, A. (2009). An axiomatic approach for result diversification. In Proceedings of the 18th international conference on world wide web, ACM, pp. 381–390.
Zurück zum Zitat Hearst, M., Pedersen, J., &, Karger, D. (1995). Scatter/gather as a tool for the analysis of retrieval results. In Working notes of the AAAI fall symposium on AI applications in knowledge navigation. Hearst, M., Pedersen, J., &, Karger, D. (1995). Scatter/gather as a tool for the analysis of retrieval results. In Working notes of the AAAI fall symposium on AI applications in knowledge navigation.
Zurück zum Zitat Hearst, M. A., &, Pedersen, J. O. (1996). Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval, ACM, pp. 76–84. Hearst, M. A., &, Pedersen, J. O. (1996). Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval, ACM, pp. 76–84.
Zurück zum Zitat Ji, X., &, Bailey, J. (2007). An efficient technique for mining approximately frequent substring patterns. In Data mining workshops, 2007. ICDM workshops 2007. Seventh IEEE international conference on, IEEE, pp. 325–330. Ji, X., &, Bailey, J. (2007). An efficient technique for mining approximately frequent substring patterns. In Data mining workshops, 2007. ICDM workshops 2007. Seventh IEEE international conference on, IEEE, pp. 325–330.
Zurück zum Zitat Koshman, S., Spink, A., &, Jansen, B. J. (2006). Web searching on the vivisimo search engine. Journal of the American Society for Information Science and Technology 57(14), 1875–1887.CrossRef Koshman, S., Spink, A., &, Jansen, B. J. (2006). Web searching on the vivisimo search engine. Journal of the American Society for Information Science and Technology 57(14), 1875–1887.CrossRef
Zurück zum Zitat Leouski, A. V. (2005). An evaluation of techniques for clustering search results, technical report, DTIC document. Leouski, A. V. (2005). An evaluation of techniques for clustering search results, technical report, DTIC document.
Zurück zum Zitat Leuski, A., &, Allan, J. (2000). Improving interactive retrieval by combining ranked lists and clustering. In Proceedings of RIAO, vol. 2000. Leuski, A., &, Allan, J. (2000). Improving interactive retrieval by combining ranked lists and clustering. In Proceedings of RIAO, vol. 2000.
Zurück zum Zitat Li, X., Wang, Y. Y., &, Acero, A. (2008). Learning query intent from regularized click graphs. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, ACM, pp. 339–346. Li, X., Wang, Y. Y., &, Acero, A. (2008). Learning query intent from regularized click graphs. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, ACM, pp. 339–346.
Zurück zum Zitat Osinski, S., &, Weiss, D. (2005). A concept-driven algorithm for clustering search results. Intelligent Systems, IEEE 20(3), 48–54.CrossRef Osinski, S., &, Weiss, D. (2005). A concept-driven algorithm for clustering search results. Intelligent Systems, IEEE 20(3), 48–54.CrossRef
Zurück zum Zitat Radlinski, F., Szummer, M., &, Craswell, N. (2010). Inferring query intent from reformulations and clicks. In Proceedings of the 19th international conference on world wide web, ACM, pp. 1171–1172. Radlinski, F., Szummer, M., &, Craswell, N. (2010). Inferring query intent from reformulations and clicks. In Proceedings of the 19th international conference on world wide web, ACM, pp. 1171–1172.
Zurück zum Zitat Rafiei, D., Bharat, K., &, Shukla, A. (2010). Diversifying web search results. In Proceedings of the 19th international conference on world wide web, ACM, pp. 781–790. Rafiei, D., Bharat, K., &, Shukla, A. (2010). Diversifying web search results. In Proceedings of the 19th international conference on world wide web, ACM, pp. 781–790.
Zurück zum Zitat Robertson, S. E., &, Jones, K. S. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science 27(3), 129–146. Robertson, S. E., &, Jones, K. S. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science 27(3), 129–146.
Zurück zum Zitat Sahami, M., &, Heilman, T. D. (2006). A web-based kernel function for measuring the similarity of short text snippets. In Proceedings of the 15th international conference on world wide web, ACM, pp. 377–386. Sahami, M., &, Heilman, T. D. (2006). A web-based kernel function for measuring the similarity of short text snippets. In Proceedings of the 15th international conference on world wide web, ACM, pp. 377–386.
Zurück zum Zitat Sakai, T. (2011). NTCIREVAL: A generic toolkit for information access evaluation. In Proceedings of the forum on information technology 2011, vol. 2, pp. 23–30 Sakai, T. (2011). NTCIREVAL: A generic toolkit for information access evaluation. In Proceedings of the forum on information technology 2011, vol. 2, pp. 23–30
Zurück zum Zitat Sakai, T., &, Song, R. (2011). Evaluating diversified search results using per-intent graded relevance. in Proceedings of the 34th international ACM SIGIR conference on research and development in Information, ACM, pp. 1043–1052. Sakai, T., &, Song, R. (2011). Evaluating diversified search results using per-intent graded relevance. in Proceedings of the 34th international ACM SIGIR conference on research and development in Information, ACM, pp. 1043–1052.
Zurück zum Zitat Santos, R. L. T., Macdonald, C., &, Ounis, I. (2010a). Exploiting query reformulations for web search result diversification. In Proceedings of the 19th international conference on world wide web, ACM, pp. 881–890. Santos, R. L. T., Macdonald, C., &, Ounis, I. (2010a). Exploiting query reformulations for web search result diversification. In Proceedings of the 19th international conference on world wide web, ACM, pp. 881–890.
Zurück zum Zitat Santos, R. L. T., Macdonald, C., &, Ounis, I. (2010b). Selectively diversifying web search results, in Proceedings of the 19th ACM international conference on information and knowledge management, ACM, pp. 1179–1188. Santos, R. L. T., Macdonald, C., &, Ounis, I. (2010b). Selectively diversifying web search results, in Proceedings of the 19th ACM international conference on information and knowledge management, ACM, pp. 1179–1188.
Zurück zum Zitat Sibson, R. (1973). Slink: An optimally efficient algorithm for the single-link cluster method. The Computer Journal 16(1), 30–34.MathSciNetCrossRef Sibson, R. (1973). Slink: An optimally efficient algorithm for the single-link cluster method. The Computer Journal 16(1), 30–34.MathSciNetCrossRef
Zurück zum Zitat Song, R., Wen, J. R., Shi, S., Xin, G., Liu, T. Y., Qin, T., Zheng, X., Zhang, J., Xue, G., &, Ma, W. Y. (2004). Microsoft research Asia at web track and terabyte track of trec 2004. in Proceedings of the thirteenth text retrieval conference proceedings (TREC-2004). Song, R., Wen, J. R., Shi, S., Xin, G., Liu, T. Y., Qin, T., Zheng, X., Zhang, J., Xue, G., &, Ma, W. Y. (2004). Microsoft research Asia at web track and terabyte track of trec 2004. in Proceedings of the thirteenth text retrieval conference proceedings (TREC-2004).
Zurück zum Zitat Song, R., Zhang, M., Sakai, T., Kato, M. P., Liu, Y., Sugimoto, M., Wang, Q., &, Orii, N. (2011). Overview of the NTCIR-9 intent task. In Proceedings. of the 9th NTCIR workshop meeting on evaluation of information access technologies. Song, R., Zhang, M., Sakai, T., Kato, M. P., Liu, Y., Sugimoto, M., Wang, Q., &, Orii, N. (2011). Overview of the NTCIR-9 intent task. In Proceedings. of the 9th NTCIR workshop meeting on evaluation of information access technologies.
Zurück zum Zitat Strohmaier, M., Kröll, M., &, Körner, C. (2009). Intentional query suggestion: Making user goals more explicit during search. In Proceedings of the 2009 workshop on web search click data, ACM, pp. 68–74. Strohmaier, M., Kröll, M., &, Körner, C. (2009). Intentional query suggestion: Making user goals more explicit during search. In Proceedings of the 2009 workshop on web search click data, ACM, pp. 68–74.
Zurück zum Zitat Wang, X., &, Zhai, C. X. (2007). Learn from web search logs to organize search results. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, ACM, pp. 87–94. Wang, X., &, Zhai, C. X. (2007). Learn from web search logs to organize search results. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, ACM, pp. 87–94.
Zurück zum Zitat Zamir, O., &, Etzioni, O. (1998). Web document clustering: A feasibility demonstration. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, ACM, pp. 46–54. Zamir, O., &, Etzioni, O. (1998). Web document clustering: A feasibility demonstration. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, ACM, pp. 46–54.
Zurück zum Zitat Zamir, O., &, Etzioni, O. (1999), Grouper: A dynamic clustering interface to web search results. Computer Networks 31(11–16), 1361–1374.CrossRef Zamir, O., &, Etzioni, O. (1999), Grouper: A dynamic clustering interface to web search results. Computer Networks 31(11–16), 1361–1374.CrossRef
Zurück zum Zitat Zeng, H. J., He, Q. C., Chen, Z., Ma, W. Y., &, Ma, J. (2004). Learning to cluster web search results. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, ACM, pp. 210–217. Zeng, H. J., He, Q. C., Chen, Z., Ma, W. Y., &, Ma, J. (2004). Learning to cluster web search results. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, ACM, pp. 210–217.
Zurück zum Zitat Zhai, C. X., Cohen, W. W., &, Lafferty, J. (2003). Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, ACM, pp. 10–17. Zhai, C. X., Cohen, W. W., &, Lafferty, J. (2003). Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, ACM, pp. 10–17.
Zurück zum Zitat Zheng, W., &, Fang, H. (2010). University of Delaware at diverstiy task of web track 2010. In Proceedings of TREC, vol. 10. Zheng, W., &, Fang, H. (2010). University of Delaware at diverstiy task of web track 2010. In Proceedings of TREC, vol. 10.
Zurück zum Zitat Zheng, W., &, Fang, H. (2011). A comparative study of search result diversification methods. In Proceedings of DDR 11. Zheng, W., &, Fang, H. (2011). A comparative study of search result diversification methods. In Proceedings of DDR 11.
Zurück zum Zitat Zheng, W., Wang, X., Fang, H., &, Cheng, H. (2011). An exploration of pattern-based subtopic modeling for search result diversification. in Proceedings of JCDL, vol. 11. Zheng, W., Wang, X., Fang, H., &, Cheng, H. (2011). An exploration of pattern-based subtopic modeling for search result diversification. in Proceedings of JCDL, vol. 11.
Metadaten
Titel
Mining subtopics from text fragments for a web query
verfasst von
Qinglei Wang
Yanan Qian
Ruihua Song
Zhicheng Dou
Fan Zhang
Tetsuya Sakai
Qinghua Zheng
Publikationsdatum
01.08.2013
Verlag
Springer Netherlands
Erschienen in
Discover Computing / Ausgabe 4/2013
Print ISSN: 2948-2984
Elektronische ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-013-9221-8

Weitere Artikel der Ausgabe 4/2013

Discover Computing 4/2013 Zur Ausgabe

Search Intents and Diversification

Increasing evaluation sensitivity to diversity