ABSTRACT
Conventional search engines usually return a ranked list of web pages in response to a query. Users have to visit several pages to locate the relevant parts. A promising future search scenario should involve: (1) understanding user intents; (2) providing relevant information directly to satisfy searchers' needs, as opposed to relevant pages. In this paper, we present a search paradigm to summarize a query's information from different aspects. Query aspects could be aligned to user intents. The generated summaries for query aspects are expected to be both specific and informative, so that users can easily and quickly find relevant information. Specifically, we use a Composite Query for Summarization" method, where a set of component queries are used for providing additional information for the original query. The system leverages the search engine to proactively gather information by submitting multiple component queries according to the original query and its aspects. In this way, we could get more relevant information for each query aspect and roughly classify information. By comparative mining the search results of different component queries, it is able to identify query (dependent) aspect words, which help to generate more specific and informative summaries. The experimental results on two data sets, Wikipedia and TREC ClueWeb2009, are encouraging. Our method outperforms two baseline methods on generating informative summaries.
- H. Chen and S. T. Dumais. Bringing order to the web: automatically categorizing search results. In CHI, pages 145--152, 2000. Google ScholarDigital Library
- V. Dang, X. Xue, and W. B. Croft. Inferring query aspects from reformulations using clustering. In Proceedings of the 20th ACM international conference on Information and knowledge management, CIKM '11, pages 2117--2120, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. JOURNAL OF THE ROYAL STATISTICAL SOCIETY, SERIES B, 39(1):1--38, 1977.Google Scholar
- Z. Dou, S. Hu, K. Chen, R. Song, and J.-R. Wen. Multi-dimensional search result diversification. In Proceedings of the 4th ACM WSDM, pages 475--484, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- J. Goldstein, V. Mittal, J. Carbonell, and M. Kantrowitz. Multi-document summarization by sentence extraction. In Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization, pages 40--48, Stroudsburg, USA, 2000. Google ScholarDigital Library
- M. A. Hearst. Clustering versus faceted categories for information exploration. Commun. ACM, 49:59--61, April 2006. Google ScholarDigital Library
- M. Hu and B. Liu. Mining and summarizing customer reviews. In W. Kim, R. Kohavi, J. Gehrke, and W. DuMouchel, editors, Proceedings of the 10th ACM SIGKDD, Seattle, Washington, USA, August 22--25, 2004, pages 168--177. ACM, 2004. Google ScholarDigital Library
- K. Jarvelin and J. Kekalainen. Ir evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd annual international ACM SIGIR, pages 41--48, New York, NY, USA, 2000. ACM. Google ScholarDigital Library
- K. Kummamuru, R. Lotlikar, S. Roy, K. Singal, and R. Krishnapuram. A hierarchical monothetic document clustering algorithm for summarization and browsing search results. In Proceedings of the 13th international conference on WWW, pages 658--665, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
- B. Y.-L. Kuo, T. Hentrich, B. M. . Good, and M. D. Wilkinson. Tag clouds for summarizing web search results. In Proceedings of the 16th ACM WWW, pages 1203--1204, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- D. J. Lawrie and W. B. Croft. Generating hierarchical summaries for web searches. In Proceedings of the 26th ACM SIGIR, pages 457--458, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- C.-Y. Lin and E. Hovy. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 Conference of the NAACL - Volume 1, pages 71--78, Stroudsburg, PA, USA, 2003. Association for Computational Linguistics. Google ScholarDigital Library
- X. Ling, Q. Mei, C. Zhai, and B. Schatz. Mining multi-faceted overviews of arbitrary topics in a text collection. In Proceeding of the 14th ACM SIGKDD, pages 497--505, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- Q. Mei, X. Ling, M. Wondra, H. Su, and C. Zhai. Topic sentiment mixture: modeling facets and opinions in weblogs. In Proceedings of the 16th international conference on World Wide Web, pages 171--180, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- A. Nenkova, L. Vanderwende, and K. McKeown. A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization. In Proceedings of the 29th Annual International ACM SIGIR, Seattle, Washington, USA, pages 573--580. ACM, 2006. Google ScholarDigital Library
- S. Robertson and H. Zaragoza. The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr., 3:333--389, April 2009. Google ScholarDigital Library
- C. Shen, D. Wang, and T. Li. Topic aspect analysis for multi-document summarization. In Proceedings of the 19th ACM CIKM, pages 1545--1548, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- R. Song, M. Zhang, T. Sakai, M. Kato, Y. Liu, M. Sugimoto, Q. Wang, and N. Orii. Overview of the ntcir-9 intent task. In NTCIR-9 Proceedings, pages 82--105. Morgan and Claypool, December 2011.Google Scholar
- A. Tombros and M. Sanderson. Advantages of query biased summaries in information retrieval. In Proceedings of the 21st ACM SIGIR}, pages 2--10, New York, NY, USA, 1998. ACM. Google ScholarDigital Library
- C. Wang, F. Jing, L. Zhang, and H.-J. Zhang. Learning query-biased web page summarization. In Proceedings of the 6th ACM CIKM, pages 555--562, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- D. Wang, S. Zhu, T. Li, and Y. Gong. Multi-document summarization using sentence-based topic models. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 297--300, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics. Google ScholarDigital Library
- X. Wang, D. Chakrabarti, and K. Punera. Mining broad latent query aspects from search sessions. In Proceedings of the 15th ACM SIGKDD, pages 867--876, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- X. Wang and C. Zhai. Learn from web search logs to organize search results. In Proceedings of the 30th annual international ACM SIGIR, pages 87--94, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- R. White and R. Roth. Exploratory search. beyond the query-response paradigm. In Synthesis Lectures on Information Concepts, Retrieval, and Services Series, Gary Marchionini (ed.), vol. 3. Morgan and Claypool, 2009. Google ScholarDigital Library
- F. Wu, J. Madhavan, and A. Halevy. Identifying aspects for web-search queries. In Journal of Artificial Intelligence Research, pages 677--700, 2011 (40). Google ScholarDigital Library
- W.-t. Yih, J. Goodman, L. Vanderwende, and H. Suzuki. Multi-document summarization by maximizing informative content-words. In Proceedings of the 20th IJCAI, pages 1776--1782, San Francisco, CA, USA, 2007. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
- H.-J. Zeng, Q.-C. He, Z. Chen, W.-Y. Ma, and J. Ma. Learning to cluster web search results. In Proceedings of the 27th annual international ACM SIGIR, pages 210--217, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
Index Terms
- Multi-aspect query summarization by composite query
Recommendations
Finding dimensions for queries
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementWe address the problem of finding multiple groups of words or phrases that explain the underlying query facets, which we refer to as query dimensions. We assume that the important aspects of a query are usually presented and repeated in the query's top ...
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementThis work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Visual query suggestion: Towards capturing user intent in internet image search
Query suggestion is an effective approach to bridge the Intention Gap between the users' search intents and queries. Most existing search engines are able to automatically suggest a list of textual query terms based on users' current query input, which ...
Comments