ABSTRACT
Effective organization of search results is critical for improving the utility of any search engine. Clustering search results is an effective way to organize search results, which allows a user to navigate into relevant documents quickly. However, two deficiencies of this approach make it not always work well: (1) the clusters discovered do not necessarily correspond to the interesting aspects of a topic from the user's perspective; and (2) the cluster labels generated are not informative enough to allow a user to identify the right cluster. In this paper, we propose to address these two deficiencies by (1) learning "interesting aspects" of a topic from Web search logs and organizing search results accordingly; and (2) generating more meaningful cluster labels using past query words entered by users. We evaluate our proposed method on a commercial search engine log data. Compared with the traditional methods of clustering search results, our method can give better result organization and more meaningful labels.
- E. Agichtein, E. Brill, and S. T. Dumais. Improving web search ranking by incorporating user behavior information. In SIGIR, pages 19--26, 2006. Google ScholarDigital Library
- J. A. Aslam, E. Pelekov, and D. Rus. The star clustering algorithm for static and dynamic information organization. Journal of Graph Algorithms and Applications, 8(1):95--129, 2004.Google ScholarCross Ref
- R. A. Baeza-Yates. Applications of web query mining. In ECIR, pages 7--22, 2005. Google ScholarDigital Library
- D. Beeferman and A. L. Berger. Agglomerative clustering of a search engine query log. In KDD, pages 407--416, 2000. Google ScholarDigital Library
- D. Carmel, E. Yom-Tov, A. Darlow, and D. Pelleg. What makes a query difficult? In SIGIR, pages 390--397, 2006. Google ScholarDigital Library
- H. Chen and S. T. Dumais. Bringing order to the web: automatically categorizing search results. In CHI, pages 145--152, 2000. Google ScholarDigital Library
- S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In Proceedings of ACM SIGIR 2002, pages 299--306, 2002. Google ScholarDigital Library
- S. T. Dumais, E. Cutrell, and H. Chen. Optimizing search by showing results in context. In CHI, pages 277--284, 2001. Google ScholarDigital Library
- M. A. Hearst and J. O. Pedersen. Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In SIGIR, pages 76--84, 1996. Google ScholarDigital Library
- T. Joachims. Optimizing search engines using clickthrough data. In KDD, pages 133--142, 2002. Google ScholarDigital Library
- T. Joachims. Evaluating Retrieval Performance Using Clickthrough Data., pages 79--96. Physica/Springer Verlag, 2003. in J. Franke and G. Nakhaeizadeh and I. Renz, "Text Mining".Google Scholar
- R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In WWW, pages 387--396, 2006. Google ScholarDigital Library
- K. Kummamuru, R. Lotlikar, S. Roy, K. Singal, and R. Krishnapuram. A hierarchical monothetic document clustering algorithm for summarization and browsing search results. In WWW, pages 658--665, 2004. Google ScholarDigital Library
- Microsoft Live Labs. Accelerating search in academic research, 2006. http://research.microsoft.com/ur/us/fundingopps/RFPs/Search 2006 RFP.aspx.Google Scholar
- P. Pirolli, P. K. Schank, M. A. Hearst, and C. Diehl. Scatter/gather browsing communicates the topic structure of a very large text collection. In CHI, pages 213--220, 1996. Google ScholarDigital Library
- F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. In KDD, pages 239--248, 2005. Google ScholarDigital Library
- S. E. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR, pages 232--241, 1994. Google ScholarDigital Library
- G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Commun. ACM, 18(11):613--620, 1975. Google ScholarDigital Library
- X. Shen, B. Tan, and C. Zhai. Context-sensitive information retrieval using implicit feedback. In SIGIR, pages 43--50, 2005. Google ScholarDigital Library
- C. J. van Rijsbergen. Information Retrieval, second edition. Butterworths, London, 1979. Google ScholarDigital Library
- V. N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, Berlin, 1995. Google ScholarDigital Library
- Vivisimo. http://vivisimo.com/.Google Scholar
- X. Wang, J.-T. Sun, Z. Chen, and C. Zhai. Latent semantic analysis for multiple-type interrelated data objects. In SIGIR, pages 236--243, 2006. Google ScholarDigital Library
- J.-R. Wen, J.-Y. Nie, and H. Zhang. Clustering user queries of a search engine. In WWW, pages 162--168, 2001. Google ScholarDigital Library
- E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In SIGIR, pages 512--519, 2005. Google ScholarDigital Library
- O. Zamir and O. Etzioni. Web document clustering: A feasibility demonstration. In SIGIR, pages 46--54, 1998. Google ScholarDigital Library
- O. Zamir and O. Etzioni. Grouper: A dynamic clustering interface to web search results. Computer Networks, 31(11-16):1361--1374, 1999. Google ScholarDigital Library
- H.-J. Zeng, Q.-C. He, Z. Chen, W.-Y. Ma, and J. Ma. Learning to cluster web search results. In SIGIR, pages 210--217, 2004. Google ScholarDigital Library
Index Terms
- Learn from web search logs to organize search results
Recommendations
Learning to cluster web search results
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalOrganizing Web search results into clusters facilitates users' quick browsing through search results. Traditional clustering techniques are inadequate since they don't generate clusters with highly readable names. In this paper, we reformalize the ...
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementThis work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Topic-driven web search result organization by leveraging wikipedia semantic knowledge
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge managementEffective organization of web search results can greatly improve the utility of search engine and enhance the quality of search results. However, the organization of search results is difficult because the sub-topics of a query are usually not ...
Comments