skip to main content
10.1145/1031171.1031192acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Optimizing web search using web click-through data

Authors Info & Claims
Published:13 November 2004Publication History

ABSTRACT

The performance of web search engines may often deteriorate due to the diversity and noisy information contained within web pages. User click-through data can be used to introduce more accurate description (metadata) for web pages, and to improve the search performance. However, noise and incompleteness, sparseness, and the volatility of web pages and queries are three major challenges for research work on user click-through log mining. In this paper, we propose a novel iterative reinforced algorithm to utilize the user click-through data to improve search performance. The algorithm fully explores the interrelations between queries and web pages, and effectively finds "virtual queries" for web pages and overcomes the challenges discussed above. Experiment results on a large set of MSN click-through log data show a significant improvement on search performance over the naive query log mining algorithm as well as the baseline search engine.

References

  1. Bernard J. Jansen, Amanda Spink, Judy Bateman, and Tefko Saracevic. Real life information retrieval: a study of user queries on the Web, ACM SIGIR Forum, v.32 n.1, p.5--17, Spring 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Brian D.D., David, G.D., and David B.L. Finding Relevant Website Queries, in Proceedings of the Twelfth International World Wide Web Conference, 2003.Google ScholarGoogle Scholar
  3. Chien-Kang Huang, Lee-Feng Chien, and Yen-Jen Oyang. Relevant term suggestion in interactive web search based on contextual information in query session logs. JASIST 54(7): 638--649,2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cui H., Wen J.R., Nie J.Y., and Ma W.Y., Query Expansion by Mining User Logs, IEEE Transaction on Knowledge and Data Engineering, Vol. 15, No. 4, July/August 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 407--415, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Funas, G.W., Landauer,T.K., Gomez,L.M. and Dumais, S.T. 1987. The vocabulary problem in human-system communication. Communications of the ACM 20,11, Pages 946--971, Nov.1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Jeh and J. Widom. SimRank: A measure of structural-context similarity. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, July 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Salton and C. Buckley. On the use of spreading activation methods in automatic information, in Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval, p.147--160, Grenoble, France, May 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Small. Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24:265--269, 1973.Google ScholarGoogle ScholarCross RefCross Ref
  10. J.-R. Wen, J.-Y. Nie, and H.-J. Zhang. Clustering user queries of a search engine. In Proceedings of the Tenth International World Wide Web Conference, Hong Kong, May 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Joachims T. Optimizing Search Engine using Clickthrough Data. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14:10--25, 1963.Google ScholarGoogle ScholarCross RefCross Ref
  13. MSN Search Engine, http://www.msn.com.Google ScholarGoogle Scholar
  14. Nick C., David H., and Stephen R. Effective Site Finding using Link Anchor Information, ACM SIGIR'01, New Orleans, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Nicolas J. Belkin, Helping people find what they don't know, Communications of the ACM, v.43 n.8, p.58--61, Aug. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Porter, M. An algorithm for suffix stripping. Program, Vol. 14(3), pp. 130--137, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  17. R. Baeza-Yates and B.Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Robertson, S.E. et al. Okapi at TREC-3. In Overview of the Third Text REtrieval Conference(TREC-3), 109--126, 1995.Google ScholarGoogle Scholar
  19. R. R. Larson. Bibliometrics of the World-Wide Web: An exploratory analysis of the intellectual structure of cyberspace. In Proceedings of the Annual Meeting of the American Society for Information Science, Baltimore, Maryland, October 1996.Google ScholarGoogle Scholar
  20. S. Brin and L. Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, in Proceedings of the 7th international World Wide Web Conference. Vol.7, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Chakrabarti et al., Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text, in: Proceedings of the 7th International World Wide Web Conference, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Thijs W., Wessel K., and Djoerd H., Retrieving Web Pages using Content, Links, URLs and Anchors, TREC10, 2002.Google ScholarGoogle Scholar
  23. V. V. Raghavan and H. Sever. On the reuse of past optimal queries. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pages 344--350, Seattle, WA, July 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimizing web search using web click-through data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management
        November 2004
        678 pages
        ISBN:1581138741
        DOI:10.1145/1031171

        Copyright © 2004 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 November 2004

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader