skip to main content
10.1145/502585.502609acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Using navigation data to improve IR functions in the context of web search

Published:05 October 2001Publication History

ABSTRACT

As part of the process of delivering content, devices like proxies and gateways log valuable information about the activities and navigation patterns of users on the Web. In this study, we consider how this navigation data can be used to improve Web search. A query posted to a search engine together with the set of pages accessed during a search task is known as a search session. We develop a mixture model for the observed set of search sessions, and propose variants of the classical EM algorithm for training. The model itself yields a type of navigation-based query clustering. By implicitly borrowing strength between related queries, the mixture formulation allows us to identify the "highly relevant" URLs for each query cluster. Next, we explore methods for incorporating existing labeled data (the Yahoo! directory, for example) to speed convergence and help resolve low-traffic clusters. Finally, the mixture formulation also provides for a simple, hierarchical display of search results based on the query clusters. The effectiveness of our approach is evaluated using proxy access logs for the outgoing Lucent proxy.

References

  1. 1.G. Attardi, A. Gulli, and F. Sebastiani. Theseus: categorization by context. In Proceedings of the Eighth Inteaataonal World Wide Web Conference (WWWS), Toronto, Canada, May 1999. Presented in the poster session.Google ScholarGoogle Scholar
  2. 2.M. Balabanovic and Y. Shoham. Fab: content-based, collaborative recommendation. Communications of the ACM, 40(3):66-72, Mar. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-8000), pages 407416, Boston, MA, Aug. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.P. S. Bradley, U. M. Fayyad, and C. A. Reina. Scaling clustering algorithms to large databases. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), pages 9-15, New York, NY, June 1998.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International World Wide Web Conference (WWW'I), pages 107-117, Brisbane, Australia, Apr. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.G. Culliss. User popularity ranked search engines. In The Search Engines Conference: Search Engines and Beyond: Developing Eficaent Knowledge Management Systems, Boston, MA, Apr. 1999.Google ScholarGoogle Scholar
  7. 7.J. Dean and M. R. Henzinger. Finding related web pages in the World Wide Web. In Proceedings of the Eighth International World Wide Web Conference (WWWS), pages 389-401, Toronto, Canada, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood for incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, 39(B):l-38, 1977.Google ScholarGoogle Scholar
  9. 9.P. B. Kantor, E. Boros, B. Melamed, V. Menkov, B. Shapira, and D. J. Neu. Capturing human intelligence in the Net. Communications of the ACM, 8(43):112-115, Aug. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.M. Kobayashi and K. Takeda. Information retrieval on the Web. ACM Computing Surveys, 32(2), June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.R. Lempel and S. Moran. The stochastic approach for link-structure analysis (SALSA) and the TKC effect. In Proceedings of the Ninth International World Wide Web Conference (WWWS), number 33, pages 387-401, Amsterdam, Netherlands, May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.D. S. Modha and W. S. Spangler. Clustering hypertext with applications to web searching. In Proceedings of the 11th ACM Conference on Hypertext and Hypermedia, pages 143-152, San Antonio, TX, May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.M. Sato and S. Ishii. On-line EM algorithm for the normalized gaussian network. Neural Computation, 12(2):407-432, Feb. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.E. Shriver and M. Hansen. Mining Web proxy logs: a user model of searching. Technical report, Bell Labs, 2001.Google ScholarGoogle Scholar
  15. 15.D. Sullivan. Nielsen//netratings search engine ratings, Feb. 2001. Avaliable at http://searchengineuatch.-com/ reports/netratings.html.Google ScholarGoogle Scholar
  16. 16.E. M. Voorhees, N. K. Gupta, and B. Johnson-Laird. Learning collection fusion strategies. In Proceedings of the 18th Annual International ACM/SIGIR Conference on Research and Deueloprnent in Information Retrieval, pages 172-179, Seattle, WA, July 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.E. M. Voorhees and R. M. Tong. Multiple search engines in database merging. In Proceedings of the Second ACM International Conference on Digital Libraries, pages 93-102, Philadelphia, PA, July 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.Y. Yang and X. Liu. A re-examination of text categorization methods. In Proceedings of SIGIR-99, 22nd ACM International Conference on Research and Development in Information Retrieval, pages 4249, Berkeley, CA, Aug. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: an efficient data clustering method for very large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 103-114, Montreal, Canada, June 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Using navigation data to improve IR functions in the context of web search

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            CIKM '01: Proceedings of the tenth international conference on Information and knowledge management
            October 2001
            616 pages
            ISBN:1581134363
            DOI:10.1145/502585

            Copyright © 2001 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 5 October 2001

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate1,861of8,427submissions,22%

            Upcoming Conference

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader