skip to main content
10.1145/1458082.1458176acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs

Published:26 October 2008Publication History

ABSTRACT

Most analysis of web search relevance and performance takes a single query as the unit of search engine interaction. When studies attempt to group queries together by task or session, a timeout is typically used to identify the boundary. However, users query search engines in order to accomplish tasks at a variety of granularities, issuing multiple queries as they attempt to accomplish tasks. In this work we study real sessions manually labeled into hierarchical tasks, and show that timeouts, whatever their length, are of limited utility in identifying task boundaries, achieving a maximum precision of only 70%. We report on properties of this search task hierarchy, as seen in a random sample of user interactions from a major web search engine's log, annotated by human editors, learning that 17% of tasks are interleaved, and 20% are hierarchically organized. No previous work has analyzed or addressed automatic identification of interleaved and hierarchically organized search tasks. We propose and evaluate a method for the automated segmentation of users' query streams into hierarchical units. Our classifiers can improve on timeout segmentation, as well as other previously published approaches, bringing the accuracy up to 92% for identifying fine-grained task boundaries, and 89-97% for identifying pairs of queries from the same task when tasks are interleaved hierarchically. This is the first work to identify, measure and automatically segment sequences of user queries into their hierarchical structure. The ability to perform this kind of segmentation paves the way for evaluating search engines in terms of user task completion.

References

  1. Comscore announces new "visits" metric for measuring user engagement, 2007. http://www.comscore.com/press/release.asp?press=1246.Google ScholarGoogle Scholar
  2. P. Anick. Using terminological feedback for web search refinement - a log-based study. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, pages 88--95, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. G. Anick. Automatic Construction of Faceted Terminological Feedback for Context-Based Information Retrieval. PhD thesis, Brandeis University, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Catledge and J. Pitkow. Characterizing browsing strategies in the world-wide web. In Proceedings of the Third International World-Wide Web Conference on Technology, tools and applications, volume 27, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Downey, S. Dumais, and E. Horvitz. Models of searching and browsing: Languages, studies, and applications. Journal of the American Society for Information Science and Technology (JASIST), 58(6):862--871, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. He, A. Goker, and D. J. Harper. Combining evidence for automatic web session identification. Information Processing and Management, 38:727--742, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. B. Huffman and M. Hochster. How well does result relevance predict session satisfaction? In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2007), pages 567--574, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. J. Jansen, A. Spink, C. Blakely, and S. Koshman. Defining a session on web search engines. Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), 2000.Google ScholarGoogle Scholar
  9. T. Lau and E. Horvitz. Patterns of search: Analyzing and modeling web query refinement. In A. Press, editor, Proceedings of the Seventh International Conference on User Modeling, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. D. Manning and H. Schutze. Foundations of Statistical Natural Language Processing. MIT Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Montgomery and C. Faloutsos. Identifying web browsing trends and patterns. IEEE Computer, 34(7):94--95, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. C. Ozmutlu and F. Cavdur. Application of automatic topic identification on excite web search engine data logs. Information Processing and Management, 41(5):1243--1262, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. C. Ozmutlu, F. Cavdur, A. Spink, and S. Ozmutlu. Investigating the performance of automatic new topic identification across multiple datasets. In Proceedings 69th Annual Meeting of the American Society for Information Science and Technology (ASIST) 43, Austin (US), 2006.Google ScholarGoogle ScholarCross RefCross Ref
  14. S. Ozmutlu. Automatic new topic identification using multiple linear regression. Information Processing and Management, 42(4):934--950, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. In R. Grossman, R. Bayardo, and K. P. Bennett, editors, KDD, pages 239--248. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. W. Silverman. Density Estimation. Chapman and Hall, London.Google ScholarGoogle Scholar
  17. C. Silverstein, M. R. Henzinger, H. Marais, and M. Moricz. Analysis of a very large web search engine query log. ACM SIGIR Forum, 33(1):6--12, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Spink, B. J. Jansen, and H. C. Ozmultu. Use of query reformulation and relevance feedback by Excite users. Internet Research: Electronic Networking Applications and Policy, 10(4):317--328, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  19. A. Spink, M. Park, B. J. Jansen, and J. Pedersen. Multitasking during web search sessions. Inf. Process. Manage., 42(1):264--275, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Teevan, E. Adar, R. Jones, and M. Potts. History repeats itself: Repeat queries in Yahoo's logs. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 703--704, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
      October 2008
      1562 pages
      ISBN:9781595939913
      DOI:10.1145/1458082

      Copyright © 2008 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 October 2008

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader