research-article

Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs

Authors:
Rosie Jones

Yahoo! Research, Burbank, CA, USA

Yahoo! Research, Burbank, CA, USA
View Profile

,
Kristina Lisa Klinkner

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge managementOctober 2008Pages 699–708https://doi.org/10.1145/1458082.1458176

Published:26 October 2008Publication History

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

Pages 699–708

ABSTRACT

Most analysis of web search relevance and performance takes a single query as the unit of search engine interaction. When studies attempt to group queries together by task or session, a timeout is typically used to identify the boundary. However, users query search engines in order to accomplish tasks at a variety of granularities, issuing multiple queries as they attempt to accomplish tasks. In this work we study real sessions manually labeled into hierarchical tasks, and show that timeouts, whatever their length, are of limited utility in identifying task boundaries, achieving a maximum precision of only 70%. We report on properties of this search task hierarchy, as seen in a random sample of user interactions from a major web search engine's log, annotated by human editors, learning that 17% of tasks are interleaved, and 20% are hierarchically organized. No previous work has analyzed or addressed automatic identification of interleaved and hierarchically organized search tasks. We propose and evaluate a method for the automated segmentation of users' query streams into hierarchical units. Our classifiers can improve on timeout segmentation, as well as other previously published approaches, bringing the accuracy up to 92% for identifying fine-grained task boundaries, and 89-97% for identifying pairs of queries from the same task when tasks are interleaved hierarchically. This is the first work to identify, measure and automatically segment sequences of user queries into their hierarchical structure. The ability to perform this kind of segmentation paves the way for evaluating search engines in terms of user task completion.

References

Comscore announces new "visits" metric for measuring user engagement, 2007. http://www.comscore.com/press/release.asp?press=1246.Google Scholar
P. Anick. Using terminological feedback for web search refinement - a log-based study. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, pages 88--95, 2003. Google ScholarDigital Library
P. G. Anick. Automatic Construction of Faceted Terminological Feedback for Context-Based Information Retrieval. PhD thesis, Brandeis University, 1999. Google ScholarDigital Library
L. Catledge and J. Pitkow. Characterizing browsing strategies in the world-wide web. In Proceedings of the Third International World-Wide Web Conference on Technology, tools and applications, volume 27, 1995. Google ScholarDigital Library
D. Downey, S. Dumais, and E. Horvitz. Models of searching and browsing: Languages, studies, and applications. Journal of the American Society for Information Science and Technology (JASIST), 58(6):862--871, 2007. Google ScholarDigital Library
D. He, A. Goker, and D. J. Harper. Combining evidence for automatic web session identification. Information Processing and Management, 38:727--742, 2002. Google ScholarDigital Library
S. B. Huffman and M. Hochster. How well does result relevance predict session satisfaction? In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2007), pages 567--574, 2007. Google ScholarDigital Library
B. J. Jansen, A. Spink, C. Blakely, and S. Koshman. Defining a session on web search engines. Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), 2000.Google Scholar
T. Lau and E. Horvitz. Patterns of search: Analyzing and modeling web query refinement. In A. Press, editor, Proceedings of the Seventh International Conference on User Modeling, 1999. Google ScholarDigital Library
C. D. Manning and H. Schutze. Foundations of Statistical Natural Language Processing. MIT Press, 1999. Google ScholarDigital Library
A. Montgomery and C. Faloutsos. Identifying web browsing trends and patterns. IEEE Computer, 34(7):94--95, 2007. Google ScholarDigital Library
H. C. Ozmutlu and F. Cavdur. Application of automatic topic identification on excite web search engine data logs. Information Processing and Management, 41(5):1243--1262, 2005. Google ScholarDigital Library
H. C. Ozmutlu, F. Cavdur, A. Spink, and S. Ozmutlu. Investigating the performance of automatic new topic identification across multiple datasets. In Proceedings 69th Annual Meeting of the American Society for Information Science and Technology (ASIST) 43, Austin (US), 2006.Google ScholarCross Ref
S. Ozmutlu. Automatic new topic identification using multiple linear regression. Information Processing and Management, 42(4):934--950, 2006. Google ScholarDigital Library
F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. In R. Grossman, R. Bayardo, and K. P. Bennett, editors, KDD, pages 239--248. ACM, 2005. Google ScholarDigital Library
B. W. Silverman. Density Estimation. Chapman and Hall, London.Google Scholar
C. Silverstein, M. R. Henzinger, H. Marais, and M. Moricz. Analysis of a very large web search engine query log. ACM SIGIR Forum, 33(1):6--12, 1999. Google ScholarDigital Library
A. Spink, B. J. Jansen, and H. C. Ozmultu. Use of query reformulation and relevance feedback by Excite users. Internet Research: Electronic Networking Applications and Policy, 10(4):317--328, 2000.Google ScholarCross Ref
A. Spink, M. Park, B. J. Jansen, and J. Pedersen. Multitasking during web search sessions. Inf. Process. Manage., 42(1):264--275, 2006. Google ScholarDigital Library
J. Teevan, E. Adar, R. Jones, and M. Potts. History repeats itself: Repeat queries in Yahoo's logs. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 703--704, 2006. Google ScholarDigital Library

Index Terms

Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Beyond session segmentation: predicting changes in search intent with client-side user interactions
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Effective search session segmentation "grouping queries according to common task or intent" can be useful for improving relevance, search evaluation, and query suggestion. Previous work has largely attempted to segment search sessions off-line, after ...
Read More
OLAP on search logs: an infrastructure supporting data-driven applications in search engines
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Search logs, which contain rich and up-to-date information about users' needs and preferences, have become a critical data source for search engines. Recently, more and more data-driven applications are being developed in search engines based on search ...
Read More
One Query, Many Clicks: Analysis of Queries with Multiple Clicks by the Same User
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

In this paper, we study multi-click queries - queries for which more than one click is performed by the same user within the same query session. Such queries may reflect a more complex information need, which leads the user to examine a variety of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
October 2008
1562 pages
ISBN:9781595939913
DOI:10.1145/1458082
General Chair:
James G. Shanahan
Church and Duncan Group Inc, USA
,
Program Chairs:
Sihem Amer-Yahia
Yahoo! Research, USA
,
Ioana Manolescu
INRIA, France
,
Yi Zhang
University of California, Santa Cruz, USA
,
David A. Evans
JustSystems Evans Research, USA
,
Alek Kolcz
Microsoft Live Labs, USA
,
Key-Sun Choi
KAIST, Korea
,
Abdur Chowdury
Twitter, USA
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 October 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
query log segmentation
query session
query session boundary detection
search goal
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 273
  Total Citations
  View Citations
- 1,162
  Total Downloads
- Downloads (Last 12 months)26
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Beyond session segmentation: predicting changes in search intent with client-side user interactions

OLAP on search logs: an infrastructure supporting data-driven applications in search engines

One Query, Many Clicks: Analysis of Queries with Multiple Clicks by the Same User