research-article

A task level metric for measuring web search satisfaction and its application on improving relevance estimation

Authors:
Ahmed Hassan

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Yang Song

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Li-wei He

Microsoft, Redmond, WA, USA

Microsoft, Redmond, WA, USA
View Profile

CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementOctober 2011Pages 125–134https://doi.org/10.1145/2063576.2063599

Published:24 October 2011Publication History

CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Pages 125–134

ABSTRACT

Understanding the behavior of satisfied and unsatisfied Web search users is very important for improving users search experience. Collecting labeled data that characterizes search behavior is a very challenging problem. Most of the previous work used a limited amount of data collected in lab studies or annotated by judges lacking information about the actual intent. In this work, we performed a large scale user study where we collected explicit judgments of user satisfaction with the entire search task. Results were analyzed using sequence models that incorporate user behavior to predict whether the user ended up being satisfied with a search or not. We test our metric on millions of queries collected from real Web search traffic and show empirically that user behavior models trained using explicit judgments of user satisfaction outperform several other search quality metrics. The proposed model can also be used to optimize different search engine components. We propose a method that uses task level success prediction to provide a better interpretation of clickthrough data. Clickthough data has been widely used to improve relevance estimation. We use our user satisfaction model to distinguish between clicks that lead to satisfaction and clicks that do not. We show that adding new features derived from this metric allowed us to improve the estimation of document relevance.

References

E. Agichtein, E. Brill, S. Dumais, and R. Ragno. User interaction models for predicting web search result preferences. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 3--10, 2006. Google ScholarDigital Library
E. Agichtein, E. Brill, and S. T. Dumais. Improving web search ranking by incorporating user behavior information. In SIGIR 2006: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 19--26, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
A. Broder. A taxonomy of web search. SIGIR Forum, 36(2):3--10, 2002. Google ScholarDigital Library
C. Drummond and R. C. Holte. C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In ICML'2003 Workshop on Learning from Imbalanced Datasets II, pages 1--8, 2003.Google Scholar
G. Dupret and C. Liao. A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine. In Proceedings of the third ACM international conference on Web search and data mining, pages 181--190, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
G. E. Dupret and B. Piwowarski. A user browsing model to predict search engine click data from past observations. In SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 331--338, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
S. Fox, K. Karnawat, M. Mydland, S. Dumais, and T. White. Evaluating implicit measures to improve web search. ACM Transactions on Information Systems, 23, 2005. Google ScholarDigital Library
L. A. Granka, T. Joachims, and G. Gay. Eye-tracking analysis of user behavior in www-search. In Proceedings of the 27th annual international conference on Research and development in information retrieval, pages 478--479, 2004. Google ScholarDigital Library
F. Guo, C. Liu, and Y. M. Wang. Efficient multiple-click models in web search. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 124--131, 2009. Google ScholarDigital Library
A. Hassan, R. Jones, and K. L. Klinkner. Beyond dcg: user behavior as a predictor of a successful search. In WSDM '10: Proceedings of the third ACM international conference on Web search and data mining, pages 221--230, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
D. Hawking, N. Craswell, P. Thistlewaite, and D. Harman. Results and challenges in web search evaluation. In WWW '99: Proceedings of the eighth international conference on World Wide Web, pages 1321--1330, New York, NY, USA, 1999. Elsevier North-Holland, Inc. Google ScholarDigital Library
S. B. Huffman and M. Hochster. How well does result relevance predict session satisfaction? In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 567--574, 2007. Google ScholarDigital Library
K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst., 20(4):422--446, 2002. Google ScholarDigital Library
T. Joachims. Optimizing search engines using clickthrough data. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 133--142, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
T. Joachims, T. Finley, and C.-N. Yu. Cutting-plane training of structural svms. Machine Learning, 77(1):27--59--59, October 2009. Google ScholarDigital Library
R. Jones and K. Klinkner. Beyond the session timeout: Automatic hierarchical segmentation of search topics in query logs. In Proceedings of ACM 17th Conference on Information and Knowledge Management (CIKM 2008), 2008. Google ScholarDigital Library
S. Jung, J. L. Herlocker, and J. Webster. Click data as implicit relevance feedback in web search. Information Processing and Management (IPM), 43(3):791--807, 2007. Google ScholarDigital Library
X.-Y. Liu, J. Wu, and Z.-H. Zhou. Exploratory undersampling for class-imbalance learning. Trans. Sys. Man Cyber. Part B, 39(2):539--550, 2009. Google ScholarDigital Library
F. Radlinski and N. Craswell. Comparing the sensitivity of information retrieval metrics. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR '10, pages 667--674, 2010. Google ScholarDigital Library
F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. In KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 239--248, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
F. Radlinski and T. Joachims. Active exploration for learning rankings from clickthrough data. In KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 570--579, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data reflect retrieval quality? In J. G. Shanahan, S. Amer-Yahia, I. Manolescu, Y. Zhang, D. A. Evans, A. Kolcz, K.-S. Choi, and A. Chowdhury, editors, CIKM, pages 43--52. ACM, 2008. Google ScholarDigital Library
A. Spink, D. Wolfram, B. Jansen, B. J. Jansen, and T. Saracevic. Searching the web: The public and their queries. 2001.Google Scholar
S. J. M. H.-B. Stephen E. Robertson, Steve Walker and M. Gatford. Okapi at trec-3. In Proceedings of the Third Text REtrieval Conference (TREC 1994), 1994.Google Scholar
J. Van Hulse, T. M. Khoshgoftaar, and A. Napolitano. Experimental perspectives on learning from imbalanced data. In Proceedings of the 24th international conference on Machine learning, pages 935--942, 2007. Google ScholarDigital Library
G. M. Weiss and F. Provost. Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 19:315--354, 2003. Google ScholarCross Ref
R. W. White and S. M. Drucker. Investigating behavioral variability in web search. In Proceedings of the 16th international conference on World Wide Web, 2007. Google ScholarDigital Library

Index Terms

A task level metric for measuring web search satisfaction and its application on improving relevance estimation
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Measuring and Predicting Search Engine Users’ Satisfaction

Search satisfaction is defined as the fulfillment of a user’s information need. Characterizing and predicting the satisfaction of search engine users is vital for improving ranking models, increasing user retention rates, and growing market share. This ...
Read More
Beyond DCG: user behavior as a predictor of a successful search
WSDM '10: Proceedings of the third ACM international conference on Web search and data mining

Web search engines are traditionally evaluated in terms of the relevance of web pages to individual queries. However, relevance of web pages does not tell the complete picture, since an individual query may represent only a piece of the user's ...
Read More
A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine
WSDM '10: Proceedings of the third ACM international conference on Web search and data mining

We propose a new model to interpret the clickthrough logs of a web search engine. This model is based on explicit assumptions on the user behavior. In particular, we draw conclusions on a document relevance by observing the user behavior after he ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
October 2011
2712 pages
ISBN:9781450307178
DOI:10.1145/2063576
Editors:
Bettina Berendt,
Arjen de Vries,
Wenfei Fan,
Craig Macdonald
University of Glasgow, UK
,
Iadh Ounis
University of Glasgow, UK
,
Ian Ruthven
University of Strathclyde, UK
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 October 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clickthrough data
search engine evaluation
user behavior models
web search success
Qualifiers
- research-article
Conference
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 44
  Total Citations
  View Citations
- 503
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A task level metric for measuring web search satisfaction and its application on improving relevance estimation

CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Measuring and Predicting Search Engine Users’ Satisfaction

Beyond DCG: user behavior as a predictor of a successful search

A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine