ABSTRACT
Understanding the behavior of satisfied and unsatisfied Web search users is very important for improving users search experience. Collecting labeled data that characterizes search behavior is a very challenging problem. Most of the previous work used a limited amount of data collected in lab studies or annotated by judges lacking information about the actual intent. In this work, we performed a large scale user study where we collected explicit judgments of user satisfaction with the entire search task. Results were analyzed using sequence models that incorporate user behavior to predict whether the user ended up being satisfied with a search or not. We test our metric on millions of queries collected from real Web search traffic and show empirically that user behavior models trained using explicit judgments of user satisfaction outperform several other search quality metrics. The proposed model can also be used to optimize different search engine components. We propose a method that uses task level success prediction to provide a better interpretation of clickthrough data. Clickthough data has been widely used to improve relevance estimation. We use our user satisfaction model to distinguish between clicks that lead to satisfaction and clicks that do not. We show that adding new features derived from this metric allowed us to improve the estimation of document relevance.
- E. Agichtein, E. Brill, S. Dumais, and R. Ragno. User interaction models for predicting web search result preferences. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 3--10, 2006. Google ScholarDigital Library
- E. Agichtein, E. Brill, and S. T. Dumais. Improving web search ranking by incorporating user behavior information. In SIGIR 2006: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 19--26, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- A. Broder. A taxonomy of web search. SIGIR Forum, 36(2):3--10, 2002. Google ScholarDigital Library
- C. Drummond and R. C. Holte. C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In ICML'2003 Workshop on Learning from Imbalanced Datasets II, pages 1--8, 2003.Google Scholar
- G. Dupret and C. Liao. A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine. In Proceedings of the third ACM international conference on Web search and data mining, pages 181--190, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- G. E. Dupret and B. Piwowarski. A user browsing model to predict search engine click data from past observations. In SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 331--338, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- S. Fox, K. Karnawat, M. Mydland, S. Dumais, and T. White. Evaluating implicit measures to improve web search. ACM Transactions on Information Systems, 23, 2005. Google ScholarDigital Library
- L. A. Granka, T. Joachims, and G. Gay. Eye-tracking analysis of user behavior in www-search. In Proceedings of the 27th annual international conference on Research and development in information retrieval, pages 478--479, 2004. Google ScholarDigital Library
- F. Guo, C. Liu, and Y. M. Wang. Efficient multiple-click models in web search. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 124--131, 2009. Google ScholarDigital Library
- A. Hassan, R. Jones, and K. L. Klinkner. Beyond dcg: user behavior as a predictor of a successful search. In WSDM '10: Proceedings of the third ACM international conference on Web search and data mining, pages 221--230, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- D. Hawking, N. Craswell, P. Thistlewaite, and D. Harman. Results and challenges in web search evaluation. In WWW '99: Proceedings of the eighth international conference on World Wide Web, pages 1321--1330, New York, NY, USA, 1999. Elsevier North-Holland, Inc. Google ScholarDigital Library
- S. B. Huffman and M. Hochster. How well does result relevance predict session satisfaction? In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 567--574, 2007. Google ScholarDigital Library
- K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst., 20(4):422--446, 2002. Google ScholarDigital Library
- T. Joachims. Optimizing search engines using clickthrough data. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 133--142, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
- T. Joachims, T. Finley, and C.-N. Yu. Cutting-plane training of structural svms. Machine Learning, 77(1):27--59--59, October 2009. Google ScholarDigital Library
- R. Jones and K. Klinkner. Beyond the session timeout: Automatic hierarchical segmentation of search topics in query logs. In Proceedings of ACM 17th Conference on Information and Knowledge Management (CIKM 2008), 2008. Google ScholarDigital Library
- S. Jung, J. L. Herlocker, and J. Webster. Click data as implicit relevance feedback in web search. Information Processing and Management (IPM), 43(3):791--807, 2007. Google ScholarDigital Library
- X.-Y. Liu, J. Wu, and Z.-H. Zhou. Exploratory undersampling for class-imbalance learning. Trans. Sys. Man Cyber. Part B, 39(2):539--550, 2009. Google ScholarDigital Library
- F. Radlinski and N. Craswell. Comparing the sensitivity of information retrieval metrics. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR '10, pages 667--674, 2010. Google ScholarDigital Library
- F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. In KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 239--248, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- F. Radlinski and T. Joachims. Active exploration for learning rankings from clickthrough data. In KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 570--579, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data reflect retrieval quality? In J. G. Shanahan, S. Amer-Yahia, I. Manolescu, Y. Zhang, D. A. Evans, A. Kolcz, K.-S. Choi, and A. Chowdhury, editors, CIKM, pages 43--52. ACM, 2008. Google ScholarDigital Library
- A. Spink, D. Wolfram, B. Jansen, B. J. Jansen, and T. Saracevic. Searching the web: The public and their queries. 2001.Google Scholar
- S. J. M. H.-B. Stephen E. Robertson, Steve Walker and M. Gatford. Okapi at trec-3. In Proceedings of the Third Text REtrieval Conference (TREC 1994), 1994.Google Scholar
- J. Van Hulse, T. M. Khoshgoftaar, and A. Napolitano. Experimental perspectives on learning from imbalanced data. In Proceedings of the 24th international conference on Machine learning, pages 935--942, 2007. Google ScholarDigital Library
- G. M. Weiss and F. Provost. Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 19:315--354, 2003. Google ScholarCross Ref
- R. W. White and S. M. Drucker. Investigating behavioral variability in web search. In Proceedings of the 16th international conference on World Wide Web, 2007. Google ScholarDigital Library
Index Terms
- A task level metric for measuring web search satisfaction and its application on improving relevance estimation
Recommendations
Measuring and Predicting Search Engine Users’ Satisfaction
Search satisfaction is defined as the fulfillment of a user’s information need. Characterizing and predicting the satisfaction of search engine users is vital for improving ranking models, increasing user retention rates, and growing market share. This ...
Beyond DCG: user behavior as a predictor of a successful search
WSDM '10: Proceedings of the third ACM international conference on Web search and data miningWeb search engines are traditionally evaluated in terms of the relevance of web pages to individual queries. However, relevance of web pages does not tell the complete picture, since an individual query may represent only a piece of the user's ...
A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine
WSDM '10: Proceedings of the third ACM international conference on Web search and data miningWe propose a new model to interpret the clickthrough logs of a web search engine. This model is based on explicit assumptions on the user behavior. In particular, we draw conclusions on a document relevance by observing the user behavior after he ...
Comments