research-article

Model characterization curves for federated search using click-logs: predicting user engagement metrics for the span of feasible operating points

Authors:
Ashok Kumar Ponnuswami

Microsoft Corporation, Redmond, WA, USA

Microsoft Corporation, Redmond, WA, USA
View Profile

,
Kumaresh Pattabiraman

Microsoft Corporation, Redmond, WA, USA

Microsoft Corporation, Redmond, WA, USA
View Profile

,
Desmond Brand

Microsoft Corporation, Redmond, WA, USA

Microsoft Corporation, Redmond, WA, USA
View Profile

,
Tapas Kanungo

Microsoft Corporation, Redmond, WA, USA

Microsoft Corporation, Redmond, WA, USA
View Profile

WWW '11: Proceedings of the 20th international conference on World wide webMarch 2011Pages 67–76https://doi.org/10.1145/1963405.1963419

Published:28 March 2011Publication History

WWW '11: Proceedings of the 20th international conference on World wide web

Pages 67–76

ABSTRACT

Modern day federated search engines aggregate heterogeneous types of results from multiple vertical search engines and compose a single search engine result page (SERP). The search engine aggregates the results and produces one ranked list, constraining the vertical results to specific slots on the SERP.

The usual way to compare two ranking algorithms is to first fix their operating points (internal thresholds), and then run an online experiment that lasts multiple weeks. Online user engagement metrics are then compared to decide which algorithm is better. However, this method does not characterize and compare the behavior over the entire span of operating points. Furthermore, this time-consuming approach is not practical if we have to conduct the experiment over numerous operating points.

In this paper we propose a method of characterizing the performance of models that allows us to predict answers to "what if" questions about online user engagement using click-logs over the entire span of feasible operating points. We audition verticals at various slots on the SERP and generate click-logs. This log is then used to create operating curves between variables of interest (for example between result quality and click-through). The operating point for the system then can be chosen to achieve a specific trade-off between the variables. We apply this methodology to predict i) the online performance of two different models, ii) the impact of changing internal quality thresholds on clickthrough, iii) the behavior of introducing a new feature, iv) which machine learning loss function will give better online engagement, v) the impact of sampling distribution of head and tail queries in the training process. The results are reported on a well-known federated search engine. We validate the predictions with online experiments.

References

J. Arguello, J. Callan, F. Diaz, and J. F. Crespo. Source of evidence for vertical selection. In Proc. of Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2009. Google ScholarDigital Library
O. Chapelle and Ya Zhang. A dynamic bayesian network model for web search ranking. In Proc. of Intl. Conf. on World Wide Web, 2009. Google ScholarDigital Library
K. Collins-Thompson. Accounting for stability in retrieval algorithms using risk-reward curves. In Proc. of SIGIR 2009 Workshop on the Future of Evaluation in Information Retrieval, pages 27--28, 2009.Google Scholar
K. Collins-Thompson. Reducing the risk of query expansion via robust constrained optimization. In Intl. Conf. on Information and Knowledge Management, pages 837--846, 2009. Google ScholarDigital Library
N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In Intl. Conf. on Web Search and Data Mining, 2008. Google ScholarDigital Library
F. Diaz and J. Arguello. Adaptation of offline selection predictions in presense of user feedback. In Proc. of Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2009. Google ScholarDigital Library
G. Dupret and C. Liao. A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine. In Intl. Conf. on Web Search and Data Mining, 2010. Google ScholarDigital Library
B. Efron and R. J. Tibshirani. An Introduction to the Bootstrap. CRC Press, 1993.Google ScholarCross Ref
D. Fetterly, N. Craswell, and V. Vinay. The impact of crawl policy on web search effectiveness. In Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 58--587, 2009. Google ScholarDigital Library
J. H. Friedman. Greedy function approximation: A graidient boosting machine. Annals of Statistics, 29:1189--1232, 2001.Google ScholarCross Ref
K. Fukunaga. Statistical Pattern Recognition. Academic Press, 1990. Google ScholarDigital Library
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer-Verlag, New York, NY, 2001.Google ScholarCross Ref
D. Hawking, N. Craswell, P. Thistiewaite, and D. Harman. Results and challenges in web search evaluation. Computer Networks, 31(11--16), 1999. Google ScholarDigital Library
S. Ji, T. Moon, G. Dupret, C. Liao, and Z. Zheng. User behavior driven ranking without editorial judgments. In Proc. of Intl. Conf. on Information and Knowledge Management, 2010. Google ScholarDigital Library
T. Joachims. A support vector method for multivariate performance measures. In Proc. Intl. Conference on Machine Learning, 2005. Google ScholarDigital Library
T. Kanungo, M. Y. Jaisimha, J. Palmer, and R. M. Haralick. A quantitative methodology for analyzing the performance of detection algorithm. In Proc. of IEEE International Conference on Computer Vision, pages 247--252, 1993.Google ScholarCross Ref
N. A. Macmillan and C. D. Creelman. Detection Theory. Lawerence Earlbaum Associates, Inc., 2005.Google Scholar
A. K. Ponnuswami, K. Pattabiraman, Q. Wu, R. Gilad-Bachrach, and T. Kanungo. On composition of a federated web search result page: Using online users to provide pairwise preference for heterogeneous verticals. In Proc. of Intl. Conf. on Web Search and Data Mining, 2011. Google ScholarDigital Library
F. Radlinksi and T. Joachims. Minimally invasive randomization for collecting unbiased preferences from clickthrough logs. In Proc. of AAAI Conf. on Artificial Intelligence, 2006. Google ScholarDigital Library
F. Radlinski and T. Joachims. Query chains: Learning to rank from implicit feedback. pages 239--248, 2005. Google ScholarDigital Library
M. Shokouhi and L. Si. Federated information retrieval. In D. W. Oard and Editors F. Sebastiani, editors, Foundations and Trends in Information Retrieval. 2010.Google Scholar
L. Si and J. Callan. Modeling search engine effectiveness for federated search. In Proc. of Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2005. Google ScholarDigital Library
H. L. Van Trees. Detection, Estimation, and Modulation Theory, Part 1. John Wiley and Sons, Inc., 2001.Google Scholar
E. M. Voorhees and D. K. Harman, editors. TREC: Experiment and Evaluation in Information Retrieval. MIT Press, 2005. Google ScholarDigital Library
K. Wang, N. Gloy, and X. Li. Inferring search behaviors using partially observable Markov (pom) model. In Intl. Conf. on Web Search and Data Mining, 2010. Google ScholarDigital Library

Index Terms

Model characterization curves for federated search using click-logs: predicting user engagement metrics for the span of feasible operating points
1. Information systems
  1. Information retrieval

Recommendations

Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

This work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Read More
Optimizing web search using web click-through data
CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management

The performance of web search engines may often deteriorate due to the diversity and noisy information contained within web pages. User click-through data can be used to introduce more accurate description (metadata) for web pages, and to improve the ...
Read More
Improving Ranking Consistency for Web Search by Leveraging a Knowledge Base and Search Logs
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

In this paper, we propose a new idea called ranking consistency in web search. Relevance ranking is one of the biggest problems in creating an effective web search system. Given some queries with similar search intents, conventional approaches typically ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '11: Proceedings of the 20th international conference on World wide web
March 2011
840 pages
ISBN:9781450306324
DOI:10.1145/1963405
General Chairs:
S. Sadagopan
IIIT-Bangalore, India
,
Krithi Ramamritham
IIT-Bombay, India
,
Arun Kumar
IBM Research, India
,
M. P. Ravindra
Infosys E & R, India
,
Program Chairs:
Elisa Bertino
Purdue University, USA
,
Ravi Kumar
Yahoo! Research, USA
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 March 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
metrics
offline prediction
online evaluation
randomization
user engagement
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 362
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Model characterization curves for federated search using click-logs: predicting user engagement metrics for the span of feasible operating points

WWW '11: Proceedings of the 20th international conference on World wide web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Re-ranking search results using query logs

Optimizing web search using web click-through data

Improving Ranking Consistency for Web Search by Leveraging a Knowledge Base and Search Logs