skip to main content
10.1145/1963405.1963419acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Model characterization curves for federated search using click-logs: predicting user engagement metrics for the span of feasible operating points

Published:28 March 2011Publication History

ABSTRACT

Modern day federated search engines aggregate heterogeneous types of results from multiple vertical search engines and compose a single search engine result page (SERP). The search engine aggregates the results and produces one ranked list, constraining the vertical results to specific slots on the SERP.

The usual way to compare two ranking algorithms is to first fix their operating points (internal thresholds), and then run an online experiment that lasts multiple weeks. Online user engagement metrics are then compared to decide which algorithm is better. However, this method does not characterize and compare the behavior over the entire span of operating points. Furthermore, this time-consuming approach is not practical if we have to conduct the experiment over numerous operating points.

In this paper we propose a method of characterizing the performance of models that allows us to predict answers to "what if" questions about online user engagement using click-logs over the entire span of feasible operating points. We audition verticals at various slots on the SERP and generate click-logs. This log is then used to create operating curves between variables of interest (for example between result quality and click-through). The operating point for the system then can be chosen to achieve a specific trade-off between the variables. We apply this methodology to predict i) the online performance of two different models, ii) the impact of changing internal quality thresholds on clickthrough, iii) the behavior of introducing a new feature, iv) which machine learning loss function will give better online engagement, v) the impact of sampling distribution of head and tail queries in the training process. The results are reported on a well-known federated search engine. We validate the predictions with online experiments.

References

  1. J. Arguello, J. Callan, F. Diaz, and J. F. Crespo. Source of evidence for vertical selection. In Proc. of Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. O. Chapelle and Ya Zhang. A dynamic bayesian network model for web search ranking. In Proc. of Intl. Conf. on World Wide Web, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Collins-Thompson. Accounting for stability in retrieval algorithms using risk-reward curves. In Proc. of SIGIR 2009 Workshop on the Future of Evaluation in Information Retrieval, pages 27--28, 2009.Google ScholarGoogle Scholar
  4. K. Collins-Thompson. Reducing the risk of query expansion via robust constrained optimization. In Intl. Conf. on Information and Knowledge Management, pages 837--846, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In Intl. Conf. on Web Search and Data Mining, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. F. Diaz and J. Arguello. Adaptation of offline selection predictions in presense of user feedback. In Proc. of Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Dupret and C. Liao. A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine. In Intl. Conf. on Web Search and Data Mining, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Efron and R. J. Tibshirani. An Introduction to the Bootstrap. CRC Press, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  9. D. Fetterly, N. Craswell, and V. Vinay. The impact of crawl policy on web search effectiveness. In Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 58--587, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. H. Friedman. Greedy function approximation: A graidient boosting machine. Annals of Statistics, 29:1189--1232, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  11. K. Fukunaga. Statistical Pattern Recognition. Academic Press, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer-Verlag, New York, NY, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  13. D. Hawking, N. Craswell, P. Thistiewaite, and D. Harman. Results and challenges in web search evaluation. Computer Networks, 31(11--16), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Ji, T. Moon, G. Dupret, C. Liao, and Z. Zheng. User behavior driven ranking without editorial judgments. In Proc. of Intl. Conf. on Information and Knowledge Management, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Joachims. A support vector method for multivariate performance measures. In Proc. Intl. Conference on Machine Learning, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Kanungo, M. Y. Jaisimha, J. Palmer, and R. M. Haralick. A quantitative methodology for analyzing the performance of detection algorithm. In Proc. of IEEE International Conference on Computer Vision, pages 247--252, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  17. N. A. Macmillan and C. D. Creelman. Detection Theory. Lawerence Earlbaum Associates, Inc., 2005.Google ScholarGoogle Scholar
  18. A. K. Ponnuswami, K. Pattabiraman, Q. Wu, R. Gilad-Bachrach, and T. Kanungo. On composition of a federated web search result page: Using online users to provide pairwise preference for heterogeneous verticals. In Proc. of Intl. Conf. on Web Search and Data Mining, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. F. Radlinksi and T. Joachims. Minimally invasive randomization for collecting unbiased preferences from clickthrough logs. In Proc. of AAAI Conf. on Artificial Intelligence, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. F. Radlinski and T. Joachims. Query chains: Learning to rank from implicit feedback. pages 239--248, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Shokouhi and L. Si. Federated information retrieval. In D. W. Oard and Editors F. Sebastiani, editors, Foundations and Trends in Information Retrieval. 2010.Google ScholarGoogle Scholar
  22. L. Si and J. Callan. Modeling search engine effectiveness for federated search. In Proc. of Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. L. Van Trees. Detection, Estimation, and Modulation Theory, Part 1. John Wiley and Sons, Inc., 2001.Google ScholarGoogle Scholar
  24. E. M. Voorhees and D. K. Harman, editors. TREC: Experiment and Evaluation in Information Retrieval. MIT Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. Wang, N. Gloy, and X. Li. Inferring search behaviors using partially observable Markov (pom) model. In Intl. Conf. on Web Search and Data Mining, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Model characterization curves for federated search using click-logs: predicting user engagement metrics for the span of feasible operating points

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '11: Proceedings of the 20th international conference on World wide web
      March 2011
      840 pages
      ISBN:9781450306324
      DOI:10.1145/1963405

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 March 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader