Skip to main content
Top

2016 | OriginalPaper | Chapter

Living Labs for Online Evaluation: From Theory to Practice

Authors : Anne Schuth, Krisztian Balog

Published in: Advances in Information Retrieval

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Experimental evaluation has always been central to Information Retrieval research. The field is increasingly moving towards online evaluation, which involves experimenting with real, unsuspecting users in their natural task environments, a so-called living lab. Specifically, with the recent introduction of the Living Labs for IR Evaluation initiative at CLEF and the OpenSearch track at TREC, researchers can now have direct access to such labs. With these benchmarking platforms in place, we believe that online evaluation will be an exciting area to work on in the future. This half-day tutorial aims to provide a comprehensive overview of the underlying theory and complement it with practical guidance.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Balog, K., Kelly, L., Schuth, A.: Head first: living labs for ad-hoc search evaluation. In: CIKM 2014, pp. 1815–1818. ACM Press, New York, USA, November 2014 Balog, K., Kelly, L., Schuth, A.: Head first: living labs for ad-hoc search evaluation. In: CIKM 2014, pp. 1815–1818. ACM Press, New York, USA, November 2014
2.
go back to reference Belkin, N.J.: Salton award lecture: people, interacting with information. In: Proceedings of 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, pp. 1–2. ACM (2015) Belkin, N.J.: Salton award lecture: people, interacting with information. In: Proceedings of 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, pp. 1–2. ACM (2015)
3.
go back to reference Chuklin, A., Markov, I., de Rijke, M.: Click Models for Web Search. Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool Publishers, San Rafael (2015) Chuklin, A., Markov, I., de Rijke, M.: Click Models for Web Search. Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool Publishers, San Rafael (2015)
4.
go back to reference Cleverdon, C.W., Keen, M.: Aslib Cranfield research project-factors determining the performance of indexing systems; Volume 2, Test results, National Science Foundation (1966) Cleverdon, C.W., Keen, M.: Aslib Cranfield research project-factors determining the performance of indexing systems; Volume 2, Test results, National Science Foundation (1966)
5.
go back to reference Diaz, F., White, R., Buscher, G., Liebling, D.: Robust models of mouse movement on dynamic web search results pages. In: CIKM, pp. 1451–1460. ACM Press, October 2013 Diaz, F., White, R., Buscher, G., Liebling, D.: Robust models of mouse movement on dynamic web search results pages. In: CIKM, pp. 1451–1460. ACM Press, October 2013
6.
go back to reference Guo, Q., Agichtein, E.: Understanding “abandoned” ads: towards personalized commercial intent inference via mouse movement analysis. In: SIGIR-IRA (2008) Guo, Q., Agichtein, E.: Understanding “abandoned” ads: towards personalized commercial intent inference via mouse movement analysis. In: SIGIR-IRA (2008)
7.
go back to reference Guo, Q., Agichtein, E.: Towards predicting web searcher gaze position from mouse movements. In: CHI EA, 3601p, April 2010 Guo, Q., Agichtein, E.: Towards predicting web searcher gaze position from mouse movements. In: CHI EA, 3601p, April 2010
8.
go back to reference Hassan, A., Shi, X., Craswell, N., Ramsey, B.: Beyond clicks: query reformulation as a predictor of search satisfaction. In: CIKM (2013) Hassan, A., Shi, X., Craswell, N., Ramsey, B.: Beyond clicks: query reformulation as a predictor of search satisfaction. In: CIKM (2013)
9.
go back to reference He, J., Zhai, C., Li, X.: Evaluation of methods for relative comparison of retrieval systems based on clickthroughs. In: CIKM 2009, ACM (2009) He, J., Zhai, C., Li, X.: Evaluation of methods for relative comparison of retrieval systems based on clickthroughs. In: CIKM 2009, ACM (2009)
10.
go back to reference He, Y., Wang, K.: Inferring search behaviors using partially observable markov model with duration (POMD). In: WSDM (2011) He, Y., Wang, K.: Inferring search behaviors using partially observable markov model with duration (POMD). In: WSDM (2011)
11.
go back to reference Hersh, W., Turpin, A.H., Price, S., Chan, B., Kramer, D., Sacherek, L., Olson, D.: Do batch and user evaluations give the same results? In: SIGIR, pp. 17–24 (2000) Hersh, W., Turpin, A.H., Price, S., Chan, B., Kramer, D., Sacherek, L., Olson, D.: Do batch and user evaluations give the same results? In: SIGIR, pp. 17–24 (2000)
12.
go back to reference Hofmann, K., Whiteson, S., de Rijke, M.: A probabilistic method for inferring preferences from clicks. In: CIKM 2011, ACM (2011) Hofmann, K., Whiteson, S., de Rijke, M.: A probabilistic method for inferring preferences from clicks. In: CIKM 2011, ACM (2011)
13.
go back to reference Jeff, H., Thomas, L., Ryen, W.: No search result left behind. In: WSDM, 203p (2012) Jeff, H., Thomas, L., Ryen, W.: No search result left behind. In: WSDM, 203p (2012)
14.
go back to reference Joachims, T., Granka, L.A., Pan, B., Hembrooke, H., Radlinski, F., Gay, G.: Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Trans. Inf. Syst. 25(2), 7 (2007)CrossRef Joachims, T., Granka, L.A., Pan, B., Hembrooke, H., Radlinski, F., Gay, G.: Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Trans. Inf. Syst. 25(2), 7 (2007)CrossRef
15.
go back to reference Kim, Y., Hassan, A., White, R., Zitouni, I.: Modeling dwell time to predict click-level satisfaction. In: WSDM (2014) Kim, Y., Hassan, A., White, R., Zitouni, I.: Modeling dwell time to predict click-level satisfaction. In: WSDM (2014)
16.
go back to reference Kohavi, R.: Online controlled experiments: introduction, insights, scaling, and humbling statistics. In: Proceedings of UEO 2013 (2013) Kohavi, R.: Online controlled experiments: introduction, insights, scaling, and humbling statistics. In: Proceedings of UEO 2013 (2013)
17.
go back to reference Lagun, D., Hsieh, C.H., Webster D., Navalpakkam, V.: Towards better measurement of attention and satisfaction in mobile search. In: SIGIR (2014) Lagun, D., Hsieh, C.H., Webster D., Navalpakkam, V.: Towards better measurement of attention and satisfaction in mobile search. In: SIGIR (2014)
18.
go back to reference Li, J., Huffman, S., Tokuda, A.: Good abandonment in mobile and pc internet search. In: SIGIR 2009, pp. 43–50 (2009) Li, J., Huffman, S., Tokuda, A.: Good abandonment in mobile and pc internet search. In: SIGIR 2009, pp. 43–50 (2009)
19.
20.
go back to reference Radlinski, F., Craswell, N.: Optimized interleaving for online retrieval evaluation. In: WSDM 2013, ACM (2013) Radlinski, F., Craswell, N.: Optimized interleaving for online retrieval evaluation. In: WSDM 2013, ACM (2013)
21.
go back to reference Radlinski, F., Kurup, M., Joachims, T.: How does clickthrough data reflect retrieval quality? In: CIKM 2008, ACM (2008) Radlinski, F., Kurup, M., Joachims, T.: How does clickthrough data reflect retrieval quality? In: CIKM 2008, ACM (2008)
22.
go back to reference Sanderson, M.: Test collection based evaluation of information retrieval systems. Found. Trends Inf. Retrieval 4(4), 247–375 (2010)CrossRefMATH Sanderson, M.: Test collection based evaluation of information retrieval systems. Found. Trends Inf. Retrieval 4(4), 247–375 (2010)CrossRefMATH
23.
go back to reference Schuth, A., Balog, K., Kelly, L.: Overview of the living labs for information retrieval evaluation (ll4ir) clef lab. In: Mothe, J., Savoy, J., Kamps, J., Pinel-Sauvagnat, K., Jones, G.J.F., SanJuan, E., Cappellato, L., Ferro, N. (eds.) CLEF 2015. LNCS, pp. 484–496. Springer, Heidelberg (2015)CrossRef Schuth, A., Balog, K., Kelly, L.: Overview of the living labs for information retrieval evaluation (ll4ir) clef lab. In: Mothe, J., Savoy, J., Kamps, J., Pinel-Sauvagnat, K., Jones, G.J.F., SanJuan, E., Cappellato, L., Ferro, N. (eds.) CLEF 2015. LNCS, pp. 484–496. Springer, Heidelberg (2015)CrossRef
24.
go back to reference Schuth, A., Bruintjes, R.-J., Büttner, F., van Doorn, J., Groenland, C., Oosterhuis, H., Tran, C.-N., Veeling, B., van der Velde, J., Wechsler, R., Woudenberg, D., de Rijke, M.: Probabilistic multileave for online retrieval evaluation. In: Proceedings of SIGIR (2015) Schuth, A., Bruintjes, R.-J., Büttner, F., van Doorn, J., Groenland, C., Oosterhuis, H., Tran, C.-N., Veeling, B., van der Velde, J., Wechsler, R., Woudenberg, D., de Rijke, M.: Probabilistic multileave for online retrieval evaluation. In: Proceedings of SIGIR (2015)
25.
go back to reference Schuth, A., Hofmann, K., Radlinski, F.: Predicting search satisfaction metrics with interleaved comparisons. In: SIGIR 2015 (2015) Schuth, A., Hofmann, K., Radlinski, F.: Predicting search satisfaction metrics with interleaved comparisons. In: SIGIR 2015 (2015)
26.
go back to reference Schuth, A., Hofmann, K., Whiteson, S., de Rijke, M.: Lerot: an online learning to rank framework. In: LivingLab 2013, pp. 23–26. ACM Press, November 2013 Schuth, A., Hofmann, K., Whiteson, S., de Rijke, M.: Lerot: an online learning to rank framework. In: LivingLab 2013, pp. 23–26. ACM Press, November 2013
27.
go back to reference Schuth, A., Sietsma, F., Whiteson, S., Lefortier, D., de Rijke, M.: Multileaved comparisons for fast online evaluation. In: CIKM 2014 (2014) Schuth, A., Sietsma, F., Whiteson, S., Lefortier, D., de Rijke, M.: Multileaved comparisons for fast online evaluation. In: CIKM 2014 (2014)
28.
go back to reference Song, Y., Shi, X., White, R., Hassan, A.: Context-aware web search abandonment prediction. In: SIGIR (2014) Song, Y., Shi, X., White, R., Hassan, A.: Context-aware web search abandonment prediction. In: SIGIR (2014)
29.
go back to reference Teevan, J., Dumais, S., Horvitz, E.: The potential value of personalizing search. In: SIGIR, pp. 756–757 (2007) Teevan, J., Dumais, S., Horvitz, E.: The potential value of personalizing search. In: SIGIR, pp. 756–757 (2007)
30.
go back to reference Turpin, A., Hersh, W.: Why batch and user evaluations do not give the same results. In: SIGIR, pp. 225–231 (2001) Turpin, A., Hersh, W.: Why batch and user evaluations do not give the same results. In: SIGIR, pp. 225–231 (2001)
31.
go back to reference Turpin, A., Scholar, F.: User performance versus precision measures for simple search tasks. In: SIGIR, pp. 11–18 (2006) Turpin, A., Scholar, F.: User performance versus precision measures for simple search tasks. In: SIGIR, pp. 11–18 (2006)
32.
go back to reference Wang, K., Walker, T., Zheng, Z.: PSkip: estimating relevance ranking quality from web search clickthrough data. In: KDD, pp. 1355–1364 (2009) Wang, K., Walker, T., Zheng, Z.: PSkip: estimating relevance ranking quality from web search clickthrough data. In: KDD, pp. 1355–1364 (2009)
33.
go back to reference Wang, K., Gloy, N., Li, X.: Inferring search behaviors using partially observable Markov (POM) model. In: WSDM (2010) Wang, K., Gloy, N., Li, X.: Inferring search behaviors using partially observable Markov (POM) model. In: WSDM (2010)
34.
go back to reference Yilmaz, E., Verma, M., Craswell, N., Radlinski, F., Bailey, P.: Relevance and effort: an analysis of document utility. In: CIKM (2014) Yilmaz, E., Verma, M., Craswell, N., Radlinski, F., Bailey, P.: Relevance and effort: an analysis of document utility. In: CIKM (2014)
35.
go back to reference Yue, Y., Joachims, T.: Interactively optimizing information retrieval systems as a dueling bandits problem. In: ICML 2009, pp. 1201–1208 (2009) Yue, Y., Joachims, T.: Interactively optimizing information retrieval systems as a dueling bandits problem. In: ICML 2009, pp. 1201–1208 (2009)
Metadata
Title
Living Labs for Online Evaluation: From Theory to Practice
Authors
Anne Schuth
Krisztian Balog
Copyright Year
2016
Publisher
Springer International Publishing
DOI
https://doi.org/10.1007/978-3-319-30671-1_88