evaluation allows the assessment of information retrieval (IR) techniques based on how real users respond to them. Because this technique is directly based on observed user behavior, it is a promising alternative to traditional
evaluation, which is based on manual relevance assessments. In particular, online evaluation can enable comparisons in settings where reliable assessments are difficult to obtain (e.g., personalized search) or expensive (e.g., for search by trained experts in specialized collections).
Despite its advantages, and its successful use in commercial settings, online evaluation is rarely employed outside of large commercial search engines due to a perception that it is impractical at small scales. The goal of this tutorial is to show how online evaluations can be conducted in such settings, demonstrate software to facilitate its use, and promote further research in the area. We will also contrast online evaluation with standard offline evaluation, and provide an overview of online approaches.