Skip to main content
Top

2015 | OriginalPaper | Chapter

A Comparison of Offline Evaluations, Online Evaluations, and User Studies in the Context of Research-Paper Recommender Systems

Authors : Joeran Beel, Stefan Langer

Published in: Research and Advanced Technology for Digital Libraries

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The evaluation of recommender systems is key to the successful application of recommender systems in practice. However, recommender-systems evaluation has received too little attention in the recommender-system community, in particular in the community of research-paper recommender systems. In this paper, we examine and discuss the appropriateness of different evaluation methods, i.e. offline evaluations, online evaluations, and user studies, in the context of research-paper recommender systems. We implemented different content-based filtering approaches in the research-paper recommender system of Docear. The approaches differed by the features to utilize (terms or citations), by user model size, whether stop-words were removed, and several other factors. The evaluations show that results from offline evaluations sometimes contradict results from online evaluations and user studies. We discuss potential reasons for the non-predictive power of offline evaluations, and discuss whether results of offline evaluations might have some inherent value. In the latter case, results of offline evaluations were worth to be published, even if they contradict results of user studies and online evaluations. However, although offline evaluations theoretically might have some inherent value, we conclude that in practice, offline evaluations are probably not suitable to evaluate recommender systems, particularly in the domain of research paper recommendations. We further analyze and discuss the appropriateness of several online evaluation metrics such as click-through rate, link-through rate, and cite-through rate.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
2
Registered users have a user account assigned to their email address. For users who want to receive recommendations, but do not want to register, an anonymous user account is automatically created. These accounts have a unique random ID and are bound to a user’s computer.
 
3
For this example we ignore the question how reputability is measured.
 
4
If users register, they have to reveal private information such as name and email address. If users are concerned about revealing this information, they probably tend to use Docear as anonymous user.
 
Literature
1.
go back to reference Ricci, F., Rokach, L., Shapira, B., Kantor, B.P. (eds.): Recommender systems handbook, pp. 1–35. Springer, Heidelberg (2011)CrossRefMATH Ricci, F., Rokach, L., Shapira, B., Kantor, B.P. (eds.): Recommender systems handbook, pp. 1–35. Springer, Heidelberg (2011)CrossRefMATH
2.
go back to reference Torres, R., McNee, S.M., Abel, M., Konstan, J.A., Riedl, J.: Enhancing digital libraries with TechLens +. In: Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 228–236 (2004) Torres, R., McNee, S.M., Abel, M., Konstan, J.A., Riedl, J.: Enhancing digital libraries with TechLens +. In: Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 228–236 (2004)
3.
go back to reference Küçüktunç, O., Saule, E., Kaya, K., Çatalyürek, Ü.V.: Recommendation on Academic Networks using Direction Aware Citation Analysis, pp. 1–10 (2012). arXiv preprint arXiv:1205.1143 Küçüktunç, O., Saule, E., Kaya, K., Çatalyürek, Ü.V.: Recommendation on Academic Networks using Direction Aware Citation Analysis, pp. 1–10 (2012). arXiv preprint arXiv:1205.1143
4.
go back to reference Gorrell, G., Ford, N., Madden, A., Holdridge, P., Eaglestone, B.: Countering method bias in questionnaire-based user studies. Journal of Documentation 67(3), 507–524 (2011)CrossRef Gorrell, G., Ford, N., Madden, A., Holdridge, P., Eaglestone, B.: Countering method bias in questionnaire-based user studies. Journal of Documentation 67(3), 507–524 (2011)CrossRef
5.
go back to reference Leroy, G.: Designing User Studies in Informatics. Springer, Heidelberg (2011)CrossRef Leroy, G.: Designing User Studies in Informatics. Springer, Heidelberg (2011)CrossRef
6.
go back to reference Ge, M., Delgado-Battenfeld, C., Jannach, D.: Beyond accuracy: evaluating recommender systems by coverage and serendipity. In: Proceedings of the Fourth ACM RecSys Conference, pp. 257–260 (2010) Ge, M., Delgado-Battenfeld, C., Jannach, D.: Beyond accuracy: evaluating recommender systems by coverage and serendipity. In: Proceedings of the Fourth ACM RecSys Conference, pp. 257–260 (2010)
7.
go back to reference McNee, S.M., Albert, I., Cosley, D., Gopalkrishnan, P., Lam, S.K., Rashid, A.M., Konstan, J.A., Riedl, J.: On the recommending of citations for research papers. In: Proceedings of the ACM Conference on Computer Supported Cooperative Work, pp. 116–125 (2002) McNee, S.M., Albert, I., Cosley, D., Gopalkrishnan, P., Lam, S.K., Rashid, A.M., Konstan, J.A., Riedl, J.: On the recommending of citations for research papers. In: Proceedings of the ACM Conference on Computer Supported Cooperative Work, pp. 116–125 (2002)
8.
go back to reference Turpin, A.H., Hersh, W.: Why batch and user evaluations do not give the same results. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 225–231 (2001) Turpin, A.H., Hersh, W.: Why batch and user evaluations do not give the same results. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 225–231 (2001)
9.
go back to reference McNee, S.M., Kapoor, N., Konstan, J.A.: Don’t look stupid: avoiding pitfalls when recommending research papers. In: Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work, pp. 171–180 (2006) McNee, S.M., Kapoor, N., Konstan, J.A.: Don’t look stupid: avoiding pitfalls when recommending research papers. In: Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work, pp. 171–180 (2006)
10.
go back to reference Jannach, D., Lerche, L., Gedikli, F., Bonnin, G.: What recommenders recommend – an analysis of accuracy, popularity, and sales diversity effects. In: Carberry, S., Weibelzahl, S., Micarelli, A., Semeraro, G. (eds.) UMAP 2013. LNCS, vol. 7899, pp. 25–37. Springer, Heidelberg (2013)CrossRef Jannach, D., Lerche, L., Gedikli, F., Bonnin, G.: What recommenders recommend – an analysis of accuracy, popularity, and sales diversity effects. In: Carberry, S., Weibelzahl, S., Micarelli, A., Semeraro, G. (eds.) UMAP 2013. LNCS, vol. 7899, pp. 25–37. Springer, Heidelberg (2013)CrossRef
11.
go back to reference Knijnenburg, B.P., Willemsen, M.C., Gantner, Z., Soncu, H., Newell, C.: Explaining the user experience of recommender systems. User Model. User-Adap. Inter. 22, 441–504 (2012)CrossRef Knijnenburg, B.P., Willemsen, M.C., Gantner, Z., Soncu, H., Newell, C.: Explaining the user experience of recommender systems. User Model. User-Adap. Inter. 22, 441–504 (2012)CrossRef
12.
go back to reference Said, A., Tikk, D., Shi, Y., Larson, M., Stumpf, K., Cremonesi, P.: Recommender systems evaluation: a 3D benchmark. In: ACM RecSys 2012 Workshop on Recommendation Utility Evaluation: Beyond RMSE, Dublin, Ireland, pp. 21–23 (2012) Said, A., Tikk, D., Shi, Y., Larson, M., Stumpf, K., Cremonesi, P.: Recommender systems evaluation: a 3D benchmark. In: ACM RecSys 2012 Workshop on Recommendation Utility Evaluation: Beyond RMSE, Dublin, Ireland, pp. 21–23 (2012)
13.
go back to reference Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. (TOIS) 22(1), 5–53 (2004)CrossRef Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. (TOIS) 22(1), 5–53 (2004)CrossRef
14.
go back to reference Jannach, D., Zanker, M., Ge, M., Gröning, M.: Recommender systems in computer science and information systems – a landscape of research. In: Huemer, C., Lops, P. (eds.) EC-Web 2012. LNBIP, vol. 123, pp. 76–87. Springer, Heidelberg (2012)CrossRef Jannach, D., Zanker, M., Ge, M., Gröning, M.: Recommender systems in computer science and information systems – a landscape of research. In: Huemer, C., Lops, P. (eds.) EC-Web 2012. LNBIP, vol. 123, pp. 76–87. Springer, Heidelberg (2012)CrossRef
15.
go back to reference Beel, J., Gipp, B., Breitinger, C.: Research paper recommender systems: a literature survey. Int. J. Digit. Libr., 2015, to appear Beel, J., Gipp, B., Breitinger, C.: Research paper recommender systems: a literature survey. Int. J. Digit. Libr., 2015, to appear
16.
go back to reference Beel, J., Langer, S., Genzmehr, M., Gipp, B., Breitinger, C., Nürnberger, A.: Research paper recommender system evaluation: a quantitative literature survey. In: Proceedings of the Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys) at the ACM RecSys Conference (RecSys), pp. 15–22 (2013) Beel, J., Langer, S., Genzmehr, M., Gipp, B., Breitinger, C., Nürnberger, A.: Research paper recommender system evaluation: a quantitative literature survey. In: Proceedings of the Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys) at the ACM RecSys Conference (RecSys), pp. 15–22 (2013)
17.
go back to reference Cremonesi, P., Garzotto, F., Turrin, R.: Investigating the persuasion potential of recommender systems from a quality perspective: An empirical study. ACM Trans. Interact. Intell. Syst. (TiiS) 2(2), 11 (2012) Cremonesi, P., Garzotto, F., Turrin, R.: Investigating the persuasion potential of recommender systems from a quality perspective: An empirical study. ACM Trans. Interact. Intell. Syst. (TiiS) 2(2), 11 (2012)
18.
go back to reference Zheng, H., Wang, D., Zhang, Q., Li, H., Yang, T.: Do clicks measure recommendation relevancy?: an empirical user study. In: Proceedings of the Fourth ACM RecSys Conference, pp. 249–252 (2010) Zheng, H., Wang, D., Zhang, Q., Li, H., Yang, T.: Do clicks measure recommendation relevancy?: an empirical user study. In: Proceedings of the Fourth ACM RecSys Conference, pp. 249–252 (2010)
19.
go back to reference Cremonesi, P., Garzotto, F., Negro, S., Papadopoulos, A.V., Turrin, R.: Looking for “Good” recommendations: a comparative evaluation of recommender systems. In: Campos, P., Graham, N., Jorge, J., Nunes, N., Palanque, P., Winckler, M. (eds.) INTERACT 2011, Part III. LNCS, vol. 6948, pp. 152–168. Springer, Heidelberg (2011)CrossRef Cremonesi, P., Garzotto, F., Negro, S., Papadopoulos, A.V., Turrin, R.: Looking for “Good” recommendations: a comparative evaluation of recommender systems. In: Campos, P., Graham, N., Jorge, J., Nunes, N., Palanque, P., Winckler, M. (eds.) INTERACT 2011, Part III. LNCS, vol. 6948, pp. 152–168. Springer, Heidelberg (2011)CrossRef
20.
go back to reference Hersh, W., Turpin, A., Price, S., Chan, B., Kramer, D., Sacherek, L., Olson, D.: Do batch and user evaluations give the same results? In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 17–24 (2000) Hersh, W., Turpin, A., Price, S., Chan, B., Kramer, D., Sacherek, L., Olson, D.: Do batch and user evaluations give the same results? In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 17–24 (2000)
21.
go back to reference Beel, J., Langer, S., Genzmehr, M., Gipp, B., Nürnberger, A.: A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation. In: Proceedings of the Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference (RecSys), pp. 7–14 (2013) Beel, J., Langer, S., Genzmehr, M., Gipp, B., Nürnberger, A.: A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation. In: Proceedings of the Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference (RecSys), pp. 7–14 (2013)
22.
go back to reference Beel, J., Gipp, B., Langer, S., Genzmehr, M.: Docear: an academic literature suite for searching, organizing and creating academic literature. In: Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 465–466 (2011) Beel, J., Gipp, B., Langer, S., Genzmehr, M.: Docear: an academic literature suite for searching, organizing and creating academic literature. In: Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 465–466 (2011)
23.
go back to reference Beel, J., Langer, S., Gipp, B., Nürnberger, A.: The architecture and datasets of docear’s research paper recommender system. D-Lib Mag. 20(11/12) (2014). doi:10.1045/ november14-beel Beel, J., Langer, S., Gipp, B., Nürnberger, A.: The architecture and datasets of docear’s research paper recommender system. D-Lib Mag. 20(11/12) (2014). doi:10.​1045/​ november14-beel
24.
go back to reference Beel, J., Langer, S., Genzmehr, M., Müller, C.: Docears PDF inspector: title extraction from PDF files. In: Proceedings of the 13th Joint Conference on Digital Libraries (JCDL 2013), pp. 443–444 (2013) Beel, J., Langer, S., Genzmehr, M., Müller, C.: Docears PDF inspector: title extraction from PDF files. In: Proceedings of the 13th Joint Conference on Digital Libraries (JCDL 2013), pp. 443–444 (2013)
25.
go back to reference Lipinski, M., Yao, K., Breitinger, C., Beel, J., Gipp, B.: Evaluation of header metadata extraction approaches and tools for scientific PDF documents. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2013), pp. 385–386 (2013) Lipinski, M., Yao, K., Breitinger, C., Beel, J., Gipp, B.: Evaluation of header metadata extraction approaches and tools for scientific PDF documents. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2013), pp. 385–386 (2013)
26.
go back to reference Beel, J., Langer, S., Genzmehr, M., Nürnberger, A.: Introducing docear’s research paper recommender system. In: Proceedings of the 13th Joint Conference on Digital Libraries (JCDL 2013), pp. 459–460 (2013) Beel, J., Langer, S., Genzmehr, M., Nürnberger, A.: Introducing docear’s research paper recommender system. In: Proceedings of the 13th Joint Conference on Digital Libraries (JCDL 2013), pp. 459–460 (2013)
27.
go back to reference Beel, J.: Towards effective research-paper recommender systems and user modeling based on mind maps. Ph.D. Thesis. Otto-von-Guericke Universität Magdeburg (2015) Beel, J.: Towards effective research-paper recommender systems and user modeling based on mind maps. Ph.D. Thesis. Otto-von-Guericke Universität Magdeburg (2015)
28.
go back to reference Beel, J., Langer, S., Kapitsaki, G., Breitinger, C., Gipp, B.: Exploring the potential of user modeling based on mind maps. In: Ricci, F., Bontcheva, K., Conlan, O., Lawless, S. (eds.) UMAP 2015. LNCS, vol. 9146, pp. 3–17. Springer, Heidelberg (2015)CrossRef Beel, J., Langer, S., Kapitsaki, G., Breitinger, C., Gipp, B.: Exploring the potential of user modeling based on mind maps. In: Ricci, F., Bontcheva, K., Conlan, O., Lawless, S. (eds.) UMAP 2015. LNCS, vol. 9146, pp. 3–17. Springer, Heidelberg (2015)CrossRef
29.
go back to reference Beel, J., Langer, S., Genzmehr, M., Gipp, B.: Utilizing mind-maps for information retrieval and user modelling. In: Dimitrova, V., Kuflik, T., Chin, D., Ricci, F., Dolog, P., Houben, G.-J. (eds.) UMAP 2014. LNCS, vol. 8538, pp. 301–313. Springer, Heidelberg (2014) Beel, J., Langer, S., Genzmehr, M., Gipp, B.: Utilizing mind-maps for information retrieval and user modelling. In: Dimitrova, V., Kuflik, T., Chin, D., Ricci, F., Dolog, P., Houben, G.-J. (eds.) UMAP 2014. LNCS, vol. 8538, pp. 301–313. Springer, Heidelberg (2014)
30.
go back to reference Rich, E.: User modeling via stereotypes. Cognitive science 3(4), 329–354 (1979)CrossRef Rich, E.: User modeling via stereotypes. Cognitive science 3(4), 329–354 (1979)CrossRef
31.
go back to reference MacRoberts, M.H., MacRoberts, B.: Problems of citation analysis. Scientometrics 36, 435–444 (1996)CrossRef MacRoberts, M.H., MacRoberts, B.: Problems of citation analysis. Scientometrics 36, 435–444 (1996)CrossRef
Metadata
Title
A Comparison of Offline Evaluations, Online Evaluations, and User Studies in the Context of Research-Paper Recommender Systems
Authors
Joeran Beel
Stefan Langer
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-24592-8_12

Premium Partner