Skip to main content
Erschienen in: Social Network Analysis and Mining 1/2020

01.12.2020 | Original Article

A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective

verfasst von: Vijay Verma, Rajesh Kumar Aggarwal

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Jaccard index, originally proposed by Jaccard (Bull Soc Vaudoise Sci Nat 37:241–272, 1901), is a measure for examining the similarity (or dissimilarity) between two sample data objects. It is defined as the proportion of the intersection size to the union size of the two data samples. It provides a very simple and intuitive measure of similarity between data samples. This research examines the measures that are akin to the Jaccard index and may be used for modelling affinity between users (or items) in collaborative recommendations. Particularly, the measures such as simple matching coefficient (SMC), Sorensen–Dice coefficient (SDC), Salton’s cosine index (SCI), and overlap coefficient (OLC) are compared and analysed in both theoretical and empirical perspectives with respect to the Jaccard index. Since these measures apprehend only the structural similarity information (overlapping information) between the data samples, these are very useful in situations where only the associations between users and items are available such as browsing or buying behaviours of the users on an e-commerce portal (i.e. unary rating data, a special case of ratings). Furthermore, a theoretical relation among these measures has been established. We have also derived an equivalent expression for each of these measures so that it can be directly applied for binary data samples in data mining/machine learning jargon. In order to compare and validate the effectiveness of these structural similarity measures, several experiments have been conducted using standardized benchmark datasets (MovieLens, FilmTrust, Epinions, Yahoo! Movies, and Yahoo! Music). Empirically obtained results demonstrate that the Salton’s cosine index (SCI) provides better accuracy (in terms of MAE, RMSE, and precision) for large datasets, whereas the overlap coefficient (OLC) results in more accurate recommendations for small datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Aggarwal CC (2016) Recommender systems: the textbook, 1st edn. Springer, BerlinCrossRef Aggarwal CC (2016) Recommender systems: the textbook, 1st edn. Springer, BerlinCrossRef
Zurück zum Zitat Ahn HJ (2008) A new similarity measure for collaborative filtering to alleviate the new user cold-starting problem. Inf Sci (NY) 178(1):37–51CrossRef Ahn HJ (2008) A new similarity measure for collaborative filtering to alleviate the new user cold-starting problem. Inf Sci (NY) 178(1):37–51CrossRef
Zurück zum Zitat Al Hassanieh L, Jaoudeh CA, Abdo JB, Demerjian J (2018) Similarity measures for collaborative filtering recommender systems. In: 2018 IEEE Middle East North Africa communications conference MENACOMM, 2018, pp 1–5, 2018 Al Hassanieh L, Jaoudeh CA, Abdo JB, Demerjian J (2018) Similarity measures for collaborative filtering recommender systems. In: 2018 IEEE Middle East North Africa communications conference MENACOMM, 2018, pp 1–5, 2018
Zurück zum Zitat Al-bashiri H, Abdulgabber MA, Romli A, Hujainah F (2017) Collaborative filtering similarity measures: revisiting. In: ACM international conference proceeding series, vol Part F1312, pp 195–200 Al-bashiri H, Abdulgabber MA, Romli A, Hujainah F (2017) Collaborative filtering similarity measures: revisiting. In: ACM international conference proceeding series, vol Part F1312, pp 195–200
Zurück zum Zitat Arsan T, Koksal E, Bozkus Z (2016) Comparison of collaborative filtering algorithms with various similarity measures for movie recommendation. Int J Comput Sci Eng Appl 6(3):1–20 Arsan T, Koksal E, Bozkus Z (2016) Comparison of collaborative filtering algorithms with various similarity measures for movie recommendation. Int J Comput Sci Eng Appl 6(3):1–20
Zurück zum Zitat Bag S, Kumar SK, Tiwari MK (2019) An efficient recommendation generation using relevant Jaccard similarity. Inf Sci (NY) 483:53–64CrossRef Bag S, Kumar SK, Tiwari MK (2019) An efficient recommendation generation using relevant Jaccard similarity. Inf Sci (NY) 483:53–64CrossRef
Zurück zum Zitat Balabanović M, Shoham Y (1997) Fab: content-based, collaborative recommendation. Commun ACM 40(3):66–72CrossRef Balabanović M, Shoham Y (1997) Fab: content-based, collaborative recommendation. Commun ACM 40(3):66–72CrossRef
Zurück zum Zitat Billsus D, Pazzani MJ (1998) Learning collaborative information filters. In: Proceedings of the fifteenth international conference on machine learning, vol 54, p 48 Billsus D, Pazzani MJ (1998) Learning collaborative information filters. In: Proceedings of the fifteenth international conference on machine learning, vol 54, p 48
Zurück zum Zitat Billsus D, Pazzani MJ (2002) User modeling for adaptative news access. User Model User Adapt Interact. 10:147–180CrossRef Billsus D, Pazzani MJ (2002) User modeling for adaptative news access. User Model User Adapt Interact. 10:147–180CrossRef
Zurück zum Zitat Bobadilla J, Serradilla F, Bernal J (2010) A new collaborative filtering metric that improves the behavior of recommender systems. Knowl-Based Syst 23(6):520–528CrossRef Bobadilla J, Serradilla F, Bernal J (2010) A new collaborative filtering metric that improves the behavior of recommender systems. Knowl-Based Syst 23(6):520–528CrossRef
Zurück zum Zitat Bobadilla J, Ortega F, Hernando A, Arroyo Á (2012a) A balanced memory-based collaborative filtering similarity measure. Int J Intell Syst 27(10):939–946CrossRef Bobadilla J, Ortega F, Hernando A, Arroyo Á (2012a) A balanced memory-based collaborative filtering similarity measure. Int J Intell Syst 27(10):939–946CrossRef
Zurück zum Zitat Bobadilla J, Hernando A, Ortega F, Gutiérrez A (2012b) Collaborative filtering based on significances. Inf Sci (NY) 185(1):1–17CrossRef Bobadilla J, Hernando A, Ortega F, Gutiérrez A (2012b) Collaborative filtering based on significances. Inf Sci (NY) 185(1):1–17CrossRef
Zurück zum Zitat Bobadilla J, Ortega F, Hernando A (2012c) A collaborative filtering similarity measure based on singularities. Inf Process Manag 48(2):204–217CrossRef Bobadilla J, Ortega F, Hernando A (2012c) A collaborative filtering similarity measure based on singularities. Inf Process Manag 48(2):204–217CrossRef
Zurück zum Zitat Breese JS, Heckerman D, Kadie C (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the 14th conference on uncertainty in artificial intelligence, vol 461, no 8, pp 43–52 Breese JS, Heckerman D, Kadie C (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the 14th conference on uncertainty in artificial intelligence, vol 461, no 8, pp 43–52
Zurück zum Zitat Burke R (2002) Hybrid recommender systems: survey and experiments. User Model User-Adapted Interact Burke R (2002) Hybrid recommender systems: survey and experiments. User Model User-Adapted Interact
Zurück zum Zitat Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)? Arguments against avoiding RMSE in the literature. Geosci Model Dev 7(3):1247–1250CrossRef Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)? Arguments against avoiding RMSE in the literature. Geosci Model Dev 7(3):1247–1250CrossRef
Zurück zum Zitat Dice LR (1945) Measures of the amount of ecologic association between species. Ecology 26(3):297–302CrossRef Dice LR (1945) Measures of the amount of ecologic association between species. Ecology 26(3):297–302CrossRef
Zurück zum Zitat Ekstrand MD (2011) Collaborative filtering recommender systems. Found Trends Hum Comput Interact 4(2):81–173CrossRef Ekstrand MD (2011) Collaborative filtering recommender systems. Found Trends Hum Comput Interact 4(2):81–173CrossRef
Zurück zum Zitat Getoor L, Sahami M (1999) Using probabilistic relational models for collaborative filtering. Work. Web Usage Anal. User Profiling Getoor L, Sahami M (1999) Using probabilistic relational models for collaborative filtering. Work. Web Usage Anal. User Profiling
Zurück zum Zitat Goldberg D, Nichols D, Oki BM, Terry D (1992) Using collaborative filtering to Weave an Information Tapestry. Commun ACM 35(12):61–70CrossRef Goldberg D, Nichols D, Oki BM, Terry D (1992) Using collaborative filtering to Weave an Information Tapestry. Commun ACM 35(12):61–70CrossRef
Zurück zum Zitat Guo G, Zhang J, Yorke-Smith N (2013) A novel Bayesian similarity measure for recommender systems. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), 2013, pp 2619–2625 Guo G, Zhang J, Yorke-Smith N (2013) A novel Bayesian similarity measure for recommender systems. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), 2013, pp 2619–2625
Zurück zum Zitat Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, San FranciscoMATH Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, San FranciscoMATH
Zurück zum Zitat Harper FM, Konstan JA (2015) The MovieLens Datasets. ACM Trans Interact Intell Syst 5(4):1–19CrossRef Harper FM, Konstan JA (2015) The MovieLens Datasets. ACM Trans Interact Intell Syst 5(4):1–19CrossRef
Zurück zum Zitat Herlocker JON, Riedl J (2002) An empirical analysis of design choices in neighborhood-based collaborative filtering algorithms. Inf Retr Boston 2002:287–310CrossRef Herlocker JON, Riedl J (2002) An empirical analysis of design choices in neighborhood-based collaborative filtering algorithms. Inf Retr Boston 2002:287–310CrossRef
Zurück zum Zitat Herlocker JL, Konstan JA, Borchers A, Riedl J (1999) An algorithmic framework for performing collaborative filtering. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval-SIGIR’99, 1999, pp 230–237 Herlocker JL, Konstan JA, Borchers A, Riedl J (1999) An algorithmic framework for performing collaborative filtering. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval-SIGIR’99, 1999, pp 230–237
Zurück zum Zitat Hill W, Stead L, Rosenstein M, Furnas G (1995) Recommending and evaluating choices in a virtual community of use. In: Proceedings of the SIGCHI conference on Human factors in computing systems-CHI’95 Hill W, Stead L, Rosenstein M, Furnas G (1995) Recommending and evaluating choices in a virtual community of use. In: Proceedings of the SIGCHI conference on Human factors in computing systems-CHI’95
Zurück zum Zitat Hofmann T (2003) Collaborative filtering via Gaussian probabilistic latent semantic analysis. In: Proceedings of the 26th annual international ACM SIGIR conference on Res. Dev. information Retr. - SIGIR’03, p 259 Hofmann T (2003) Collaborative filtering via Gaussian probabilistic latent semantic analysis. In: Proceedings of the 26th annual international ACM SIGIR conference on Res. Dev. information Retr. - SIGIR’03, p 259
Zurück zum Zitat Jaccard P (1901) Distribution comparée de la flore alpine dans quelques régions des Alpes occidentales et orientales. Bull Soc Vaudoise Sci Nat 37:241–272 Jaccard P (1901) Distribution comparée de la flore alpine dans quelques régions des Alpes occidentales et orientales. Bull Soc Vaudoise Sci Nat 37:241–272
Zurück zum Zitat Joaquin D, Naohiro I (1999) Memory-based weighted-majority prediction for recommender systems. Res Dev Inf Retr Joaquin D, Naohiro I (1999) Memory-based weighted-majority prediction for recommender systems. Res Dev Inf Retr
Zurück zum Zitat Konstan JA, Miller BN, Maltz D, Herlocker JL, Gordon LR, Riedl J (1997) GroupLens: applying collaborative filtering to Usenet news. Commun ACM 40(3):77–87CrossRef Konstan JA, Miller BN, Maltz D, Herlocker JL, Gordon LR, Riedl J (1997) GroupLens: applying collaborative filtering to Usenet news. Commun ACM 40(3):77–87CrossRef
Zurück zum Zitat Laghmari K, Marsala C, Ramdani M (2018) An adapted incremental graded multi-label classification model for recommendation systems. Prog Artif Intell 7(1):15–29CrossRef Laghmari K, Marsala C, Ramdani M (2018) An adapted incremental graded multi-label classification model for recommendation systems. Prog Artif Intell 7(1):15–29CrossRef
Zurück zum Zitat Lang K (1995) NewsWeeder : learning to filter netnews (To appear in ML 95). In: Proceedings of the 12th international machine learning conference Lang K (1995) NewsWeeder : learning to filter netnews (To appear in ML 95). In: Proceedings of the 12th international machine learning conference
Zurück zum Zitat Liu H, Hu Z, Mian A, Tian H, Zhu X (2014) A new user similarity model to improve the accuracy of collaborative filtering. Knowl-Based Syst 56:156–166CrossRef Liu H, Hu Z, Mian A, Tian H, Zhu X (2014) A new user similarity model to improve the accuracy of collaborative filtering. Knowl-Based Syst 56:156–166CrossRef
Zurück zum Zitat Marlin B (2003) Modeling user rating profiles for collaborative filtering. In: Proceedings of the 16th international conference on neural information processing systems, 2003, pp 627–634 Marlin B (2003) Modeling user rating profiles for collaborative filtering. In: Proceedings of the 16th international conference on neural information processing systems, 2003, pp 627–634
Zurück zum Zitat Massa P, Avesani P (2007) Trust-aware recommender systems. In: Proceedings of the 2007 ACM conference on recommender systems, 2007, pp 17–24 Massa P, Avesani P (2007) Trust-aware recommender systems. In: Proceedings of the 2007 ACM conference on recommender systems, 2007, pp 17–24
Zurück zum Zitat Nakamura A, Abe N (1998) Collaborative filtering using weighted majority prediction algorithms. In: Proceedings of the fifteenth international conference on machine learning, 1998, pp 395–403 Nakamura A, Abe N (1998) Collaborative filtering using weighted majority prediction algorithms. In: Proceedings of the fifteenth international conference on machine learning, 1998, pp 395–403
Zurück zum Zitat Ortega F, Zhu B, Bobadilla J, Hernando A (2018) CF4J: collaborative filtering for Java. Knowl-Based Syst 152:94–99CrossRef Ortega F, Zhu B, Bobadilla J, Hernando A (2018) CF4J: collaborative filtering for Java. Knowl-Based Syst 152:94–99CrossRef
Zurück zum Zitat Owen S, Anil R, Dunning T, Friedman E (2011) Mahout in action. Manning Publications Co., Greenwich Owen S, Anil R, Dunning T, Friedman E (2011) Mahout in action. Manning Publications Co., Greenwich
Zurück zum Zitat Patra BK, Launonen R, Ollikainen V, Nandi S (2015) A new similarity measure using Bhattacharyya coefficient for collaborative filtering in sparse data. Knowl-Based Syst 82:163–177CrossRef Patra BK, Launonen R, Ollikainen V, Nandi S (2015) A new similarity measure using Bhattacharyya coefficient for collaborative filtering in sparse data. Knowl-Based Syst 82:163–177CrossRef
Zurück zum Zitat Pavlov D, Pennock D (2002) A maximum entropy approach to collaborative filtering in dynamic, sparse, high-dimensional domains. Proc Neural Inf Process Syst 2002:1441–1448 Pavlov D, Pennock D (2002) A maximum entropy approach to collaborative filtering in dynamic, sparse, high-dimensional domains. Proc Neural Inf Process Syst 2002:1441–1448
Zurück zum Zitat Resnick P, Varian HR (1997) Recommender systems 40(3) Resnick P, Varian HR (1997) Recommender systems 40(3)
Zurück zum Zitat Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) GroupLens : an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM conference on computer supported cooperative work, 1994, pp 175–186 Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) GroupLens : an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM conference on computer supported cooperative work, 1994, pp 175–186
Zurück zum Zitat Ricci F, Rokach L, Shapira B, Kantor PB (2010) Recommender systems handbook, 1st edn. Springer, BerlinMATH Ricci F, Rokach L, Shapira B, Kantor PB (2010) Recommender systems handbook, 1st edn. Springer, BerlinMATH
Zurück zum Zitat Salton G, McGill M (1983) Introduction to modem information, pp 375–384 Salton G, McGill M (1983) Introduction to modem information, pp 375–384
Zurück zum Zitat Sarwar B, Karypis G, Konstan J, Reidl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the tenth international conference world wide web-WWW’01, pp 285–295 Sarwar B, Karypis G, Konstan J, Reidl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the tenth international conference world wide web-WWW’01, pp 285–295
Zurück zum Zitat Science C, Wnek J (1997) Learning and revising user profiles: the identification of interesting web sites. Mach Learn 331:313–331 Science C, Wnek J (1997) Learning and revising user profiles: the identification of interesting web sites. Mach Learn 331:313–331
Zurück zum Zitat Shardanand U, Maes P (1995) Social information filtering: algorithms for automating ‘word of mouth’. In: Proceedings of the SIGCHI conference on human factors in computing systems-CHI’95, pp 210–217 Shardanand U, Maes P (1995) Social information filtering: algorithms for automating ‘word of mouth’. In: Proceedings of the SIGCHI conference on human factors in computing systems-CHI’95, pp 210–217
Zurück zum Zitat Shi Y, Larson M, Hanjalic A (2014) Collaborative filtering beyond the user-item matrix: a survey of the state of the art and future challenges. ACM Comput Surv 47(1):1–45CrossRef Shi Y, Larson M, Hanjalic A (2014) Collaborative filtering beyond the user-item matrix: a survey of the state of the art and future challenges. ACM Comput Surv 47(1):1–45CrossRef
Zurück zum Zitat Sondur SD, Nayak S, Chigadani AP (2016) Similarity measures for recommender systems: a comparative study. Int J Sci Res Dev 2(3):76–80 Sondur SD, Nayak S, Chigadani AP (2016) Similarity measures for recommender systems: a comparative study. Int J Sci Res Dev 2(3):76–80
Zurück zum Zitat Sorensen T (1948) A method of establishing groups of equal amplitude in plant sociology based on similarity of species content. Det Kong Danske Vidensk Selesk Biol Skr 5(1):1–34 Sorensen T (1948) A method of establishing groups of equal amplitude in plant sociology based on similarity of species content. Det Kong Danske Vidensk Selesk Biol Skr 5(1):1–34
Zurück zum Zitat Stephen SC, Xie H, Rai S (2017) Measures of similarity in memory-based collaborative filtering recommender system—a comparison. In: ACM international conference proceeding series, vol Part F1296, 2017 Stephen SC, Xie H, Rai S (2017) Measures of similarity in memory-based collaborative filtering recommender system—a comparison. In: ACM international conference proceeding series, vol Part F1296, 2017
Zurück zum Zitat Su X, Khoshgoftaar TM (2009) A survey of collaborative filtering techniques. Adv Artif Intell 2009(Section 3):1–19CrossRef Su X, Khoshgoftaar TM (2009) A survey of collaborative filtering techniques. Adv Artif Intell 2009(Section 3):1–19CrossRef
Zurück zum Zitat Suganeshwari G, Syed Ibrahim SP (2018) A comparison study on similarity measures in collaborative filtering algorithms for movie recommendation. Int J Pure Appl Math 119(15 Special Issue C):1495–1505 Suganeshwari G, Syed Ibrahim SP (2018) A comparison study on similarity measures in collaborative filtering algorithms for movie recommendation. Int J Pure Appl Math 119(15 Special Issue C):1495–1505
Zurück zum Zitat Sun SB et al (2017) Integrating triangle and Jaccard similarities for recommendation. PLoS ONE 12(8):1–16 Sun SB et al (2017) Integrating triangle and Jaccard similarities for recommendation. PLoS ONE 12(8):1–16
Zurück zum Zitat Vijaymeena MK, Kavitha K (2016) A survey on similarity measures in text mining. Mach Learn Appl Int J 3(1):19–28 Vijaymeena MK, Kavitha K (2016) A survey on similarity measures in text mining. Mach Learn Appl Int J 3(1):19–28
Metadaten
Titel
A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective
verfasst von
Vijay Verma
Rajesh Kumar Aggarwal
Publikationsdatum
01.12.2020
Verlag
Springer Vienna
Erschienen in
Social Network Analysis and Mining / Ausgabe 1/2020
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-020-00660-9

Weitere Artikel der Ausgabe 1/2020

Social Network Analysis and Mining 1/2020 Zur Ausgabe

Premium Partner