Skip to main content

2016 | OriginalPaper | Buchkapitel

A Tale of Four Metrics

verfasst von : Richard Connor

Erschienen in: Similarity Search and Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

There are many contexts where the definition of similarity in multivariate space requires to be based on the correlation, rather than absolute value, of the variables. Examples include classic IR measurements such as TDF/IF and BM25, client similarity measures based on collaborative filtering, feature analysis of chemical molecules, and biodiversity contexts.
In such cases, it is almost standard for Cosine similarity to be used. More recently, Jensen-Shannon divergence has appeared in a proper metric form, and a related metric Structural Entropic Distance (SED) has been investigated. A fourth metric, based on a little-known divergence function named as Triangular Divergence, is also assessed here.
For these metrics, we study their properties in the context of similarity and metric search. We compare and contrast their semantics and performance. Our conclusion is that, despite Cosine Distance being an almost automatic choice in this context, Triangular Distance is most likely to be the best choice in terms of a compromise between semantics and performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Some functions are formally undefined in the presence of zero values, requiring either \(0\log 0\) or 0/0. In each case, there is in fact a good mathematical argument for treating these terms as 0 rather than undefined.
 
2
See [4] for an explanation of this constant.
 
3
In fact Shannon’s entropy raised to the power of the logarithm base, see [4] for details.
 
4
118 dimensions.
 
Literatur
1.
Zurück zum Zitat Connor, R., Cardillo, F.A., Vadicamo, L., Rabitti, F.: Hilbert Exclusion: Improved Metric Search Through Finite Isometric Embeddings. ArXiv e-prints, accepted for publication ACM TOIS, April 2016 Connor, R., Cardillo, F.A., Vadicamo, L., Rabitti, F.: Hilbert Exclusion: Improved Metric Search Through Finite Isometric Embeddings. ArXiv e-prints, accepted for publication ACM TOIS, April 2016
2.
Zurück zum Zitat Connor, R., Cardillo, F.A., Vadicamo, L., Rabitti, F.: Supermetric Search with the Four-Point Property. Accepted for publication SISAP, Tokyo, Japan, October 2016 Connor, R., Cardillo, F.A., Vadicamo, L., Rabitti, F.: Supermetric Search with the Four-Point Property. Accepted for publication SISAP, Tokyo, Japan, October 2016
3.
Zurück zum Zitat Connor, R., Moss, R.: A multivariate correlation distance for vector spaces. In: Navarro, G., Pestov, V. (eds.) SISAP 2012. LNCS, vol. 7404, pp. 209–225. Springer, Heidelberg (2012)CrossRef Connor, R., Moss, R.: A multivariate correlation distance for vector spaces. In: Navarro, G., Pestov, V. (eds.) SISAP 2012. LNCS, vol. 7404, pp. 209–225. Springer, Heidelberg (2012)CrossRef
4.
Zurück zum Zitat Connor, R., Simeoni, F., Iakovos, M., Moss, R.: A bounded distance metric for comparing tree structure. Inf. Syst. 36(4), 748–764 (2011)CrossRef Connor, R., Simeoni, F., Iakovos, M., Moss, R.: A bounded distance metric for comparing tree structure. Inf. Syst. 36(4), 748–764 (2011)CrossRef
6.
7.
Zurück zum Zitat Fuglede, B., Topsoe, F.: Jensen-Shannon divergence and Hilbert space embedding. In: Proceedings of International Symposium on Information Theory, ISIT 2004, p. 31 (2004) Fuglede, B., Topsoe, F.: Jensen-Shannon divergence and Hilbert space embedding. In: Proceedings of International Symposium on Information Theory, ISIT 2004, p. 31 (2004)
8.
Zurück zum Zitat Jones, K.S., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: development and comparative experiments: part 2. Inf. Process. Manag. 36(6), 809–840 (2000)CrossRef Jones, K.S., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: development and comparative experiments: part 2. Inf. Process. Manag. 36(6), 809–840 (2000)CrossRef
10.
Zurück zum Zitat Österreicher, F., Vajda, I.: A new class of metric divergences on probability spaces and and its statistical applications. Ann. Inst. Stat. Math. 55, 639–653 (2003)CrossRefMATH Österreicher, F., Vajda, I.: A new class of metric divergences on probability spaces and and its statistical applications. Ann. Inst. Stat. Math. 55, 639–653 (2003)CrossRefMATH
11.
Zurück zum Zitat Singhal, A.: Modern information retrieval: a brief overview. IEEE Data Eng. Bull. 24(4), 35–43 (2001) Singhal, A.: Modern information retrieval: a brief overview. IEEE Data Eng. Bull. 24(4), 35–43 (2001)
12.
Zurück zum Zitat Topsoe, F.: Some inequalities for information divergence and related measures of discrimination. IEEE Trans. Inf. Theor. 46(4), 1602–1609 (2000)MathSciNetCrossRefMATH Topsoe, F.: Some inequalities for information divergence and related measures of discrimination. IEEE Trans. Inf. Theor. 46(4), 1602–1609 (2000)MathSciNetCrossRefMATH
13.
Zurück zum Zitat Topsøe, F.: Jenson-Shannon divergence and norm-based measures of discrimination and variation. Preprint math.ku.dk (2003) Topsøe, F.: Jenson-Shannon divergence and norm-based measures of discrimination and variation. Preprint math.​ku.​dk (2003)
Metadaten
Titel
A Tale of Four Metrics
verfasst von
Richard Connor
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-46759-7_16