Skip to main content

2016 | OriginalPaper | Buchkapitel

Centering Versus Scaling for Hubness Reduction

verfasst von : Roman Feldbauer, Arthur Flexer

Erschienen in: Artificial Neural Networks and Machine Learning – ICANN 2016

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Hubs and anti-hubs are points that appear very close or very far to many other data points due to a problem of measuring distances in high-dimensional spaces. Hubness is an aspect of the curse of dimensionality affecting many machine learning tasks. We present the first large scale empirical study to compare two competing hubness reduction techniques: scaling and centering. We show that scaling consistently reduces hubness and improves nearest neighbor classification, while centering shows rather mixed results. Support vector classification is mostly unaffected by centering-based hubness reduction.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Python scripts for hubness analysis are available at: https://​github.​com/​OFAI.
 
Literatur
1.
Zurück zum Zitat Flexer, A.: Improving visualization of high-dimensional music similarity spaces. In: 16th ISMIR Conference (2015) Flexer, A.: Improving visualization of high-dimensional music similarity spaces. In: 16th ISMIR Conference (2015)
2.
Zurück zum Zitat Flexer, A., Schnitzer, D., Schlüter, J.: A MIREX meta-analysis of hubness in audio music similarity. In: 13th ISMIR Conference (2012) Flexer, A., Schnitzer, D., Schlüter, J.: A MIREX meta-analysis of hubness in audio music similarity. In: 13th ISMIR Conference (2012)
3.
Zurück zum Zitat Francois, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE Trans. Knowl. Data Eng. 19, 873–886 (2007)CrossRef Francois, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE Trans. Knowl. Data Eng. 19, 873–886 (2007)CrossRef
4.
Zurück zum Zitat Hara, K., Suzuki, I., Shimbo, M., Kobayashi, K., Fukumizu, K., Radovanović, M.: Localized centering: reducing hubness in large-sample data hubness in high-dimensional data. In: 29th AAAI Conference on Artificial Intelligence, pp. 2645–2651 (2015) Hara, K., Suzuki, I., Shimbo, M., Kobayashi, K., Fukumizu, K., Radovanović, M.: Localized centering: reducing hubness in large-sample data hubness in high-dimensional data. In: 29th AAAI Conference on Artificial Intelligence, pp. 2645–2651 (2015)
5.
Zurück zum Zitat Jegou, H., Harzallah, H., Schmid, C.: A contextual dissimilarity measure for accurate and efficient image search. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2007) Jegou, H., Harzallah, H., Schmid, C.: A contextual dissimilarity measure for accurate and efficient image search. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2007)
6.
Zurück zum Zitat Radovanović, M., Nanopoulos, A., Ivanović, M.: Reverse nearest neighbors in unsupervised distance-based outlier detection. IEEE Trans. Knowl. Data Eng. 27(5), 1369–1382 (2015)CrossRef Radovanović, M., Nanopoulos, A., Ivanović, M.: Reverse nearest neighbors in unsupervised distance-based outlier detection. IEEE Trans. Knowl. Data Eng. 27(5), 1369–1382 (2015)CrossRef
7.
Zurück zum Zitat Radovanović, M., Nanopoulos, A., Ivanović, M.: Hubs in space: popular nearest neighbors in high-dimensional data. J. Mach. Learn. Res. 11, 2487–2531 (2010)MathSciNetMATH Radovanović, M., Nanopoulos, A., Ivanović, M.: Hubs in space: popular nearest neighbors in high-dimensional data. J. Mach. Learn. Res. 11, 2487–2531 (2010)MathSciNetMATH
8.
Zurück zum Zitat Schnitzer, D., Flexer, A.: The unbalancing effect of hubs on K-medoids clustering in high-dimensional spaces. In: International Joint Conference on Neural Networks (2015) Schnitzer, D., Flexer, A.: The unbalancing effect of hubs on K-medoids clustering in high-dimensional spaces. In: International Joint Conference on Neural Networks (2015)
9.
Zurück zum Zitat Schnitzer, D., Flexer, A., Schedl, M., Widmer, G.: Local and global scaling reduce hubs in space. J. Mach. Learn. Res. 13, 2871–2902 (2012)MathSciNetMATH Schnitzer, D., Flexer, A., Schedl, M., Widmer, G.: Local and global scaling reduce hubs in space. J. Mach. Learn. Res. 13, 2871–2902 (2012)MathSciNetMATH
10.
Zurück zum Zitat Schnitzer, D., Flexer, A., Tomašev, N.: A case for hubness removal in high-dimensional multimedia retrieval. In: de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C.X., de Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 687–692. Springer, Heidelberg (2014)CrossRef Schnitzer, D., Flexer, A., Tomašev, N.: A case for hubness removal in high-dimensional multimedia retrieval. In: de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C.X., de Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 687–692. Springer, Heidelberg (2014)CrossRef
11.
Zurück zum Zitat Suzuki, I., Hara, K., Shimbo, M., Saerens, M., Fukumizu, K.: Centering similarity measures to reduce hubs. In: Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), pp. 613–623 (2013) Suzuki, I., Hara, K., Shimbo, M., Saerens, M., Fukumizu, K.: Centering similarity measures to reduce hubs. In: Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), pp. 613–623 (2013)
12.
Zurück zum Zitat Tomašev, N., Radovanović, M., Mladenić, D., Ivanović, M.: The role of hubness in clustering high-dimensional data. IEEE Trans. Knowl. Data Eng. 26(3), 739–751 (2014)CrossRef Tomašev, N., Radovanović, M., Mladenić, D., Ivanović, M.: The role of hubness in clustering high-dimensional data. IEEE Trans. Knowl. Data Eng. 26(3), 739–751 (2014)CrossRef
13.
Zurück zum Zitat Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATH Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATH
Metadaten
Titel
Centering Versus Scaling for Hubness Reduction
verfasst von
Roman Feldbauer
Arthur Flexer
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-44778-0_21