Skip to main content
Top

2019 | OriginalPaper | Chapter

A New Topology-Preserving Distance Metric with Applications to Multi-dimensional Data Clustering

Author : Konstantinos K. Delibasis

Published in: Artificial Intelligence Applications and Innovations

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In many cases of high dimensional data analysis, data points may lie on manifolds of very complex shapes/geometries. Thus, the usual Euclidean distance may lead to suboptimal results when utilized in clustering or visualization operations. In this work, we introduce a new distance definition in multi-dimensional spaces that preserves the topology of the data point manifold. The parameters of the proposed distance are discussed and their physical meaning is explored through 2 and 3-dimensional synthetic datasets. A robust method for the parameterization of the algorithm is suggested. Finally, a modification of the well-known k-means clustering algorithm is introduced, to exploit the benefits of the proposed distance metric for data clustering. Comparative results including other established clustering algorithms are presented in terms of cluster purity and V-measure, for a number of well-known datasets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. Data Min. Knowl. Disc. 11(1), 5–33 (2005)MathSciNetCrossRef Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. Data Min. Knowl. Disc. 11(1), 5–33 (2005)MathSciNetCrossRef
3.
go back to reference Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data 3(1), 1–58 (2009)CrossRef Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data 3(1), 1–58 (2009)CrossRef
4.
go back to reference Pavlidis, N.G., Hofmeyr, D.P., Tasoulis, S.K.: Minimum density hyperplanes. J. Mach. Learn. Res. 17(156), 1–33 (2016)MathSciNetMATH Pavlidis, N.G., Hofmeyr, D.P., Tasoulis, S.K.: Minimum density hyperplanes. J. Mach. Learn. Res. 17(156), 1–33 (2016)MathSciNetMATH
5.
go back to reference Schoelkopf, B., Smola, A., Mueller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5), 1299–1319 (1998)CrossRef Schoelkopf, B., Smola, A., Mueller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5), 1299–1319 (1998)CrossRef
6.
go back to reference Tasoulis, S.K., Tasoulis, D.K., Plagianakos, V.P.: Enhancing principal direction divisive clustering. Pattern Recogn. 43(10), 3391–3411 (2010)MATHCrossRef Tasoulis, S.K., Tasoulis, D.K., Plagianakos, V.P.: Enhancing principal direction divisive clustering. Pattern Recogn. 43(10), 3391–3411 (2010)MATHCrossRef
7.
go back to reference Tenenbaum, J.B., Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Sci. 290(5500), 2319–2323 (2000)CrossRef Tenenbaum, J.B., Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Sci. 290(5500), 2319–2323 (2000)CrossRef
9.
go back to reference Gonen, M., Margolin, A.A.: Localized data fusion for kernel k-means clustering with application to cancer biology. In: Advances in Neural Information Processing Systems 27 (NIPS 2014), Montrιal, Quιbec, Canada (2014) Gonen, M., Margolin, A.A.: Localized data fusion for kernel k-means clustering with application to cancer biology. In: Advances in Neural Information Processing Systems 27 (NIPS 2014), Montrιal, Quιbec, Canada (2014)
Metadata
Title
A New Topology-Preserving Distance Metric with Applications to Multi-dimensional Data Clustering
Author
Konstantinos K. Delibasis
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-19823-7_12

Premium Partner