Skip to main content

2015 | OriginalPaper | Buchkapitel

Graph Based Kernel k-Means Using Representative Data Points as Initial Centers

verfasst von : Wuyi Yang, Liguo Tang

Erschienen in: Intelligent Computing Theories and Methodologies

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The k-means algorithm is undoubtedly the most widely used data clustering algorithm due to its relative simplicity. It can only handle data that are linearly separable. A generalization of k-means is kernel k-means, which can handle data that are not linearly separable. Standard k-means and kernel k-means have the same disadvantage of being sensitive to the initial placement of the cluster centers. A novel kernel k-means algorithm is proposed in the paper. The proposed algorithm uses a graph based kernel matrix and finds k data points as initial centers for kernel k-means. Since finding the optimal data points as initial centers is an NP-hard problem, this problem is relaxed to obtain k representative data points as initial centers. Matching pursuit algorithm for multiple vectors is used to greedily find k representative data points. The proposed algorithm is tested on synthetic and real-world datasets and compared with kernel k-means algorithms using other initialization techniques. Our empirical study shows encouraging results of the proposed algorithm.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Jain, A.K.: Data clustering, 50 years beyond K-means. Pattern Recogn. Lett. 31, 651–666 (2010)CrossRef Jain, A.K.: Data clustering, 50 years beyond K-means. Pattern Recogn. Lett. 31, 651–666 (2010)CrossRef
2.
Zurück zum Zitat Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–296 (1967) Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–296 (1967)
3.
Zurück zum Zitat Celebi, E.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the K-means clustering algorithm. Expert Syst. Appl. 40, 200–210 (2013)CrossRef Celebi, E.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the K-means clustering algorithm. Expert Syst. Appl. 40, 200–210 (2013)CrossRef
4.
Zurück zum Zitat Dhillon, I.S., Guan, Y., Kulis, B.: Weighted graph cuts without eigenvectors: a multilevel approach. IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 1944–1957 (2007)CrossRef Dhillon, I.S., Guan, Y., Kulis, B.: Weighted graph cuts without eigenvectors: a multilevel approach. IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 1944–1957 (2007)CrossRef
5.
Zurück zum Zitat Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10, 1299–1319 (1998)CrossRef Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10, 1299–1319 (1998)CrossRef
6.
Zurück zum Zitat Hochbaum, D., Shmoys, D.: A best possible heuristic for the K-center problem. Math. Oper. Res. 10(2), 180–184 (1985)MathSciNetCrossRef Hochbaum, D., Shmoys, D.: A best possible heuristic for the K-center problem. Math. Oper. Res. 10(2), 180–184 (1985)MathSciNetCrossRef
7.
Zurück zum Zitat Ball, G.H., Hall, D.J.: A clustering technique for summarizing multivariate data. Behav. Sci. 12(2), 153–155 (1967)CrossRef Ball, G.H., Hall, D.J.: A clustering technique for summarizing multivariate data. Behav. Sci. 12(2), 153–155 (1967)CrossRef
8.
Zurück zum Zitat Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007) Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)
9.
Zurück zum Zitat Hasan, M.A., Chaoji, V., Salem, S., Zaki, M.: Robust partitional clustering by outlier and density insensitive seeding. Pattern Recogn. 30(11), 994–1002 (2009)CrossRef Hasan, M.A., Chaoji, V., Salem, S., Zaki, M.: Robust partitional clustering by outlier and density insensitive seeding. Pattern Recogn. 30(11), 994–1002 (2009)CrossRef
10.
Zurück zum Zitat Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances In Neural Information Processing Systems, pp. 585–591 (2001) Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances In Neural Information Processing Systems, pp. 585–591 (2001)
11.
Zurück zum Zitat Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)CrossRef Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)CrossRef
12.
Zurück zum Zitat Smola, A.J., Kondor, R.: Kernels and regularization on graphs. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 144–158. Springer, Heidelberg (2003)CrossRef Smola, A.J., Kondor, R.: Kernels and regularization on graphs. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 144–158. Springer, Heidelberg (2003)CrossRef
13.
Zurück zum Zitat Velmurugan, T., Santhanam, T.: A survey of partition based clustering algorithms in data mining: an experimental approach. Inf. Technol. J. 10(3), 478–484 (2011)CrossRef Velmurugan, T., Santhanam, T.: A survey of partition based clustering algorithms in data mining: an experimental approach. Inf. Technol. J. 10(3), 478–484 (2011)CrossRef
14.
Zurück zum Zitat Yu, K., Bi, J., Tresp, V.: Active learning via transductive experimental design. In: International Conference on Machine Learning (2006) Yu, K., Bi, J., Tresp, V.: Active learning via transductive experimental design. In: International Conference on Machine Learning (2006)
15.
Zurück zum Zitat Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Sig. Process. 41(12), 3397–3415 (1993)CrossRef Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Sig. Process. 41(12), 3397–3415 (1993)CrossRef
Metadaten
Titel
Graph Based Kernel k-Means Using Representative Data Points as Initial Centers
verfasst von
Wuyi Yang
Liguo Tang
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-22180-9_29