Skip to main content
Erschienen in: Journal of Combinatorial Optimization 1/2020

26.09.2019

The seeding algorithm for k-means problem with penalties

verfasst von: Min Li, Dachuan Xu, Jun Yue, Dongmei Zhang, Peng Zhang

Erschienen in: Journal of Combinatorial Optimization | Ausgabe 1/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The k-means problem is a classic NP-hard problem in machine learning and computational geometry. And its goal is to separate the given set into k clusters according to the minimal squared distance. The k-means problem with penalties, as one generalization of k-means problem, allows that some point need not be clustered instead of being paid some penalty. In this paper, we study the k-means problem with penalties by using the seeding algorithm. We propose that the accuracy only involves the ratio of the maximal penalty value to the minimal one. When the penalty is uniform, the approximation factor reduces to the same one for the k-means problem. Moreover, our result generalizes the k-means++ for k-means problem to the penalty version. Numerical experiments show that our seeding algorithm is more effective than the one without using seeding.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Aggarwal A, Deshpande A, Kannan R (2009) Adaptive sampling for \(k\)-means clustering. In: Proceedings of APPROX and RANDOM, pp. 15–28 Aggarwal A, Deshpande A, Kannan R (2009) Adaptive sampling for \(k\)-means clustering. In: Proceedings of APPROX and RANDOM, pp. 15–28
Zurück zum Zitat Ahmadian S, Norouzi-Fard A, Svensson O, Ward J (2017) Better guarantees for \(k\)-means and Euclidean \(k\)-median by primal–dual algorithms. In: Proceedings of FOCS, pp. 61–72 Ahmadian S, Norouzi-Fard A, Svensson O, Ward J (2017) Better guarantees for \(k\)-means and Euclidean \(k\)-median by primal–dual algorithms. In: Proceedings of FOCS, pp. 61–72
Zurück zum Zitat Aloise D, Deshpande A, Hansen P, Popat P (2009) NP-hardness of Euclidean sum-of-squares clustering. Mach Learn 75:245–248CrossRef Aloise D, Deshpande A, Hansen P, Popat P (2009) NP-hardness of Euclidean sum-of-squares clustering. Mach Learn 75:245–248CrossRef
Zurück zum Zitat Arthur D, Vassilvitskii S (2007) \(k\)-means++: The advantages of careful seeding. In: Proceedings of SODA, pp. 1027–1035 Arthur D, Vassilvitskii S (2007) \(k\)-means++: The advantages of careful seeding. In: Proceedings of SODA, pp. 1027–1035
Zurück zum Zitat Awasthi P, Charikar M, Krishnaswamy R, Sinop AK (2015) The hardness of approximation of Euclidean \(k\)-means. In: Proceedings of SoCG, pp. 754–767 Awasthi P, Charikar M, Krishnaswamy R, Sinop AK (2015) The hardness of approximation of Euclidean \(k\)-means. In: Proceedings of SoCG, pp. 754–767
Zurück zum Zitat Bachem O, Lucic M, Hassani SH, Krause A (2016a) Approximate \(k\)-means++ in sublinear time. In: Proceedings of AAAI, pp. 1459–1467 Bachem O, Lucic M, Hassani SH, Krause A (2016a) Approximate \(k\)-means++ in sublinear time. In: Proceedings of AAAI, pp. 1459–1467
Zurück zum Zitat Bachem O, Lucic M, Hassani SH, Krause A (2016b) Fast and provably good seedings for \(k\)-means. In: Proceedings of NIPS, pp. 55–63 Bachem O, Lucic M, Hassani SH, Krause A (2016b) Fast and provably good seedings for \(k\)-means. In: Proceedings of NIPS, pp. 55–63
Zurück zum Zitat Bachem O, Lucic M, Krause A (2017) Distributed and provably good seedings for \(k\)-means in constant rounds. In: Proceedings of ICML, pp. 292–300 Bachem O, Lucic M, Krause A (2017) Distributed and provably good seedings for \(k\)-means in constant rounds. In: Proceedings of ICML, pp. 292–300
Zurück zum Zitat Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable \(k\)-means++. In: Proceedings of the VLDB endowment, pp. 622–633 Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable \(k\)-means++. In: Proceedings of the VLDB endowment, pp. 622–633
Zurück zum Zitat Blömer J, Lammersen C, Schmidt M, Sohler C (2016) Theoretical analysis of the \(k\)-means algorithm–a survey. In: Kliemann L, Sanders P (eds) Algorithm engineering, Springer, New York, pp. 81–116 Blömer J, Lammersen C, Schmidt M, Sohler C (2016) Theoretical analysis of the \(k\)-means algorithm–a survey. In: Kliemann L, Sanders P (eds) Algorithm engineering, Springer, New York, pp. 81–116
Zurück zum Zitat Chang XY, Wang Y, Li R, Xu Z (2014) Sparse \(k\)-means with \(l_\infty \)/\(l_0\) penalty for high-dimensional data clustering. arXiv:1403.7890v1 Chang XY, Wang Y, Li R, Xu Z (2014) Sparse \(k\)-means with \(l_\infty \)/\(l_0\) penalty for high-dimensional data clustering. arXiv:​1403.​7890v1
Zurück zum Zitat Drineas P, Frieze A, Kannan R, Vempala S, Vinay V (2004) Clustering large graphs via the singular value decomposition. Mach Learn 56:9–33CrossRef Drineas P, Frieze A, Kannan R, Vempala S, Vinay V (2004) Clustering large graphs via the singular value decomposition. Mach Learn 56:9–33CrossRef
Zurück zum Zitat Har-Peled S, Sadri B (2005) How fast is the \(k\)-means method? In: Proceedings of SODA, pp. 332–229 Har-Peled S, Sadri B (2005) How fast is the \(k\)-means method? In: Proceedings of SODA, pp. 332–229
Zurück zum Zitat Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverma R, Wu AY (2004) A local search approximation algorithm for \(k\)-means clustering. Comput Geom Theory Appl 28:89–112MathSciNetCrossRef Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverma R, Wu AY (2004) A local search approximation algorithm for \(k\)-means clustering. Comput Geom Theory Appl 28:89–112MathSciNetCrossRef
Zurück zum Zitat Lee E, Schmidt M, Wright J (2017) Improved and simplified inapproximability for \(k\)-means. Inf Process Lett 120:40–43MathSciNetCrossRef Lee E, Schmidt M, Wright J (2017) Improved and simplified inapproximability for \(k\)-means. Inf Process Lett 120:40–43MathSciNetCrossRef
Zurück zum Zitat Ostrovsky R, Rabani Y, Schulman L, Swamy C (2012) The effectiveness of Lloyd-type methods for the \(k\)-means problem. J ACM 59:28:1–28:22MathSciNetCrossRef Ostrovsky R, Rabani Y, Schulman L, Swamy C (2012) The effectiveness of Lloyd-type methods for the \(k\)-means problem. J ACM 59:28:1–28:22MathSciNetCrossRef
Zurück zum Zitat Rezaei M, Färnti P (2016) Set matching measures for external cluster validity. IEEE Trans Knowl Data Eng 28:2173–2186CrossRef Rezaei M, Färnti P (2016) Set matching measures for external cluster validity. IEEE Trans Knowl Data Eng 28:2173–2186CrossRef
Zurück zum Zitat Tseng GC (2007) Penalized and weighted \(k\)-means for clustering with scattered objects and prior information in high-throughput biological data. Bioinformatics 23:2247–2255CrossRef Tseng GC (2007) Penalized and weighted \(k\)-means for clustering with scattered objects and prior information in high-throughput biological data. Bioinformatics 23:2247–2255CrossRef
Zurück zum Zitat Xu D, Xu Y, Zhang D (2017) A survey on algorithm for \(k\)-means problem and its variants. Oper Res Trans 21:101–109CrossRef Xu D, Xu Y, Zhang D (2017) A survey on algorithm for \(k\)-means problem and its variants. Oper Res Trans 21:101–109CrossRef
Zurück zum Zitat Zhang D, Hao C, Wu C, Xu D, Zhang Z (2017) A local search approximation algorithm for the \(k\)-means problem with penalties. In: Proceedings of COCOON, pp. 568–574 Zhang D, Hao C, Wu C, Xu D, Zhang Z (2017) A local search approximation algorithm for the \(k\)-means problem with penalties. In: Proceedings of COCOON, pp. 568–574
Metadaten
Titel
The seeding algorithm for k-means problem with penalties
verfasst von
Min Li
Dachuan Xu
Jun Yue
Dongmei Zhang
Peng Zhang
Publikationsdatum
26.09.2019
Verlag
Springer US
Erschienen in
Journal of Combinatorial Optimization / Ausgabe 1/2020
Print ISSN: 1382-6905
Elektronische ISSN: 1573-2886
DOI
https://doi.org/10.1007/s10878-019-00450-w

Weitere Artikel der Ausgabe 1/2020

Journal of Combinatorial Optimization 1/2020 Zur Ausgabe

Premium Partner