Skip to main content
Erschienen in: Journal of Combinatorial Optimization 3/2022

15.04.2020

The seeding algorithm for spherical k-means clustering with penalties

verfasst von: Sai Ji, Dachuan Xu, Longkun Guo, Min Li, Dongmei Zhang

Erschienen in: Journal of Combinatorial Optimization | Ausgabe 3/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Spherical k-means clustering as a known NP-hard variant of the k-means problem has broad applications in data mining. In contrast to k-means, it aims to partition a collection of given data distributed on a spherical surface into k sets so as to minimize the within-cluster sum of cosine dissimilarity. In the paper, we introduce spherical k-means clustering with penalties and give a \(2\max \{2,M\}(1+M)(\ln k+2)\)-approximation algorithm. Moreover, we prove that when against spherical k-means clustering with penalties but on separable instances, our algorithm is with an approximation ratio \(2\max \{3,M+1\}\) with high probability, where M is the ratio of the maximal and the minimal penalty cost of the given data set.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ahmadian S, Norouzi-Fard A, Svensson O, Ward J (2017) Better guarantees for \(k\)-means and Euclidean \(k\)-median by primal–dual algorithms. In: Proceedings of the 58th annual IEEE symposium on foundations of computer science (FOCS), pp 61–72 Ahmadian S, Norouzi-Fard A, Svensson O, Ward J (2017) Better guarantees for \(k\)-means and Euclidean \(k\)-median by primal–dual algorithms. In: Proceedings of the 58th annual IEEE symposium on foundations of computer science (FOCS), pp 61–72
Zurück zum Zitat Aloise D, Deshpande A, Hansen P (2009) NP-hardness of Euclidean sum-of-squares clustering. Mach Learn 75(2):245–248CrossRef Aloise D, Deshpande A, Hansen P (2009) NP-hardness of Euclidean sum-of-squares clustering. Mach Learn 75(2):245–248CrossRef
Zurück zum Zitat Arthur D, Vassilvitskii S (2006) How slow is the k-means method? In: Proceedings of the 22th symposium on computational geometry (SoCG), pp 144-153 Arthur D, Vassilvitskii S (2006) How slow is the k-means method? In: Proceedings of the 22th symposium on computational geometry (SoCG), pp 144-153
Zurück zum Zitat Arthur D, Vassilvitskii S (2007) \(k\)-means++: the advantages of careful seeding, In: Proceedings of the 18th annual ACM-SIAM symposium on discrete algorithms (SODA), pp 1027–1035 Arthur D, Vassilvitskii S (2007) \(k\)-means++: the advantages of careful seeding, In: Proceedings of the 18th annual ACM-SIAM symposium on discrete algorithms (SODA), pp 1027–1035
Zurück zum Zitat Awasthi P, Charikar M, Krishnaswamy R, Sinop A (2015) The hardness of approximation of Euclidean \(k\)-means. In: Proceedings of the 31st symposium on computational geometry (SoCG), pp 754–767 Awasthi P, Charikar M, Krishnaswamy R, Sinop A (2015) The hardness of approximation of Euclidean \(k\)-means. In: Proceedings of the 31st symposium on computational geometry (SoCG), pp 754–767
Zurück zum Zitat Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable k-means++. Proc VLDB Endow 5(7):622–633 Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable k-means++. Proc VLDB Endow 5(7):622–633
Zurück zum Zitat Blömer J, Lammersen C, Schmidt M, Sohler C (2016) Theoretical analysis of the k-means algorithm – a survey. In: Kliemann L, Sanders P (eds) Algorithm engineering. Lecture notes in computer science, vol 9220. Springer, Cham, pp 81–116 Blömer J, Lammersen C, Schmidt M, Sohler C (2016) Theoretical analysis of the k-means algorithm – a survey. In: Kliemann L, Sanders P (eds) Algorithm engineering. Lecture notes in computer science, vol 9220. Springer, Cham, pp 81–116
Zurück zum Zitat Blömer J, Brauer S, Bujna K (2017) A theoretical analysis of the fuzzy \(k\)-means problem, In: Proceedings of the 16th IEEE international conference on data mining (ICDM), pp 805–810 Blömer J, Brauer S, Bujna K (2017) A theoretical analysis of the fuzzy \(k\)-means problem, In: Proceedings of the 16th IEEE international conference on data mining (ICDM), pp 805–810
Zurück zum Zitat Cohen-Addad V, Klein PN, Mathieu C (2019) Local search yields approximation schemes for \(k\)-means and \(k\)-median in Euclidean and minor-free metrics. SIAM J Comput 48(2):644–667MathSciNetCrossRef Cohen-Addad V, Klein PN, Mathieu C (2019) Local search yields approximation schemes for \(k\)-means and \(k\)-median in Euclidean and minor-free metrics. SIAM J Comput 48(2):644–667MathSciNetCrossRef
Zurück zum Zitat Dhillon I, Modha D (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42(1–2):143–175CrossRef Dhillon I, Modha D (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42(1–2):143–175CrossRef
Zurück zum Zitat Drineas P, Frieze A, Kannan R, Vempala V (2004) Clustering large graphs via the singular value decomposition. Mach Learn 56(1–3):9–33CrossRef Drineas P, Frieze A, Kannan R, Vempala V (2004) Clustering large graphs via the singular value decomposition. Mach Learn 56(1–3):9–33CrossRef
Zurück zum Zitat Endo Y, Miyamoto S (2015) Spherical \(k\)-means++ clustering. In: Proceedings of the 16th international conference on modeling decisions for artificial intelligence (MDAI), pp 103-114 Endo Y, Miyamoto S (2015) Spherical \(k\)-means++ clustering. In: Proceedings of the 16th international conference on modeling decisions for artificial intelligence (MDAI), pp 103-114
Zurück zum Zitat Gupta S, Kumar R, Lu K, Moseley B, Vassilvitskii S (2017) Local search methods for \(k\)-means with outliers. Proc VLDB Endow 10(7):757–768CrossRef Gupta S, Kumar R, Lu K, Moseley B, Vassilvitskii S (2017) Local search methods for \(k\)-means with outliers. Proc VLDB Endow 10(7):757–768CrossRef
Zurück zum Zitat Hornik K, Feinerer I, Kober M, Buchata M (2015) Spherical \(k\)-means clustering. J Stat Softw 50(10):1–22 Hornik K, Feinerer I, Kober M, Buchata M (2015) Spherical \(k\)-means clustering. J Stat Softw 50(10):1–22
Zurück zum Zitat Kanungo T, Mount D, Netanyahu N, Piatko C, Silverma R (2004) A local search approximation algorithm for \(k\)-means clustering. Comput Geom 28(2–3):89–112MathSciNetCrossRef Kanungo T, Mount D, Netanyahu N, Piatko C, Silverma R (2004) A local search approximation algorithm for \(k\)-means clustering. Comput Geom 28(2–3):89–112MathSciNetCrossRef
Zurück zum Zitat Li M, Xu D, Zhang D, Zou J (2019) The seeding algorithms for spherical \(k\)-means clustering. J Glob Optim 76(4): 695–708 Li M, Xu D, Zhang D, Zou J (2019) The seeding algorithms for spherical \(k\)-means clustering. J Glob Optim 76(4): 695–708
Zurück zum Zitat Li M, Xu D, Yue J, Zhang D, Zhang P (2020) The seeding slgorithm for \(k\)-means problem with penalties. J Comb Optim 39(1):15–32MathSciNetCrossRef Li M, Xu D, Yue J, Zhang D, Zhang P (2020) The seeding slgorithm for \(k\)-means problem with penalties. J Comb Optim 39(1):15–32MathSciNetCrossRef
Zurück zum Zitat Moriya T, Roth H, Nakamura S, Oda H, Kai N, Oda M (2018) Unsupervised pathology image segmentation using representation learning with spherical \(k\)-means. In: Proceeding SPIE 10581, Medical Imaging 2018: Digital Pathology, 1058111 Moriya T, Roth H, Nakamura S, Oda H, Kai N, Oda M (2018) Unsupervised pathology image segmentation using representation learning with spherical \(k\)-means. In: Proceeding SPIE 10581, Medical Imaging 2018: Digital Pathology, 1058111
Zurück zum Zitat Tunali V, Bilgin T, Camurcu A (2016) An improved clustering algorithm for text mining: multi-cluster spherical \(K\)-means. Int Arab J Inf Technol 13(1):12–19 Tunali V, Bilgin T, Camurcu A (2016) An improved clustering algorithm for text mining: multi-cluster spherical \(K\)-means. Int Arab J Inf Technol 13(1):12–19
Zurück zum Zitat Vattani A (2011) K-means requires exponentially many iterations even in the plane. Discrete Comput Geom 45(4):596–616MathSciNetCrossRef Vattani A (2011) K-means requires exponentially many iterations even in the plane. Discrete Comput Geom 45(4):596–616MathSciNetCrossRef
Zurück zum Zitat Xu J, Han J, Xiong K, Nie F (2016) Robust and sparse fuzzy \(k\)-means clustering. In: Proceedings 25th international joint conference on artificial intelligence (IJCAI), pp 2224–2230 Xu J, Han J, Xiong K, Nie F (2016) Robust and sparse fuzzy \(k\)-means clustering. In: Proceedings 25th international joint conference on artificial intelligence (IJCAI), pp 2224–2230
Zurück zum Zitat Xu D, Xu Y, Zhang D (2017) A survey on algorithm for \(k\)-means and its variants. Oper Res Trans 21:101–109 (in Chinese)MATH Xu D, Xu Y, Zhang D (2017) A survey on algorithm for \(k\)-means and its variants. Oper Res Trans 21:101–109 (in Chinese)MATH
Metadaten
Titel
The seeding algorithm for spherical k-means clustering with penalties
verfasst von
Sai Ji
Dachuan Xu
Longkun Guo
Min Li
Dongmei Zhang
Publikationsdatum
15.04.2020
Verlag
Springer US
Erschienen in
Journal of Combinatorial Optimization / Ausgabe 3/2022
Print ISSN: 1382-6905
Elektronische ISSN: 1573-2886
DOI
https://doi.org/10.1007/s10878-020-00569-1

Weitere Artikel der Ausgabe 3/2022

Journal of Combinatorial Optimization 3/2022 Zur Ausgabe

Premium Partner