nach oben

International Journal of Machine Learning and Cybernetics

Erschienen in:

01.04.2013 | Original Article

A hybrid approach to speed-up the k-means clustering method

verfasst von: T. Hitendra Sarma, P. Viswanath, B. Eswara Reddy

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 2/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

k-means clustering method is an iterative partition-based method which for finite data-sets converges to a solution in a finite time. The running time of this method grows linearly with respect to the size of the data-set. Many variants have been proposed to speed-up the conventional k-means clustering method. In this paper, we propose a prototype-based hybrid approach to speed-up the k-means clustering method. The proposed method, first partitions the data-set into small clusters (grouplets), which are of varying sizes. Each grouplet is represented by a prototype. Later, the set of prototypes is partitioned into k clusters using the modified k-means method. The modified k-means clustering method is similar to the conventional k-means method but it avoids empty clusters (the clusters to which no pattern is assigned) in the iterative process. In each cluster of prototypes, each prototype is replaced by its corresponding set of patterns (which formed the grouplet) to derive a partition of the data-set. Since this partition of the data-set can deviate from the partition obtained using the conventional k-means method over the entire data-set, a correcting step is proposed. Both theoretically and experimentally, the conventional k-means method and the proposed hybrid method (augmented with the correcting step) are shown to yield the same result (provided, the initial k seed points are same). But, the proposed method is much faster than the conventional one. Experimentally, the proposed method is compared with the conventional method and the other recent methods that are proposed to speed-up the k-means method.

Vorheriger Artikel DE/isolated/1: a new mutation operator for multimodal optimization with differential evolution

Nächster Artikel From Gaussian kernel density estimation to kernel methods

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.

Order your 30-days-trial for free and without any commitment.

Jetzt informieren

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

Jetzt informieren

For the sake of simplicity, we assume that the patterns are from a Euclidean space and Euclidean distance is used, whereas the proposed methods are applicable with any distance metric.

Almedia MB, Braga A, Braga JP (2000) Svm-km: speeding svms learning with a priory cluster selection and k-means. In: Proceedings of the sixth Brazilian Symposium on Neural Networks 162–167

Alsabti K, Ranka S, Singh V (1998) An efficient k-means clustering algorithm. In: Proceedings of First Workshop High Performance Data Mining (March 1998)

Ananthanarayana V, Murty M, Subramanian D (2001) An incremental data mining algorithm for compact realization of prototypes. Pattern Recognit 34:2249–2251MATHCrossRef

Babu TR, Murty MN (2001) Comparison of genetic algorithms based prototype selection schemes. Pattern Recognit 34:523–525CrossRef

Berkhin P (2002) Survey of clustering data mining techniques. Technical Report, Accure Software

Bidyut Kr. Patra, Sukumar Nandi, Viswanath P (2011) A distance based clustering method for arbitrary shaped clusters in large data-sets. Pattern Recognit 44:2862–2870

Bottou L, Bengio Y (1995) Convergence properties of the k-means algorithms. In: Advances in Neural Information Processing Systems 7, MIT Press, Cambridge, pp. 585–592

Bradley PS, Fayyad U, Raina C (1998) Scaling clustering algorithms to large databases. In: Proceedings of Fourth International Conference on Knowledge Discovery and Data Mining, AAAI Press, pp 9–15

Chitta R, Murthy MN (2010) Two-level k-means clustering algorithm for k-\(\tau\) relationship establishment and linear-time classification. Pattern Recognit 43:796–804MATHCrossRef

10.

Davidson I, Satyanarayana A (2004) Speeding up k-means clustering by bootstrap averaging. IEEE ICDM

11.

Domingos P, Hulten G (2001) A general method for scaling up machine learning algorithms and its application to clustering. Eighteenth International Conference on Machine Learning

12.

Farnstrom F, Lewis J, Elkan C (2000) Scalability for clustering algorithms revisited. SIGKDD Explor 2(1):51–57CrossRef

13.

Gereon Frahling, Christian Sohler (2006) A fast k-means implementation using coresets. In: Proceedings of the twenty-second annual symposium on Computational geometry (SCG ’06), pp. 135–143. ACM Press, New York, USA

14.

Gongde Guo, Si Chen, Lifei Chen (2011) Soft subspace clustering with an improved feature weight self-adjustment mechanism. Int J Mach Learn Cybern. doi:10.1007/s13042-011-0038

15.

Grunbaum B (2003) Convex Polytopes, 2nd edn. Springer, New YorkCrossRef

16.

Guha S, Rastogi R, Shim K (1998) Cure:an efficient clustering algorithm for large databases. In: Proceedings of Conference Management of Data (ACM SIGMOD’98), pp 73–84

17.

Han J, Kamber M (2000) Data mining: concepts and techniques. 2nd edn. Morgan Kaufmann, USA

18.

Hartigan JA (1975) Clustering algorithms. Wiley, New YorkMATH

19.

Jain A, Dubes R (1988) Algorithms for clustering data. Prentice Hall, Englewood CliffsMATH

20.

Jain A, Murthy MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323CrossRef

21.

Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666CrossRef

22.

Jain AK, Duin P, Mao J (2000) Statistical pattern recognition: A review. IEEE Transact Pattern Anal Mach Intell 22(1):4–37CrossRef

23.

Liang J, Song W (2011) Clustering based on steiner points. Int J Mach Learn Cybern. doi:10.1007/s13042-011-0047-7

24.

Abdul Nazeer KA, Sebastian MP (2009) Improving the accuracy and efficiency of the k-means clustering algorithm. In: Proceedings of the World Congress on Engineering 2009 vol I. London, U.K.

25.

Kanungo T, Mount D, Netanyahu N, Piatko C, Silverman R, Wu A The analysis of a simple k-means clustering algorithm. In: Proceedings of 16th Annual ACM Symposium Computational Geometry, pp 100–109 (June 2000)

26.

Kanungo T, Mount DM, Netanyahu NS, christine D Piatko, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transact Pattern Anal Mach Intell 24(7):881–892CrossRef

27.

Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New YorkCrossRef

28.

Krishna K, Murty M (1999) Genetic k-means algorithm. IEEE Transact Syst Man Cybern Part B Cybern 29(3):433–439CrossRef

29.

Lloyd SP (1982) Least squares quantization in PCM. IEEE Transact Inf Theory 28:129–137MathSciNetMATHCrossRef

30.

Lu Y, Lu S, Fotouhi F, Deng Y, Brown SJ (2004) Fgka: A fast genetic k-means clustering algorithm. In: Proceedings of ACM Symposium on Applied Computing. pp 622–623

31.

MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, University of California Press, Berkeley, pp. 281–297

32.

Pakhira MK (2009) A modified k-means algorithm to avoid empty clusters. Int J Recent Trends Eng 1(1):220–226

33.

parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor Newsl 6(1): 90–105CrossRef

34.

Pelleg D, Moore A Accelerating exact k-means algorithms with geometric reasoning. In: Proceedings ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 277–281 (Aug 1999)

35.

Pelleg D, Moore A (2000) x-means: Extending k-means with efficient estimation of the number of clusters. In: Proceedings of the 17th International Conference on Machine Learning (July 2000)

36.

Murphy PM (1994) UCI Repository of Machine Learning Databases (http://www.ics.uci.edu/mlearn/MLRepository.html]. Department of Information and Computer Science, University of California, Irvine

37.

Jin R, Goswami A, Agarwal G (2006) Fast and exact out-of-core and distributed k-means clustering. Knowl Inf Syst 10(1):17–40CrossRef

38.

Fahim AM, Salem AM, Torkey A, Ramadan MA (2006) An efficient enhanced k-means clustering algorithm. J Zhejiang Univ 10(7):1626–1633CrossRef

39.

Na S, Xumin L, Yong G (2010) Research on k-means clustering algorithm: an improved k-means clustering algorithm. In: Third International Symposium on Intelligent Information Technology and Security Informatics (IITSI), pp 63–67 (April 2010)

40.

Spath H (1980) Cluster analysis algorithms for data reduction and classification. Ellis Horwood, Chichester

41.

Phillips SJ (2002) Acceleration of k-means and related clustering algorithms. In: Proceedings of Algorithms Engineering and Experiments(ALENEX02), pp. 166–177. Springer, Berlin

42.

Hitendra Sarma T, Viswanath P (2009) Speeding-up the k-means clustering method: A prototype based approach. In: Proceedings of 3rd International Conference on Pattern Recognition and Machine Intelligence(PReMI)LNCS 5909, pp. 56–61. Springer, Berlin

43.

Vijaya P, Murty MN, Subramanian DK (2004) Leaders-subleaders: an efficient hierarchical clustering algorithm for large data sets. Pattern Recognit Lett 25:505–513CrossRef

44.

Viswanath P, Pinkesh R (2006) l-dbscan : a fast hybrid density based clustering method. In: Proceedings of the 18th Intl. Conf. on Pattern Recognition (ICPR-06), vol. 1, pp. 912–915. IEEE Computer Society, Hong Kong

45.

Viswanath P, Suresh Babu V (2009) Rough-DBSCAN : a fast hybrid density based clustering mehtod for large data sets. Pattern Recognition latters (2009), doi:10.1016/j.patrec.2009.08.008

46.

Vitter J (1985) Random sampling with a reservoir. ACM Transact Math Softw 11(1):37–57MathSciNetMATHCrossRef

47.

Wu X, Kumar V, Ross Quinlan J, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu P (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37CrossRef

48.

Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Transact Neural Netw 16(3):645–678CrossRef

Titel: A hybrid approach to speed-up the k-means clustering method
verfasst von: T. Hitendra Sarma
P. Viswanath
B. Eswara Reddy
Publikationsdatum: 01.04.2013
Verlag: Springer-Verlag
Erschienen in: International Journal of Machine Learning and Cybernetics / Ausgabe 2/2013
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI: https://doi.org/10.1007/s13042-012-0079-7

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Buchstaben, die aus einem Megaphon kommen/© MicroStockHub/Getty Images/iStock, Digitale Lieferkette/© zapp2photo / stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

ATZelectronics worldwide

ATZelektronik

Weitere Artikel der Ausgabe 2/2013

Modeling, simulation and design optimization of a hoisting rig active heave compensation system

Effects of artificially intelligent tools on pattern recognition

From Gaussian kernel density estimation to kernel methods

Weighted preferences in evolutionary multi-objective optimization

Filtering financial time series by least squares

DE/isolated/1: a new mutation operator for multimodal optimization with differential evolution

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.