nach oben

Erschienen in:

2019 | OriginalPaper | Buchkapitel

A High Performance Modified K-Means Algorithm for Dynamic Data Clustering in Multi-core CPUs Based Environments

verfasst von : Giuliano Laccetti, Marco Lapegna, Valeria Mele, Diego Romano

Erschienen in: Internet and Distributed Computing Systems

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

K-means algorithm is one of the most widely used methods in data mining and statistical data analysis to partition several objects in K distinct groups, called clusters, on the basis of their similarities. The main problel and distributed clustering algorithms start to be designem of this algorithm is that it requires the number of clusters as an input data, but in the real life it is very difficult to fix in advance such value. In this work we propose a parallel modified K-means algorithm where the number of clusters is increased at run time in a iterative procedure until a given cluster quality metric is satisfied. To improve the performance of the procedure, at each iteration two new clusters are created, splitting only the cluster with the worst value of the quality metric. Furthermore, experiments in a multi-core CPUs based environment are presented.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel In-network Hebbian Plasticity for Wireless Sensor Networks

Nächstes Kapitel Overcoming GPU Memory Capacity Limitations in Hybrid MPI Implementations of CFD

Abubaker, M., Ashour, W.M.: Efficient data clustering algorithms: improvements over K-means. Int. J. Intell. Syst. Appl. 5, 37–49 (2013)

Aggarwal, C.C., Reddy, C.K.: Data Clustering, Algorithms and Applications. Chapman and Hall/CRC, London (2013)CrossRef

Andrade, G., Ramos, G., Madeira, D., Sachetto, R., Ferreira, R., Rocha, L.: G-DBSCAN: a GPU accelerated algorithm for density-based clustering. Procedia Comput. Sci. 18, 369–378 (2013)CrossRef

Boccia, V., Carracciuolo, L., Laccetti, G., Lapegna, M., Mele, V.: HADAB: enabling fault tolerance in parallel applications running in distributed environments. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011. LNCS, vol. 7203, pp. 700–709. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31464-3_71CrossRef

Caruso, P., Laccetti, G., Lapegna, M.: A performance contract system in a grid enabling, component based programming environment. In: Sloot, P.M.A., Hoekstra, A.G., Priol, T., Reinefeld, A., Bubak, M. (eds.) EGC 2005. LNCS, vol. 3470, pp. 982–992. Springer, Heidelberg (2005). https://doi.org/10.1007/11508380_100CrossRef

D’Ambra, P., Danelutto, M., di Serafino, D., Lapegna, M.: Advanced environments for parallel and distributed applications: a view of the current status. Parallel Comput. 28, 1637–1662 (2002)CrossRef

D’Ambra, P., Danelutto, M., di Serafino, D., Lapegna, M.: Integrating MPI-based numerical software into an advanced parallel computing environment. In: Proceedings of the Eleventh Euromicro Conference on Parallel Distributed and Network-based Procesing, Clematis ed., pp. 283–291. IEEE (2003)

D’Apuzzo, M., Lapegna, M., Murli, A.: Scalability and load balancing in adaptive algorithms for multidimensional integration. Parallel Comput. 23, 1199–1210 (1997)MathSciNetCrossRef

Di Fatta, G., Blasa, F., Cafiero, S., Fortino, G.: Fault tolerant decentralised K-means clustering for asynchronous large-scale networks. J. Parallel Distrib. Comput. 73(2013), 317–329 (2013)CrossRef

10.

Dua, D., Graff, C.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2017). http://archive.ics.uci.edu/ml

11.

Frey, P.W., Slate, D.J.: Letter recognition using Holland-style adaptive classifiers. Mach. Learn. 6, 161–182 (1991)

12.

Gan, D.G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms, and Applications. ASA-SIAM Series on Statistics and Applied Probability. SIAM, Philadelphia. ASA, Alexandria (2007)

13.

Gregoretti, F., Laccetti, G., Murli, A., Oliva, G., Scafuri, U.: MGF: a grid-enabled MPI library. Future Gener. Comput. Syst. 24, 158–165 (2008)CrossRef

14.

He, Y., Tan, H., Luo, W., Feng, S., Fan, J.: MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data. Front. Comput. Sci. 8, 83–99 (2014)MathSciNetCrossRef

15.

Huang, Z.X.: Extensions to the K-means algorithm for clustering large datasets with categorical values. Data Min. Knowl. Disc. 2, 283–304 (1998)CrossRef

16.

Joshi, A., Kaur, R.: A review: comparative study of various clustering techniques in data mining. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3, 55–57 (2013)

17.

Karypis, G., Kumar, V.: Parallel multilevel K-way partitioning for irregular graphs. SIAM Rev. 41, 278–300 (1999)MathSciNetCrossRef

18.

Laccetti, G., Lapegna, M.: PAMIHR. a parallel FORTRAN program for multidimensional quadrature on distributed memory architectures. In: Amestoy, P., et al. (eds.) Euro-Par 1999. LNCS, vol. 1685, pp. 1144–1148. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48311-X_160CrossRef

19.

Laccetti, G., Lapegna, M., Mele, V., Montella, R.: An adaptive algorithm for high-dimensional integrals on heterogeneous CPUGPU systems. Concurr. Comput. Pract. Exp. 31, e4945 (2018)

20.

Laccetti, G., Lapegna, M., Mele, V., Romano, D., Murli, A.: A double adaptive algorithm for multidimensional integration on multicore based HPC systems. Int. J. Parallel Program. 40, 397–409 (2012)CrossRef

21.

Laccetti, G., Lapegna, M., Mele, V., Romano, D.: A study on adaptive algorithms for numerical quadrature on heterogeneous GPU and multicore based systems. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2013. LNCS, vol. 8384, pp. 704–713. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-55224-3_66CrossRef

22.

Laccetti, G., Lapegna, M., Mele, V.: A loosely coordinated model for heap-based priority queues in multicore environments. Int. J. Parallel Prog. 44, 901–921 (2016)CrossRef

23.

Lapegna, M.: A global adaptive quadrature for the approximate computation of multidimensional integrals on a distributed memory multiprocessor. Concurr. Pract. Exp. 4, 413–426 (1992)CrossRef

24.

Patibandla, R.S.M.L., Veeranjaneyulu, N.: Survey on clustering algorithms for unstructured data. In: Bhateja, V., Coello Coello, C.A., Satapathy, S.C., Pattnaik, P.K. (eds.) Intelligent Engineering Informatics. AISC, vol. 695, pp. 421–429. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-7566-7_41CrossRef

25.

Pelleg, D., Moore, A.W.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the 17th International Conference on Machine Learning, pp. 727–734. Morgan Kaufmann (2000)

26.

Pena, J.M., Lozano, J.A., Larranaga, P.: An empirical comparison of four initialization methods for the K-means algorithm. Pattern Recogn. Lett. 20, 1027–1040 (1999)CrossRef

27.

Shindler, M., Wong, A., Meyerson, A.: Fast and accurate k-means for large datasets. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.): Proceedings of 25th Annual Conference on Neural Information Processing Systems, pp. 2375–2383 (2011)

28.

Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y., Herawan, T.: Big data clustering: a review. In: Murgante, B., et al. (eds.) ICCSA 2014. LNCS, vol. 8583, pp. 707–720. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09156-3_49 CrossRef

29.

Xu, D., Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data Sci. 2, 165–193 (2015)CrossRef

30.

Xu, R., Wunsch, D.: Survey of clustering algorithms. Trans. Neural Netw. 16, 645–678 (2005)CrossRef

31.

Zhao, W., Ma, H., He, Q.: Parallel K-means clustering based on MapReduce. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) CloudCom 2009. LNCS, vol. 5931, pp. 674–679. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10665-1_71CrossRef

Titel: A High Performance Modified K-Means Algorithm for Dynamic Data Clustering in Multi-core CPUs Based Environments
verfasst von: Giuliano Laccetti
Marco Lapegna
Valeria Mele
Diego Romano
Verlag: Springer International Publishing
Buch: Internet and Distributed Computing Systems
Print ISBN: 978-3-030-34913-4

Electronic ISBN: 978-3-030-34914-1

Copyright-Jahr: 2019
DOI: https://doi.org/10.1007/978-3-030-34914-1_9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner