nach oben

Soft Computing

Erschienen in:

12.05.2015 | Methodologies and Application

A study of large-scale data clustering based on fuzzy clustering

verfasst von: Yangyang Li, Guoli Yang, Haiyang He, Licheng Jiao, Ronghua Shang

Erschienen in: Soft Computing | Ausgabe 8/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Large-scale data are any data that cannot be loaded into the main memory of the ordinary. This is not the objective definition of large-scale data, but it is easy to understand what the large-scale data is. We first introduce some present algorithms to clustering large-scale data, some data stream clustering algorithms based on FCM algorithms are also introduced. In this paper, we propose a new structure to cluster large-scale data and two new data stream clustering algorithms based on the structure are propose in Sects. 3 and 4. In our method, we load the objects in the dataset one by one. We set a threshold of the membership, if the membership of one object and a cluster center is bigger than the threshold, the object is assigned to the cluster and the location of nearest cluster center will be updated, else the object is put into the temporary matrix; we call it pool. When the pool is full, we cluster the data in the pool and update the location of cluster centers. The two algorithms are based on the data stream structure. The difference of the two algorithms is the how the objects in the data are weighed. We test our algorithms on handwritten digits images dataset and several large-scale UCI datasets and make a comparison with some presented algorithms. The experiments proved that our algorithm is more suitable to cluster large-scale datasets.

Vorheriger Artikel Evolved intelligent clustered bee colony for voltage stability prediction on power transmission system

Nächster Artikel Cloud-based electronic health record system supporting fuzzy keyword search

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Angluin D, Laird P (1988) Learning from noisy examples. Mach Learn, pp 343–370. doi:10.1007/BF00116829

Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Soc Ind Appl Math, pp 1027–1035. http://dl.acm.org/citation.cfm?id=1283494

Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum, New York. doi:10.1007/978-1-4757-0450-1

Bradley PS, Fayyad UM, Reina C (1998) Scaling clustering algorithms to large databases. KDD. 1998: 9–15. http://www.aaai.org/Library/KDD/1998/kdd98-002.php

Cannon R, Dave J, Bezdek JC (1986) Efficient implementation of fuzzy c-means algorithm. IEEE Tans Patten Anal March Intell PAMI–8(2):248–255. doi:10.1109/TPAMI.1986.4767778 CrossRefMATH

Cheng T, Goldgof D, Hall L (1995) Fast clustering with application to fuzzy rule generation. In: Proceedings of IEEE international conference fuzzy system, Tokyo, Japan, pp 2289–2295. doi:10.1109/FUZZY.1995.409998

Chu C, Kim SK, Lin YA (2007) Map-reduce for machine learning on multicore. Adv Neural Inf Process Syst, 19: 281. http://papers.nips.cc/paper/3150-map-reduce-for-machine-learning-on-multicore

Duda RO, Peter EH, D GS (1999) attern classification. Wiley, New York. http://as.wiley.com/WileyCDA/WileyTitle/productCd-0471056693.html

Edelstein HA (1999) Introduction to data mining and knowledge discovery. 3rd Edition, Crows Corporation, Potomac. Two Crows Corporation. ISBN:1-892095-02-5. http://www.twocrows.com/intro-dm.pdf

Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithm for large databases. In: Proceedings of ACM-SIGMOD international conference management of data (SIGMOD’ 98), ACM Press. New York, pp 73–84. doi:10.1016/S0306-4379(01)00008-4

Han JW, Micheline K, Jian P (2011) Data mining: concepts and techniques. The Morgan Kaufmann series in data management systems. July 2011. ISBN: 978-0123814791. http://web.engr.illinois.edu/~hanj

Hansen H, Jaumard B (1997) Cluster analysis and mathematical programming. Math Program 79:191–215. doi:10.1007/BF02614317 MathSciNetMATH

Henzinger MR, Raghavan P, Rajagopalan S (1998) Computing on data streams, SRC technical notes. http://www.eecs.harvard.edu/~michaelm/E210/datastreams.pdf

Hathaway RJ, Bezdek JC (2006) Extending fuzzy and probabilistic clustering to very large data sets. Comput Stat Data Anal 51(1):215–234. doi:10.1016/j.csda.2006.02.008 MathSciNetCrossRefMATH

Hartigan JA, Wong MA (1979) A k-means clustering algorithm. Appl Stat 28:100–108. doi:10.2307/2346830 CrossRefMATH

Hore P, Hall LO, Goldgof DB (2007) Single pass fuzzy c means. IEEE international fuzzy systems conference, Imperial College, London, UK, 23–26 July, 2007, Proceedings pp 1–7. doi:10.1109/FUZZY.2007.4295372

Hore P, Hall LO, Goldgof DB (2009) A scalable framework for segmenting magnetic resonance images. J Signal Process Syst 54(1–3):183–203. doi:10.1007/s11265-008-0243-1 CrossRef

Huber PJ (1996) Massive data sets workshop: the morning after[C] Massive data sets. In: Proceedings of a workshop. National Academy Press, Washington, DC. http://www.nap.edu/openbook.php?record_id=5505&page=169

Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis. Wiley, New York. doi:10.1002/9780470316801

Kolen J, Hutcheson T (2002) Reducing the time complexity of fuzzy c-mean algorithm. IEEE Tans Fuzzy Syst 10(2):263–267. doi:10.1109/91.995126 CrossRef

Ng RT, Han J (2002) CLARANS: a method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng, 14(5), pp 1003–1016. doi:10.1109/TKDE.2002.1033770

Richard OD (2008) Sequential k-means clustering. http://www.cs.princeton.edu/courses/archive/fall08/cos436/Duda/C/sk_means.htm

Shankar BU, Pal NR FFCM (1994) An effective approach for large data sets. In: Proceedings of international conference fuzzy logic neural nets soft comput., Fukuoka, Japan, pp 332. http://www.researchgate.net/publication/246178981_Ffcm_An_effective_approach_for_large_data_sets

Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Burlington. doi:10.1145/507338.507355

Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: Proceedings ACM SIGMOD conference, Montreal, Canada, pp 103–114. doi:10.1145/233269.233324

Zhong S (2005) Efficient online spherical k-means clustering. Neural Networks, IJCNN’05. Proceedings. IEEE international joint conference, 5: 3180-3185. doi:10.1109/IJCNN.2005.1556436

Titel: A study of large-scale data clustering based on fuzzy clustering
verfasst von: Yangyang Li
Guoli Yang
Haiyang He
Licheng Jiao
Ronghua Shang
Publikationsdatum: 12.05.2015
Verlag: Springer Berlin Heidelberg
Erschienen in: Soft Computing / Ausgabe 8/2016
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI: https://doi.org/10.1007/s00500-015-1698-1

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 8/2016

Artificial bee colony algorithm for clustering: an extreme learning approach

Term frequency with average term occurrences for textual information retrieval

Kernel-based linear classification on categorical data

Maximal limited similarity-based rough set model

Quantum-inspired multi-objective optimization evolutionary algorithm based on decomposition

Special issue on computational intelligence algorithms and applications

Premium Partner