Skip to main content
Top

2018 | OriginalPaper | Chapter

Ant Colony Based Fuzzy C-Means Clustering for Very Large Data

Authors : Dhruv Mullick, Ayush Garg, Arpit Bajaj, Ayush Garg, Swati Aggarwal

Published in: Advances in Fuzzy Logic and Technology 2017

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Fuzzy C-Means (FCM) is a popular technique for clustering of data. It combines the concepts of K-Means algorithm and Fuzzy set theory. However, FCM faces the challenges of running into a local optimal value, and of producing results which are sensitive to initialisation conditions. To solve these problems, there has been prior work which incorporates Ant Colony Optimisation (ACO) into the conventional FCM algorithm. The authors of this paper find that though the FCM-ACO algorithm is a definite improvement over the traditional FCM, there is still scope for improving the scalability and accuracy of the system. The authors propose using a Multi Round Sampling (MRS) technique along with Ant colony Optimisation. The proposed algorithm allows us to cluster the dataset without considering it entirely, hence allowing for a more space and time efficient system. This makes the system highly scalable and hence suitable for large datasets. Moreover, extensive experiments on several publicly available datasets, both large and small, prove that the proposed algorithm of Multi Round Sampling of Ant Colony based Fuzzy C-Means (MRSA-FCM) gives superior clustering results, over the FCM and FCM-ACO systems.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
2D15 clustering dataset can be found at [1].
 
2
MNIST clustering dataset can be found at [3].
 
3
Forest clustering dataset can be found at [2].
 
Literature
4.
go back to reference Ayed, A.B., Halima, M.B., Alimi, A.M.: Survey on clustering methods: towards fuzzy clustering for big data. In: 2014 6th International Conference on Soft Computing and Pattern Recognition (SoCPaR), pp. 331–336. IEEE (2014) Ayed, A.B., Halima, M.B., Alimi, A.M.: Survey on clustering methods: towards fuzzy clustering for big data. In: 2014 6th International Conference on Soft Computing and Pattern Recognition (SoCPaR), pp. 331–336. IEEE (2014)
5.
go back to reference Bezdek, J.C., Ehrlich, R., Full, W.: Fcm: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984)CrossRef Bezdek, J.C., Ehrlich, R., Full, W.: Fcm: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984)CrossRef
6.
go back to reference Bharill, N., Tiwari, A., Malviya, A.: Fuzzy based scalable clustering algorithms for handling big data using apache spark. IEEE Trans. Big Data 2(4), 339–352 (2016)CrossRef Bharill, N., Tiwari, A., Malviya, A.: Fuzzy based scalable clustering algorithms for handling big data using apache spark. IEEE Trans. Big Data 2(4), 339–352 (2016)CrossRef
7.
go back to reference Chen, C.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inform. Sci. 275, 314–347 (2014)CrossRef Chen, C.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inform. Sci. 275, 314–347 (2014)CrossRef
8.
go back to reference Cox, M., Ellsworth, D.: Managing big data for scientific visualization. ACM Siggraph. 97, 146–162 (1997) Cox, M., Ellsworth, D.: Managing big data for scientific visualization. ACM Siggraph. 97, 146–162 (1997)
9.
go back to reference Dorigo, M., Stützle, T.: The ant colony optimization metaheuristic: algorithms, applications, and advances. In: Handbook of Metaheuristics, pp. 250–285. Springer (2003) Dorigo, M., Stützle, T.: The ant colony optimization metaheuristic: algorithms, applications, and advances. In: Handbook of Metaheuristics, pp. 250–285. Springer (2003)
10.
go back to reference Dasgupta, S.: Experiments with random projection. In: Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pp. 143–151. Morgan Kaufmann Publishers Inc. (2000) Dasgupta, S.: Experiments with random projection. In: Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pp. 143–151. Morgan Kaufmann Publishers Inc. (2000)
11.
go back to reference Demchenko, Y., Grosso, P., De Laat, C., Membrey, P.: Addressing big data issues in scientific data infrastructure. In: 2013 International Conference on Collaboration Technologies and Systems (CTS), pp. 48–55. IEEE (2013) Demchenko, Y., Grosso, P., De Laat, C., Membrey, P.: Addressing big data issues in scientific data infrastructure. In: 2013 International Conference on Collaboration Technologies and Systems (CTS), pp. 48–55. IEEE (2013)
13.
go back to reference Dorigo, M., Stützle, T.: Ant colony optimization: overview and recent advances, pp. 227–263. Springer US, Boston (2010) Dorigo, M., Stützle, T.: Ant colony optimization: overview and recent advances, pp. 227–263. Springer US, Boston (2010)
14.
go back to reference Esogbue, A.O.: Optimal clustering of fuzzy data via fuzzy dynamic programming. Fuzzy Sets Syst. 18(3), 283–298 (1986)CrossRefMATH Esogbue, A.O.: Optimal clustering of fuzzy data via fuzzy dynamic programming. Fuzzy Sets Syst. 18(3), 283–298 (1986)CrossRefMATH
15.
go back to reference Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A.Y., Foufou, S., Bouras, A.: A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans. Emer. Topics Comput. 2(3), 267–279 (2014)CrossRef Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A.Y., Foufou, S., Bouras, A.: A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans. Emer. Topics Comput. 2(3), 267–279 (2014)CrossRef
16.
go back to reference Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. ICML 3, 186–193 (2003) Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. ICML 3, 186–193 (2003)
17.
go back to reference Flood, M.D., Jagadish, H., Raschid, L., et al.: Big data challenges and opportunities in financial stability monitoring. Financ. Stab. Rev. 20, 129–142 (2016) Flood, M.D., Jagadish, H., Raschid, L., et al.: Big data challenges and opportunities in financial stability monitoring. Financ. Stab. Rev. 20, 129–142 (2016)
18.
go back to reference Havens, T.C., Bezdek, J.C., Leckie, C., Hall, L.O., Palaniswami, M.: Fuzzy c-means algorithms for very large data. IEEE Trans. Fuzzy Syst. 20(6), 1130–1146 (2012)CrossRef Havens, T.C., Bezdek, J.C., Leckie, C., Hall, L.O., Palaniswami, M.: Fuzzy c-means algorithms for very large data. IEEE Trans. Fuzzy Syst. 20(6), 1130–1146 (2012)CrossRef
19.
go back to reference Hung, M.C., Yang, D.L.: An efficient fuzzy c-means clustering algorithm. In: 2001 Proceedings IEEE International Conference on Data Mining, ICDM 2001, pp. 225–232. IEEE (2001) Hung, M.C., Yang, D.L.: An efficient fuzzy c-means clustering algorithm. In: 2001 Proceedings IEEE International Conference on Data Mining, ICDM 2001, pp. 225–232. IEEE (2001)
20.
go back to reference Joshi, K.R., Patil, S.S.: Processing big data applications using mapreduce. J. Data Min. Manage. 1(2) (2016) Joshi, K.R., Patil, S.S.: Processing big data applications using mapreduce. J. Data Min. Manage. 1(2) (2016)
21.
go back to reference Kitchin, R., McArdle, G.: What makes big data, big data? exploring the ontological characteristics of 26 datasets. Big Data Soc. 3(1), 2053951716631130 (2016)CrossRef Kitchin, R., McArdle, G.: What makes big data, big data? exploring the ontological characteristics of 26 datasets. Big Data Soc. 3(1), 2053951716631130 (2016)CrossRef
22.
go back to reference Krishna, K., Murty, M.N.: Genetic k-means algorithm. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 29(3), 433–439 (1999)CrossRef Krishna, K., Murty, M.N.: Genetic k-means algorithm. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 29(3), 433–439 (1999)CrossRef
23.
go back to reference Laney, D.: 3D data management: controlling data volume, velocity and variety. META Group Res. Note 6, 70 (2001) Laney, D.: 3D data management: controlling data volume, velocity and variety. META Group Res. Note 6, 70 (2001)
24.
go back to reference Liang, G.S., Chou, T.Y., Han, T.C.: Cluster analysis based on fuzzy equivalence relation. Europ. J. Oper. Res. 166(1), 160–171 (2005)MathSciNetCrossRefMATH Liang, G.S., Chou, T.Y., Han, T.C.: Cluster analysis based on fuzzy equivalence relation. Europ. J. Oper. Res. 166(1), 160–171 (2005)MathSciNetCrossRefMATH
25.
go back to reference Liao, T., Sttzle, T., Montes de Oca, M., Dorigo, M.: A unified ant colony optimization algorithm for continuous optimization. Europ. J. Oper. Res. 234(3), 597–609 (2014)MathSciNetCrossRefMATH Liao, T., Sttzle, T., Montes de Oca, M., Dorigo, M.: A unified ant colony optimization algorithm for continuous optimization. Europ. J. Oper. Res. 234(3), 597–609 (2014)MathSciNetCrossRefMATH
26.
go back to reference Liu, W., Jiang, L.: A clustering algorithm FCM-ACO for supplier base management. In: Proceedings of the 6th International Conference on Advanced Data Mining and Applications: Part I, ADMA 2010, pp. 106–113. Springer, Heidelberg (2010) Liu, W., Jiang, L.: A clustering algorithm FCM-ACO for supplier base management. In: Proceedings of the 6th International Conference on Advanced Data Mining and Applications: Part I, ADMA 2010, pp. 106–113. Springer, Heidelberg (2010)
27.
go back to reference Ludwig, S.A.: Mapreduce-based fuzzy c-means clustering algorithm: implementation and scalability. Int. J. Mach. Learn. Cybern. 6(6), 923–934 (2015)CrossRef Ludwig, S.A.: Mapreduce-based fuzzy c-means clustering algorithm: implementation and scalability. Int. J. Mach. Learn. Cybern. 6(6), 923–934 (2015)CrossRef
28.
go back to reference Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefMATH Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefMATH
29.
go back to reference Maulik, U., Bandyopadhyay, S.: Genetic algorithm-based clustering technique. Pattern Recogn. 33(9), 1455–1465 (2000)CrossRef Maulik, U., Bandyopadhyay, S.: Genetic algorithm-based clustering technique. Pattern Recogn. 33(9), 1455–1465 (2000)CrossRef
30.
go back to reference Ramage, D., Heymann, P., Manning, C.D., Garcia-Molina, H.: Clustering the tagged web. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pp. 54–63. ACM (2009) Ramage, D., Heymann, P., Manning, C.D., Garcia-Molina, H.: Clustering the tagged web. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pp. 54–63. ACM (2009)
31.
go back to reference Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)CrossRef Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)CrossRef
32.
go back to reference del Río, S., Lopez, V., Benítez, J.M., Herrera, F.: A mapreduce approach to address big data classification problems based on the fusion of linguistic fuzzy rules. Int. J. Comput. Intell. Syst. 8(3), 422–437 (2015)CrossRef del Río, S., Lopez, V., Benítez, J.M., Herrera, F.: A mapreduce approach to address big data classification problems based on the fusion of linguistic fuzzy rules. Int. J. Comput. Intell. Syst. 8(3), 422–437 (2015)CrossRef
33.
go back to reference Runkler, T.A.: Ant colony optimization of clustering models. Int. J. Intell. Syst. 20(12), 1233–1251 (2005)CrossRefMATH Runkler, T.A.: Ant colony optimization of clustering models. Int. J. Intell. Syst. 20(12), 1233–1251 (2005)CrossRefMATH
34.
35.
go back to reference Santos, J.M., Embrechts, M.: On the use of the adjusted rand index as a metric for evaluating supervised classification. In: International Conference on Artificial Neural Networks, pp. 175–184. Springer (2009) Santos, J.M., Embrechts, M.: On the use of the adjusted rand index as a metric for evaluating supervised classification. In: International Conference on Artificial Neural Networks, pp. 175–184. Springer (2009)
36.
go back to reference Shirkhorshidi, A.S., Aghabozorgi, S., et al.: Big data clustering: a review, pp. 707–720. Springer International Publishing, Cham (2014) Shirkhorshidi, A.S., Aghabozorgi, S., et al.: Big data clustering: a review, pp. 707–720. Springer International Publishing, Cham (2014)
38.
go back to reference Sun, J., Xie, Y., Zhang, H., Faloutsos, C.: Less is more: compact matrix decomposition for large sparse graphs. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 366–377. SIAM (2007) Sun, J., Xie, Y., Zhang, H., Faloutsos, C.: Less is more: compact matrix decomposition for large sparse graphs. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 366–377. SIAM (2007)
39.
go back to reference Tong, H., Kang, U.: Big data clustering (2013) Tong, H., Kang, U.: Big data clustering (2013)
40.
go back to reference Tong, H., Papadimitriou, S., Sun, J., Yu, P.S., Faloutsos, C.: Colibri: fast mining of large static and dynamic graphs. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 686–694. ACM (2008) Tong, H., Papadimitriou, S., Sun, J., Yu, P.S., Faloutsos, C.: Colibri: fast mining of large static and dynamic graphs. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 686–694. ACM (2008)
41.
go back to reference Wu, Z., Leahy, R.: An optimal graph theoretic approach to data clustering: theory and its application to image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 15(11), 1101–1113 (1993)CrossRef Wu, Z., Leahy, R.: An optimal graph theoretic approach to data clustering: theory and its application to image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 15(11), 1101–1113 (1993)CrossRef
42.
go back to reference Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)CrossRef Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)CrossRef
43.
go back to reference Yeung, K.Y., Ruzzo, W.L.: Details of the adjusted rand index and clustering algorithms, supplement to the paper an empirical study on principal component analysis for clustering gene expression data. Bioinformatics 17(9), 763–774 (2001)CrossRef Yeung, K.Y., Ruzzo, W.L.: Details of the adjusted rand index and clustering algorithms, supplement to the paper an empirical study on principal component analysis for clustering gene expression data. Bioinformatics 17(9), 763–774 (2001)CrossRef
Metadata
Title
Ant Colony Based Fuzzy C-Means Clustering for Very Large Data
Authors
Dhruv Mullick
Ayush Garg
Arpit Bajaj
Ayush Garg
Swati Aggarwal
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-66824-6_51

Premium Partner