Skip to main content
Top

2022 | OriginalPaper | Chapter

An Effective Machine Learning Approach for Clustering Categorical Data with High Dimensions

Authors : Syed Umar, Tadele Debisa Deressa, Tariku Birhanu Yadesa, Gemechu Boche Beshan, Endal Kachew Mosisa, Nilesh T. Gole

Published in: Artificial Intelligence and Speech Technology

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Many modern real world databases include redundant quantities of categorical data that contribute in data processing and efficient decision-making with their advances in database technology. However, for the reasons that they are identical to measurements the clustering algorithms are only devised for numerical results. An immense amount of work is being performed on the clustering of categorical data using a specifically defined similarity measure over categorical data. Thereby, the dynamic issue with real-world domain, which does not clearly take the predictive form, is the inner function. The function is based on both unseen and transonic perspective. This then offers a detailed and inventive collaboration with categorical results. The paper describes a stratified, immune-based approach with a new similarity metric, in order to reduce distance function, for clustering CAIS categorical data. For successful exploration of clusters over categorical results, CAIS adopts an immunology focused approach. It also selects subsistent nomadic characteristics as a representative entity and organize them into clusters that quantify affinity. To minimize database throughput, CAIS is segmented into several attributes. The analytical findings show that the proposed solution yields greater mining performance on different categorical datasets and outperforms EM on categorical datasets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Zhao, Z., Wang, L., Liu, H., Ye, J.: On similarity preserving feature selection. IEEE Trans. Knowl. Data Eng. 25(3), 619–632 (2013)CrossRef Zhao, Z., Wang, L., Liu, H., Ye, J.: On similarity preserving feature selection. IEEE Trans. Knowl. Data Eng. 25(3), 619–632 (2013)CrossRef
2.
go back to reference Alam, S., Dobbie, G., Koh, Y.S., Riddle, P., Rehman, S.U.: Research on particle swarm optimization based clustering: a systematic review of literature and techniques. Swarm Evol. Comput. 17, 1–13 (2014)CrossRef Alam, S., Dobbie, G., Koh, Y.S., Riddle, P., Rehman, S.U.: Research on particle swarm optimization based clustering: a systematic review of literature and techniques. Swarm Evol. Comput. 17, 1–13 (2014)CrossRef
3.
go back to reference Kameshwaran, K., Malarvizhi, K.: Survey on clustering techniques in data mining. Int. J. Comput. Sci. Inf. Technol. 5(2), 2272–2276 (2014) Kameshwaran, K., Malarvizhi, K.: Survey on clustering techniques in data mining. Int. J. Comput. Sci. Inf. Technol. 5(2), 2272–2276 (2014)
4.
go back to reference Verma, A., Kaur, I., Kaur, A.: Algorithmic approach to data mining and classification techniques. Indian J. Sci. Technol. 9(28), 1–22 (2016)CrossRef Verma, A., Kaur, I., Kaur, A.: Algorithmic approach to data mining and classification techniques. Indian J. Sci. Technol. 9(28), 1–22 (2016)CrossRef
5.
go back to reference Jiang, B., Pei, J., Tao, Y., Lin, X.: Clustering uncertain data based on probability distribution similarity. IEEE Trans. Knowl. Data Eng. 25(4), 751–763 (2013)CrossRef Jiang, B., Pei, J., Tao, Y., Lin, X.: Clustering uncertain data based on probability distribution similarity. IEEE Trans. Knowl. Data Eng. 25(4), 751–763 (2013)CrossRef
6.
go back to reference Sood, M., Bansal, S.: K-Medoids clustering technique using Bat algorithm. Int. J. Appl. Inf. Syst. 5(8), 20–22 (2013) Sood, M., Bansal, S.: K-Medoids clustering technique using Bat algorithm. Int. J. Appl. Inf. Syst. 5(8), 20–22 (2013)
7.
go back to reference Cao, F., Liang, J., Li, D., Zhao, X.: A weighting K-modes algorithm for subspace clustering of categorical data. Neurocomputing 108, 23–30 (2013)CrossRef Cao, F., Liang, J., Li, D., Zhao, X.: A weighting K-modes algorithm for subspace clustering of categorical data. Neurocomputing 108, 23–30 (2013)CrossRef
8.
go back to reference Mudaliar, P.U., Patil, T.A., Thete, S.S., Moholkar, K.P.: A fast clustering based feature subset selection algorithm for high dimensional data. Int. J. Emerg. Trend Eng. Basic Sci. 2(1), 494–499 (2015) Mudaliar, P.U., Patil, T.A., Thete, S.S., Moholkar, K.P.: A fast clustering based feature subset selection algorithm for high dimensional data. Int. J. Emerg. Trend Eng. Basic Sci. 2(1), 494–499 (2015)
9.
go back to reference Magendiran, N., Jayaranjani, J.: An efficient fast clustering-based feature subset selection algorithm for high-dimensional data. Int. J. Innov. Res. Sci. Eng. Technol. 3(1), 405–408 (2014) Magendiran, N., Jayaranjani, J.: An efficient fast clustering-based feature subset selection algorithm for high-dimensional data. Int. J. Innov. Res. Sci. Eng. Technol. 3(1), 405–408 (2014)
10.
go back to reference Yun, U., Ryang, H., Kwon, O.-C.: Monitoring vehicle outliers based on clustering technique. Appl. Soft Comput. 49, 845–860 (2016)CrossRef Yun, U., Ryang, H., Kwon, O.-C.: Monitoring vehicle outliers based on clustering technique. Appl. Soft Comput. 49, 845–860 (2016)CrossRef
11.
go back to reference Tabakhi, S., Moradi, P., Akhlaghian, F.: An unsupervised feature selection algorithm based on Ant colony optimization. Eng. Appl. Artif. Intell. 32, 112–123 (2014)CrossRef Tabakhi, S., Moradi, P., Akhlaghian, F.: An unsupervised feature selection algorithm based on Ant colony optimization. Eng. Appl. Artif. Intell. 32, 112–123 (2014)CrossRef
12.
go back to reference Saha, A., Das, S.: Categorical fuzzy K-modes clustering with automated feature weight learning. Neurocomputing 166, 422–435 (2015)CrossRef Saha, A., Das, S.: Categorical fuzzy K-modes clustering with automated feature weight learning. Neurocomputing 166, 422–435 (2015)CrossRef
13.
go back to reference Godase, A., Gupta, P.: Improvised method of FAST clustering based feature selection technique algorithm for high dimensional data. Int. J. Appl. Innov. Eng. Manage. 4(6), 135–140 (2015) Godase, A., Gupta, P.: Improvised method of FAST clustering based feature selection technique algorithm for high dimensional data. Int. J. Appl. Innov. Eng. Manage. 4(6), 135–140 (2015)
14.
go back to reference Wu, X., Wu, B., Sun, J., Qiu, S., Li, X.: A hybrid fuzzy K-harmonic means clustering algorithm. Appl. Math. Model. 39(12), 3398–3409 (2015)CrossRef Wu, X., Wu, B., Sun, J., Qiu, S., Li, X.: A hybrid fuzzy K-harmonic means clustering algorithm. Appl. Math. Model. 39(12), 3398–3409 (2015)CrossRef
15.
go back to reference Liu, X., Li, M.: Integrated constraint based clustering algorithm for high dimensional data. Neurocomputing 142, 478–485 (2014)CrossRef Liu, X., Li, M.: Integrated constraint based clustering algorithm for high dimensional data. Neurocomputing 142, 478–485 (2014)CrossRef
16.
go back to reference Kamakshaiah, K., Seshadri, R.: Prototype survey analysis of different information retrieval classification and grouping approaches for categorical information. In: 2017 International Conference on Intelligent Computing and Control (I2C2), Coimbatore, pp. 1–7 (2017). Part of ISBN 9781538603741. https://doi.org/10.1109/I2C2.2017.8321825 Kamakshaiah, K., Seshadri, R.: Prototype survey analysis of different information retrieval classification and grouping approaches for categorical information. In: 2017 International Conference on Intelligent Computing and Control (I2C2), Coimbatore, pp. 1–7 (2017). Part of ISBN 9781538603741. https://​doi.​org/​10.​1109/​I2C2.​2017.​8321825
17.
go back to reference Zhou, J., Pan, Y., Chen, C.P., Wang, D., Han, S.: K-Medoids method based on divergence for uncertain data clustering. In: 2016 IEEE International Conference on Systems, Man, and Cybernetics, pp. 002671–002674 (2016) Zhou, J., Pan, Y., Chen, C.P., Wang, D., Han, S.: K-Medoids method based on divergence for uncertain data clustering. In: 2016 IEEE International Conference on Systems, Man, and Cybernetics, pp. 002671–002674 (2016)
18.
go back to reference Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. In: 1999 International Conference on Data Engineering, pp. 512–521 (1999) Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. In: 1999 International Conference on Data Engineering, pp. 512–521 (1999)
19.
go back to reference Narayana, G.S., Kolli, K.: Fuzzy K-means clustering with fast density peak clustering on multivariate kernel estimator with evolutionary multimodal optimization clusters on a large dataset. Multimedia Tools Appl. 80(3), 4769–4787 (2021). ISSN 1380-7501. https://doi.org/10.1007/s11042-020-09718-4 Narayana, G.S., Kolli, K.: Fuzzy K-means clustering with fast density peak clustering on multivariate kernel estimator with evolutionary multimodal optimization clusters on a large dataset. Multimedia Tools Appl. 80(3), 4769–4787 (2021). ISSN 1380-7501. https://​doi.​org/​10.​1007/​s11042-020-09718-4
20.
go back to reference Babu, A.G., et al.: An experimental analysis of clustering sentiments for Opinion Mining. In: Proceedings of the 2017 International Conference on Machine Learning and Soft Computing (ACM International Conference), ICMLSC 2017, Ho Chi Minh City, Vietnam, 13–16 January 2017, pp. 53–57 (2017). Proceeding Series. ISBN 978-1-4503-4828-7. EID: 2-s2.0-85018707408. https://doi.org/10.1145/3036290.3036318 Babu, A.G., et al.: An experimental analysis of clustering sentiments for Opinion Mining. In: Proceedings of the 2017 International Conference on Machine Learning and Soft Computing (ACM International Conference), ICMLSC 2017, Ho Chi Minh City, Vietnam, 13–16 January 2017, pp. 53–57 (2017). Proceeding Series. ISBN 978-1-4503-4828-7. EID: 2-s2.0-85018707408. https://​doi.​org/​10.​1145/​3036290.​3036318
Metadata
Title
An Effective Machine Learning Approach for Clustering Categorical Data with High Dimensions
Authors
Syed Umar
Tadele Debisa Deressa
Tariku Birhanu Yadesa
Gemechu Boche Beshan
Endal Kachew Mosisa
Nilesh T. Gole
Copyright Year
2022
DOI
https://doi.org/10.1007/978-3-030-95711-7_39

Premium Partner