Top

Published in:

2022 | OriginalPaper | Chapter

An Effective Machine Learning Approach for Clustering Categorical Data with High Dimensions

Authors : Syed Umar, Tadele Debisa Deressa, Tariku Birhanu Yadesa, Gemechu Boche Beshan, Endal Kachew Mosisa, Nilesh T. Gole

Published in: Artificial Intelligence and Speech Technology

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Many modern real world databases include redundant quantities of categorical data that contribute in data processing and efficient decision-making with their advances in database technology. However, for the reasons that they are identical to measurements the clustering algorithms are only devised for numerical results. An immense amount of work is being performed on the clustering of categorical data using a specifically defined similarity measure over categorical data. Thereby, the dynamic issue with real-world domain, which does not clearly take the predictive form, is the inner function. The function is based on both unseen and transonic perspective. This then offers a detailed and inventive collaboration with categorical results. The paper describes a stratified, immune-based approach with a new similarity metric, in order to reduce distance function, for clustering CAIS categorical data. For successful exploration of clusters over categorical results, CAIS adopts an immunology focused approach. It also selects subsistent nomadic characteristics as a representative entity and organize them into clusters that quantify affinity. To minimize database throughput, CAIS is segmented into several attributes. The analytical findings show that the proposed solution yields greater mining performance on different categorical datasets and outperforms EM on categorical datasets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Sarcasm Detection in Social Media Using Hybrid Deep Learning and Machine Learning Approaches

next chapter Identification of Disease Resistant Plant Genes Using Artificial Neural Network

Zhao, Z., Wang, L., Liu, H., Ye, J.: On similarity preserving feature selection. IEEE Trans. Knowl. Data Eng. 25(3), 619–632 (2013)CrossRef

Alam, S., Dobbie, G., Koh, Y.S., Riddle, P., Rehman, S.U.: Research on particle swarm optimization based clustering: a systematic review of literature and techniques. Swarm Evol. Comput. 17, 1–13 (2014)CrossRef

Kameshwaran, K., Malarvizhi, K.: Survey on clustering techniques in data mining. Int. J. Comput. Sci. Inf. Technol. 5(2), 2272–2276 (2014)

Verma, A., Kaur, I., Kaur, A.: Algorithmic approach to data mining and classification techniques. Indian J. Sci. Technol. 9(28), 1–22 (2016)CrossRef

Jiang, B., Pei, J., Tao, Y., Lin, X.: Clustering uncertain data based on probability distribution similarity. IEEE Trans. Knowl. Data Eng. 25(4), 751–763 (2013)CrossRef

Sood, M., Bansal, S.: K-Medoids clustering technique using Bat algorithm. Int. J. Appl. Inf. Syst. 5(8), 20–22 (2013)

Cao, F., Liang, J., Li, D., Zhao, X.: A weighting K-modes algorithm for subspace clustering of categorical data. Neurocomputing 108, 23–30 (2013)CrossRef

Mudaliar, P.U., Patil, T.A., Thete, S.S., Moholkar, K.P.: A fast clustering based feature subset selection algorithm for high dimensional data. Int. J. Emerg. Trend Eng. Basic Sci. 2(1), 494–499 (2015)

Magendiran, N., Jayaranjani, J.: An efficient fast clustering-based feature subset selection algorithm for high-dimensional data. Int. J. Innov. Res. Sci. Eng. Technol. 3(1), 405–408 (2014)

10.

Yun, U., Ryang, H., Kwon, O.-C.: Monitoring vehicle outliers based on clustering technique. Appl. Soft Comput. 49, 845–860 (2016)CrossRef

11.

Tabakhi, S., Moradi, P., Akhlaghian, F.: An unsupervised feature selection algorithm based on Ant colony optimization. Eng. Appl. Artif. Intell. 32, 112–123 (2014)CrossRef

12.

Saha, A., Das, S.: Categorical fuzzy K-modes clustering with automated feature weight learning. Neurocomputing 166, 422–435 (2015)CrossRef

13.

Godase, A., Gupta, P.: Improvised method of FAST clustering based feature selection technique algorithm for high dimensional data. Int. J. Appl. Innov. Eng. Manage. 4(6), 135–140 (2015)

14.

Wu, X., Wu, B., Sun, J., Qiu, S., Li, X.: A hybrid fuzzy K-harmonic means clustering algorithm. Appl. Math. Model. 39(12), 3398–3409 (2015)CrossRef

15.

Liu, X., Li, M.: Integrated constraint based clustering algorithm for high dimensional data. Neurocomputing 142, 478–485 (2014)CrossRef

16.

Kamakshaiah, K., Seshadri, R.: Prototype survey analysis of different information retrieval classification and grouping approaches for categorical information. In: 2017 International Conference on Intelligent Computing and Control (I2C2), Coimbatore, pp. 1–7 (2017). Part of ISBN 9781538603741. https://doi.org/10.1109/I2C2.2017.8321825

17.

Zhou, J., Pan, Y., Chen, C.P., Wang, D., Han, S.: K-Medoids method based on divergence for uncertain data clustering. In: 2016 IEEE International Conference on Systems, Man, and Cybernetics, pp. 002671–002674 (2016)

18.

Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. In: 1999 International Conference on Data Engineering, pp. 512–521 (1999)

19.

Narayana, G.S., Kolli, K.: Fuzzy K-means clustering with fast density peak clustering on multivariate kernel estimator with evolutionary multimodal optimization clusters on a large dataset. Multimedia Tools Appl. 80(3), 4769–4787 (2021). ISSN 1380-7501. https://doi.org/10.1007/s11042-020-09718-4

20.

Babu, A.G., et al.: An experimental analysis of clustering sentiments for Opinion Mining. In: Proceedings of the 2017 International Conference on Machine Learning and Soft Computing (ACM International Conference), ICMLSC 2017, Ho Chi Minh City, Vietnam, 13–16 January 2017, pp. 53–57 (2017). Proceeding Series. ISBN 978-1-4503-4828-7. EID: 2-s2.0-85018707408. https://doi.org/10.1145/3036290.3036318

Title: An Effective Machine Learning Approach for Clustering Categorical Data with High Dimensions
Authors: Syed Umar
Tadele Debisa Deressa
Tariku Birhanu Yadesa
Gemechu Boche Beshan
Endal Kachew Mosisa
Nilesh T. Gole
Publisher: Springer International Publishing
Book: Artificial Intelligence and Speech Technology
Print ISBN: 978-3-030-95710-0

Electronic ISBN: 978-3-030-95711-7

Copyright Year: 2022
DOI: https://doi.org/10.1007/978-3-030-95711-7_39

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner