Skip to main content

2022 | OriginalPaper | Buchkapitel

An Effective Machine Learning Approach for Clustering Categorical Data with High Dimensions

verfasst von : Syed Umar, Tadele Debisa Deressa, Tariku Birhanu Yadesa, Gemechu Boche Beshan, Endal Kachew Mosisa, Nilesh T. Gole

Erschienen in: Artificial Intelligence and Speech Technology

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Many modern real world databases include redundant quantities of categorical data that contribute in data processing and efficient decision-making with their advances in database technology. However, for the reasons that they are identical to measurements the clustering algorithms are only devised for numerical results. An immense amount of work is being performed on the clustering of categorical data using a specifically defined similarity measure over categorical data. Thereby, the dynamic issue with real-world domain, which does not clearly take the predictive form, is the inner function. The function is based on both unseen and transonic perspective. This then offers a detailed and inventive collaboration with categorical results. The paper describes a stratified, immune-based approach with a new similarity metric, in order to reduce distance function, for clustering CAIS categorical data. For successful exploration of clusters over categorical results, CAIS adopts an immunology focused approach. It also selects subsistent nomadic characteristics as a representative entity and organize them into clusters that quantify affinity. To minimize database throughput, CAIS is segmented into several attributes. The analytical findings show that the proposed solution yields greater mining performance on different categorical datasets and outperforms EM on categorical datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Zhao, Z., Wang, L., Liu, H., Ye, J.: On similarity preserving feature selection. IEEE Trans. Knowl. Data Eng. 25(3), 619–632 (2013)CrossRef Zhao, Z., Wang, L., Liu, H., Ye, J.: On similarity preserving feature selection. IEEE Trans. Knowl. Data Eng. 25(3), 619–632 (2013)CrossRef
2.
Zurück zum Zitat Alam, S., Dobbie, G., Koh, Y.S., Riddle, P., Rehman, S.U.: Research on particle swarm optimization based clustering: a systematic review of literature and techniques. Swarm Evol. Comput. 17, 1–13 (2014)CrossRef Alam, S., Dobbie, G., Koh, Y.S., Riddle, P., Rehman, S.U.: Research on particle swarm optimization based clustering: a systematic review of literature and techniques. Swarm Evol. Comput. 17, 1–13 (2014)CrossRef
3.
Zurück zum Zitat Kameshwaran, K., Malarvizhi, K.: Survey on clustering techniques in data mining. Int. J. Comput. Sci. Inf. Technol. 5(2), 2272–2276 (2014) Kameshwaran, K., Malarvizhi, K.: Survey on clustering techniques in data mining. Int. J. Comput. Sci. Inf. Technol. 5(2), 2272–2276 (2014)
4.
Zurück zum Zitat Verma, A., Kaur, I., Kaur, A.: Algorithmic approach to data mining and classification techniques. Indian J. Sci. Technol. 9(28), 1–22 (2016)CrossRef Verma, A., Kaur, I., Kaur, A.: Algorithmic approach to data mining and classification techniques. Indian J. Sci. Technol. 9(28), 1–22 (2016)CrossRef
5.
Zurück zum Zitat Jiang, B., Pei, J., Tao, Y., Lin, X.: Clustering uncertain data based on probability distribution similarity. IEEE Trans. Knowl. Data Eng. 25(4), 751–763 (2013)CrossRef Jiang, B., Pei, J., Tao, Y., Lin, X.: Clustering uncertain data based on probability distribution similarity. IEEE Trans. Knowl. Data Eng. 25(4), 751–763 (2013)CrossRef
6.
Zurück zum Zitat Sood, M., Bansal, S.: K-Medoids clustering technique using Bat algorithm. Int. J. Appl. Inf. Syst. 5(8), 20–22 (2013) Sood, M., Bansal, S.: K-Medoids clustering technique using Bat algorithm. Int. J. Appl. Inf. Syst. 5(8), 20–22 (2013)
7.
Zurück zum Zitat Cao, F., Liang, J., Li, D., Zhao, X.: A weighting K-modes algorithm for subspace clustering of categorical data. Neurocomputing 108, 23–30 (2013)CrossRef Cao, F., Liang, J., Li, D., Zhao, X.: A weighting K-modes algorithm for subspace clustering of categorical data. Neurocomputing 108, 23–30 (2013)CrossRef
8.
Zurück zum Zitat Mudaliar, P.U., Patil, T.A., Thete, S.S., Moholkar, K.P.: A fast clustering based feature subset selection algorithm for high dimensional data. Int. J. Emerg. Trend Eng. Basic Sci. 2(1), 494–499 (2015) Mudaliar, P.U., Patil, T.A., Thete, S.S., Moholkar, K.P.: A fast clustering based feature subset selection algorithm for high dimensional data. Int. J. Emerg. Trend Eng. Basic Sci. 2(1), 494–499 (2015)
9.
Zurück zum Zitat Magendiran, N., Jayaranjani, J.: An efficient fast clustering-based feature subset selection algorithm for high-dimensional data. Int. J. Innov. Res. Sci. Eng. Technol. 3(1), 405–408 (2014) Magendiran, N., Jayaranjani, J.: An efficient fast clustering-based feature subset selection algorithm for high-dimensional data. Int. J. Innov. Res. Sci. Eng. Technol. 3(1), 405–408 (2014)
10.
Zurück zum Zitat Yun, U., Ryang, H., Kwon, O.-C.: Monitoring vehicle outliers based on clustering technique. Appl. Soft Comput. 49, 845–860 (2016)CrossRef Yun, U., Ryang, H., Kwon, O.-C.: Monitoring vehicle outliers based on clustering technique. Appl. Soft Comput. 49, 845–860 (2016)CrossRef
11.
Zurück zum Zitat Tabakhi, S., Moradi, P., Akhlaghian, F.: An unsupervised feature selection algorithm based on Ant colony optimization. Eng. Appl. Artif. Intell. 32, 112–123 (2014)CrossRef Tabakhi, S., Moradi, P., Akhlaghian, F.: An unsupervised feature selection algorithm based on Ant colony optimization. Eng. Appl. Artif. Intell. 32, 112–123 (2014)CrossRef
12.
Zurück zum Zitat Saha, A., Das, S.: Categorical fuzzy K-modes clustering with automated feature weight learning. Neurocomputing 166, 422–435 (2015)CrossRef Saha, A., Das, S.: Categorical fuzzy K-modes clustering with automated feature weight learning. Neurocomputing 166, 422–435 (2015)CrossRef
13.
Zurück zum Zitat Godase, A., Gupta, P.: Improvised method of FAST clustering based feature selection technique algorithm for high dimensional data. Int. J. Appl. Innov. Eng. Manage. 4(6), 135–140 (2015) Godase, A., Gupta, P.: Improvised method of FAST clustering based feature selection technique algorithm for high dimensional data. Int. J. Appl. Innov. Eng. Manage. 4(6), 135–140 (2015)
14.
Zurück zum Zitat Wu, X., Wu, B., Sun, J., Qiu, S., Li, X.: A hybrid fuzzy K-harmonic means clustering algorithm. Appl. Math. Model. 39(12), 3398–3409 (2015)CrossRef Wu, X., Wu, B., Sun, J., Qiu, S., Li, X.: A hybrid fuzzy K-harmonic means clustering algorithm. Appl. Math. Model. 39(12), 3398–3409 (2015)CrossRef
15.
Zurück zum Zitat Liu, X., Li, M.: Integrated constraint based clustering algorithm for high dimensional data. Neurocomputing 142, 478–485 (2014)CrossRef Liu, X., Li, M.: Integrated constraint based clustering algorithm for high dimensional data. Neurocomputing 142, 478–485 (2014)CrossRef
16.
Zurück zum Zitat Kamakshaiah, K., Seshadri, R.: Prototype survey analysis of different information retrieval classification and grouping approaches for categorical information. In: 2017 International Conference on Intelligent Computing and Control (I2C2), Coimbatore, pp. 1–7 (2017). Part of ISBN 9781538603741. https://doi.org/10.1109/I2C2.2017.8321825 Kamakshaiah, K., Seshadri, R.: Prototype survey analysis of different information retrieval classification and grouping approaches for categorical information. In: 2017 International Conference on Intelligent Computing and Control (I2C2), Coimbatore, pp. 1–7 (2017). Part of ISBN 9781538603741. https://​doi.​org/​10.​1109/​I2C2.​2017.​8321825
17.
Zurück zum Zitat Zhou, J., Pan, Y., Chen, C.P., Wang, D., Han, S.: K-Medoids method based on divergence for uncertain data clustering. In: 2016 IEEE International Conference on Systems, Man, and Cybernetics, pp. 002671–002674 (2016) Zhou, J., Pan, Y., Chen, C.P., Wang, D., Han, S.: K-Medoids method based on divergence for uncertain data clustering. In: 2016 IEEE International Conference on Systems, Man, and Cybernetics, pp. 002671–002674 (2016)
18.
Zurück zum Zitat Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. In: 1999 International Conference on Data Engineering, pp. 512–521 (1999) Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. In: 1999 International Conference on Data Engineering, pp. 512–521 (1999)
19.
Zurück zum Zitat Narayana, G.S., Kolli, K.: Fuzzy K-means clustering with fast density peak clustering on multivariate kernel estimator with evolutionary multimodal optimization clusters on a large dataset. Multimedia Tools Appl. 80(3), 4769–4787 (2021). ISSN 1380-7501. https://doi.org/10.1007/s11042-020-09718-4 Narayana, G.S., Kolli, K.: Fuzzy K-means clustering with fast density peak clustering on multivariate kernel estimator with evolutionary multimodal optimization clusters on a large dataset. Multimedia Tools Appl. 80(3), 4769–4787 (2021). ISSN 1380-7501. https://​doi.​org/​10.​1007/​s11042-020-09718-4
20.
Zurück zum Zitat Babu, A.G., et al.: An experimental analysis of clustering sentiments for Opinion Mining. In: Proceedings of the 2017 International Conference on Machine Learning and Soft Computing (ACM International Conference), ICMLSC 2017, Ho Chi Minh City, Vietnam, 13–16 January 2017, pp. 53–57 (2017). Proceeding Series. ISBN 978-1-4503-4828-7. EID: 2-s2.0-85018707408. https://doi.org/10.1145/3036290.3036318 Babu, A.G., et al.: An experimental analysis of clustering sentiments for Opinion Mining. In: Proceedings of the 2017 International Conference on Machine Learning and Soft Computing (ACM International Conference), ICMLSC 2017, Ho Chi Minh City, Vietnam, 13–16 January 2017, pp. 53–57 (2017). Proceeding Series. ISBN 978-1-4503-4828-7. EID: 2-s2.0-85018707408. https://​doi.​org/​10.​1145/​3036290.​3036318
Metadaten
Titel
An Effective Machine Learning Approach for Clustering Categorical Data with High Dimensions
verfasst von
Syed Umar
Tadele Debisa Deressa
Tariku Birhanu Yadesa
Gemechu Boche Beshan
Endal Kachew Mosisa
Nilesh T. Gole
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-030-95711-7_39

Premium Partner