Skip to main content
Top
Published in: Knowledge and Information Systems 4/2024

30-12-2023 | Regular Paper

A smart intelligent approach based on hybrid group search and pelican optimization algorithm for data stream clustering

Authors: Swathi Agarwal, C. R. K. Reddy

Published in: Knowledge and Information Systems | Issue 4/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Big data applications generate a huge range of evolving, real-time, and high-dimensional streaming data. In many applications, data stream clustering regarding efficiency and effectiveness becomes challenging. A major issue in data mining is clustering of data streams. The several clustering techniques were implemented for stream data, but they are mostly quite restricted approaches to cluster dynamics. Generally, the data stream is an arrival of data sequence and also several factors are added in the clustering, which is rather than the classical clustering. For every data point, the stream is mostly unbounded and also the data has been estimated atleast once. It leads to higher processing time and an additional requirement on memory. In addition, the clusters in each data and their statistical property vary over time, and streams can be noisy. To address these challenges, this research work aims to implement a novel data stream clustering which is developed with a hybrid meta-heuristic model. Initially, a data stream is collected, and the micro-clusters are formed by the K-Means Clustering (KMC) technique. Then, the formation of micro-clusters, merge and sorting of the data clusters, where the cluster optimization is performed by the Hybrid Group Search Pelican Optimization (HGSPO). The main objective of the clustering is performed to maximize the accuracy through the radius, distance and similarity measures and then, the thresholds of these metrics are optimized. In the training phase, a stream of clustering threshold is fixed for each cluster. When new data comes into this stream clustering model, the output of training data is measured with new data output that is decided to forward the data into the appropriate clusters based on the assigned threshold with minimum similarity. Through the performance analysis and the attained results, the clustering quality of the recommended system is ensured regarding standard performance metrics by estimating with various clustering and heuristic algorithms.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bezdek JC, Keller JM (2021) Streaming data analysis: clustering or classification? IEEE Trans Syst, Man, Cybern: Syst 51(1):91–102CrossRef Bezdek JC, Keller JM (2021) Streaming data analysis: clustering or classification? IEEE Trans Syst, Man, Cybern: Syst 51(1):91–102CrossRef
2.
go back to reference Fahy C, Yang S (2022) Finding and tracking multi-density clusters in online dynamic data streams. IEEE Trans Big Data 8(1):178–192CrossRef Fahy C, Yang S (2022) Finding and tracking multi-density clusters in online dynamic data streams. IEEE Trans Big Data 8(1):178–192CrossRef
3.
go back to reference Huang L, Wang C-D, Chao H-Y, Yu PS (2020) MVStream: multiview data stream clustering. IEEE Trans Neural Netw Learn Syst 31(9):3482–3496MathSciNetCrossRef Huang L, Wang C-D, Chao H-Y, Yu PS (2020) MVStream: multiview data stream clustering. IEEE Trans Neural Netw Learn Syst 31(9):3482–3496MathSciNetCrossRef
4.
go back to reference Zhang X, Furtlehner C, Germain-Renaud C, Sebag M (2014) Data stream clustering with affinity propagation. IEEE Trans Knowl Data Eng 26(7):1644–1656CrossRef Zhang X, Furtlehner C, Germain-Renaud C, Sebag M (2014) Data stream clustering with affinity propagation. IEEE Trans Knowl Data Eng 26(7):1644–1656CrossRef
5.
go back to reference Tareq M, Sundararajan EA, Harwood A, Bakar AA (2022) A systematic review of density grid-based clustering for data streams. IEEE Access 10:579–596CrossRef Tareq M, Sundararajan EA, Harwood A, Bakar AA (2022) A systematic review of density grid-based clustering for data streams. IEEE Access 10:579–596CrossRef
6.
go back to reference Cheng L, Niu J, Di Francesco M, Das SK, Luo C, Gu Y (2016) Seamless streaming data delivery in cluster-based wireless sensor networks with mobile elements. IEEE Syst J 10(2):805–816CrossRef Cheng L, Niu J, Di Francesco M, Das SK, Luo C, Gu Y (2016) Seamless streaming data delivery in cluster-based wireless sensor networks with mobile elements. IEEE Syst J 10(2):805–816CrossRef
7.
go back to reference Li X, Zhang Z (2019) Research and analysis for real-time streaming big data based on controllable clustering and edge computing algorithm. IEEE Access 7:171621–171632CrossRef Li X, Zhang Z (2019) Research and analysis for real-time streaming big data based on controllable clustering and edge computing algorithm. IEEE Access 7:171621–171632CrossRef
8.
go back to reference Hahsler M, Bolaños M (2016) Clustering data streams based on shared density between micro-clusters. IEEE Trans Knowl Data Eng 28(6):1449–1461CrossRef Hahsler M, Bolaños M (2016) Clustering data streams based on shared density between micro-clusters. IEEE Trans Knowl Data Eng 28(6):1449–1461CrossRef
9.
go back to reference Liu B, Xiao Y, Yu PS, Cao L, Zhang Y, Hao Z (2014) Uncertain one-class learning and concept summarization learning on uncertain data streams. IEEE Trans Knowl Data Eng 26(2):468–484CrossRef Liu B, Xiao Y, Yu PS, Cao L, Zhang Y, Hao Z (2014) Uncertain one-class learning and concept summarization learning on uncertain data streams. IEEE Trans Knowl Data Eng 26(2):468–484CrossRef
10.
go back to reference Rodrigues PP, Gama J, Pedroso J (2008) Hierarchical clustering of time-series data streams. IEEE Trans Knowl Data Eng 20(5):615–627CrossRef Rodrigues PP, Gama J, Pedroso J (2008) Hierarchical clustering of time-series data streams. IEEE Trans Knowl Data Eng 20(5):615–627CrossRef
11.
go back to reference Yang Y, Chen K (2011) Temporal data clustering via weighted clustering ensemble with different representations. IEEE Trans Knowl Data Eng 23(2):307–320CrossRef Yang Y, Chen K (2011) Temporal data clustering via weighted clustering ensemble with different representations. IEEE Trans Knowl Data Eng 23(2):307–320CrossRef
12.
go back to reference Zubaroğlu A, Atalay V (2021) Data stream clustering: a review. Artif Intell Rev 54:1201–1236CrossRef Zubaroğlu A, Atalay V (2021) Data stream clustering: a review. Artif Intell Rev 54:1201–1236CrossRef
13.
go back to reference Fahy C, Yang S (2019) Dynamic feature selection for clustering high dimensional data streams. IEEE Access 7:127128–127140CrossRef Fahy C, Yang S (2019) Dynamic feature selection for clustering high dimensional data streams. IEEE Access 7:127128–127140CrossRef
14.
go back to reference Tareq M, Sundararajan EA, Mohd M, Sani NS (2020) Online clustering of evolving data streams using a density grid-based method. IEEE Access 8:166472–166490CrossRef Tareq M, Sundararajan EA, Mohd M, Sani NS (2020) Online clustering of evolving data streams using a density grid-based method. IEEE Access 8:166472–166490CrossRef
15.
go back to reference Bai L, Cheng X, Liang J, Shen H (2016) An optimization model for clustering categorical data streams with drifting concepts. IEEE Trans Knowl Data Eng 28(11):2871–2883CrossRef Bai L, Cheng X, Liang J, Shen H (2016) An optimization model for clustering categorical data streams with drifting concepts. IEEE Trans Knowl Data Eng 28(11):2871–2883CrossRef
16.
go back to reference Wang C, Lai J, Huang D, Zheng W (2013) SVStream: a support vector-based algorithm for clustering data streams. IEEE Trans Knowl Data Eng 25(6):1410–1424CrossRef Wang C, Lai J, Huang D, Zheng W (2013) SVStream: a support vector-based algorithm for clustering data streams. IEEE Trans Knowl Data Eng 25(6):1410–1424CrossRef
17.
go back to reference Youn J, Shim J, Lee S-G (2018) Efficient data stream clustering with sliding windows based on locality-sensitive hashing. IEEE Access 6:63757–63776CrossRef Youn J, Shim J, Lee S-G (2018) Efficient data stream clustering with sliding windows based on locality-sensitive hashing. IEEE Access 6:63757–63776CrossRef
18.
go back to reference Sui J, Liu Z, Jung A, Liu L, Li X (2018) Dynamic clustering scheme for evolving data streams based on improved STRAP. IEEE Access 6:46157–46166CrossRef Sui J, Liu Z, Jung A, Liu L, Li X (2018) Dynamic clustering scheme for evolving data streams based on improved STRAP. IEEE Access 6:46157–46166CrossRef
19.
go back to reference Li Y, Li H, Wang Z, Liu B, Cui J, Fei H (2022) ESA-Stream: efficient self-adaptive online data stream clustering. IEEE Trans Knowl Data Eng 34(2):617–630CrossRef Li Y, Li H, Wang Z, Liu B, Cui J, Fei H (2022) ESA-Stream: efficient self-adaptive online data stream clustering. IEEE Trans Knowl Data Eng 34(2):617–630CrossRef
20.
go back to reference Yan X, Razeghi-Jahromi M, Homaifar A, Erol BA, Girma A, Tunstel E (2019) A novel streaming data clustering algorithm based on fitness proportionate sharing. IEEE Access 7:184985–185000CrossRef Yan X, Razeghi-Jahromi M, Homaifar A, Erol BA, Girma A, Tunstel E (2019) A novel streaming data clustering algorithm based on fitness proportionate sharing. IEEE Access 7:184985–185000CrossRef
21.
go back to reference Fahy C, Yang S, Gongora M (2019) Ant colony stream clustering: a fast density clustering algorithm for dynamic data streams. IEEE Trans Cybern 49(6):2215–2228CrossRef Fahy C, Yang S, Gongora M (2019) Ant colony stream clustering: a fast density clustering algorithm for dynamic data streams. IEEE Trans Cybern 49(6):2215–2228CrossRef
22.
go back to reference Puschmann D, Barnaghi P, Tafazolli R (2017) Adaptive clustering for dynamic IoT data streams. IEEE Internet Things J 4(1):64–74CrossRef Puschmann D, Barnaghi P, Tafazolli R (2017) Adaptive clustering for dynamic IoT data streams. IEEE Internet Things J 4(1):64–74CrossRef
23.
go back to reference Yin C, Xia L, Zhang S, Sun R, Wang J (2018) Improved clustering algorithm based on high-speed network data stream. Soft Comput 22:4185–4195CrossRef Yin C, Xia L, Zhang S, Sun R, Wang J (2018) Improved clustering algorithm based on high-speed network data stream. Soft Comput 22:4185–4195CrossRef
24.
go back to reference Wang Y, Li J, Yang B, Li H-G (2022) Stream-data-clustering based adaptive alarm threshold setting approaches for industrial processes with multiple operating conditions. ISA Trans 129:594–608CrossRef Wang Y, Li J, Yang B, Li H-G (2022) Stream-data-clustering based adaptive alarm threshold setting approaches for industrial processes with multiple operating conditions. ISA Trans 129:594–608CrossRef
25.
go back to reference Sun Y, Cao M, Sun Y, Gao H, Lou F, Liu S, Xia Q (2021) Uncertain data stream algorithm based on clustering RBF neural network. Microprocess Microsyst 81:103731CrossRef Sun Y, Cao M, Sun Y, Gao H, Lou F, Liu S, Xia Q (2021) Uncertain data stream algorithm based on clustering RBF neural network. Microprocess Microsyst 81:103731CrossRef
26.
go back to reference Aggarwal CC, Yu PS, Han J, Wang J, (2003) A framework for clustering evolving data streams, In: Proceedings 2003 VLDB Conference, pp. 81–92 Aggarwal CC, Yu PS, Han J, Wang J, (2003) A framework for clustering evolving data streams, In: Proceedings 2003 VLDB Conference, pp. 81–92
27.
go back to reference Chan TF, Golub GH & LeVeque RJ, (1982) Updating formulae and a pairwise algorithm for computing sample variances, COMPSTAT 1982 5th Symposium held at Toulouse pp 30–41. Chan TF, Golub GH & LeVeque RJ, (1982) Updating formulae and a pairwise algorithm for computing sample variances, COMPSTAT 1982 5th Symposium held at Toulouse pp 30–41.
28.
go back to reference Ester M , Kriegel H-P , Sander J, Xu X , 1996 A density-based algorithm for discovering clusters in large spatial databases with noise, In: KDD-96 Proceedings, AAAI, pp 226–231 Ester M , Kriegel H-P , Sander J, Xu X , 1996 A density-based algorithm for discovering clusters in large spatial databases with noise, In: KDD-96 Proceedings, AAAI, pp 226–231
29.
go back to reference Cao F , Ester M , Qian W, and Zhou A, (2006) "Density-based clustering over an evolving data stream with noise, In: Proceedings of the 2006 SIAM international conference on data mining (SDM) Cao F , Ester M , Qian W, and Zhou A, (2006) "Density-based clustering over an evolving data stream with noise, In: Proceedings of the 2006 SIAM international conference on data mining (SDM)
30.
go back to reference Sculley D, (2020) Web-scale k-means clustering, In: Proceedings of the 19th international conference on world wide web, pp 1177–1178. Sculley D, (2020) Web-scale k-means clustering, In: Proceedings of the 19th international conference on world wide web, pp 1177–1178.
31.
go back to reference O'Callaghan L, Mishra N, Meyerson A, Guha S, Motwani R, (2002) Streaming-data algorithms for high-quality clustering, In: Proceedings 18th international conference on data engineering, pp. 685–694 O'Callaghan L, Mishra N, Meyerson A, Guha S, Motwani R, (2002) Streaming-data algorithms for high-quality clustering, In: Proceedings 18th international conference on data engineering, pp. 685–694
32.
go back to reference Assenmacher D & Trautmann H, (2002) Textual one-pass stream clustering with automated distance threshold adaption, In: Asian conference on intelligent information and database systems, pp 3–16 Assenmacher D & Trautmann H, (2002) Textual one-pass stream clustering with automated distance threshold adaption, In: Asian conference on intelligent information and database systems, pp 3–16
33.
go back to reference Carnein M , Assenmacher D & Trautmann H , (2017) Stream clustering of chat messages with applications to twitch streams, In: International conference on conceptual modeling, pp. 79–88 Carnein M , Assenmacher D & Trautmann H , (2017) Stream clustering of chat messages with applications to twitch streams, In: International conference on conceptual modeling, pp. 79–88
34.
go back to reference Preetha M, Anil KN, Elavarasi K, Vignesh T, Nagaraju V (2022) A hybrid clustering approach based Q-leach in TDMA to optimize QOS-parameters. Wireless Pers Commun 123(2):1169–1200CrossRef Preetha M, Anil KN, Elavarasi K, Vignesh T, Nagaraju V (2022) A hybrid clustering approach based Q-leach in TDMA to optimize QOS-parameters. Wireless Pers Commun 123(2):1169–1200CrossRef
35.
go back to reference Manishankar M, Rao KV (2018) Mining stream data using k-means clustering algorithm. Int J Res 7:390–396 Manishankar M, Rao KV (2018) Mining stream data using k-means clustering algorithm. Int J Res 7:390–396
36.
go back to reference He S, Wu QH, Saunders JR (2009) Group search optimizer: an optimization algorithm inspired by animal searching behavior. IEEE Trans Evol Comput 13(5):973–990CrossRef He S, Wu QH, Saunders JR (2009) Group search optimizer: an optimization algorithm inspired by animal searching behavior. IEEE Trans Evol Comput 13(5):973–990CrossRef
37.
go back to reference Trojovský P, Dehghani M (2022) Pelican optimization algorithm: a novel nature-inspired algorithm for engineering applications. Sensors 22:855CrossRef Trojovský P, Dehghani M (2022) Pelican optimization algorithm: a novel nature-inspired algorithm for engineering applications. Sensors 22:855CrossRef
38.
go back to reference Pedersen MEH, Chipperfield AJ (2010) Simplifying particle swarm optimization. Appl Soft Comput 10(2):618–628CrossRef Pedersen MEH, Chipperfield AJ (2010) Simplifying particle swarm optimization. Appl Soft Comput 10(2):618–628CrossRef
39.
go back to reference Raom RV (2016) Jaya: a simple and new optimization algorithm for solving constrained and unconstrained optimization problems. Int J Indus Eng Comput 7(1):19–34 Raom RV (2016) Jaya: a simple and new optimization algorithm for solving constrained and unconstrained optimization problems. Int J Indus Eng Comput 7(1):19–34
40.
go back to reference Zhang B, Qin S, Wang W, Wang D, Xue L (2016) Data stream clustering based on Fuzzy C-Mean algorithm and entropy theory. Signal Process 126:111–116CrossRef Zhang B, Qin S, Wang W, Wang D, Xue L (2016) Data stream clustering based on Fuzzy C-Mean algorithm and entropy theory. Signal Process 126:111–116CrossRef
Metadata
Title
A smart intelligent approach based on hybrid group search and pelican optimization algorithm for data stream clustering
Authors
Swathi Agarwal
C. R. K. Reddy
Publication date
30-12-2023
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 4/2024
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-023-02002-5

Other articles of this Issue 4/2024

Knowledge and Information Systems 4/2024 Go to the issue

Premium Partner