Skip to main content

2022 | OriginalPaper | Buchkapitel

5. Research Intention Towards Incremental Clustering

verfasst von : Sanjay Chakraborty, SK Hafizul Islam, Debabrata Samanta

Erschienen in: Data Classification and Incremental Clustering in Data Mining and Machine Learning

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Incremental clustering is nothing but a process of grouping new incoming or incremental data into classes or clusters. It mainly clusters the randomly new data into a similar group of clusters. The existing K-means and DBSCAN clustering algorithms are inefficient to handle the large dynamic databases because, for every change in the incremental database, they simply run their algorithms repeatedly, taking lots of time to properly cluster those new ones coming data. It takes too much time and has also been realized that applying the existing algorithm frequently for updated databases may be too costly. So, the existing K-means clustering algorithm is not suitable for a dynamic environment. That’s why incremental versions of K-means and DBSCAN have been introduced in our work to overcome these challenges. To address the aforementioned issue, incremental clustering algorithms were developed to measure new cluster centers by simply computing the distance of new data from the means of current clusters rather than rerunning the entire clustering procedure. Both the K-means and the DBSCANDBSCAN algorithms use a similar approach. As a result, it specifies the delta change in the original database at which incremental K-means or DBSCANDBSCAN clustering outperforms prior techniques.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Eshref Januzaj, Hans-Peter Kriegel, Martin Pfeifle, “Towards Effective and Efficient Distributed Clustering”, Workshop on Clustering Large Data Sets (ICDM2003), Melbourne, FL, 2003. Eshref Januzaj, Hans-Peter Kriegel, Martin Pfeifle, “Towards Effective and Efficient Distributed Clustering”, Workshop on Clustering Large Data Sets (ICDM2003), Melbourne, FL, 2003.
2.
Zurück zum Zitat S. Jiang, X. Song, “A clustering based method for unsupervised intrusion detections” . Pattern Recognition Letters, PP. 802–810, 2006. S. Jiang, X. Song, “A clustering based method for unsupervised intrusion detections” . Pattern Recognition Letters, PP. 802–810, 2006.
4.
Zurück zum Zitat A.M. Sowjanya, M. Shashi, “Cluster Feature-Based Incremental Clustering Approach (CFICA) For Numerical Data,” IJCSNS International Journal of Computer Science and Network Security, VOL. 10 No. 9, September 2010. A.M. Sowjanya, M. Shashi, “Cluster Feature-Based Incremental Clustering Approach (CFICA) For Numerical Data,” IJCSNS International Journal of Computer Science and Network Security, VOL. 10 No. 9, September 2010.
7.
Zurück zum Zitat Anil Kumar Tiwari, Lokesh Kumar Sharma, G. Rama Krishna, “Entropy Weighting Genetic k-Means Algorithm for Subspace Clustering”, International Journal of Computer Applications (0975– 8887), Volume 7– No. 7, October 2010. Anil Kumar Tiwari, Lokesh Kumar Sharma, G. Rama Krishna, “Entropy Weighting Genetic k-Means Algorithm for Subspace Clustering”, International Journal of Computer Applications (0975– 8887), Volume 7– No. 7, October 2010.
8.
Zurück zum Zitat Aristidis Likasa, Nikos Vlassis, Jakob J. Verbeek,“The global k-means clustering algorithm”, the journal of the pattern recognition society, Pattern Recognition 36 (2003) 451–461, 2002. Aristidis Likasa, Nikos Vlassis, Jakob J. Verbeek,“The global k-means clustering algorithm”, the journal of the pattern recognition society, Pattern Recognition 36 (2003) 451–461, 2002.
9.
Zurück zum Zitat B. Naik, M. S. Obaidat, J. Nayak, D. Pelusi, P. Vijayakumar and S. H. Islam, “Intelligent Secure Ecosystem Based on Metaheuristic and Functional Link Neural Network for Edge of Things,” in IEEE Transactions on Industrial Informatics, vol. 16, no. 3, pp. 1947–1956, March 2020, doi:https://doi.org/10.1109/TII.2019.2920831. B. Naik, M. S. Obaidat, J. Nayak, D. Pelusi, P. Vijayakumar and S. H. Islam, “Intelligent Secure Ecosystem Based on Metaheuristic and Functional Link Neural Network for Edge of Things,” in IEEE Transactions on Industrial Informatics, vol. 16, no. 3, pp. 1947–1956, March 2020, doi:https://​doi.​org/​10.​1109/​TII.​2019.​2920831.
10.
Zurück zum Zitat Carlos Ordonez and Edward Omiecinski, “Efficient Disk-Based K-Means Clustering for Relational Databases”, IEEE transaction on knowledge and Data Engineering, Vol. 16, No. 8,August 2004. Carlos Ordonez and Edward Omiecinski, “Efficient Disk-Based K-Means Clustering for Relational Databases”, IEEE transaction on knowledge and Data Engineering, Vol. 16, No. 8,August 2004.
11.
Zurück zum Zitat Carlos Ordonez, “Clustering Binary Data Streams with K-means”, San Diego, CA, USA. Copyright 2003, ACM 1- 58113-763-x, DMKD’03, June 13, 2003. Carlos Ordonez, “Clustering Binary Data Streams with K-means”, San Diego, CA, USA. Copyright 2003, ACM 1- 58113-763-x, DMKD’03, June 13, 2003.
12.
Zurück zum Zitat CHEN Ning, CHEN An, ZHOU Long-xiang, “An Incremental Grid Density-Based Clustering Algorithm”, Journal of Software, Vol. 13, No. 1, 2002. CHEN Ning, CHEN An, ZHOU Long-xiang, “An Incremental Grid Density-Based Clustering Algorithm”, Journal of Software, Vol. 13, No. 1, 2002.
14.
Zurück zum Zitat Data Mining concepts and techniques by Jiawei Han and Micheline Kamber, Morgan Kaufmann (publisher) from chapter-7 ‘cluster analysis’, ISBN:978-1-55860-901-3, 2006. Data Mining concepts and techniques by Jiawei Han and Micheline Kamber, Morgan Kaufmann (publisher) from chapter-7 ‘cluster analysis’, ISBN:978-1-55860-901-3, 2006.
15.
Zurück zum Zitat Debashis Das Chakladar and Sanjay Chakraborty, EEG Based Emotion Classification using Correlation Based Subset Selection, Biologically Inspired Cognitive Architectures (Cognitive Systems Research), Elsevier, 2018. Debashis Das Chakladar and Sanjay Chakraborty, EEG Based Emotion Classification using Correlation Based Subset Selection, Biologically Inspired Cognitive Architectures (Cognitive Systems Research), Elsevier, 2018.
16.
Zurück zum Zitat Dunham, M.H., Data Mining: Introductory And Advanced Topics, New Jersey: Prentice Hall, ISBN-13: 9780130888921. 2003. Dunham, M.H., Data Mining: Introductory And Advanced Topics, New Jersey: Prentice Hall, ISBN-13: 9780130888921. 2003.
17.
Zurück zum Zitat Govender, P., & Sivakumar, V. (2020). Application of k-means and hierarchical clustering techniques for analysis of air pollution: A review (1980–2019). Atmospheric pollution research, 11(1), 40–56.CrossRef Govender, P., & Sivakumar, V. (2020). Application of k-means and hierarchical clustering techniques for analysis of air pollution: A review (1980–2019). Atmospheric pollution research, 11(1), 40–56.CrossRef
19.
Zurück zum Zitat H. Witten, Data mining: practical machine learning tools and techniques with Java implementations San-Francisco, California: Morgan Kaufmann, ISBN: 978-0-12-374856-0 2000. H. Witten, Data mining: practical machine learning tools and techniques with Java implementations San-Francisco, California: Morgan Kaufmann, ISBN: 978-0-12-374856-0 2000.
20.
Zurück zum Zitat Jahwar, A. F., & Abdulazeez, A. M. (2020). Meta-heuristic algorithms for k-means clustering: A review. PalArch’s Journal of Archaeology of Egypt/Egyptology, 17(7), 12002–12020. Jahwar, A. F., & Abdulazeez, A. M. (2020). Meta-heuristic algorithms for k-means clustering: A review. PalArch’s Journal of Archaeology of Egypt/Egyptology, 17(7), 12002–12020.
21.
Zurück zum Zitat K. Mumtaz, Dr. K. Duraiswamy, “An Analysis on Density Based Clustering of Multi Dimensional Spatial Data”, Indian Journal of Computer Science and Engineering, Vol. 1 No 1, pp-8–12, ISSN: 0976-5166. K. Mumtaz, Dr. K. Duraiswamy, “An Analysis on Density Based Clustering of Multi Dimensional Spatial Data”, Indian Journal of Computer Science and Engineering, Vol. 1 No 1, pp-8–12, ISSN: 0976-5166.
23.
Zurück zum Zitat Kantardzic, M. Data Mining: concepts, models, method, and algorithms, New Jersey: IEEE press, ISBN: 978-0-471-22852-3, 2003. Kantardzic, M. Data Mining: concepts, models, method, and algorithms, New Jersey: IEEE press, ISBN: 978-0-471-22852-3, 2003.
24.
Zurück zum Zitat Kehar Singh, Dimple Malik and Naveen Sharma, “Evolving limitations in K-means algorithm in data Mining and their removal”, IJCEM International Journal of Computational Engineering & Management, Vol. 12, April 2011. Kehar Singh, Dimple Malik and Naveen Sharma, “Evolving limitations in K-means algorithm in data Mining and their removal”, IJCEM International Journal of Computational Engineering & Management, Vol. 12, April 2011.
25.
Zurück zum Zitat Khamparia, A, Singh, PK, Rani, P, Samanta, D, Khanna, A, Bhushan, B. An internet of health things-driven deep learning framework for detection and classification of skin cancer using transfer learning. Trans Emerging Tel Tech. 2020;e3963. doi:https://doi.org/10.1002/ett.3963 Khamparia, A, Singh, PK, Rani, P, Samanta, D, Khanna, A, Bhushan, B. An internet of health things-driven deep learning framework for detection and classification of skin cancer using transfer learning. Trans Emerging Tel Tech. 2020;e3963. doi:https://​doi.​org/​10.​1002/​ett.​3963
26.
Zurück zum Zitat Long, Z. Z., Xu, G., Du, J., Zhu, H., Yan, T., & Yu, Y. F. (2021). Flexible Subspace Clustering: A Joint Feature Selection and K-Means Clustering Framework. Big Data Research, 23, 100170.CrossRef Long, Z. Z., Xu, G., Du, J., Zhu, H., Yan, T., & Yu, Y. F. (2021). Flexible Subspace Clustering: A Joint Feature Selection and K-Means Clustering Framework. Big Data Research, 23, 100170.CrossRef
27.
Zurück zum Zitat Lopamudra Dey, Sanjay Chakraborty, Anirban Mukhopadhyay. Machine Learning Techniques for Sequence-based Prediction of Viral-Host Interactions between SARS-CoV-2 and Human Proteins. Biomedical Journal, Elsevier, 2020. Lopamudra Dey, Sanjay Chakraborty, Anirban Mukhopadhyay. Machine Learning Techniques for Sequence-based Prediction of Viral-Host Interactions between SARS-CoV-2 and Human Proteins. Biomedical Journal, Elsevier, 2020.
28.
Zurück zum Zitat Martin Ester, Hans-Peter Kriegel, Jorg Sander, Michael Wimmer, Xiaowei Xu, “Incremental clustering for mining in a data ware housing”, 24th VLDB Conference New York, USA, 1998. Martin Ester, Hans-Peter Kriegel, Jorg Sander, Michael Wimmer, Xiaowei Xu, “Incremental clustering for mining in a data ware housing”, 24th VLDB Conference New York, USA, 1998.
29.
Zurück zum Zitat Michael K. Ng, Mark Junjie Li, Joshua Zhexue Huang, and Zengyou He, “On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm”, IEEE transaction on pattern analysis and machine intelligence, vol. 29, No. 3, March 2007. Michael K. Ng, Mark Junjie Li, Joshua Zhexue Huang, and Zengyou He, “On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm”, IEEE transaction on pattern analysis and machine intelligence, vol. 29, No. 3, March 2007.
30.
Zurück zum Zitat Naresh Kumar Nagwani and Ashok Bhansali, “An Object Oriented Email Clustering Model Using Weighted Similarities between Emails Attributes”, International Journal of Research and Reviews in Computer science (IJRRCS), Vol. 1, No. 2, June 2010. Naresh Kumar Nagwani and Ashok Bhansali, “An Object Oriented Email Clustering Model Using Weighted Similarities between Emails Attributes”, International Journal of Research and Reviews in Computer science (IJRRCS), Vol. 1, No. 2, June 2010.
31.
Zurück zum Zitat Oyelade, O. J, Oladipupo, O. O, Obagbuwa, I. C, “Application of k-means Clustering algorithm for prediction of Students’ Academic Performance”, (IJCSIS) International Journal of Computer Science and Information security, Vol. 7, No. 1, 2010. Oyelade, O. J, Oladipupo, O. O, Obagbuwa, I. C, “Application of k-means Clustering algorithm for prediction of Students’ Academic Performance”, (IJCSIS) International Journal of Computer Science and Information security, Vol. 7, No. 1, 2010.
32.
Zurück zum Zitat Rohan Kumar, Rajat Kumar, Pinki Kumar, Vishal Kumar, Sanjay Chakraborty, Prediction of Protein-Protein interaction as Carcinogenic using Deep Learning Techniques, 2nd International Conference on Intelligent Computing, Information and Control Systems (ICICCS), Springer, pp. 461–475, 2021. Rohan Kumar, Rajat Kumar, Pinki Kumar, Vishal Kumar, Sanjay Chakraborty, Prediction of Protein-Protein interaction as Carcinogenic using Deep Learning Techniques, 2nd International Conference on Intelligent Computing, Information and Control Systems (ICICCS), Springer, pp. 461–475, 2021.
33.
Zurück zum Zitat Sauravjyoti Sarmah, Dhruba K. Bhattacharyya, “An Effective Technique for Clustering Incremental Gene Expression data”, IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 3, No 3, May 2010. Sauravjyoti Sarmah, Dhruba K. Bhattacharyya, “An Effective Technique for Clustering Incremental Gene Expression data”, IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 3, No 3, May 2010.
34.
Zurück zum Zitat Steven Young, Itemer Arel, Thomas P. Karnowski, Derek Rose, University of Tennesee, “A Fast and Stable incremental clustering Algorithm”, TN 37996, 7th International 2010. Steven Young, Itemer Arel, Thomas P. Karnowski, Derek Rose, University of Tennesee, “A Fast and Stable incremental clustering Algorithm”, TN 37996, 7th International 2010.
35.
Zurück zum Zitat Taoying Li and Yan Chen, “Fuzzy K-means Incremental Clustering Based on K-Center and Vector Quantization”, Journal of computers, vol. 5, No. 11, November 2010. Taoying Li and Yan Chen, “Fuzzy K-means Incremental Clustering Based on K-Center and Vector Quantization”, Journal of computers, vol. 5, No. 11, November 2010.
36.
Zurück zum Zitat Tapas Kanungo, David M. Mount, “An Efficient k-Means Clustering Algorithm: Analysis and implementation,” IEEE transaction vol. 24 No. 7, July 2002. Tapas Kanungo, David M. Mount, “An Efficient k-Means Clustering Algorithm: Analysis and implementation,” IEEE transaction vol. 24 No. 7, July 2002.
37.
Zurück zum Zitat Tavallali, P., Tavallali, P., & Singhal, M. (2021). K-means tree: an optimal clustering tree for unsupervised learning. The Journal of Supercomputing, 77(5), 5239–5266.CrossRef Tavallali, P., Tavallali, P., & Singhal, M. (2021). K-means tree: an optimal clustering tree for unsupervised learning. The Journal of Supercomputing, 77(5), 5239–5266.CrossRef
39.
Zurück zum Zitat Xiaoke Su, Yang Lan, Renxia Wan, and Yuming, “A Fast Incremental Clustering Algorithm”, international Symposium on Information Processing (ISIP’09), Huangshan, P.R. China, August-21-23, pp: 175–178, 2009. Xiaoke Su, Yang Lan, Renxia Wan, and Yuming, “A Fast Incremental Clustering Algorithm”, international Symposium on Information Processing (ISIP’09), Huangshan, P.R. China, August-21-23, pp: 175–178, 2009.
40.
Zurück zum Zitat Patra B.K., Ville O., Launonen R., Nandi S., Babu K.S. (2013) Distance based Incremental Clustering for Mining Clusters of Arbitrary Shapes. In: Maji P., Ghosh A., Murty M.N., Ghosh K., Pal S.K. (eds) Pattern Recognition and Machine Intelligence. PReMI 2013. Lecture Notes in Computer Science, vol 8251. Springer, Berlin, Heidelberg. doi:https://doi.org/10.1007/978-3-642-45062-4_31.CrossRef Patra B.K., Ville O., Launonen R., Nandi S., Babu K.S. (2013) Distance based Incremental Clustering for Mining Clusters of Arbitrary Shapes. In: Maji P., Ghosh A., Murty M.N., Ghosh K., Pal S.K. (eds) Pattern Recognition and Machine Intelligence. PReMI 2013. Lecture Notes in Computer Science, vol 8251. Springer, Berlin, Heidelberg. doi:https://​doi.​org/​10.​1007/​978-3-642-45062-4_​31.CrossRef
41.
Zurück zum Zitat Halkidi M., Spiliopoulou M., Pavlou A. (2012) A Semi-supervised Incremental Clustering Algorithm for Streaming Data. In: Tan PN., Chawla S., Ho C.K., Bailey J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2012. Lecture Notes in Computer Science, vol 7301. Springer, Berlin, Heidelberg. doi:https://doi.org/10.1007/978-3-642-30217-6_48.CrossRef Halkidi M., Spiliopoulou M., Pavlou A. (2012) A Semi-supervised Incremental Clustering Algorithm for Streaming Data. In: Tan PN., Chawla S., Ho C.K., Bailey J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2012. Lecture Notes in Computer Science, vol 7301. Springer, Berlin, Heidelberg. doi:https://​doi.​org/​10.​1007/​978-3-642-30217-6_​48.CrossRef
42.
Zurück zum Zitat Zuriana Abu Bakar, Mustafa Mat Deris and Arifah Che Alhadi, “Performance analysis of partitional and incremental clustering”, SNATI, ISBN-979-756-061—6, 2005. Zuriana Abu Bakar, Mustafa Mat Deris and Arifah Che Alhadi, “Performance analysis of partitional and incremental clustering”, SNATI, ISBN-979-756-061—6, 2005.
43.
Zurück zum Zitat Chakraborty S., Nagwani N.K. (2011) Analysis and Study of Incremental K-Means Clustering Algorithm. In: Mantri A., Nandi S., Kumar G., Kumar S. (eds) High Performance Architecture and Grid Computing. HPAGC 2011. Communications in Computer and Information Science, vol 169. Springer, Berlin, Heidelberg. doi:https://doi.org/10.1007/978-3-642-22577-2_46CrossRef Chakraborty S., Nagwani N.K. (2011) Analysis and Study of Incremental K-Means Clustering Algorithm. In: Mantri A., Nandi S., Kumar G., Kumar S. (eds) High Performance Architecture and Grid Computing. HPAGC 2011. Communications in Computer and Information Science, vol 169. Springer, Berlin, Heidelberg. doi:https://​doi.​org/​10.​1007/​978-3-642-22577-2_​46CrossRef
46.
Zurück zum Zitat Joo K.H., Lee W.S. (2005) An Incremental Document Clustering for the Large Document Database. In: Lee G.G., Yamada A., Meng H., Myaeng S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. doi:https://doi.org/10.1007/11562382_29.CrossRef Joo K.H., Lee W.S. (2005) An Incremental Document Clustering for the Large Document Database. In: Lee G.G., Yamada A., Meng H., Myaeng S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. doi:https://​doi.​org/​10.​1007/​11562382_​29.CrossRef
47.
Zurück zum Zitat Yu H., Zhang C., Hu F. (2014) An Incremental Clustering Approach Based on Three-Way Decisions. In: Cornelis C., Kryszkiewicz M., Ślȩzak D., Ruiz E.M., Bello R., Shang L. (eds) Rough Sets and Current Trends in Computing. RSCTC 2014. Lecture Notes in Computer Science, vol 8536. Springer, Cham. doi:https://doi.org/10.1007/978-3-319-08644-6_16.CrossRef Yu H., Zhang C., Hu F. (2014) An Incremental Clustering Approach Based on Three-Way Decisions. In: Cornelis C., Kryszkiewicz M., Ślȩzak D., Ruiz E.M., Bello R., Shang L. (eds) Rough Sets and Current Trends in Computing. RSCTC 2014. Lecture Notes in Computer Science, vol 8536. Springer, Cham. doi:https://​doi.​org/​10.​1007/​978-3-319-08644-6_​16.CrossRef
48.
Metadaten
Titel
Research Intention Towards Incremental Clustering
verfasst von
Sanjay Chakraborty
SK Hafizul Islam
Debabrata Samanta
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-030-93088-2_5

Neuer Inhalt