Skip to main content
Erschienen in:
Buchtitelbild

2022 | OriginalPaper | Buchkapitel

1. Introduction to Data Mining and Knowledge Discovery

verfasst von : Sanjay Chakraborty, SK Hafizul Islam, Debabrata Samanta

Erschienen in: Data Classification and Incremental Clustering in Data Mining and Machine Learning

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Data mining is a process of discovering some necessary hidden patterns from a large chunk of data that can be stored in multiple heterogeneous resources. It has an enormous use to make strategic decisions by business executives after analyzing the hidden truth of data. Data mining one of the steps in the knowledge-creation process. A data mining system consists of a data warehouse, a database server, a data mining engine, a pattern analysis module, and a graphical user interface. Data mining techniques include mining the frequent patterns and association learning rules with analysis, sequence analysis. Data mining technique is applicable on the top of various kinds of intelligent data storage systems such as data warehouses. It provides some analysis processes to make some useful strategic decisions. There are various issues and challenges faced by a data mining system in large databases. It provides a great place to work for data researchers and developers. Data mining is the process of classification, which can be executed based on the examination of training data (i.e., objects whose class label is predefined). With the help of an expert set of previous class objects with known class labels, it can find a model that can predict a class object with an unknown class label. These classification models can be classified into a variety of categories, including nearest neighbor, neural network, and others. Bayesian model, decision tree, neural network Random forest, decision trees Support vector machine, random forest SVM (support vector machine), for example. By analyzing the most common class among k closest samples, the K-Nearest Neighbor (KNN) technique aids in predicting of the class object with the unknown class label. It’s an easy-to-use strategy that yields a solid classification result from any distribution. The Naive Bayes theory helps to perform the classification. It is one of the fastest classification algorithms, capable of efficiently handling real-world discrete data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Eshref Januzaj, Hans-Peter Kriegel, Martin Pfeifle, “Towards Effective and Efficient Distributed Clustering”, Workshop on Clustering Large Data Sets (ICDM2003), Melbourne, FL, 2003. Eshref Januzaj, Hans-Peter Kriegel, Martin Pfeifle, “Towards Effective and Efficient Distributed Clustering”, Workshop on Clustering Large Data Sets (ICDM2003), Melbourne, FL, 2003.
2.
Zurück zum Zitat S.Jiang, X.Song, “A clustering based method for unsupervised intrusion detections” . Pattern Recognition Letters, PP.802-810, 2006. S.Jiang, X.Song, “A clustering based method for unsupervised intrusion detections” . Pattern Recognition Letters, PP.802-810, 2006.
4.
Zurück zum Zitat A.M.Sowjanya, M.Shashi, “Cluster Feature-Based Incremental Clustering Approach (CFICA) For Numerical Data, IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.9, September 2010. A.M.Sowjanya, M.Shashi, “Cluster Feature-Based Incremental Clustering Approach (CFICA) For Numerical Data, IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.9, September 2010.
7.
Zurück zum Zitat Anil Kumar Tiwari, Lokesh Kumar Sharma, G. Rama Krishna, “ Entropy Weighting Genetic k-Means Algorithm for Subspace Clustering ”,International Journal of Computer Applications (0975– 8887),Volume 7– No.7, October 2010. Anil Kumar Tiwari, Lokesh Kumar Sharma, G. Rama Krishna, “ Entropy Weighting Genetic k-Means Algorithm for Subspace Clustering ”,International Journal of Computer Applications (0975– 8887),Volume 7– No.7, October 2010.
8.
Zurück zum Zitat Aristidis Likasa , Nikos Vlassis, Jakob J. Verbeek ,“ The global k-means clustering algorithm ” , the journal of the pattern recognition society, Pattern Recognition36 (2003) 451–461, 2002. Aristidis Likasa , Nikos Vlassis, Jakob J. Verbeek ,“ The global k-means clustering algorithm ” , the journal of the pattern recognition society, Pattern Recognition36 (2003) 451–461, 2002.
9.
Zurück zum Zitat B. Naik, M. S. Obaidat, J. Nayak, D. Pelusi, P. Vijayakumar and S. H. Islam, “Intelligent Secure Ecosystem Based on Metaheuristic and Functional Link Neural Network for Edge of Things,” in IEEE Transactions on Industrial Informatics, vol. 16, no. 3, pp. 1947–1956, March 2020, doi: https://doi.org/10.1109/TII.2019.2920831. B. Naik, M. S. Obaidat, J. Nayak, D. Pelusi, P. Vijayakumar and S. H. Islam, “Intelligent Secure Ecosystem Based on Metaheuristic and Functional Link Neural Network for Edge of Things,” in IEEE Transactions on Industrial Informatics, vol. 16, no. 3, pp. 1947–1956, March 2020, doi: https://​doi.​org/​10.​1109/​TII.​2019.​2920831.
10.
Zurück zum Zitat Carlos Ordonez and Edward Omiecinski, “Efficient Disk-Based K-Means Clustering for Relational Databases”, IEEE transaction on knowledge and Data Engineering,Vol.16,No.8,August 2004. Carlos Ordonez and Edward Omiecinski, “Efficient Disk-Based K-Means Clustering for Relational Databases”, IEEE transaction on knowledge and Data Engineering,Vol.16,No.8,August 2004.
11.
Zurück zum Zitat Carlos Ordonez, “Clustering Binary Data Streams with K-means”, San Diego, CA, USA. Copyright 2003, ACM 1- 58113-763-x, DMKD'03, June 13, 2003. Carlos Ordonez, “Clustering Binary Data Streams with K-means”, San Diego, CA, USA. Copyright 2003, ACM 1- 58113-763-x, DMKD'03, June 13, 2003.
12.
Zurück zum Zitat CHEN Ning , CHEN An, ZHOU Long-xiang, “An Incremental Grid Density-Based Clustering Algorithm”, Journal of Software, Vol.13, No.1,2002. CHEN Ning , CHEN An, ZHOU Long-xiang, “An Incremental Grid Density-Based Clustering Algorithm”, Journal of Software, Vol.13, No.1,2002.
14.
Zurück zum Zitat Data Mining concepts and techniques by Jiawei Han and Micheline Kamber, Morgan Kaufmann (publisher) from chapter-7 ‘cluster analysis’, ISBN:978-1-55860-901-3, 2006. Data Mining concepts and techniques by Jiawei Han and Micheline Kamber, Morgan Kaufmann (publisher) from chapter-7 ‘cluster analysis’, ISBN:978-1-55860-901-3, 2006.
15.
Zurück zum Zitat Debashis Das Chakladar and Sanjay Chakraborty, EEG Based Emotion Classification using Correlation Based Subset Selection, Biologically Inspired Cognitive Architectures (Cognitive Systems Research), Elsevier, 2018. Debashis Das Chakladar and Sanjay Chakraborty, EEG Based Emotion Classification using Correlation Based Subset Selection, Biologically Inspired Cognitive Architectures (Cognitive Systems Research), Elsevier, 2018.
16.
Zurück zum Zitat Dunham, M.H., Data Mining: Introductory And Advanced Topics, New Jersey: Prentice Hall, ISBN-13: 9780130888921. 2003. Dunham, M.H., Data Mining: Introductory And Advanced Topics, New Jersey: Prentice Hall, ISBN-13: 9780130888921. 2003.
17.
Zurück zum Zitat Govender, P., & Sivakumar, V. (2020). Application of k-means and hierarchical clustering techniques for analysis of air pollution: A review (1980–2019). Atmospheric pollution research, 11(1), 40–56. Govender, P., & Sivakumar, V. (2020). Application of k-means and hierarchical clustering techniques for analysis of air pollution: A review (1980–2019). Atmospheric pollution research, 11(1), 40–56.
19.
Zurück zum Zitat H.Witten, Data mining: practical machine learning tools and techniques with Java implementations San-Francisco, California : Morgan Kaufmann,ISBN: 978-0-12-374856-0 2000. H.Witten, Data mining: practical machine learning tools and techniques with Java implementations San-Francisco, California : Morgan Kaufmann,ISBN: 978-0-12-374856-0 2000.
20.
Zurück zum Zitat Jahwar, A. F., & Abdulazeez, A. M. (2020). Meta-heuristic algorithms for k-means clustering: A review. PalArch’s Journal of Archaeology of Egypt/Egyptology, 17(7), 12002–12020. Jahwar, A. F., & Abdulazeez, A. M. (2020). Meta-heuristic algorithms for k-means clustering: A review. PalArch’s Journal of Archaeology of Egypt/Egyptology, 17(7), 12002–12020.
21.
Zurück zum Zitat K. Mumtaz, Dr. K. Duraiswamy, “An Analysis on Density Based Clustering of Multi Dimensional Spatial Data”, Indian Journal of Computer Science and Engineering, Vol. 1 No 1, pp-8–12, ISSN : 0976-5166. K. Mumtaz, Dr. K. Duraiswamy, “An Analysis on Density Based Clustering of Multi Dimensional Spatial Data”, Indian Journal of Computer Science and Engineering, Vol. 1 No 1, pp-8–12, ISSN : 0976-5166.
23.
Zurück zum Zitat Kantardzic, M.Data Mining: concepts, models, method, and algorithms, New Jersey: IEEE press, ISBN: 978-0-471-22852-3, 2003. Kantardzic, M.Data Mining: concepts, models, method, and algorithms, New Jersey: IEEE press, ISBN: 978-0-471-22852-3, 2003.
24.
Zurück zum Zitat Kehar Singh, Dimple Malik and Naveen Sharma, “Evolving limitations in K-means algorithm in data Mining and their removal”, IJCEM International Journal of Computational Engineering & Management, Vol. 12, April 2011. Kehar Singh, Dimple Malik and Naveen Sharma, “Evolving limitations in K-means algorithm in data Mining and their removal”, IJCEM International Journal of Computational Engineering & Management, Vol. 12, April 2011.
25.
Zurück zum Zitat Khamparia, A, Singh, PK, Rani, P, Samanta, D, Khanna, A, Bhushan, B. An internet of health things-driven deep learning framework for detection and classification of skin cancer using transfer learning. Trans Emerging Tel Tech. 2020;e3963. https://doi.org/10.1002/ett.3963 Khamparia, A, Singh, PK, Rani, P, Samanta, D, Khanna, A, Bhushan, B. An internet of health things-driven deep learning framework for detection and classification of skin cancer using transfer learning. Trans Emerging Tel Tech. 2020;e3963. https://​doi.​org/​10.​1002/​ett.​3963
26.
Zurück zum Zitat Long, Z. Z., Xu, G., Du, J., Zhu, H., Yan, T., & Yu, Y. F. (2021). Flexible Subspace Clustering: A Joint Feature Selection and K-Means Clustering Framework. Big Data Research, 23, 100170. Long, Z. Z., Xu, G., Du, J., Zhu, H., Yan, T., & Yu, Y. F. (2021). Flexible Subspace Clustering: A Joint Feature Selection and K-Means Clustering Framework. Big Data Research, 23, 100170.
27.
Zurück zum Zitat Lopamudra Dey, Sanjay Chakraborty, Anirban Mukhopadhyay. Machine Learning Techniques for Sequence-based Prediction of Viral-Host Interactions between SARS-CoV-2 and Human Proteins. Biomedical Journal, Elsevier, 2020. Lopamudra Dey, Sanjay Chakraborty, Anirban Mukhopadhyay. Machine Learning Techniques for Sequence-based Prediction of Viral-Host Interactions between SARS-CoV-2 and Human Proteins. Biomedical Journal, Elsevier, 2020.
28.
Zurück zum Zitat Martin Ester, Hans-Peter Kriegel, Jorg Sander, Michael Wimmer, Xiaowei Xu, “Incremental clustering for mining in a data ware housing”, 24th VLDB Conference New York, USA, 1998. Martin Ester, Hans-Peter Kriegel, Jorg Sander, Michael Wimmer, Xiaowei Xu, “Incremental clustering for mining in a data ware housing”, 24th VLDB Conference New York, USA, 1998.
29.
Zurück zum Zitat Michael K. Ng, Mark Junjie Li, Joshua Zhexue Huang, and Zengyou He, “ On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm ”, IEEE transaction on pattern analysis and machine intelligence, vol.29, No. 3, March 2007. Michael K. Ng, Mark Junjie Li, Joshua Zhexue Huang, and Zengyou He, “ On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm ”, IEEE transaction on pattern analysis and machine intelligence, vol.29, No. 3, March 2007.
30.
Zurück zum Zitat Naresh kumar Nagwani and Ashok Bhansali, “An Object Oriented Email Clustering Model Using Weighted Similarities between Emails Attributes”, International Journal of Research and Reviews in Computer science (IJRRCS), Vol. 1, No. 2, June 2010. Naresh kumar Nagwani and Ashok Bhansali, “An Object Oriented Email Clustering Model Using Weighted Similarities between Emails Attributes”, International Journal of Research and Reviews in Computer science (IJRRCS), Vol. 1, No. 2, June 2010.
31.
Zurück zum Zitat Oyelade, O.J, Oladipupo, O. O, Obagbuwa, I. C, “Application of k-means Clustering algorithm for prediction of Students’ Academic Performance”,(IJCSIS) International Journal of Computer Science and Information security,Vol.7,No. 1, 2010. Oyelade, O.J, Oladipupo, O. O, Obagbuwa, I. C, “Application of k-means Clustering algorithm for prediction of Students’ Academic Performance”,(IJCSIS) International Journal of Computer Science and Information security,Vol.7,No. 1, 2010.
32.
Zurück zum Zitat Rohan Kumar, Rajat Kumar, Pinki Kumar, Vishal Kumar, Sanjay Chakraborty, Prediction of Protein-Protein interaction as Carcinogenic using Deep Learning Techniques, 2nd International Conference on Intelligent Computing, Information and Control Systems (ICICCS), Springer, pp. 461–475, 2021. Rohan Kumar, Rajat Kumar, Pinki Kumar, Vishal Kumar, Sanjay Chakraborty, Prediction of Protein-Protein interaction as Carcinogenic using Deep Learning Techniques, 2nd International Conference on Intelligent Computing, Information and Control Systems (ICICCS), Springer, pp. 461–475, 2021.
33.
Zurück zum Zitat Sauravjyoti Sarmah, Dhruba K. Bhattacharyya,“ An Effective Technique for Clustering Incremental Gene Expression data” , IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 3, No 3, May 2010. Sauravjyoti Sarmah, Dhruba K. Bhattacharyya,“ An Effective Technique for Clustering Incremental Gene Expression data” , IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 3, No 3, May 2010.
34.
Zurück zum Zitat Steven Young, Itemer Arel, Thomas P. Karnowski,Derek Rose, University of Tennessee, “A Fast and Stable incremental clustering Algorithm”, TN 37996, 7th International 2010. Steven Young, Itemer Arel, Thomas P. Karnowski,Derek Rose, University of Tennessee, “A Fast and Stable incremental clustering Algorithm”, TN 37996, 7th International 2010.
35.
Zurück zum Zitat Taoying Li and Yan Chen, “Fuzzy K-means Incremental Clustering Based on K-Center and Vector Quantization”, Journal of computers, vol. 5, No.11, November 2010. Taoying Li and Yan Chen, “Fuzzy K-means Incremental Clustering Based on K-Center and Vector Quantization”, Journal of computers, vol. 5, No.11, November 2010.
36.
Zurück zum Zitat Tapas Kanungo , David M. Mount , “An Efficient k-Means Clustering Algorithm: Analysis and implementation IEEE transaction vol. 24 No. 7, July 2002. Tapas Kanungo , David M. Mount , “An Efficient k-Means Clustering Algorithm: Analysis and implementation IEEE transaction vol. 24 No. 7, July 2002.
37.
Zurück zum Zitat Tavallali, P., Tavallali, P., & Singhal, M. (2021). K-means tree: an optimal clustering tree for unsupervised learning. The Journal of Supercomputing, 77(5), 5239–5266.CrossRef Tavallali, P., Tavallali, P., & Singhal, M. (2021). K-means tree: an optimal clustering tree for unsupervised learning. The Journal of Supercomputing, 77(5), 5239–5266.CrossRef
39.
Zurück zum Zitat Xiaoke Su, Yang Lan, Renxia Wan, and Yuming, “A Fast Incremental Clustering Algorithm ”, international Symposium on Information Processing (ISIP’09), Huangshan, P.R.China, August-21–23,pp:175–178,2009. Xiaoke Su, Yang Lan, Renxia Wan, and Yuming, “A Fast Incremental Clustering Algorithm ”, international Symposium on Information Processing (ISIP’09), Huangshan, P.R.China, August-21–23,pp:175–178,2009.
40.
Zurück zum Zitat Zuriana Abu Bakar, Mustafa Mat Deris and Arifah Che Alhadi, “Performance analysis of partitional and incremental clustering”, SNATI, ISBN-979-756-061-6, 2005. Zuriana Abu Bakar, Mustafa Mat Deris and Arifah Che Alhadi, “Performance analysis of partitional and incremental clustering”, SNATI, ISBN-979-756-061-6, 2005.
Metadaten
Titel
Introduction to Data Mining and Knowledge Discovery
verfasst von
Sanjay Chakraborty
SK Hafizul Islam
Debabrata Samanta
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-030-93088-2_1

Neuer Inhalt