nach oben

Erschienen in:

2022 | OriginalPaper | Buchkapitel

1. Introduction to Data Mining and Knowledge Discovery

verfasst von : Sanjay Chakraborty, SK Hafizul Islam, Debabrata Samanta

Erschienen in: Data Classification and Incremental Clustering in Data Mining and Machine Learning

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Data mining is a process of discovering some necessary hidden patterns from a large chunk of data that can be stored in multiple heterogeneous resources. It has an enormous use to make strategic decisions by business executives after analyzing the hidden truth of data. Data mining one of the steps in the knowledge-creation process. A data mining system consists of a data warehouse, a database server, a data mining engine, a pattern analysis module, and a graphical user interface. Data mining techniques include mining the frequent patterns and association learning rules with analysis, sequence analysis. Data mining technique is applicable on the top of various kinds of intelligent data storage systems such as data warehouses. It provides some analysis processes to make some useful strategic decisions. There are various issues and challenges faced by a data mining system in large databases. It provides a great place to work for data researchers and developers. Data mining is the process of classification, which can be executed based on the examination of training data (i.e., objects whose class label is predefined). With the help of an expert set of previous class objects with known class labels, it can find a model that can predict a class object with an unknown class label. These classification models can be classified into a variety of categories, including nearest neighbor, neural network, and others. Bayesian model, decision tree, neural network Random forest, decision trees Support vector machine, random forest SVM (support vector machine), for example. By analyzing the most common class among k closest samples, the K-Nearest Neighbor (KNN) technique aids in predicting of the class object with the unknown class label. It’s an easy-to-use strategy that yields a solid classification result from any distribution. The Naive Bayes theory helps to perform the classification. It is one of the fastest classification algorithms, capable of efficiently handling real-world discrete data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nächstes Kapitel A Brief Concept on Machine Learning

Eshref Januzaj, Hans-Peter Kriegel, Martin Pfeifle, “Towards Effective and Efficient Distributed Clustering”, Workshop on Clustering Large Data Sets (ICDM2003), Melbourne, FL, 2003.

S.Jiang, X.Song, “A clustering based method for unsupervised intrusion detections” . Pattern Recognition Letters, PP.802-810, 2006.

Guha A., D. Samanta, A. Banerjee and D. Agarwal, “A Deep Learning Model for Information Loss Prevention From Multi-Page Digital Documents,” in IEEE Access, vol. 9, pp. 80451–80465, 2021, doi: https://doi.org/10.1109/ACCESS.2021.3084841.

A.M.Sowjanya, M.Shashi, “Cluster Feature-Based Incremental Clustering Approach (CFICA) For Numerical Data, IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.9, September 2010.

Air-pollution database, WBPCB, URL: ‘http://www.wbpcb.gov.in/html/airqualitynxt.php’.

Althar, R.R., Samanta, D. The realist approach for evaluation of computational intelligence in software engineering. Innovations Syst Softw Eng 17, 17–27 (2021). https://doi.org/10.1007/s11334-020-00383-2.

Anil Kumar Tiwari, Lokesh Kumar Sharma, G. Rama Krishna, “ Entropy Weighting Genetic k-Means Algorithm for Subspace Clustering ”,International Journal of Computer Applications (0975– 8887),Volume 7– No.7, October 2010.

Aristidis Likasa , Nikos Vlassis, Jakob J. Verbeek ,“ The global k-means clustering algorithm ” , the journal of the pattern recognition society, Pattern Recognition36 (2003) 451–461, 2002.

B. Naik, M. S. Obaidat, J. Nayak, D. Pelusi, P. Vijayakumar and S. H. Islam, “Intelligent Secure Ecosystem Based on Metaheuristic and Functional Link Neural Network for Edge of Things,” in IEEE Transactions on Industrial Informatics, vol. 16, no. 3, pp. 1947–1956, March 2020, doi: https://doi.org/10.1109/TII.2019.2920831.

10.

Carlos Ordonez and Edward Omiecinski, “Efficient Disk-Based K-Means Clustering for Relational Databases”, IEEE transaction on knowledge and Data Engineering,Vol.16,No.8,August 2004.

11.

12.

CHEN Ning , CHEN An, ZHOU Long-xiang, “An Incremental Grid Density-Based Clustering Algorithm”, Journal of Software, Vol.13, No.1,2002.

13.

D. Samanta et al., “Cipher Block Chaining Support Vector Machine for Secured Decentralized Cloud Enabled Intelligent IoT Architecture,” in IEEE Access, vol. 9, pp. 98013–98025, 2021, doi: https://doi.org/10.1109/ACCESS.2021.3095297.

14.

Data Mining concepts and techniques by Jiawei Han and Micheline Kamber, Morgan Kaufmann (publisher) from chapter-7 ‘cluster analysis’, ISBN:978-1-55860-901-3, 2006.

15.

Debashis Das Chakladar and Sanjay Chakraborty, EEG Based Emotion Classification using Correlation Based Subset Selection, Biologically Inspired Cognitive Architectures (Cognitive Systems Research), Elsevier, 2018.

16.

Dunham, M.H., Data Mining: Introductory And Advanced Topics, New Jersey: Prentice Hall, ISBN-13: 9780130888921. 2003.

17.

Govender, P., & Sivakumar, V. (2020). Application of k-means and hierarchical clustering techniques for analysis of air pollution: A review (1980–2019). Atmospheric pollution research, 11(1), 40–56.

18.

Guha, A., Samanta, D. Hybrid Approach to Document Anomaly Detection: An Application to Facilitate RPA in Title Insurance. Int. J. Autom. Comput. 18, 55–72 (2021). https://doi.org/10.1007/s11633-020-1247-y

19.

H.Witten, Data mining: practical machine learning tools and techniques with Java implementations San-Francisco, California : Morgan Kaufmann,ISBN: 978-0-12-374856-0 2000.

20.

Jahwar, A. F., & Abdulazeez, A. M. (2020). Meta-heuristic algorithms for k-means clustering: A review. PalArch’s Journal of Archaeology of Egypt/Egyptology, 17(7), 12002–12020.

21.

K. Mumtaz, Dr. K. Duraiswamy, “An Analysis on Density Based Clustering of Multi Dimensional Spatial Data”, Indian Journal of Computer Science and Engineering, Vol. 1 No 1, pp-8–12, ISSN : 0976-5166.

22.

K. Wang et al., "A Trusted Consensus Scheme for Collaborative Learning in the Edge AI Computing Domain," in IEEE Network, vol. 35, no. 1, pp. 204-210, January/February 2021, doi: https://doi.org/10.1109/MNET.011.2000249.

23.

Kantardzic, M.Data Mining: concepts, models, method, and algorithms, New Jersey: IEEE press, ISBN: 978-0-471-22852-3, 2003.

24.

Kehar Singh, Dimple Malik and Naveen Sharma, “Evolving limitations in K-means algorithm in data Mining and their removal”, IJCEM International Journal of Computational Engineering & Management, Vol. 12, April 2011.

25.

Khamparia, A, Singh, PK, Rani, P, Samanta, D, Khanna, A, Bhushan, B. An internet of health things-driven deep learning framework for detection and classification of skin cancer using transfer learning. Trans Emerging Tel Tech. 2020;e3963. https://doi.org/10.1002/ett.3963

26.

Long, Z. Z., Xu, G., Du, J., Zhu, H., Yan, T., & Yu, Y. F. (2021). Flexible Subspace Clustering: A Joint Feature Selection and K-Means Clustering Framework. Big Data Research, 23, 100170.

27.

Lopamudra Dey, Sanjay Chakraborty, Anirban Mukhopadhyay. Machine Learning Techniques for Sequence-based Prediction of Viral-Host Interactions between SARS-CoV-2 and Human Proteins. Biomedical Journal, Elsevier, 2020.

28.

Martin Ester, Hans-Peter Kriegel, Jorg Sander, Michael Wimmer, Xiaowei Xu, “Incremental clustering for mining in a data ware housing”, 24th VLDB Conference New York, USA, 1998.

29.

Michael K. Ng, Mark Junjie Li, Joshua Zhexue Huang, and Zengyou He, “ On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm ”, IEEE transaction on pattern analysis and machine intelligence, vol.29, No. 3, March 2007.

30.

Naresh kumar Nagwani and Ashok Bhansali, “An Object Oriented Email Clustering Model Using Weighted Similarities between Emails Attributes”, International Journal of Research and Reviews in Computer science (IJRRCS), Vol. 1, No. 2, June 2010.

31.

Oyelade, O.J, Oladipupo, O. O, Obagbuwa, I. C, “Application of k-means Clustering algorithm for prediction of Students’ Academic Performance”,(IJCSIS) International Journal of Computer Science and Information security,Vol.7,No. 1, 2010.

32.

Rohan Kumar, Rajat Kumar, Pinki Kumar, Vishal Kumar, Sanjay Chakraborty, Prediction of Protein-Protein interaction as Carcinogenic using Deep Learning Techniques, 2nd International Conference on Intelligent Computing, Information and Control Systems (ICICCS), Springer, pp. 461–475, 2021.

33.

Sauravjyoti Sarmah, Dhruba K. Bhattacharyya,“ An Effective Technique for Clustering Incremental Gene Expression data” , IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 3, No 3, May 2010.

34.

Steven Young, Itemer Arel, Thomas P. Karnowski,Derek Rose, University of Tennessee, “A Fast and Stable incremental clustering Algorithm”, TN 37996, 7th International 2010.

35.

Taoying Li and Yan Chen, “Fuzzy K-means Incremental Clustering Based on K-Center and Vector Quantization”, Journal of computers, vol. 5, No.11, November 2010.

36.

Tapas Kanungo , David M. Mount , “An Efficient k-Means Clustering Algorithm: Analysis and implementation IEEE transaction vol. 24 No. 7, July 2002.

37.

Tavallali, P., Tavallali, P., & Singhal, M. (2021). K-means tree: an optimal clustering tree for unsupervised learning. The Journal of Supercomputing, 77(5), 5239–5266.CrossRef

38.

Weka, Waikato environment for knowledge environment – http://www.cs.waikato.ac.nz/ml/weka/.

39.

Xiaoke Su, Yang Lan, Renxia Wan, and Yuming, “A Fast Incremental Clustering Algorithm ”, international Symposium on Information Processing (ISIP’09), Huangshan, P.R.China, August-21–23,pp:175–178,2009.

40.

Zuriana Abu Bakar, Mustafa Mat Deris and Arifah Che Alhadi, “Performance analysis of partitional and incremental clustering”, SNATI, ISBN-979-756-061-6, 2005.

Titel: Introduction to Data Mining and Knowledge Discovery
verfasst von: Sanjay Chakraborty
SK Hafizul Islam
Debabrata Samanta
Verlag: Springer International Publishing
Buch: Data Classification and Incremental Clustering in Data Mining and Machine Learning
Print ISBN: 978-3-030-93087-5

Electronic ISBN: 978-3-030-93088-2

Copyright-Jahr: 2022
DOI: https://doi.org/10.1007/978-3-030-93088-2_1

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Kryptowährungen/© gopixa / Getty Images / iStock, MG4 aus China auf dem Prüfstand im ADAC-Technik-Zentrum in Landsberg am Lech/© ADAC e.V., Chassis eines Elektrofahrzeugs/© chesky / stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.