nach oben

Cluster Computing

Erschienen in:

04.09.2017

A hybrid approach for mismatch data reduction in datasets and guide data mining

verfasst von: R. Dhanalakshmi, T. Sethukarasi

Erschienen in: Cluster Computing | Sonderheft 5/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

An outlier is a set of data that distinctly differ from rest of the data in a dataset defined as normal. Detection of outlier is an active area of research in data mining. If clustering methods are used, the elements that are lying outside the clusters are focused and detected as outliers. But it is not true few unknown elements will become a part of the cluster. So to ignore the irrelevant data completely from the data set, it becomes necessary to identify and eliminate these data merged with the clusters. An efficient hybrid approach is proposed to reduce the number of outliers. Two algorithms namely multilayer neural networks (MLN) and weighted-K means adopted for datamining are employed in proposed approach to identify outliers in a data group. This approach guides and results in better cluster formation. Each element in the dataset provided as input to MLN after assigning weights by weighted K-means. MLN is trained to reproduce the normal input data (inliers) and ensures that groups formed by weighted K-means are consisting of inliers only. Among the outlier detection methods presented in literature for outlier detection in data mining, the proposed method is based on Integrating Semantic Knowledge. This method relates the data point is an outlier by identifying the behaviour of the data elements that differ from other data elements belonging to the same cluster or class. The principle intention of this research work is to reduce the amount of outliers by enhancing the performance of clustering or classification techniques that guides to improve accuracy and reduce the mean square error. The test results provides evident to supremacy of the proposed strategy in reducing the outlier.

Vorheriger Artikel Towards mobile cloud authentication and gait based security using time warping technique

Nächster Artikel Intrusion detection of distributed denial of service attack in cloud

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Han, J., Kamber, M.: Data Mining—Concepts & Techniques. Morgan Kaufmann Publishers, Academic Press (2001)

Sankar Rajagopal, D.R.: Customer data clustering using data mining technique. Int. J. Database Manag. Syst. 3(4) (2011)

Yabing, J.: Research of an improved apriori algorithm in data mining association rule. Int. J. Comput. Commun. Eng. 2(1), 25 (2013)CrossRef

Fournier-Viger, P., Lin, J.C.W., Kiran, R.U., Koh, Y.S., Thomas, R.: A survey of sequential pattern mining. Data Sci. Pattern Recognit. Ubiquitous Int. 1(1) (2017)

Lin, L., Ye, J., Deng, F., Xiong, S., Zhong, L.: A comparison study of clustering algorithms for microblog posts. Cluster Comput. 19(3), 1333–1345 (2016)CrossRef

Kamila, N.K., Jena, L., Bhuyan, H.K.: Pareto-based multi-objective optimization for classification in data mining. Cluster Comput. 19(4), 1723–1745 (2016)CrossRef

Wang, J., Su, X.: An improved K-means clustering algorithm. In: 2011 IEEE 3rd International Conference on Communication Software and Networks, Xi’an, pp. 44–46 (2011)

Fawcett, T., Provost, F.: Adaptive fraud detection. Data Min. Knowl. Discov. J. 1(3), 291–316 (1997)CrossRef

DuMouchel, W., Schonlau, M.: A fast computer intrusion detection algorithm based on hypothesis testing of command transition probabilities. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, pp. 189–193 (1998)

10.

Williams, G., Huang, Z.: Advanced topics in artificial intelligence. In: Sattar, A. (ed.) Mining the Knowledge Mine: The Hot Spots Methodology for Mining Large Real World Databases. Lecture Notes in Artificial Intelligence, vol. 1342, pp. 340–348. Springer, Berlin (1997)

11.

Yamanishi, K., Takeuchi, J., Williams, G., Milne, P.: On-line unsupervised outlier detection using finite mixtures with discounting learning algorithm. In: Proceedings of KDD2000, pp. 320–324 (2000)

12.

Breunig, M., Kriegel, H., Ng, R., Sander, J.: Lof: identifying density-based local outliers. In: Proceedings of ACM SIGMOD, International Conference on Management of Data (2000)

13.

Ramaswamy, S., Rastogi, R., Shim K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of International Conference on Management of Data, ACM-SIGMOD, Dallas (2000)

14.

Knorr, E., Ng, R.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of 24rd International Conference on Very Large Data Bases (VLDB), pp. 392–403 (1998)

15.

Atkinson, A.C.: Fast very robust methods for the detection of multiple outliers. J. Am. Stat. Assoc. 89, 1329–1339 (1994)CrossRef

16.

Kosinksi, A.S.: A procedure for the detection of multivariate outliers. Comput. Stat. Data Anal. 29 (1999)

17.

Knorr, E., Ng, R.: A unified approach for mining outliers. In: Proceedings of KDD, pp. 219–222 (1997)

18.

Knorr, E., Ng, R.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of 24th International Conference on Very Large Data Bases, VLDB, pp. 392– 403, 24–27 (1998)

19.

Huang, J.Z., et al.: Automated variable weighting in k-means type clustering. IEEE Trans. Pattern Anal. Mach. Intell. 27(5), 657–668 (2005)CrossRef

20.

Chan, E.Y., et al.: An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recognit. 37(5), 943–952 (2004)CrossRef

21.

Huang, J.Z., et al.: Weighting method for feature selection in K-means. In: Computational Methods of Feature Selection , pp. 193–210 (2008)

22.

de Amorim, R.C., Mirkin, B.: Minkowski metric, feature weighting and anomalous cluster initializing in K-means clustering. Pattern Recognit. 45(3), 1061–1075 (2012)CrossRef

23.

Hung, E., Cheung, D.W.: Parallel mining of outliers in large database. Distrib. Parallel Databases 12(1), 5–26 (2002)CrossRef

24.

Lozano, E., Acuna, E.: Parallel algorithms for distance-based and density-based outliers. In: Proceedings of Fifth IEEE International Conference on Data Mining (ICDM), pp. 729–732 (2005)

25.

Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (2003)

26.

Ng, R., Han, J.: Efficient and effective clustering methods for spatial data mining. In: Proceedings of 20th VLDB, pp. 144–155 (1994)

27.

Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of KDD, pp. 226–231 (1999)

28.

Zhang, T., Ramakrishnan, R., Livny M.: An efficient data clustering method for very large databases. In: Proceedings of ACM SIGMOD, pp. 103–114 (1996)

29.

Kollios, G., Gunopoulos, D., Koudas, N., Berchtold, S.: An efficient approximation scheme for data mining tasks. In: ICDE (2001)

30.

Bartkowiak, A., Szustalewicz, A.: Detecting multivariate outliers by a grand tour. Mach. Graph. Vis. 6(4), 487–505 (1997)

31.

Williams, G., Altas, I., Bakin, S., Christen, P., Hegland, Markus, Marquez, Alonso, Milne, Peter, Nagappan, Rajehndra, Roberts, Stephen: Large-scale parallel data mining, LNAI state-of-the art survey. In: Zaki, M.J., Ho, C.-T. (eds.) The Integrated Delivery of Large-Scale Data Mining: The ACSys Data Mining Project, pp. 24–54. Springer, Berlin (2000)

32.

Swayne, D.F., Cook, D., Buja A.: XGobi: interactive dynamic graphics in the X window system with a link to S. In: Proceedings of the ASA Section on Statistical Graphics, pp. 1–8, Alexandria, VA. American Statistical Association (1991)

33.

Sykacek, P.: Equivalent error bars for neural network classifiers trained by Bayesian inference. In: Proceedings of ESANN (1997)

34.

Ackley, D.H., Hinton, G.E., Sejinowski, T.J.: A learning algorithm for boltzmann machines. Cognit. Sci. 9, 147–169 (1985)CrossRef

35.

Hecht-Nielsen, R.: Replicator neural networks for universal optimal source coding. Science 269, 1860–1863 (1995)CrossRef

36.

Hampel, F.R.: The influence curve and its role in robust estimation. J. Am. Stat. Assoc. 69, 383–393 (1974)MathSciNetCrossRef

37.

Hawkins, S., He, H., Williams, G.J., Baxter, R.A.: DaWaK 2002. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) Outlier detection using replicator neural networks. LNCS, vol. 2454, pp. 170–180. Springer, Heidelberg (2002)MATH

38.

Zhao, X., Liang, J., Cao, F.: A simple and effective outlier detection algorithm for categorical data. Int. J. Mach. Learn. Cyber. 5, 469–477 (2014)

39.

Zengyou, H., Shengchun, D., Xiaofei, X., Huang, J.Z.: A fast greedy algorithm for outlier mining. Applications of Evolutionary Computing. In: Proceedings of the EvoWorkshops 2006: EvoBIO, EvoCOMNET, EvoHOT EvoIASP, EvoINTERACTION, EvoMUSART, and EvoSTOC. LNCS, vol. 3907, pp. 567–576 (2006)

40.

Zhang, W., Wu, J., Yu, J.: An improved method of outlier detection based on frequent pattern. In: Proceeding of WASE International Conference on Information Engineering (2010)

41.

Otey, M.E., Ghoting, A., Parthasarathy, A.: Fast distributed outlier detection in mixed-attribute data sets. Data Min. Knowl. Discov. (2006)

Titel: A hybrid approach for mismatch data reduction in datasets and guide data mining
verfasst von: R. Dhanalakshmi
T. Sethukarasi
Publikationsdatum: 04.09.2017
Verlag: Springer US
Erschienen in: Cluster Computing / Ausgabe Sonderheft 5/2019
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI: https://doi.org/10.1007/s10586-017-1137-4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Sonderheft 5/2019

Energy aware and fast authentication scheme using identity based encryption in wireless sensor networks

Optimal test suite selection in regression testing with testcase prioritization using modified Ann and Whale optimization algorithm

Predicting user preferences on changing trends and innovations using SVM based sentiment analysis

HHSRP: a cluster based hybrid hierarchical secure routing protocol for wireless sensor networks

Selection and optimization of cooperative advertising strategies in supply chain based on stackelberg game method

Sentiment classification and computing for online reviews by a hybrid SVM and LSA based approach