Skip to main content
Top
Published in: International Journal of Machine Learning and Cybernetics 3/2022

25-04-2021 | Original Article

Data reduction based on NN-kNN measure for NN classification and regression

Authors: Shuang An, Qinghua Hu, Changzhong Wang, Ge Guo, Piyu Li

Published in: International Journal of Machine Learning and Cybernetics | Issue 3/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Data reduction processes are designed not only to reduce the amount of data, but also to reduce noise interference. In this study, we focus on researching sample reduction algorithms for the classification and regression data. A sample quality evaluation measure denoted by NN-kNN, which is inspired by human social behavior, is proposed. This measure is a local evaluation method that can accurately evaluate the quality of samples under uneven and irregular data distribution. Additionally, the measure is easy to understand and applies to both supervised and unsupervised data. Consequently, it respectively studies the sample reduction algorithms based on the NN-kNN measure for classification and regression data. Experiments are carried out to verify the proposed quality evaluation measure and data reduction algorithms. Experimental results show that NN-kNN can evaluate data quality effectively. High quality samples selected by the reduction algorithms can generate high classification and prediction performance. Furthermore, the robustness of the sample reduction algorithms is also validated.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Literature
1.
go back to reference Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66 Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66
2.
go back to reference An S, Hu QH, Pedrycz W, Zhu PF, Tsang Eric CC (2016) Data-distribution-aware fuzzy rough set model and its application to robust classification. IEEE Trans Cybern 46(12):3073–3085 An S, Hu QH, Pedrycz W, Zhu PF, Tsang Eric CC (2016) Data-distribution-aware fuzzy rough set model and its application to robust classification. IEEE Trans Cybern 46(12):3073–3085
3.
go back to reference Bai W, Wang XT, Xin JC, Wang GR (2016) Efficient algorithm for distributed density-based outlier detection on big data. Neurocomputing 181:19–28CrossRef Bai W, Wang XT, Xin JC, Wang GR (2016) Efficient algorithm for distributed density-based outlier detection on big data. Neurocomputing 181:19–28CrossRef
4.
go back to reference Breunig MM, Kriegel H-P, Ng RT, Sander J (1999) Optics-of: identifying local outliers. Principles of Data Mining and Knowledge Discovery, Lecture Notes in Computer Science 1704:262–270 Breunig MM, Kriegel H-P, Ng RT, Sander J (1999) Optics-of: identifying local outliers. Principles of Data Mining and Knowledge Discovery, Lecture Notes in Computer Science 1704:262–270
5.
go back to reference Chen YX, Dang X, Peng HX, Bart H (2009) Outlier detection with the kernelized spatial depth function. Artif Intell Rev 31(2):288–305 Chen YX, Dang X, Peng HX, Bart H (2009) Outlier detection with the kernelized spatial depth function. Artif Intell Rev 31(2):288–305
6.
go back to reference Dai JH, Hu QH,Zhang JH (2017) Attribute selection for partially labeled categorical data by rough set approach. IEEE Trans Cybern 47(9)(SI):2460-2471 Dai JH, Hu QH,Zhang JH (2017) Attribute selection for partially labeled categorical data by rough set approach. IEEE Trans Cybern 47(9)(SI):2460-2471
8.
go back to reference Ding WP, Lin CT, Witold P (2020) Multiple relevant feature ensemble selection based on multilayer co-evolutionary consensus mapreduce. IEEE Trans Cybern 50(2):425–439CrossRef Ding WP, Lin CT, Witold P (2020) Multiple relevant feature ensemble selection based on multilayer co-evolutionary consensus mapreduce. IEEE Trans Cybern 50(2):425–439CrossRef
9.
go back to reference Dua D, Graff C (2019) UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science Dua D, Graff C (2019) UCI Machine Learning Repository [http://​archive.​ics.​uci.​edu/​ml]. Irvine, CA: University of California, School of Information and Computer Science
10.
go back to reference Frumosu FD, Kulahci M (2019) Outliers detection using an iterative strategy for semi-supervised learning. Qual Reliab Eng Int 35(5):1408–1423CrossRef Frumosu FD, Kulahci M (2019) Outliers detection using an iterative strategy for semi-supervised learning. Qual Reliab Eng Int 35(5):1408–1423CrossRef
11.
go back to reference Gao JH, Ji WX, Zhang LL (2020) Cube-based incremental outlier detection for streaming computing. Inf Sci 517:361–376CrossRef Gao JH, Ji WX, Zhang LL (2020) Cube-based incremental outlier detection for streaming computing. Inf Sci 517:361–376CrossRef
12.
go back to reference Garcia S, Derrac J, Cano JR, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435CrossRef Garcia S, Derrac J, Cano JR, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435CrossRef
13.
go back to reference Hautamaki V, Karkkainen I, Franti P (2001) Outlier detection using k-nearest neighbour graph. IEEE Comput Soc 3:430–433 Hautamaki V, Karkkainen I, Franti P (2001) Outlier detection using k-nearest neighbour graph. IEEE Comput Soc 3:430–433
14.
go back to reference He Z, Xu X, Deng S (2003) Discovering cluster-based local outliers. Pattern Recogn Lett 24(9–10):1641–1650CrossRef He Z, Xu X, Deng S (2003) Discovering cluster-based local outliers. Pattern Recogn Lett 24(9–10):1641–1650CrossRef
15.
go back to reference Knorr EM, Ng RT, Tucakov V (2000) Distance-based Outliers: algorithms and applications. VLDB J 8(3–4):237–253CrossRef Knorr EM, Ng RT, Tucakov V (2000) Distance-based Outliers: algorithms and applications. VLDB J 8(3–4):237–253CrossRef
16.
go back to reference Krzysztof M, Witold R (2020) All-relevant feature selection using multidimensional filters with exhaustive search. Inf Sci 524:277–297MathSciNetCrossRef Krzysztof M, Witold R (2020) All-relevant feature selection using multidimensional filters with exhaustive search. Inf Sci 524:277–297MathSciNetCrossRef
17.
go back to reference Li XJ, Lv JC, Yi Z (2020) Outlier detection using structural scores in a high-dimensional space. IEEE Trans Cybern 50(5):2302–2310CrossRef Li XJ, Lv JC, Yi Z (2020) Outlier detection using structural scores in a high-dimensional space. IEEE Trans Cybern 50(5):2302–2310CrossRef
18.
go back to reference Liu HW, Li XL, Li JY, Zhang SC (2018) Efficient outlier detection for high-dimensional data. IEEE Trans Syst Man Cybern-Syst 48(12):2451–2461CrossRef Liu HW, Li XL, Li JY, Zhang SC (2018) Efficient outlier detection for high-dimensional data. IEEE Trans Syst Man Cybern-Syst 48(12):2451–2461CrossRef
19.
go back to reference Mei BS, Xu YT (2020) Safe sample screening for regularized multi-task learning. Knowl-Based Syst 204:106–248CrossRef Mei BS, Xu YT (2020) Safe sample screening for regularized multi-task learning. Knowl-Based Syst 204:106–248CrossRef
20.
go back to reference Otey ME, Ghoting A, Parthasarathy S (2006) Fast distributed outlier detection in mixed-attribute data sets. Data Min Knowl Discov 12(2–3):203–228MathSciNetCrossRef Otey ME, Ghoting A, Parthasarathy S (2006) Fast distributed outlier detection in mixed-attribute data sets. Data Min Knowl Discov 12(2–3):203–228MathSciNetCrossRef
21.
go back to reference Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106 Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
22.
go back to reference Ramaswamy S, Rastogi R, Shim K (2000) Effecient algorithms for mining outliers from large data sets. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data 29:427-438 Ramaswamy S, Rastogi R, Shim K (2000) Effecient algorithms for mining outliers from large data sets. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data 29:427-438
25.
go back to reference Tan AH, Wu W-Z, Qian YH, Liang JY, Chen JK, Li JJ (2019) Intuitionistic fuzzy rough set-based granular structures and attribute subset selection. IEEE Trans Fuzzy Syst 27(3):527–539CrossRef Tan AH, Wu W-Z, Qian YH, Liang JY, Chen JK, Li JJ (2019) Intuitionistic fuzzy rough set-based granular structures and attribute subset selection. IEEE Trans Fuzzy Syst 27(3):527–539CrossRef
26.
go back to reference Tang B, He HB (2017) A local density-based approach for outlier detection. Neurocomputing 241:171–180CrossRef Tang B, He HB (2017) A local density-based approach for outlier detection. Neurocomputing 241:171–180CrossRef
27.
go back to reference Verbiest N, Cornelis C, Herrera F (2013) FRPS: A fuzzy rough prototype selection method. Pattern Recogn 46:2770–2782CrossRef Verbiest N, Cornelis C, Herrera F (2013) FRPS: A fuzzy rough prototype selection method. Pattern Recogn 46:2770–2782CrossRef
28.
go back to reference Wang CZ, Qi YL, Shao MW, Hu QH, Chen DG, Qian YH, Lin YJ (2017) A fitting model for feature selection with fuzzy rough sets. IEEE Trans Fuzzy Syst 25(4):741–753CrossRef Wang CZ, Qi YL, Shao MW, Hu QH, Chen DG, Qian YH, Lin YJ (2017) A fitting model for feature selection with fuzzy rough sets. IEEE Trans Fuzzy Syst 25(4):741–753CrossRef
29.
go back to reference Wang CZ, Huang Y, Shao MW, Hu QH, Chen DG (2019) Feature selection based on neighborhood self-Information. IEEE Trans Cybern 99:1–12 Wang CZ, Huang Y, Shao MW, Hu QH, Chen DG (2019) Feature selection based on neighborhood self-Information. IEEE Trans Cybern 99:1–12
30.
go back to reference Wang CZ, Wang Y, Shao MW, Qian YH, Chen DG (2020) Fuzzy rough attribute reduction for categorical data. IEEE Trans Fuzzy Syst 28(5):818–830CrossRef Wang CZ, Wang Y, Shao MW, Qian YH, Chen DG (2020) Fuzzy rough attribute reduction for categorical data. IEEE Trans Fuzzy Syst 28(5):818–830CrossRef
31.
go back to reference Yang YY, Song SJ, Chen DG, Zhang X (2020) Discernible neighborhood counting based incremental feature selection for heterogeneous data. Int J Mach Learn Cybern 11(5):1115–1127CrossRef Yang YY, Song SJ, Chen DG, Zhang X (2020) Discernible neighborhood counting based incremental feature selection for heterogeneous data. Int J Mach Learn Cybern 11(5):1115–1127CrossRef
32.
go back to reference Yu DR, An S, Hu QH (2011) Fuzzy mutual information based min-redundancy and max-relevance heterogeneous feature selection. Int J Comput Intell Syst 4(4):619–633 Yu DR, An S, Hu QH (2011) Fuzzy mutual information based min-redundancy and max-relevance heterogeneous feature selection. Int J Comput Intell Syst 4(4):619–633
33.
go back to reference Yuan Z, Zhang XY, Feng S (2018) Hybrid data-driven outlier detection based on neighborhood information entropy and its developmental measures. Expert Syst Appl 112:243–257CrossRef Yuan Z, Zhang XY, Feng S (2018) Hybrid data-driven outlier detection based on neighborhood information entropy and its developmental measures. Expert Syst Appl 112:243–257CrossRef
Metadata
Title
Data reduction based on NN-kNN measure for NN classification and regression
Authors
Shuang An
Qinghua Hu
Changzhong Wang
Ge Guo
Piyu Li
Publication date
25-04-2021
Publisher
Springer Berlin Heidelberg
Published in
International Journal of Machine Learning and Cybernetics / Issue 3/2022
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-021-01327-3

Other articles of this Issue 3/2022

International Journal of Machine Learning and Cybernetics 3/2022 Go to the issue