Skip to main content
Top
Published in: International Journal of Machine Learning and Cybernetics 9/2018

07-04-2017 | Original Article

A study on unstable cuts and its application to sample selection

Authors: Sheng Xing, Zhong Ming

Published in: International Journal of Machine Learning and Cybernetics | Issue 9/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

An unstable cuts-based sample selection (UCBSS) is proposed. The proposed method addresses problems associated with traditional sample selection methods based on distance calculation when compressing large datasets, that is, significant time requirements and computational complexity. The core idea of the proposed method is that the extreme value of the convex function will be obtained at the boundary point. The proposed method measures the boundary extent of samples by marking unstable cuts, counting the number of unstable cuts and setting a threshold, and then obtains unstable subsets. Experimental results show that the proposed method is suitable for compression of large datasets with high imbalance ratio. Compared to the traditional condensed nearest neighbour (CNN) method, the proposed method can obtain similar compression ratios and higher G-mean values on datasets with high imbalance ratio. When the discriminant function of the classifier is a convex function, the proposed method can obtain similar accuracy and higher compression ratios on datasets with significant noise. In addition, the run time of the proposed method shows obvious advantage.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Literature
1.
go back to reference Wang XZ (2015) Learning from big data with uncertainty—editoria. J Intell Fuzzy Syst 28(5):2329–2330CrossRef Wang XZ (2015) Learning from big data with uncertainty—editoria. J Intell Fuzzy Syst 28(5):2329–2330CrossRef
3.
go back to reference Hart P (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(5):515–516CrossRef Hart P (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(5):515–516CrossRef
4.
go back to reference Gates GW (1972) The reduced nearest neighbor rule. IEEE Trans Theory 18(3):431–433CrossRef Gates GW (1972) The reduced nearest neighbor rule. IEEE Trans Theory 18(3):431–433CrossRef
5.
go back to reference Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern SMC 6(6):448–452MathSciNetMATH Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern SMC 6(6):448–452MathSciNetMATH
7.
go back to reference Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Machine Learning 38(3):257–286CrossRefMATH Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Machine Learning 38(3):257–286CrossRefMATH
8.
go back to reference Brighton B, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Disc 6(2):153–172MathSciNetCrossRefMATH Brighton B, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Disc 6(2):153–172MathSciNetCrossRefMATH
9.
go back to reference Ritter GL, Woodruff HB, Lowry SR et al (1975) An algorithm for a selective nearest neighbour decision rule. IEEE Trans Inf Theory 21(6):665–669CrossRefMATH Ritter GL, Woodruff HB, Lowry SR et al (1975) An algorithm for a selective nearest neighbour decision rule. IEEE Trans Inf Theory 21(6):665–669CrossRefMATH
10.
go back to reference Dasarathy BV (1994) Minimal consistent set (MCS) identification for optimal nearest neighbor decision systems design. IEEE Trans Syst Man Cybern 24(1):511–517CrossRef Dasarathy BV (1994) Minimal consistent set (MCS) identification for optimal nearest neighbor decision systems design. IEEE Trans Syst Man Cybern 24(1):511–517CrossRef
11.
go back to reference Fayed HA, Atiya AF (2009) A novel template reduction approach for the k-nearest neighbor method. IEEE Trans Neural Networks 20(5):890–896CrossRef Fayed HA, Atiya AF (2009) A novel template reduction approach for the k-nearest neighbor method. IEEE Trans Neural Networks 20(5):890–896CrossRef
12.
go back to reference Nikolaidis K, Goulermas JY, Wu QH (2011) A class boundary preserving algorithm for data condensation. Pattern Recognit 44(3):704–715CrossRefMATH Nikolaidis K, Goulermas JY, Wu QH (2011) A class boundary preserving algorithm for data condensation. Pattern Recognit 44(3):704–715CrossRefMATH
13.
go back to reference Zhai JH, Li T, Wang XZ (2016) A cross-instance selection algorithm. J Intell Fuzzy Syst 30(2):717–728CrossRef Zhai JH, Li T, Wang XZ (2016) A cross-instance selection algorithm. J Intell Fuzzy Syst 30(2):717–728CrossRef
14.
go back to reference Chen JN, Zhang CM, Xue XP, Liu CL (2013) Fast instance selection for speeding up support vector machines. Knowl Based Syst 45:1–7CrossRef Chen JN, Zhang CM, Xue XP, Liu CL (2013) Fast instance selection for speeding up support vector machines. Knowl Based Syst 45:1–7CrossRef
15.
go back to reference Chou CH, Kuo BH, Chang F (2006) The generalized condensed nearest neighbor rule as a data reduction method. Proceedings of the 18th international conference on pattern recognition, Hong-Kong, 556–559 Chou CH, Kuo BH, Chang F (2006) The generalized condensed nearest neighbor rule as a data reduction method. Proceedings of the 18th international conference on pattern recognition, Hong-Kong, 556–559
16.
go back to reference Li YH, Maguire L (2011) Selecting critical patterns based on local geometrical and statistical information. IEEE Trans Pattern Anal Mach Intell 33(6):1189–1201CrossRef Li YH, Maguire L (2011) Selecting critical patterns based on local geometrical and statistical information. IEEE Trans Pattern Anal Mach Intell 33(6):1189–1201CrossRef
17.
18.
go back to reference Lowe DG (1995) Similarity Metric Learning for a Variable-Kernel Classifier. Neural Comput 7(1):72–85CrossRef Lowe DG (1995) Similarity Metric Learning for a Variable-Kernel Classifier. Neural Comput 7(1):72–85CrossRef
19.
go back to reference Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6:37–66 Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6:37–66
20.
go back to reference Wilson DR, Martinez TR (1997) Instance pruning techniques. In: Fisher D (ed) Machine learning: Proceedings of the Fourteenth International Conference (ICML’97). Morgan Kaufmann Publishers, San Francisco, pp 403–411 Wilson DR, Martinez TR (1997) Instance pruning techniques. In: Fisher D (ed) Machine learning: Proceedings of the Fourteenth International Conference (ICML’97). Morgan Kaufmann Publishers, San Francisco, pp 403–411
21.
go back to reference Tsai CF, Chen ZY (2014) Towards high dimensional instance selection: an evolutionary approach. Decis Support Syst 61:79–92CrossRef Tsai CF, Chen ZY (2014) Towards high dimensional instance selection: an evolutionary approach. Decis Support Syst 61:79–92CrossRef
22.
go back to reference Tsai CF, Chang CW (2013) SVOIS: Support vector oriented instance selection for text classification. Inf Syst 38(8):1070–1083CrossRef Tsai CF, Chang CW (2013) SVOIS: Support vector oriented instance selection for text classification. Inf Syst 38(8):1070–1083CrossRef
23.
go back to reference García-Osorio C, Haro-García AD, García-Pedrajas N (2010) Democratic instance selection: a linear complexity instance selection algorithm based on classifier ensemble concepts. Artif Intell 174:410–441MathSciNetCrossRef García-Osorio C, Haro-García AD, García-Pedrajas N (2010) Democratic instance selection: a linear complexity instance selection algorithm based on classifier ensemble concepts. Artif Intell 174:410–441MathSciNetCrossRef
24.
go back to reference Haro-García AD, García-Pedrajas N, Castillo JARD (2012) Large scale instance selection by means of federal instance selection. Data Knowl Eng 75:58–77CrossRef Haro-García AD, García-Pedrajas N, Castillo JARD (2012) Large scale instance selection by means of federal instance selection. Data Knowl Eng 75:58–77CrossRef
25.
go back to reference Wang XZ, Dong LC, Yan JH (2012) Maximum ambiguity-based sample selection in fuzzy decision tree induction. IEEE Trans Knowl Data Eng 24(8):1491–1505CrossRef Wang XZ, Dong LC, Yan JH (2012) Maximum ambiguity-based sample selection in fuzzy decision tree induction. IEEE Trans Knowl Data Eng 24(8):1491–1505CrossRef
26.
go back to reference Fu YF, Zhu XQ, Elmagarmid AK (2013) Active learning with optimal instance subset selection [J]. IEEE Trans Cybern 44(5):464–475 Fu YF, Zhu XQ, Elmagarmid AK (2013) Active learning with optimal instance subset selection [J]. IEEE Trans Cybern 44(5):464–475
27.
go back to reference Zhai TT, He ZF (2013) Instance selection for time series classification based on immune binary particle swarm optimization. Knowl-Based Syst 49:106–115CrossRef Zhai TT, He ZF (2013) Instance selection for time series classification based on immune binary particle swarm optimization. Knowl-Based Syst 49:106–115CrossRef
28.
go back to reference Wang XZ, Xing S, Zhao SX (2016) Unstable cut-points based sample selection for large data classification 29(9):780–789 Wang XZ, Xing S, Zhao SX (2016) Unstable cut-points based sample selection for large data classification 29(9):780–789
29.
go back to reference Lv J, Yi Z (2005) An improved backpropagation algorithm using absolute error function. Springer Berlin Heidelberg, 3496:585–590 Lv J, Yi Z (2005) An improved backpropagation algorithm using absolute error function. Springer Berlin Heidelberg, 3496:585–590
30.
go back to reference Breiman L, Friedman JH, Stone CJ (1984) Classification and regression tree. Wadsworth International Group Breiman L, Friedman JH, Stone CJ (1984) Classification and regression tree. Wadsworth International Group
31.
go back to reference Breiman L (1996) Technical note: Some properties of splitting criteria. Mach Learn 24:41–47MATH Breiman L (1996) Technical note: Some properties of splitting criteria. Mach Learn 24:41–47MATH
32.
go back to reference Rokach L, Maimon O (2005) Top-down induction of decision trees classifiers-a survey. IEEE Trans Syst Man Cybern Part C 35(4):476–488CrossRef Rokach L, Maimon O (2005) Top-down induction of decision trees classifiers-a survey. IEEE Trans Syst Man Cybern Part C 35(4):476–488CrossRef
33.
go back to reference Quinlan JR (1986) Induction of decision tree. Machine Learning 1(1):81–106 Quinlan JR (1986) Induction of decision tree. Machine Learning 1(1):81–106
34.
go back to reference Quinlan JR (1996) Improved use of continuous attributes in C4.5. J Artif Intell Res 4:77–90CrossRefMATH Quinlan JR (1996) Improved use of continuous attributes in C4.5. J Artif Intell Res 4:77–90CrossRefMATH
35.
go back to reference Fayyad UM, Irani KB (1992) On the handling of continuous-valued attributes in decision tree generation. Mach Learn 8:87–102MATH Fayyad UM, Irani KB (1992) On the handling of continuous-valued attributes in decision tree generation. Mach Learn 8:87–102MATH
36.
go back to reference Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. Mach Learn 1:1022–1027 Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. Mach Learn 1:1022–1027
38.
go back to reference Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501CrossRef Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501CrossRef
39.
go back to reference Wang XZ, Shao QY, Qing M, Zhai JH (2013) Architecture selection for networks trained with extreme learning machine using localized generalization error model. Neurocomputing 102: 3–9CrossRef Wang XZ, Shao QY, Qing M, Zhai JH (2013) Architecture selection for networks trained with extreme learning machine using localized generalization error model. Neurocomputing 102: 3–9CrossRef
40.
go back to reference Wang XZ, Chen AX, Feng HM (2011) Upper integral network with extreme learning mechanism. Neurocomputing 74(16): 2520–2525CrossRef Wang XZ, Chen AX, Feng HM (2011) Upper integral network with extreme learning mechanism. Neurocomputing 74(16): 2520–2525CrossRef
Metadata
Title
A study on unstable cuts and its application to sample selection
Authors
Sheng Xing
Zhong Ming
Publication date
07-04-2017
Publisher
Springer Berlin Heidelberg
Published in
International Journal of Machine Learning and Cybernetics / Issue 9/2018
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-017-0663-y

Other articles of this Issue 9/2018

International Journal of Machine Learning and Cybernetics 9/2018 Go to the issue