Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 1/2016

01.02.2016 | Original Article

Large symmetric margin instance selection algorithm

verfasst von: Javad Hamidzadeh, Reza Monsefi, Hadi Sadoghi Yazdi

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In instance-based classifiers, there is a need for storing a large number of samples as a training set. In this paper, we propose a large symmetric margin instance selection algorithm, namely LAMIS. LAMIS removes non-border (interior) instances and keeps border ones. This paper presents an instance selection process through formulating it as a constrained binary optimization problem and solves it by employment filled function algorithm. Instance-based learning algorithms are often confronted with the problem of deciding which instances must be stored for use during an actual test. Storing too many instances can result in large memory requirements and slow execution. In LAMIS, the core of instance selection process is based on keeping the hyperplane that separates a two-class data, to provide large margin separation. LAMIS selects the most representative instances, satisfying both objectives: high accuracy and reduction rates. The performance has been evaluated on real world data sets from UCI repository by the ten-fold cross-validation method. The results of experiments have been compared with state-of-the-art methods, where the overall results, show the superiority of the proposed method in terms of classification accuracy and reduction percentage.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF (2010) A new fast prototype selection method based on clustering. Pattern Anal Appl 13(2):131–141MathSciNetCrossRef Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF (2010) A new fast prototype selection method based on clustering. Pattern Anal Appl 13(2):131–141MathSciNetCrossRef
2.
Zurück zum Zitat Czarnowski I (2012) Cluster-based instance selection for machine classification. Knowl Inf Syst 30(1):113–133CrossRef Czarnowski I (2012) Cluster-based instance selection for machine classification. Knowl Inf Syst 30(1):113–133CrossRef
3.
Zurück zum Zitat Cheng H, Shan J, Ju W, Guo Y, Zhang L (2010) Automated breast cancer detection and classification using ultrasound images: a survey. Pattern Recogn 43:299–317MATHCrossRef Cheng H, Shan J, Ju W, Guo Y, Zhang L (2010) Automated breast cancer detection and classification using ultrasound images: a survey. Pattern Recogn 43:299–317MATHCrossRef
4.
Zurück zum Zitat Rosset S, Perlich C, Swirszcz G, Melville P, Liu Y (2010) Medical data mining: insights from winning two competitions. Data Min Knowl Disc 20:439–468MathSciNetCrossRef Rosset S, Perlich C, Swirszcz G, Melville P, Liu Y (2010) Medical data mining: insights from winning two competitions. Data Min Knowl Disc 20:439–468MathSciNetCrossRef
5.
Zurück zum Zitat Liu H, Liu L, Zhang H (2010) Ensemble gene selection for cancer classification. Pattern Recogn 43:2763–2772CrossRef Liu H, Liu L, Zhang H (2010) Ensemble gene selection for cancer classification. Pattern Recogn 43:2763–2772CrossRef
6.
Zurück zum Zitat Twala B, Phorah M (2010) Predicting incomplete gene microarray data with the use of supervised learning algorithms. Pattern Recogn Lett 31:2061–2069CrossRef Twala B, Phorah M (2010) Predicting incomplete gene microarray data with the use of supervised learning algorithms. Pattern Recogn Lett 31:2061–2069CrossRef
7.
Zurück zum Zitat Dhurandhar A, Dobra A (2013) Probabilistic characterization of nearest neighbor classifier. Int J Mach Learn Cybernet 4:259–272CrossRef Dhurandhar A, Dobra A (2013) Probabilistic characterization of nearest neighbor classifier. Int J Mach Learn Cybernet 4:259–272CrossRef
8.
9.
Zurück zum Zitat Tomašev N, Radovanović M, Mladenić D, Ivanović M (2012) Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification. Int J Mach Learn Cybernet. doi:10.1007/s13042-012-0137-1 Tomašev N, Radovanović M, Mladenić D, Ivanović M (2012) Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification. Int J Mach Learn Cybernet. doi:10.​1007/​s13042-012-0137-1
10.
Zurück zum Zitat Hamidzadeh J, Monsefi R, Sadoghi Yazdi H (2012) DDC: distance-based decision classifier. Neural Comput Appl 21:1697–1707CrossRef Hamidzadeh J, Monsefi R, Sadoghi Yazdi H (2012) DDC: distance-based decision classifier. Neural Comput Appl 21:1697–1707CrossRef
11.
Zurück zum Zitat Small K, Roth D (2010) Margin-based active learning for structured predictions. Int J Mach Learn Cybernet 1:3–25CrossRef Small K, Roth D (2010) Margin-based active learning for structured predictions. Int J Mach Learn Cybernet 1:3–25CrossRef
12.
Zurück zum Zitat Wang XZ, Dong LC, Yan JH (2012) Maximum ambiguity based sample selection in fuzzy decision tree induction. IEEE Trans Knowl Data Eng 24(8):1491–1505CrossRef Wang XZ, Dong LC, Yan JH (2012) Maximum ambiguity based sample selection in fuzzy decision tree induction. IEEE Trans Knowl Data Eng 24(8):1491–1505CrossRef
13.
Zurück zum Zitat Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38:257–286MATHCrossRef Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38:257–286MATHCrossRef
14.
Zurück zum Zitat Chien-Hsing C, Bo-Han K, Fu C (2006) The generalized condensed nearest neighbor rule as a data reduction method. In: Proceedings of the 18th international conference on pattern recognition, IEEE Computer Society, Hong-Kong, pp 556–559 Chien-Hsing C, Bo-Han K, Fu C (2006) The generalized condensed nearest neighbor rule as a data reduction method. In: Proceedings of the 18th international conference on pattern recognition, IEEE Computer Society, Hong-Kong, pp 556–559
15.
Zurück zum Zitat Lam W, Keung CK, Liu D (2002) Discovering useful concept prototypes for classification based on filtering and abstraction. IEEE Trans Pattern Anal Mach Intell 24(8):1075–1090CrossRef Lam W, Keung CK, Liu D (2002) Discovering useful concept prototypes for classification based on filtering and abstraction. IEEE Trans Pattern Anal Mach Intell 24(8):1075–1090CrossRef
16.
Zurück zum Zitat Veenman CJ, Reinders MJT (2005) The nearest subclass classifier: a compromise between the nearest mean and nearest neighbor classifier. IEEE Trans Pattern Mach Intell 27(9):1417–1429CrossRef Veenman CJ, Reinders MJT (2005) The nearest subclass classifier: a compromise between the nearest mean and nearest neighbor classifier. IEEE Trans Pattern Mach Intell 27(9):1417–1429CrossRef
17.
Zurück zum Zitat García S, Derrac J, Cano JR, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Mach Intell 34(3):417–435CrossRef García S, Derrac J, Cano JR, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Mach Intell 34(3):417–435CrossRef
18.
Zurück zum Zitat Olvera-Lopez AJ, Carrasco-Ochoa JF, Martinez-Trinidad JA, Kittler J (2010) A review of instance selection methods. Artif Intell Rev 34:133–143CrossRef Olvera-Lopez AJ, Carrasco-Ochoa JF, Martinez-Trinidad JA, Kittler J (2010) A review of instance selection methods. Artif Intell Rev 34:133–143CrossRef
19.
Zurück zum Zitat Herrero JR, Navarro JJ (2007) Exploiting computer resources for fast nearest neighbor classification. Pattern Anal Appl 10(4):265–275MathSciNetCrossRef Herrero JR, Navarro JJ (2007) Exploiting computer resources for fast nearest neighbor classification. Pattern Anal Appl 10(4):265–275MathSciNetCrossRef
20.
Zurück zum Zitat Hart P (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14:515–516CrossRef Hart P (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14:515–516CrossRef
21.
Zurück zum Zitat Gates GW (1972) The reduced nearest neighbor rule. IEEE Trans Inf Theory 18(3):431–433CrossRef Gates GW (1972) The reduced nearest neighbor rule. IEEE Trans Inf Theory 18(3):431–433CrossRef
22.
Zurück zum Zitat Wilson D (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybernet 2:408–421MATHCrossRef Wilson D (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybernet 2:408–421MATHCrossRef
24.
Zurück zum Zitat Lowe DG (1995) Similarity metric learning for a variable-kernel classifier. Neural Comput 7(1):72–85CrossRef Lowe DG (1995) Similarity metric learning for a variable-kernel classifier. Neural Comput 7(1):72–85CrossRef
25.
Zurück zum Zitat Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Disc 6(2):153–172MATHMathSciNetCrossRef Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Disc 6(2):153–172MATHMathSciNetCrossRef
26.
Zurück zum Zitat Lumini A, Nanni L (2006) A clustering method for automatic biometric template selection. Pattern Recogn 39:495–497MATHCrossRef Lumini A, Nanni L (2006) A clustering method for automatic biometric template selection. Pattern Recogn 39:495–497MATHCrossRef
27.
Zurück zum Zitat Raicharoen T, Lursinsap C (2005) A divide-and-conquer approach to the pairwise opposite class-nearest neighbor (POC-NN) algorithm. Pattern Recogn Lett 26(10):1554–1567CrossRef Raicharoen T, Lursinsap C (2005) A divide-and-conquer approach to the pairwise opposite class-nearest neighbor (POC-NN) algorithm. Pattern Recogn Lett 26(10):1554–1567CrossRef
28.
Zurück zum Zitat Fayed HA, Atiya AF (2009) A novel template, reduction approach for the K-nearest neighbor method. IEEE Trans Neural Netw 20(5):890–896CrossRef Fayed HA, Atiya AF (2009) A novel template, reduction approach for the K-nearest neighbor method. IEEE Trans Neural Netw 20(5):890–896CrossRef
29.
Zurück zum Zitat Marchiori E (2008) Hit miss networks with applications to instance selection. J Mach Learn Res 9:997–1017MATHMathSciNet Marchiori E (2008) Hit miss networks with applications to instance selection. J Mach Learn Res 9:997–1017MATHMathSciNet
30.
Zurück zum Zitat Marchiori E (2010) Class conditional nearest neighbor for large margin instance selection. IEEE Trans Pattern Anal Mach Intell 32(2):364–370CrossRef Marchiori E (2010) Class conditional nearest neighbor for large margin instance selection. IEEE Trans Pattern Anal Mach Intell 32(2):364–370CrossRef
31.
Zurück zum Zitat Nikolaidis K, Goulermasn JY, Wu QH (2011) A class boundary preserving algorithm for data condensation. Pattern Recogn 44:704–715MATHCrossRef Nikolaidis K, Goulermasn JY, Wu QH (2011) A class boundary preserving algorithm for data condensation. Pattern Recogn 44:704–715MATHCrossRef
32.
Zurück zum Zitat Vallejo CG, Troyano JA, Ortega FJ (2010) InstanceRank: bringing order to datasets. Pattern Recogn Lett 31:133–142CrossRef Vallejo CG, Troyano JA, Ortega FJ (2010) InstanceRank: bringing order to datasets. Pattern Recogn Lett 31:133–142CrossRef
33.
Zurück zum Zitat Hernandez-Leal P, Carrasco-Ochoaa JA, Martinez-Trinidada JF, Olvera-Lopez JA (2013) InstanceRank based on borders for instance selection. Pattern Recogn 46:365–375CrossRef Hernandez-Leal P, Carrasco-Ochoaa JA, Martinez-Trinidada JF, Olvera-Lopez JA (2013) InstanceRank based on borders for instance selection. Pattern Recogn 46:365–375CrossRef
34.
Zurück zum Zitat Kuncheva L (1995) Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recogn Lett 16:809–814CrossRef Kuncheva L (1995) Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recogn Lett 16:809–814CrossRef
35.
Zurück zum Zitat Kuncheva LI (1997) Fitness functions in editing k-NN referent set by genetic algorithms. Pattern Recogn 30:1041–1049CrossRef Kuncheva LI (1997) Fitness functions in editing k-NN referent set by genetic algorithms. Pattern Recogn 30:1041–1049CrossRef
36.
Zurück zum Zitat Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans Evol Comput 7(6):561–575CrossRef Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans Evol Comput 7(6):561–575CrossRef
37.
Zurück zum Zitat Garcia S, Cano JR, Herera F (2008) A Memetic algorithm for evolutionary prototype selection: a scaling up approach. Pattern Recogn 41:2693–2709MATHCrossRef Garcia S, Cano JR, Herera F (2008) A Memetic algorithm for evolutionary prototype selection: a scaling up approach. Pattern Recogn 41:2693–2709MATHCrossRef
38.
39.
Zurück zum Zitat Reeves CR, Bush DR (2001) Using genetic algorithms for training data selection in RBF networks. In: Instance selection and construction for data mining. Kluwer Academic Publishers, pp 339–356 Reeves CR, Bush DR (2001) Using genetic algorithms for training data selection in RBF networks. In: Instance selection and construction for data mining. Kluwer Academic Publishers, pp 339–356
40.
Zurück zum Zitat Angiulli F, Astorino A (2010) Scaling up support vector machines using nearest neighbor condensation. IEEE Trans Neural Netw 21(2):351–357CrossRef Angiulli F, Astorino A (2010) Scaling up support vector machines using nearest neighbor condensation. IEEE Trans Neural Netw 21(2):351–357CrossRef
41.
Zurück zum Zitat Li Y, Maguire L (2011) Selecting critical patterns based on local geometrical and statistical information. IEEE Trans Pattern Anal Mach Intell 33(6):1189–1201CrossRef Li Y, Maguire L (2011) Selecting critical patterns based on local geometrical and statistical information. IEEE Trans Pattern Anal Mach Intell 33(6):1189–1201CrossRef
42.
Zurück zum Zitat Smith-Miles KA (2008) Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput Surv 41:6–25CrossRef Smith-Miles KA (2008) Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput Surv 41:6–25CrossRef
43.
Zurück zum Zitat Smith-Miles K, Islam R (2010) Meta-learning for data summarization based on instance selection method. In: 2010 IEEE congress on evolutionary computation (CEC). Barcelona, Spain, pp 1–8 Smith-Miles K, Islam R (2010) Meta-learning for data summarization based on instance selection method. In: 2010 IEEE congress on evolutionary computation (CEC). Barcelona, Spain, pp 1–8
44.
Zurück zum Zitat Caises Y, González A, Leyva E, Pérez R (2011) Combining instance selection methods based on data characterization: an approach to increase their effectiveness. Inf Sci 181(20):4780–4798CrossRef Caises Y, González A, Leyva E, Pérez R (2011) Combining instance selection methods based on data characterization: an approach to increase their effectiveness. Inf Sci 181(20):4780–4798CrossRef
45.
Zurück zum Zitat Leyva E, González A, Pérez R (2013) Knowledge-based instance selection: a compromise between efficiency and versatility. Knowl Based Syst 47:65–76CrossRef Leyva E, González A, Pérez R (2013) Knowledge-based instance selection: a compromise between efficiency and versatility. Knowl Based Syst 47:65–76CrossRef
46.
Zurück zum Zitat Wu ZY, Bai FS, Lee HWJ, Yang YJ (2007) A filled function method for constrained global optimization. J Glob Optim 39:495–507MATHMathSciNetCrossRef Wu ZY, Bai FS, Lee HWJ, Yang YJ (2007) A filled function method for constrained global optimization. J Glob Optim 39:495–507MATHMathSciNetCrossRef
47.
Zurück zum Zitat Ge RP (1990) A filled function method for finding a global minimizer of a function of several variables. Math Progr 46:191–204MATHCrossRef Ge RP (1990) A filled function method for finding a global minimizer of a function of several variables. Math Progr 46:191–204MATHCrossRef
48.
Zurück zum Zitat Shang YL, Zhang LS (2008) Finding discrete global minima with a filled function for integer programming. Eur J Oper Res 189:31–40MATHMathSciNetCrossRef Shang YL, Zhang LS (2008) Finding discrete global minima with a filled function for integer programming. Eur J Oper Res 189:31–40MATHMathSciNetCrossRef
49.
50.
Zurück zum Zitat Ling AF, Xu CX, Xu F-M (2009) A discrete filled function algorithm embedded with continuous approximation for solving max-cut problems. Eur J Oper Res 197:519–531MATHMathSciNetCrossRef Ling AF, Xu CX, Xu F-M (2009) A discrete filled function algorithm embedded with continuous approximation for solving max-cut problems. Eur J Oper Res 197:519–531MATHMathSciNetCrossRef
51.
Zurück zum Zitat Zhang Y, Xu Y, Zhang L (2009) A filled function method applied to nonsmooth constrained global optimization. J Comput Appl Math 232:415–426MATHMathSciNetCrossRef Zhang Y, Xu Y, Zhang L (2009) A filled function method applied to nonsmooth constrained global optimization. J Comput Appl Math 232:415–426MATHMathSciNetCrossRef
52.
Zurück zum Zitat Wang C, Yang Y, Li J (2009) A new filled function method for unconstrained global optimization. J Comput Appl Math 225:68–79MATHMathSciNetCrossRef Wang C, Yang Y, Li J (2009) A new filled function method for unconstrained global optimization. J Comput Appl Math 225:68–79MATHMathSciNetCrossRef
53.
Zurück zum Zitat Ma S, Yang Y, Liu H (2010) A parameter free filled function for unconstrained global optimization. Appl Math Comput 215:3610–3619MATHMathSciNetCrossRef Ma S, Yang Y, Liu H (2010) A parameter free filled function for unconstrained global optimization. Appl Math Comput 215:3610–3619MATHMathSciNetCrossRef
54.
Zurück zum Zitat Jie L (2011) A new filled function algorithm for constrained global optimization problems. In: Seventh International conference on computational intelligence and security, pp 38–41 Jie L (2011) A new filled function algorithm for constrained global optimization problems. In: Seventh International conference on computational intelligence and security, pp 38–41
55.
Zurück zum Zitat Shuqing J (2012) A filled function method with one parameter for box constraint. In: Eighth International conference on computational intelligence and security, pp 1–4 Shuqing J (2012) A filled function method with one parameter for box constraint. In: Eighth International conference on computational intelligence and security, pp 1–4
56.
57.
Zurück zum Zitat Wang W, Shang Y (2012) A quasi-filled function approach for nonlinear global integer optimization. In: Fifth International joint conference on computational sciences and optimization, pp 359–361 Wang W, Shang Y (2012) A quasi-filled function approach for nonlinear global integer optimization. In: Fifth International joint conference on computational sciences and optimization, pp 359–361
58.
Zurück zum Zitat Antczak T (2009) Exact penalty functions method for mathematical programming problems involving index functions. Eur J Oper Res 198:29–36MATHMathSciNetCrossRef Antczak T (2009) Exact penalty functions method for mathematical programming problems involving index functions. Eur J Oper Res 198:29–36MATHMathSciNetCrossRef
60.
Zurück zum Zitat Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MATHMathSciNet Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MATHMathSciNet
61.
Zurück zum Zitat García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180:2044–2064CrossRef García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180:2044–2064CrossRef
Metadaten
Titel
Large symmetric margin instance selection algorithm
verfasst von
Javad Hamidzadeh
Reza Monsefi
Hadi Sadoghi Yazdi
Publikationsdatum
01.02.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 1/2016
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-014-0239-z

Weitere Artikel der Ausgabe 1/2016

International Journal of Machine Learning and Cybernetics 1/2016 Zur Ausgabe

Neuer Inhalt