Skip to main content
Top
Published in: Knowledge and Information Systems 3/2019

21-05-2018 | Regular Paper

Instance reduction for one-class classification

Authors: Bartosz Krawczyk, Isaac Triguero, Salvador García, Michał Woźniak, Francisco Herrera

Published in: Knowledge and Information Systems | Issue 3/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Instance reduction techniques are data preprocessing methods originally developed to enhance the nearest neighbor rule for standard classification. They reduce the training data by selecting or generating representative examples of a given problem. These algorithms have been designed and widely analyzed in multi-class problems providing very competitive results. However, this issue was rarely addressed in the context of one-class classification. In this specific domain a reduction of the training set may not only decrease the classification time and classifier’s complexity, but also allows us to handle internal noisy data and simplify the data description boundary. We propose two methods for achieving this goal. The first one is a flexible framework that adjusts any instance reduction method to one-class scenario by introduction of meaningful artificial outliers. The second one is a novel modification of evolutionary instance reduction technique that is based on differential evolution and uses consistency measure for model evaluation in filter or wrapper modes. It is a powerful native one-class solution that does not require an access to counterexamples. Both of the proposed algorithms can be applied to any type of one-class classifier. On the basis of extensive computational experiments, we show that the proposed methods are highly efficient techniques to reduce the complexity and improve the classification performance in one-class scenarios.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66 Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66
2.
go back to reference Angiulli F (2012) Prototype-based domain description for one-class classification. IEEE Trans Pattern Anal Mach Intell 34(6):1131–1144CrossRef Angiulli F (2012) Prototype-based domain description for one-class classification. IEEE Trans Pattern Anal Mach Intell 34(6):1131–1144CrossRef
3.
go back to reference Bicego M, Figueiredo MAT (2009) Soft clustering using weighted one-class support vector machines. Pattern Recogn 42(1):27–32MATHCrossRef Bicego M, Figueiredo MAT (2009) Soft clustering using weighted one-class support vector machines. Pattern Recogn 42(1):27–32MATHCrossRef
4.
go back to reference Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2015) Feature selection for high-dimensional data. Artificial Intelligence: Foundations, Theory, and Algorithms. Springer, pp 1–132. ISBN 978-3-319-21857-1. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2015) Feature selection for high-dimensional data. Artificial Intelligence: Foundations, Theory, and Algorithms. Springer, pp 1–132. ISBN 978-3-319-21857-1.
6.
go back to reference Cabral GG, de Oliveira ALI (2011) A novel one-class classification method based on feature analysis and prototype reduction. In: Proceedings of the IEEE international conference on systems, man and cybernetics, Anchorage, October 9–12, 2011, pp 983–988 Cabral GG, de Oliveira ALI (2011) A novel one-class classification method based on feature analysis and prototype reduction. In: Proceedings of the IEEE international conference on systems, man and cybernetics, Anchorage, October 9–12, 2011, pp 983–988
7.
go back to reference Cano A, Ventura S, Cios KJ (2017) Multi-objective genetic programming for feature extraction and data visualization. Soft Comput 21(8):2069–2089CrossRef Cano A, Ventura S, Cios KJ (2017) Multi-objective genetic programming for feature extraction and data visualization. Soft Comput 21(8):2069–2089CrossRef
8.
go back to reference Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans Evol Comput 7(6):561–575CrossRef Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans Evol Comput 7(6):561–575CrossRef
9.
go back to reference Chen Y, Garcia EK, Gupta MR, Rahimi A, Cazzanti L (2009) Similarity-based classification: concepts and algorithms. J Mach Learn Res 10:747–776MathSciNetMATH Chen Y, Garcia EK, Gupta MR, Rahimi A, Cazzanti L (2009) Similarity-based classification: concepts and algorithms. J Mach Learn Res 10:747–776MathSciNetMATH
10.
go back to reference Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27MATHCrossRef Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27MATHCrossRef
11.
go back to reference Cyganek B (2012) One-class support vector ensembles for image segmentation and classification. J Math Imaging Vis 42(2–3):103–117MathSciNetMATHCrossRef Cyganek B (2012) One-class support vector ensembles for image segmentation and classification. J Math Imaging Vis 42(2–3):103–117MathSciNetMATHCrossRef
12.
go back to reference Cyganek B, Wiatr K (2011) Image contents annotations with the ensemble of one-class support vector machines. In: NCTA 2011—proceedings of the international conference on neural computation theory and applications [part of the international joint conference on computational intelligence IJCCI 2011], Paris, 24–26 October, 2011, pp 277–282 Cyganek B, Wiatr K (2011) Image contents annotations with the ensemble of one-class support vector machines. In: NCTA 2011—proceedings of the international conference on neural computation theory and applications [part of the international joint conference on computational intelligence IJCCI 2011], Paris, 24–26 October, 2011, pp 277–282
13.
go back to reference Czarnowski I (2010) Prototype selection algorithms for distributed learning. Pattern Recogn 43(6):2292–2300MATHCrossRef Czarnowski I (2010) Prototype selection algorithms for distributed learning. Pattern Recogn 43(6):2292–2300MATHCrossRef
14.
go back to reference Czarnowski I (2012) Cluster-based instance selection for machine classification. Knowl Inf Syst 30(1):113–133CrossRef Czarnowski I (2012) Cluster-based instance selection for machine classification. Knowl Inf Syst 30(1):113–133CrossRef
15.
go back to reference Das S, Suganthan P (2011) Differential evolution: a survey of the state-of-the-art. IEEE Trans Evol Comput 15(1):4–31CrossRef Das S, Suganthan P (2011) Differential evolution: a survey of the state-of-the-art. IEEE Trans Evol Comput 15(1):4–31CrossRef
16.
go back to reference García S, Cano JR, Herrera F (2008) A memetic algorithm for evolutionary prototype selection: a scaling up approach. Pattern Recogn 41(8):2693–2709MATHCrossRef García S, Cano JR, Herrera F (2008) A memetic algorithm for evolutionary prototype selection: a scaling up approach. Pattern Recogn 41(8):2693–2709MATHCrossRef
17.
go back to reference García S, Derrac J, Cano J, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435CrossRef García S, Derrac J, Cano J, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435CrossRef
18.
go back to reference García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064CrossRef García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064CrossRef
19.
go back to reference García S, Herrera F (2008) An extension on “Statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694MATH García S, Herrera F (2008) An extension on “Statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694MATH
20.
go back to reference García S, Herrera F (2009) Evolutionary under-sampling for classification with imbalanced data sets: proposals and taxonomy. Evol Comput 17(3):275–306MathSciNetCrossRef García S, Herrera F (2009) Evolutionary under-sampling for classification with imbalanced data sets: proposals and taxonomy. Evol Comput 17(3):275–306MathSciNetCrossRef
21.
go back to reference García S, Luengo J, Herrera F (2014) Data preprocessing in data mining. Springer Publishing Company, Berlin García S, Luengo J, Herrera F (2014) Data preprocessing in data mining. Springer Publishing Company, Berlin
22.
go back to reference García S, Luengo J, Herrera F (2016) Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowl Based Syst 98:1–29CrossRef García S, Luengo J, Herrera F (2016) Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowl Based Syst 98:1–29CrossRef
23.
go back to reference García V, Mollineda RA, Sánchez JS (2009) Index of balanced accuracy: a performance measure for skewed class distributions. In: Pattern recognition and image analysis, 4th Iberian Conference, IbPRIA 2009, Póvoa de Varzim, Portugal, June 10–12, 2009, Proceedings, pp 441–448 García V, Mollineda RA, Sánchez JS (2009) Index of balanced accuracy: a performance measure for skewed class distributions. In: Pattern recognition and image analysis, 4th Iberian Conference, IbPRIA 2009, Póvoa de Varzim, Portugal, June 10–12, 2009, Proceedings, pp 441–448
24.
go back to reference García-Pedrajas N, de Haro-García A (2014) Boosting instance selection algorithms. Knowl Based Syst 67:342–360CrossRef García-Pedrajas N, de Haro-García A (2014) Boosting instance selection algorithms. Knowl Based Syst 67:342–360CrossRef
25.
go back to reference Hadjadji B, Chibani Y (2014) Optimized selection of training samples for one-class neural network classifier. In: 2014 international joint conference on neural networks, IJCNN 2014, Beijing, July 6–11, 2014, pp 345–349 Hadjadji B, Chibani Y (2014) Optimized selection of training samples for one-class neural network classifier. In: 2014 international joint conference on neural networks, IJCNN 2014, Beijing, July 6–11, 2014, pp 345–349
26.
go back to reference Hempstalk K, Frank E, Witten IH (2008) One-class classification by combining density and class probability estimation. In: Machine learning and knowledge discovery in databases, European conference, ECML/PKDD 2008, Antwerp, September 15–19, 2008, Proceedings, Part I, pp 505–519 Hempstalk K, Frank E, Witten IH (2008) One-class classification by combining density and class probability estimation. In: Machine learning and knowledge discovery in databases, European conference, ECML/PKDD 2008, Antwerp, September 15–19, 2008, Proceedings, Part I, pp 505–519
27.
go back to reference Hu W, Tan Y (2016) Prototype generation using multiobjective particle swarm optimization for nearest neighbor classification. IEEE Trans Cybern 46(12):2719–2731CrossRef Hu W, Tan Y (2016) Prototype generation using multiobjective particle swarm optimization for nearest neighbor classification. IEEE Trans Cybern 46(12):2719–2731CrossRef
28.
go back to reference Japkowicz N, Myers C, Gluck M (1995) A novelty detection approach to classification. In: Proceedings of the 14th international joint conference on artificial intelligence, IJCAI 95, Montréal Québec, August 20–25 1995, vol 2, pp 518–523 Japkowicz N, Myers C, Gluck M (1995) A novelty detection approach to classification. In: Proceedings of the 14th international joint conference on artificial intelligence, IJCAI 95, Montréal Québec, August 20–25 1995, vol 2, pp 518–523
29.
go back to reference Juszczak P, Tax DMJ, Pekalska E, Duin RPW (2009) Minimum spanning tree based one-class classifier. Neurocomputing 72(7–9):1859–1869CrossRef Juszczak P, Tax DMJ, Pekalska E, Duin RPW (2009) Minimum spanning tree based one-class classifier. Neurocomputing 72(7–9):1859–1869CrossRef
30.
go back to reference Kim K, Lin H, Choi JY, Choi K (2016) A design framework for hierarchical ensemble of multiple feature extractors and multiple classifiers. Pattern Recogn 52:1–16CrossRef Kim K, Lin H, Choi JY, Choi K (2016) A design framework for hierarchical ensemble of multiple feature extractors and multiple classifiers. Pattern Recogn 52:1–16CrossRef
31.
go back to reference Krawczyk B (2015) One-class classifier ensemble pruning and weighting with firefly algorithm. Neurocomputing 150:490–500CrossRef Krawczyk B (2015) One-class classifier ensemble pruning and weighting with firefly algorithm. Neurocomputing 150:490–500CrossRef
32.
go back to reference Krawczyk B, Triguero I, García S, Woźniak M, Herrera F (2014) A first attempt on evolutionary prototype reduction for nearest neighbor one-class classification. In: Proceedings of the IEEE congress on evolutionary computation, CEC 2014, Beijing, July 6–11, 2014, pp 747–753. https://doi.org/10.1109/CEC.2014.6900469 Krawczyk B, Triguero I, García S, Woźniak M, Herrera F (2014) A first attempt on evolutionary prototype reduction for nearest neighbor one-class classification. In: Proceedings of the IEEE congress on evolutionary computation, CEC 2014, Beijing, July 6–11, 2014, pp 747–753. https://​doi.​org/​10.​1109/​CEC.​2014.​6900469
33.
go back to reference Krawczyk B, Woźniak M, Herrera F (2015) On the usefulness of one-class classifier ensembles for decomposition of multi-class problems. Pattern Recogn 48(12):3969–3982CrossRef Krawczyk B, Woźniak M, Herrera F (2015) On the usefulness of one-class classifier ensembles for decomposition of multi-class problems. Pattern Recogn 48(12):3969–3982CrossRef
34.
go back to reference Lam W, Keung CK, Liu D (2002) Discovering useful concept prototypes for classification based on filtering and abstraction. IEEE Trans Pattern Anal Mach Intell 14(8):1075–1090CrossRef Lam W, Keung CK, Liu D (2002) Discovering useful concept prototypes for classification based on filtering and abstraction. IEEE Trans Pattern Anal Mach Intell 14(8):1075–1090CrossRef
35.
go back to reference Leyva E, Muñoz AG, Pérez R (2015) Three new instance selection methods based on local sets: a comparative study with several approaches from a bi-objective perspective. Pattern Recogn 48(4):1523–1537CrossRef Leyva E, Muñoz AG, Pérez R (2015) Three new instance selection methods based on local sets: a comparative study with several approaches from a bi-objective perspective. Pattern Recogn 48(4):1523–1537CrossRef
36.
go back to reference Li Y (2011) Selecting training points for one-class support vector machines. Pattern Recogn Lett 32(11):1517–1522CrossRef Li Y (2011) Selecting training points for one-class support vector machines. Pattern Recogn Lett 32(11):1517–1522CrossRef
37.
go back to reference Liu W, Hua G, Smith JR (2014) Unsupervised one-class learning for automatic outlier removal. In: 2014 IEEE conference on computer vision and pattern recognition, CVPR 2014, Columbus, June 23–28, 2014, pp 3826–3833 Liu W, Hua G, Smith JR (2014) Unsupervised one-class learning for automatic outlier removal. In: 2014 IEEE conference on computer vision and pattern recognition, CVPR 2014, Columbus, June 23–28, 2014, pp 3826–3833
38.
go back to reference Moya M, Hush D (1996) Network constraints and multi-objective optimization for one-class classification. Neural Netw 9(3):463–474CrossRef Moya M, Hush D (1996) Network constraints and multi-objective optimization for one-class classification. Neural Netw 9(3):463–474CrossRef
39.
go back to reference Neri F, Tirronen V (2009) Scale factor local search in differential evolution. Memet Comput 1(2):153–171CrossRef Neri F, Tirronen V (2009) Scale factor local search in differential evolution. Memet Comput 1(2):153–171CrossRef
40.
go back to reference Parhizkar E, Abadi M (2015) Beeowa: a novel approach based on ABC algorithm and induced OWA operators for constructing one-class classifier ensembles. Neurocomputing 166:367–381CrossRef Parhizkar E, Abadi M (2015) Beeowa: a novel approach based on ABC algorithm and induced OWA operators for constructing one-class classifier ensembles. Neurocomputing 166:367–381CrossRef
41.
go back to reference Pyle D (1999) Data preparation for data mining. The Morgan Kaufmann series in data management systems. Morgan Kaufmann, Burlington Pyle D (1999) Data preparation for data mining. The Morgan Kaufmann series in data management systems. Morgan Kaufmann, Burlington
42.
go back to reference Ramírez-Gallego S, Krawczyk B, García S, Wozniak M, Herrera F (2017) A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing 239:39–57CrossRef Ramírez-Gallego S, Krawczyk B, García S, Wozniak M, Herrera F (2017) A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing 239:39–57CrossRef
43.
go back to reference Rokach L (2016) Decision forest: twenty years of research. Inf Fus 27:111–125CrossRef Rokach L (2016) Decision forest: twenty years of research. Inf Fus 27:111–125CrossRef
44.
go back to reference Shu W, Shen H (2016) Multi-criteria feature selection on cost-sensitive data with missing values. Pattern Recogn 51:268–280CrossRef Shu W, Shen H (2016) Multi-criteria feature selection on cost-sensitive data with missing values. Pattern Recogn 51:268–280CrossRef
45.
go back to reference Sonnenburg S, Ratsch G, Schafer C, Scholkopf B (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565MathSciNetMATH Sonnenburg S, Ratsch G, Schafer C, Scholkopf B (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565MathSciNetMATH
46.
go back to reference Spurek P, Wójcik M, Tabor J (2015) Cross-entropy clustering approach to one-class classification. In: Artificial intelligence and soft computing—14th international conference, ICAISC 2015, Zakopane, June 14–18, 2015, Proceedings, Part I, pp 481–490 Spurek P, Wójcik M, Tabor J (2015) Cross-entropy clustering approach to one-class classification. In: Artificial intelligence and soft computing—14th international conference, ICAISC 2015, Zakopane, June 14–18, 2015, Proceedings, Part I, pp 481–490
47.
go back to reference Tax DJM, Duin RPW (2001) Uniform object generation for optimizing one-class classifiers. J Mach Learn Res 2:155–173MATH Tax DJM, Duin RPW (2001) Uniform object generation for optimizing one-class classifiers. J Mach Learn Res 2:155–173MATH
48.
49.
go back to reference Tax DMJ, Müller K (2004) A consistency-based model selection for one-class classification. In: 17th international conference on pattern recognition, ICPR 2004, Cambridge, August 23–26, 2004, pp 363–366 Tax DMJ, Müller K (2004) A consistency-based model selection for one-class classification. In: 17th international conference on pattern recognition, ICPR 2004, Cambridge, August 23–26, 2004, pp 363–366
50.
51.
go back to reference Triguero I, Derrac J, García S, Herrera F (2012) A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans Syst Man Cybern C Appl Rev 42(1):86–100CrossRef Triguero I, Derrac J, García S, Herrera F (2012) A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans Syst Man Cybern C Appl Rev 42(1):86–100CrossRef
52.
go back to reference Triguero I, García S, Herrera F (2010) IPADE: iterative prototype adjustment for nearest neighbor classification. IEEE Trans Neural Netw 21(12):1984–1990CrossRef Triguero I, García S, Herrera F (2010) IPADE: iterative prototype adjustment for nearest neighbor classification. IEEE Trans Neural Netw 21(12):1984–1990CrossRef
53.
go back to reference Triguero I, García S, Herrera F (2011) Differential evolution for optimizing the positioning of prototypes in nearest neighbor classification. Pattern Recogn 44(4):901–916CrossRef Triguero I, García S, Herrera F (2011) Differential evolution for optimizing the positioning of prototypes in nearest neighbor classification. Pattern Recogn 44(4):901–916CrossRef
54.
go back to reference Triguero I, Peralta D, Bacardit J, García S, Herrera F (2015) MRPR: a mapreduce solution for prototype reduction in big data classification. Neurocomputing 150:331–345CrossRef Triguero I, Peralta D, Bacardit J, García S, Herrera F (2015) MRPR: a mapreduce solution for prototype reduction in big data classification. Neurocomputing 150:331–345CrossRef
55.
go back to reference Wilk T, Woźniak M (2012) Soft computing methods applied to combination of one-class classifiers. Neurocomputing 75:185–193CrossRef Wilk T, Woźniak M (2012) Soft computing methods applied to combination of one-class classifiers. Neurocomputing 75:185–193CrossRef
56.
57.
go back to reference Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38(3):257–286MATHCrossRef Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38(3):257–286MATHCrossRef
58.
go back to reference Woźniak M, Grana M, Corchado E (2014) A survey of multiple classifier systems as hybrid systems. Inf Fus 16(1):3–17CrossRef Woźniak M, Grana M, Corchado E (2014) A survey of multiple classifier systems as hybrid systems. Inf Fus 16(1):3–17CrossRef
59.
go back to reference Zhu F, Ye N, Yu W, Xu S, Li G (2014) Boundary detection and sample reduction for one-class support vector machines. Neurocomputing 123:166–173CrossRef Zhu F, Ye N, Yu W, Xu S, Li G (2014) Boundary detection and sample reduction for one-class support vector machines. Neurocomputing 123:166–173CrossRef
Metadata
Title
Instance reduction for one-class classification
Authors
Bartosz Krawczyk
Isaac Triguero
Salvador García
Michał Woźniak
Francisco Herrera
Publication date
21-05-2018
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 3/2019
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-018-1220-z

Other articles of this Issue 3/2019

Knowledge and Information Systems 3/2019 Go to the issue

Premium Partner