Skip to main content
Top
Published in: Evolutionary Intelligence 3/2012

01-09-2012 | Special Issue

Efficient recurrent local search strategies for semi- and unsupervised regularized least-squares classification

Authors: Fabian Gieseke, Oliver Kramer, Antti Airola, Tapio Pahikkala

Published in: Evolutionary Intelligence | Issue 3/2012

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Binary classification tasks are among the most important ones in the field of machine learning. One prominent approach to address such tasks are support vector machines which aim at finding a hyperplane separating two classes well such that the induced distance between the hyperplane and the patterns is maximized. In general, sufficient labeled data is needed for such classification settings to obtain reasonable models. However, labeled data is often rare in real-world learning scenarios while unlabeled data can be obtained easily. For this reason, the concept of support vector machines has also been extended to semi- and unsupervised settings: in the unsupervised case, one aims at finding a partition of the data into two classes such that a subsequent application of a support vector machine leads to the best overall result. Similarly, given both a labeled and an unlabeled part, semi-supervised support vector machines favor decision hyperplanes that lie in a low density area induced by the unlabeled training patterns, while still considering the labeled part of the data. The associated optimization problems for both the semi- and unsupervised case, however, are of combinatorial nature and, hence, difficult to solve. In this work, we present efficient implementations of simple local search strategies for (variants of) the both cases that are based on matrix update schemes for the intermediate candidate solutions. We evaluate the performances of the resulting approaches on a variety of artificial and real-world data sets. The results indicate that our approaches can successfully incorporate unlabeled data. (The unsupervised case was originally proposed by Gieseke F, Pahikkala et al. (2009). The derivations presented in this work are new and comprehend the old ones (for the unsupervised setting) as a special case.)

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
Note that semi-supervised support vector machines do no necessarily lead to better classification models. In general, a low-density area indicating the classification boundaries is required. In the literature, this requirement is called the cluster assumption  [8, 39].
 
2
For the sake of simplicity, the offset set \({b \in {\mathbb R}}\) is omitted in the latter formulation. From both a theoretical as well as a practical point of view, the additional term does not yield any known advantages for kernel functions like the RBF kernel [25, 30]. However, for the linear kernel, the offset term can make a difference since it addresses translated data. In case such an offset effect is needed for a particular learning task, one can add a dimension of ones to the input data to obtain a (regularized) offset term [25].
 
3
The random generation of an initial candidate solution takes the class ratio given by the balance constraint into account, i.e., for an initial candidate solution y we have y i  = 1 with probability b c and y i  =  − 1 with probability 1 − b c for \(i = l+1,\ldots, n. \)
 
4
If K is invertible, then
$$ \begin{aligned} - 2 {({\bf D} {\bf K})}^{\rm T}({\bf D} {\bf y}-{\bf D} {\bf K} {\bf c}) + 2 \lambda {\bf K} {\bf c} &= {\bf 0} \\ \Leftrightarrow {({\bf D} {\bf K})}^{\rm T} ({{\bf D} {\bf K}{\bf D}} + \lambda {\bf I}) {\bf D}^{-1}{\bf c} &= {({\bf D} {\bf K})}^{\rm T} {\bf D} {\bf y} \\ \Leftrightarrow {\bf c} &= {\bf D} {\bf G} {\bf D} {\bf y} \\ \end{aligned} $$
If K is not invertible, then the latter equation can be used as well since we only need a single solution (if c = D G D y, then (D K)T G −1 D −1 c = (D K)T D y holds as well).
 
5
As mentioned, various other update schemes are possible. Another update scheme, for instance, consists in updating the terms in (10) individually, i.e., to handle the first term by updating the vector D K c * = D K D G D y and therefore (D y − D K c *)T (D y − D K c *) in linear time. Similarly, the second term can be handled in linear time (by first updating K c * = K D G D y separately). Note that one can also consider one of the predecessors of Eq. (13).
 
6
Naturally, in case a too small subset is selected, the performance of the final model can be bad.
 
7
Similar data sets are often used in related experimental evaluations, see, e.g., [8]
 
9
For instance, [8] propose to make use of the test set to select the model parameters: “This allowed for finding hyperparameter values by minimizing the test error, which is not possible in real applications; however, the results of this procedure can be useful to judge the potential of a method. To obtain results that are indicative of real world performance, the model selection has to be performed using only the small set of labeled points.”
 
10
As a side note, we would like to point out that this special type shows a very similar behavior with respect to the classification performance compared to the more general setup with μ = 5 and ν = 25 (if sufficient restarts are performed).
 
Literature
1.
go back to reference Bennett KP, Demiriz A (1998) Semi-supervised support vector machines. In: Kearns MJ, Solla SA, Cohn DA (eds) Advances in neural information processing systems 11, MIT Press, pp 368–374 Bennett KP, Demiriz A (1998) Semi-supervised support vector machines. In: Kearns MJ, Solla SA, Cohn DA (eds) Advances in neural information processing systems 11, MIT Press, pp 368–374
3.
go back to reference Bie TD, Cristianini N (2003) Convex methods for transduction. In: Advances in neural information processing systems 16, MIT Press, pp 73–80 Bie TD, Cristianini N (2003) Convex methods for transduction. In: Advances in neural information processing systems 16, MIT Press, pp 73–80
4.
go back to reference Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New YorkMATH Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New YorkMATH
6.
go back to reference Chapelle O, Zien A (2005) Semi-supervised classification by low density separation. In: Proceedings of the tenth international workshop on artificial intelligence and statistics, pp 57–64 Chapelle O, Zien A (2005) Semi-supervised classification by low density separation. In: Proceedings of the tenth international workshop on artificial intelligence and statistics, pp 57–64
7.
go back to reference Chapelle O, Chi M, Zien A (2006) A continuation method for semi-supervised svms. In: Proceedings of the international conference on machine learning, pp 185–192 Chapelle O, Chi M, Zien A (2006) A continuation method for semi-supervised svms. In: Proceedings of the international conference on machine learning, pp 185–192
8.
go back to reference Chapelle, O, Schölkopf, B, Zien, A (eds) (2006) Semi-supervised learning. MIT Press, Cambridge, MA Chapelle, O, Schölkopf, B, Zien, A (eds) (2006) Semi-supervised learning. MIT Press, Cambridge, MA
9.
go back to reference Chapelle O, Sindhwani V, Keerthi SS (2008) Optimization techniques for semi-supervised support vector machines. J Mach Learn Res 9:203–233MATH Chapelle O, Sindhwani V, Keerthi SS (2008) Optimization techniques for semi-supervised support vector machines. J Mach Learn Res 9:203–233MATH
10.
go back to reference Collobert R, Sinz F, Weston J, Bottou L (2006) Trading convexity for scalability. In: Proceedings of the international conference on machine learning, pp 201–208 Collobert R, Sinz F, Weston J, Bottou L (2006) Trading convexity for scalability. In: Proceedings of the international conference on machine learning, pp 201–208
11.
13.
go back to reference Fogel DB (1966) Artificial intelligence through simulated evolution. Wiley, New YorkMATH Fogel DB (1966) Artificial intelligence through simulated evolution. Wiley, New YorkMATH
14.
go back to reference Fung G, Mangasarian OL (2001) Semi-supervised support vector machines for unlabeled data classification. Optim Methods Softw 15:29–44MATHCrossRef Fung G, Mangasarian OL (2001) Semi-supervised support vector machines for unlabeled data classification. Optim Methods Softw 15:29–44MATHCrossRef
15.
go back to reference Gieseke F, Pahikkala T, Kramer O (2009) Fast evolutionary maximum margin clustering. In: Proceedings of the international conference on machine learning, pp 361–368 Gieseke F, Pahikkala T, Kramer O (2009) Fast evolutionary maximum margin clustering. In: Proceedings of the international conference on machine learning, pp 361–368
16.
go back to reference Golub GH, Van Loan C (1989) Matrix computations, 2nd edn. Johns Hopkins University Press, Baltimore and London Golub GH, Van Loan C (1989) Matrix computations, 2nd edn. Johns Hopkins University Press, Baltimore and London
17.
go back to reference Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New YorkMATHCrossRef Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New YorkMATHCrossRef
18.
go back to reference Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor
19.
go back to reference Horn R, Johnson CR (1985) Matrix analysis. Cambridge University Press, CambridgeMATH Horn R, Johnson CR (1985) Matrix analysis. Cambridge University Press, CambridgeMATH
20.
go back to reference Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of the international conference on machine learning, pp 200–209 Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of the international conference on machine learning, pp 200–209
21.
go back to reference Mierswa I (2009) Non-convex and multi-objective optimization in data mining. PhD thesis, Technische Universität Dortmund Mierswa I (2009) Non-convex and multi-objective optimization in data mining. PhD thesis, Technische Universität Dortmund
22.
go back to reference Nene S, Nayar S, Murase H (1996) Columbia object image library (coil-100). Tech. rep Nene S, Nayar S, Murase H (1996) Columbia object image library (coil-100). Tech. rep
23.
go back to reference Rechenberg I (1973) Evolutionsstrategie: optimierung technischer systeme nach prinzipien der biologischen evolution. Frommann-Holzboog, Stuttgart Rechenberg I (1973) Evolutionsstrategie: optimierung technischer systeme nach prinzipien der biologischen evolution. Frommann-Holzboog, Stuttgart
24.
go back to reference Rifkin R, Yeo G, Poggio T (2003) Regularized least-squares classification. In: Advances in learning theory: methods, models and applications, IOS Press, pp 131–154 Rifkin R, Yeo G, Poggio T (2003) Regularized least-squares classification. In: Advances in learning theory: methods, models and applications, IOS Press, pp 131–154
25.
go back to reference Rifkin RM (2002) Everything old is new again: a fresh look at historical approaches in machine learning. PhD thesis, MIT Rifkin RM (2002) Everything old is new again: a fresh look at historical approaches in machine learning. PhD thesis, MIT
26.
go back to reference Schölkopf B, Herbrich R, Smola AJ (2001) A generalized representer theorem. In: Proceedings of the 14th annual conference on computational learning theory and 5th European conference on computational learning theory. Springer, London, pp 416–426 Schölkopf B, Herbrich R, Smola AJ (2001) A generalized representer theorem. In: Proceedings of the 14th annual conference on computational learning theory and 5th European conference on computational learning theory. Springer, London, pp 416–426
27.
go back to reference Schwefel HP (1977) Numerische optimierung von computer-modellen mittel der evolutionsstrategie. Birkhuser, Basel Schwefel HP (1977) Numerische optimierung von computer-modellen mittel der evolutionsstrategie. Birkhuser, Basel
28.
go back to reference Silva C, Santos JS, Wanner EF, Carrano EG, Takahashi RHC (2009) Semi-supervised training of least squares support vector machine using a multiobjective evolutionary algorithm. In: Proceedings of the eleventh conference on congress on evolutionary computation, IEEE Press, Piscataway, NJ, USA, pp 2996–3002 Silva C, Santos JS, Wanner EF, Carrano EG, Takahashi RHC (2009) Semi-supervised training of least squares support vector machine using a multiobjective evolutionary algorithm. In: Proceedings of the eleventh conference on congress on evolutionary computation, IEEE Press, Piscataway, NJ, USA, pp 2996–3002
29.
go back to reference Sindhwani V, Keerthi S, Chapelle O (2006) Deterministic annealing for semi-supervised kernel machines. In: Proceedings of the international conference on machine learning, pp 841–848 Sindhwani V, Keerthi S, Chapelle O (2006) Deterministic annealing for semi-supervised kernel machines. In: Proceedings of the international conference on machine learning, pp 841–848
30.
go back to reference Steinwart I, Christmann A (2008) Support vector machines. Springer, New YorkMATH Steinwart I, Christmann A (2008) Support vector machines. Springer, New YorkMATH
31.
go back to reference Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300MathSciNetCrossRef Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300MathSciNetCrossRef
32.
go back to reference Valizadegan H, Jin R (2007) Generalized maximum margin clustering and unsupervised kernel learning. In: Advances in neural information processing systems, MIT Press, vol 19, pp 1417–1424 Valizadegan H, Jin R (2007) Generalized maximum margin clustering and unsupervised kernel learning. In: Advances in neural information processing systems, MIT Press, vol 19, pp 1417–1424
33.
go back to reference Vapnik V (1998) Statistical learning theory. Wiley, New YorkMATH Vapnik V (1998) Statistical learning theory. Wiley, New YorkMATH
34.
go back to reference Xu L, Schuurmans D (2005) Unsupervised and semi-supervised multi-class support vector machines. In: Proceedings of the national conference on artificial intelligence, pp 904–910 Xu L, Schuurmans D (2005) Unsupervised and semi-supervised multi-class support vector machines. In: Proceedings of the national conference on artificial intelligence, pp 904–910
35.
go back to reference Xu L, Neufeld J, Larson B, Schuurmans D (2005) Maximum margin clustering. In: Advances in neural information processing systems vol 17, pp 1537–1544 Xu L, Neufeld J, Larson B, Schuurmans D (2005) Maximum margin clustering. In: Advances in neural information processing systems vol 17, pp 1537–1544
36.
go back to reference Zhang K, Tsang IW, Kwok JT (2007) Maximum margin clustering made practical. In: Proceedings of the international conference on machine learning, pp 1119–1126 Zhang K, Tsang IW, Kwok JT (2007) Maximum margin clustering made practical. In: Proceedings of the international conference on machine learning, pp 1119–1126
37.
go back to reference Zhao B, Wang F, Zhang C (2008a) Efficient maximum margin clustering via cutting plane algorithm. In: Proceedings of the SIAM international conference on data mining, pp 751–762 Zhao B, Wang F, Zhang C (2008a) Efficient maximum margin clustering via cutting plane algorithm. In: Proceedings of the SIAM international conference on data mining, pp 751–762
38.
go back to reference Zhao B, Wang F, Zhang C (2008b) Efficient multiclass maximum margin clustering. In: Proceedings of the international conference on machine learning, pp 1248–1255 Zhao B, Wang F, Zhang C (2008b) Efficient multiclass maximum margin clustering. In: Proceedings of the international conference on machine learning, pp 1248–1255
39.
go back to reference Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning. Morgan and Claypool, SeattleMATH Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning. Morgan and Claypool, SeattleMATH
Metadata
Title
Efficient recurrent local search strategies for semi- and unsupervised regularized least-squares classification
Authors
Fabian Gieseke
Oliver Kramer
Antti Airola
Tapio Pahikkala
Publication date
01-09-2012
Publisher
Springer-Verlag
Published in
Evolutionary Intelligence / Issue 3/2012
Print ISSN: 1864-5909
Electronic ISSN: 1864-5917
DOI
https://doi.org/10.1007/s12065-012-0068-5

Other articles of this Issue 3/2012

Evolutionary Intelligence 3/2012 Go to the issue

Premium Partner