Top

Evolutionary Intelligence

Published in:

01-09-2012 | Special Issue

Efficient recurrent local search strategies for semi- and unsupervised regularized least-squares classification

Authors: Fabian Gieseke, Oliver Kramer, Antti Airola, Tapio Pahikkala

Published in: Evolutionary Intelligence | Issue 3/2012

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Binary classification tasks are among the most important ones in the field of machine learning. One prominent approach to address such tasks are support vector machines which aim at finding a hyperplane separating two classes well such that the induced distance between the hyperplane and the patterns is maximized. In general, sufficient labeled data is needed for such classification settings to obtain reasonable models. However, labeled data is often rare in real-world learning scenarios while unlabeled data can be obtained easily. For this reason, the concept of support vector machines has also been extended to semi- and unsupervised settings: in the unsupervised case, one aims at finding a partition of the data into two classes such that a subsequent application of a support vector machine leads to the best overall result. Similarly, given both a labeled and an unlabeled part, semi-supervised support vector machines favor decision hyperplanes that lie in a low density area induced by the unlabeled training patterns, while still considering the labeled part of the data. The associated optimization problems for both the semi- and unsupervised case, however, are of combinatorial nature and, hence, difficult to solve. In this work, we present efficient implementations of simple local search strategies for (variants of) the both cases that are based on matrix update schemes for the intermediate candidate solutions. We evaluate the performances of the resulting approaches on a variety of artificial and real-world data sets. The results indicate that our approaches can successfully incorporate unlabeled data. (The unsupervised case was originally proposed by Gieseke F, Pahikkala et al. (2009). The derivations presented in this work are new and comprehend the old ones (for the unsupervised setting) as a special case.)

previous article Kernel representations for evolving continuous functions

next article James A. Shapiro: Evolution: a view from the twenty-first century

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Available only for authorised users

Note that semi-supervised support vector machines do no necessarily lead to better classification models. In general, a low-density area indicating the classification boundaries is required. In the literature, this requirement is called the cluster assumption [8, 39].

For the sake of simplicity, the offset set ${b \in {\mathbb R}}$ is omitted in the latter formulation. From both a theoretical as well as a practical point of view, the additional term does not yield any known advantages for kernel functions like the RBF kernel [25, 30]. However, for the linear kernel, the offset term can make a difference since it addresses translated data. In case such an offset effect is needed for a particular learning task, one can add a dimension of ones to the input data to obtain a (regularized) offset term [25].

The random generation of an initial candidate solution takes the class ratio given by the balance constraint into account, i.e., for an initial candidate solution y we have y _i = 1 with probability b _c and y _i = − 1 with probability 1 − b _c for $i = l+1,\ldots, n. $

If K is invertible, then

$$ \begin{aligned} - 2 {({\bf D} {\bf K})}^{\rm T}({\bf D} {\bf y}-{\bf D} {\bf K} {\bf c}) + 2 \lambda {\bf K} {\bf c} &= {\bf 0} \\ \Leftrightarrow {({\bf D} {\bf K})}^{\rm T} ({{\bf D} {\bf K}{\bf D}} + \lambda {\bf I}) {\bf D}^{-1}{\bf c} &= {({\bf D} {\bf K})}^{\rm T} {\bf D} {\bf y} \\ \Leftrightarrow {\bf c} &= {\bf D} {\bf G} {\bf D} {\bf y} \\ \end{aligned} $$

If K is not invertible, then the latter equation can be used as well since we only need a single solution (if c = D G D y, then (D K)^T G ⁻¹ D ⁻¹ c = (D K)^T D y holds as well).

As mentioned, various other update schemes are possible. Another update scheme, for instance, consists in updating the terms in (10) individually, i.e., to handle the first term by updating the vector D K c ^* = D K D G D y and therefore (D y − D K c ^*)^T (D y − D K c ^*) in linear time. Similarly, the second term can be handled in linear time (by first updating K c ^* = K D G D y separately). Note that one can also consider one of the predecessors of Eq. (13).

Naturally, in case a too small subset is selected, the performance of the final model can be bad.

Similar data sets are often used in related experimental evaluations, see, e.g., [8]

http://yann.lecun.com/exdb/mnist.

For instance, [8] propose to make use of the test set to select the model parameters: “This allowed for finding hyperparameter values by minimizing the test error, which is not possible in real applications; however, the results of this procedure can be useful to judge the potential of a method. To obtain results that are indicative of real world performance, the model selection has to be performed using only the small set of labeled points.”

As a side note, we would like to point out that this special type shows a very similar behavior with respect to the classification performance compared to the more general setup with μ = 5 and ν = 25 (if sufficient restarts are performed).

Bennett KP, Demiriz A (1998) Semi-supervised support vector machines. In: Kearns MJ, Solla SA, Cohn DA (eds) Advances in neural information processing systems 11, MIT Press, pp 368–374

Beyer HG, Schwefel HP (2002) Evolution strategies—a comprehensive introduction. Nat Comput 1:3–52MathSciNetMATHCrossRef

Bie TD, Cristianini N (2003) Convex methods for transduction. In: Advances in neural information processing systems 16, MIT Press, pp 73–80

Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New YorkMATH

Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

Chapelle O, Zien A (2005) Semi-supervised classification by low density separation. In: Proceedings of the tenth international workshop on artificial intelligence and statistics, pp 57–64

Chapelle O, Chi M, Zien A (2006) A continuation method for semi-supervised svms. In: Proceedings of the international conference on machine learning, pp 185–192

Chapelle, O, Schölkopf, B, Zien, A (eds) (2006) Semi-supervised learning. MIT Press, Cambridge, MA

Chapelle O, Sindhwani V, Keerthi SS (2008) Optimization techniques for semi-supervised support vector machines. J Mach Learn Res 9:203–233MATH

10.

Collobert R, Sinz F, Weston J, Bottou L (2006) Trading convexity for scalability. In: Proceedings of the international conference on machine learning, pp 201–208

11.

Droste S, Jansen T, Wegener I (2002) On the analysis of the (1+1) evolutionary algorithm. Theor Comput Sci 276(1–2):51–81MathSciNetMATHCrossRef

12.

Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. Adv Comput Math 13(1):1–50, http://dx.doi.org/10.1023/A:1018946025316

13.

Fogel DB (1966) Artificial intelligence through simulated evolution. Wiley, New YorkMATH

14.

Fung G, Mangasarian OL (2001) Semi-supervised support vector machines for unlabeled data classification. Optim Methods Softw 15:29–44MATHCrossRef

15.

Gieseke F, Pahikkala T, Kramer O (2009) Fast evolutionary maximum margin clustering. In: Proceedings of the international conference on machine learning, pp 361–368

16.

Golub GH, Van Loan C (1989) Matrix computations, 2nd edn. Johns Hopkins University Press, Baltimore and London

17.

Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New YorkMATHCrossRef

18.

Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor

19.

Horn R, Johnson CR (1985) Matrix analysis. Cambridge University Press, CambridgeMATH

20.

Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of the international conference on machine learning, pp 200–209

21.

Mierswa I (2009) Non-convex and multi-objective optimization in data mining. PhD thesis, Technische Universität Dortmund

22.

Nene S, Nayar S, Murase H (1996) Columbia object image library (coil-100). Tech. rep

23.

Rechenberg I (1973) Evolutionsstrategie: optimierung technischer systeme nach prinzipien der biologischen evolution. Frommann-Holzboog, Stuttgart

24.

Rifkin R, Yeo G, Poggio T (2003) Regularized least-squares classification. In: Advances in learning theory: methods, models and applications, IOS Press, pp 131–154

25.

Rifkin RM (2002) Everything old is new again: a fresh look at historical approaches in machine learning. PhD thesis, MIT

26.

Schölkopf B, Herbrich R, Smola AJ (2001) A generalized representer theorem. In: Proceedings of the 14th annual conference on computational learning theory and 5th European conference on computational learning theory. Springer, London, pp 416–426

27.

Schwefel HP (1977) Numerische optimierung von computer-modellen mittel der evolutionsstrategie. Birkhuser, Basel

28.

Silva C, Santos JS, Wanner EF, Carrano EG, Takahashi RHC (2009) Semi-supervised training of least squares support vector machine using a multiobjective evolutionary algorithm. In: Proceedings of the eleventh conference on congress on evolutionary computation, IEEE Press, Piscataway, NJ, USA, pp 2996–3002

29.

Sindhwani V, Keerthi S, Chapelle O (2006) Deterministic annealing for semi-supervised kernel machines. In: Proceedings of the international conference on machine learning, pp 841–848

30.

Steinwart I, Christmann A (2008) Support vector machines. Springer, New YorkMATH

31.

Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300MathSciNetCrossRef

32.

Valizadegan H, Jin R (2007) Generalized maximum margin clustering and unsupervised kernel learning. In: Advances in neural information processing systems, MIT Press, vol 19, pp 1417–1424

33.

Vapnik V (1998) Statistical learning theory. Wiley, New YorkMATH

34.

Xu L, Schuurmans D (2005) Unsupervised and semi-supervised multi-class support vector machines. In: Proceedings of the national conference on artificial intelligence, pp 904–910

35.

Xu L, Neufeld J, Larson B, Schuurmans D (2005) Maximum margin clustering. In: Advances in neural information processing systems vol 17, pp 1537–1544

36.

Zhang K, Tsang IW, Kwok JT (2007) Maximum margin clustering made practical. In: Proceedings of the international conference on machine learning, pp 1119–1126

37.

Zhao B, Wang F, Zhang C (2008a) Efficient maximum margin clustering via cutting plane algorithm. In: Proceedings of the SIAM international conference on data mining, pp 751–762

38.

Zhao B, Wang F, Zhang C (2008b) Efficient multiclass maximum margin clustering. In: Proceedings of the international conference on machine learning, pp 1248–1255

39.

Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning. Morgan and Claypool, SeattleMATH

Title: Efficient recurrent local search strategies for semi- and unsupervised regularized least-squares classification
Authors: Fabian Gieseke
Oliver Kramer
Antti Airola
Tapio Pahikkala
Publication date: 01-09-2012
Publisher: Springer-Verlag
Published in: Evolutionary Intelligence / Issue 3/2012
Print ISSN: 1864-5909
Electronic ISSN: 1864-5917
DOI: https://doi.org/10.1007/s12065-012-0068-5

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 3/2012

Tuning and evolution of support vector kernels

Evolutionary kernel machines

James A. Shapiro: Evolution: a view from the twenty-first century

Response to Pauline Hogeweg’s review of my book, “Evolution: a view from the 21st century”

Kernel representations for evolving continuous functions

Premium Partner