Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 6/2021

24.01.2021 | Original Article

Sample-based online learning for bi-regular hinge loss

verfasst von: Wei Xue, Ping Zhong, Wensheng Zhang, Gaohang Yu, Yebin Chen

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 6/2021

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Support vector machine (SVM), a state-of-the-art classifier for supervised classification task, is famous for its strong generalization guarantees derived from the max-margin property. In this paper, we focus on the maximum margin classification problem cast by SVM and study the bi-regular hinge loss model, which not only performs feature selection but tends to select highly correlated features together. To solve this model, we propose an online learning algorithm that aims at solving a non-smooth minimization problem by alternating iterative mechanism. Basically, the proposed algorithm alternates between intrusion samples detection and iterative optimization, and at each iteration it obtains a closed-form solution to the model. In theory, we prove that the proposed algorithm achieves \(O(1/\sqrt{T})\) convergence rate under some mild conditions, where T is the number of training samples received in online learning. Experimental results on synthetic data and benchmark datasets demonstrate the effectiveness and performance of our approach in comparison with several popular algorithms, such as LIBSVM, SGD, PEGASOS, SVRG, etc.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
A problem is said to be NP-hard if the solution is required to be over the integers [16].
 
2
The learner in an online learning model needs to make predictions about a sequence of samples, one after the other, and then receives a loss after each prediction. The goal is to minimize the accumulated losses [30, 31]. The first explicit models of online learning were proposed by Angluin [2] and Littlestone [28].
 
5
CVX is a free Matlab software for disciplined convex programming, which can be downloaded from http://​cvxr.​com/​cvx.
 
6
The objective function values here include the hinge loss and the bi-regular term.
 
Literatur
1.
Zurück zum Zitat Akbari M, Gharesifard B, Linder T (2019) Individual regret bounds for the distributed online alternating direction method of multipliers. IEEE Trans Autom Control 64(4):1746–1752MathSciNetCrossRef Akbari M, Gharesifard B, Linder T (2019) Individual regret bounds for the distributed online alternating direction method of multipliers. IEEE Trans Autom Control 64(4):1746–1752MathSciNetCrossRef
2.
4.
Zurück zum Zitat Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2010) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122CrossRef Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2010) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122CrossRef
5.
Zurück zum Zitat Buhlmann P, van de Geer S (2011) Statistics for High-dimensional Data: Methods, Theory and Applications. Springer, BerlinCrossRef Buhlmann P, van de Geer S (2011) Statistics for High-dimensional Data: Methods, Theory and Applications. Springer, BerlinCrossRef
6.
Zurück zum Zitat Candès EJ, Wakin MB, Boyd SP (2008) Enhancing sparsity by reweighted \(l_1\) minimization. J Fourier Anal Appl 14(5):877–905MathSciNetCrossRef Candès EJ, Wakin MB, Boyd SP (2008) Enhancing sparsity by reweighted \(l_1\) minimization. J Fourier Anal Appl 14(5):877–905MathSciNetCrossRef
7.
Zurück zum Zitat Chang KW, Hsieh CJ, Lin CJ (2008) Coordinate descent method for large-scale l2-loss linear support vector machines. J Mach Learn Res 9:1369–1398MathSciNetMATH Chang KW, Hsieh CJ, Lin CJ (2008) Coordinate descent method for large-scale l2-loss linear support vector machines. J Mach Learn Res 9:1369–1398MathSciNetMATH
8.
Zurück zum Zitat Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27CrossRef Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27CrossRef
9.
Zurück zum Zitat Chauhan VK, Dahiya K, Sharma A (2019) Problem formulations and solvers in linear SVM: a review. Artif Intell Rev 52:803–855CrossRef Chauhan VK, Dahiya K, Sharma A (2019) Problem formulations and solvers in linear SVM: a review. Artif Intell Rev 52:803–855CrossRef
10.
Zurück zum Zitat Chauhan VK, Sharma A, Dahiya K (2020) Stochastic trust region inexact Newton method for large-scale machine learning. Int J Mach Learn and Cybern 11:1541–1555CrossRef Chauhan VK, Sharma A, Dahiya K (2020) Stochastic trust region inexact Newton method for large-scale machine learning. Int J Mach Learn and Cybern 11:1541–1555CrossRef
11.
Zurück zum Zitat Cohen K, Nedić A, Srikant R (2017) On projected stochastic gradient descent algorithm with weighted averaging for least squares regression. IEEE Trans Autom Control 62(11):5974–5981MathSciNetCrossRef Cohen K, Nedić A, Srikant R (2017) On projected stochastic gradient descent algorithm with weighted averaging for least squares regression. IEEE Trans Autom Control 62(11):5974–5981MathSciNetCrossRef
12.
Zurück zum Zitat Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297MATH Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297MATH
13.
Zurück zum Zitat De Mol C, De Vito E, Rosasco L (2009) Elastic-net regularization in learning theory. J Complex 25(2):201–230MathSciNetCrossRef De Mol C, De Vito E, Rosasco L (2009) Elastic-net regularization in learning theory. J Complex 25(2):201–230MathSciNetCrossRef
14.
Zurück zum Zitat Duchi JC, Shalev-Shwartz S, Singer Y, Tewari A (2010) Composite Objective Mirror Descent. In: Proceedings of the 23rd annual conference on learning theory, pp 14–26 Duchi JC, Shalev-Shwartz S, Singer Y, Tewari A (2010) Composite Objective Mirror Descent. In: Proceedings of the 23rd annual conference on learning theory, pp 14–26
15.
Zurück zum Zitat Gabay D, Mercier B (1976) A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput Optim Appl 2(1):17–40MATH Gabay D, Mercier B (1976) A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput Optim Appl 2(1):17–40MATH
16.
Zurück zum Zitat Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. Freeman, New YorkMATH Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. Freeman, New YorkMATH
17.
Zurück zum Zitat Glowinski R, Marrocco A (1975) Sur l’approximation, paréléments finis d’ordre un, et lan résolution, par pénalisation-dualité, d’une classe de problémes de Dirichlet non linéaires. Rev Fr Automat Infor 9:41–76MATH Glowinski R, Marrocco A (1975) Sur l’approximation, paréléments finis d’ordre un, et lan résolution, par pénalisation-dualité, d’une classe de problémes de Dirichlet non linéaires. Rev Fr Automat Infor 9:41–76MATH
18.
Zurück zum Zitat Gong Y, Xu W (2007) Machine learning for multimedia content analysis. Springer Science & Business Media, New York Gong Y, Xu W (2007) Machine learning for multimedia content analysis. Springer Science & Business Media, New York
19.
Zurück zum Zitat Hajewski J, Oliveira S, Stewart D (2018) Smoothed hinge loss and l1 support vector machines. In: Proceedings of the 2018 IEEE international conference on data mining workshops, pp 1217–1223 Hajewski J, Oliveira S, Stewart D (2018) Smoothed hinge loss and l1 support vector machines. In: Proceedings of the 2018 IEEE international conference on data mining workshops, pp 1217–1223
20.
Zurück zum Zitat He B, Yuan X (2012) On the \(O(1/n)\) convergence rate of the Douglas-Rachford alternating direction method. SIAM J Numer Anal 50(2):700–709MathSciNetCrossRef He B, Yuan X (2012) On the \(O(1/n)\) convergence rate of the Douglas-Rachford alternating direction method. SIAM J Numer Anal 50(2):700–709MathSciNetCrossRef
21.
Zurück zum Zitat Hsieh CJ, Chang KW, Lin CJ, Keerthi SS, Sundararajan S (2008) A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 36th international conference on machine learning, pp 408–415 Hsieh CJ, Chang KW, Lin CJ, Keerthi SS, Sundararajan S (2008) A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 36th international conference on machine learning, pp 408–415
22.
Zurück zum Zitat Huang F, Chen S, Huang H (2019) Faster stochastic alternating direction method of multipliers for nonconvex optimization. In: Proceedings of the 36th international conference on machine learning, pp 2839–2848 Huang F, Chen S, Huang H (2019) Faster stochastic alternating direction method of multipliers for nonconvex optimization. In: Proceedings of the 36th international conference on machine learning, pp 2839–2848
23.
Zurück zum Zitat Joachims T (2006) Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 217–226 Joachims T (2006) Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 217–226
24.
Zurück zum Zitat Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, pp 315-323 Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, pp 315-323
25.
Zurück zum Zitat Khan ZA, Zubair S, Alquhayz H, Azeem M, Ditta A (2019) Design of momentum fractional stochastic gradient descent for recommender systems. IEEE Access 7:179575–179590CrossRef Khan ZA, Zubair S, Alquhayz H, Azeem M, Ditta A (2019) Design of momentum fractional stochastic gradient descent for recommender systems. IEEE Access 7:179575–179590CrossRef
26.
Zurück zum Zitat Lin CJ, Weng RC, Sathiya Keerthi S (2008) Trust region Newton method for large-scale logistic regression. J Mach Learn Res 9:627–650MathSciNetMATH Lin CJ, Weng RC, Sathiya Keerthi S (2008) Trust region Newton method for large-scale logistic regression. J Mach Learn Res 9:627–650MathSciNetMATH
27.
Zurück zum Zitat Liu Y, Shang F, Cheng J (2017) Accelerated variance reduced stochastic ADMM. In: Proceedings of the 31st AAAI conference on artificial intelligence, pp 2287-2293 Liu Y, Shang F, Cheng J (2017) Accelerated variance reduced stochastic ADMM. In: Proceedings of the 31st AAAI conference on artificial intelligence, pp 2287-2293
28.
Zurück zum Zitat Littlestone N (1988) Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Mach Learn 2(4):285–318 Littlestone N (1988) Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Mach Learn 2(4):285–318
29.
Zurück zum Zitat Nalepa J, Kawulok M (2019) Selecting training sets for support vector machines: a review. Artif Intell Rev 52(2):857–900CrossRef Nalepa J, Kawulok M (2019) Selecting training sets for support vector machines: a review. Artif Intell Rev 52(2):857–900CrossRef
30.
Zurück zum Zitat Sammut C, Webb GI (2011) Encyclopedia of Machine Learning. Springer Science & Business Media, New YorkMATH Sammut C, Webb GI (2011) Encyclopedia of Machine Learning. Springer Science & Business Media, New YorkMATH
31.
Zurück zum Zitat Shalev-Shwartz S (2012) Online learning and online convex optimization. Found Trends Mach Learn 4(2):107–194CrossRef Shalev-Shwartz S (2012) Online learning and online convex optimization. Found Trends Mach Learn 4(2):107–194CrossRef
32.
Zurück zum Zitat Shalev-Shwartz S, Singer Y, Srebro N, Cotter A (2011) Pegasos: primal estimated sub-gradient solver for SVM. Math Program 127(1):3–30MathSciNetCrossRef Shalev-Shwartz S, Singer Y, Srebro N, Cotter A (2011) Pegasos: primal estimated sub-gradient solver for SVM. Math Program 127(1):3–30MathSciNetCrossRef
33.
Zurück zum Zitat Singla M, Shukla KK (2020) Robust statistics-based support vector machine and its variants: a survey. Neural Comput Appl 32:11173–11194CrossRef Singla M, Shukla KK (2020) Robust statistics-based support vector machine and its variants: a survey. Neural Comput Appl 32:11173–11194CrossRef
34.
Zurück zum Zitat Song T, Li D, Liu Z, Yang W (2019) Online ADMM-based extreme learning machine for sparse supervised learning. IEEE Access 7:64533–64544CrossRef Song T, Li D, Liu Z, Yang W (2019) Online ADMM-based extreme learning machine for sparse supervised learning. IEEE Access 7:64533–64544CrossRef
35.
Zurück zum Zitat Suzuki T (2013) Dual averaging and proximal gradient descent for online alternating direction multiplier method. In: Proceedings of the 30th international conference on machine learning, pp 392–400 Suzuki T (2013) Dual averaging and proximal gradient descent for online alternating direction multiplier method. In: Proceedings of the 30th international conference on machine learning, pp 392–400
36.
Zurück zum Zitat Tan C, Ma S, Dai Y H, Qian Y (2016) Barzilai-borwein step size for stochastic gradient descent. Advances in neural information processing systems, pp 685–693 Tan C, Ma S, Dai Y H, Qian Y (2016) Barzilai-borwein step size for stochastic gradient descent. Advances in neural information processing systems, pp 685–693
37.
Zurück zum Zitat Vapnik V (1995) The nature of statistical learning theory. Springer, New YorkCrossRef Vapnik V (1995) The nature of statistical learning theory. Springer, New YorkCrossRef
38.
Zurück zum Zitat Wang L, Zhu J, Zou H (2006) The doubly regularized support vector machine. Stat Sinica 16:589–615MathSciNetMATH Wang L, Zhu J, Zou H (2006) The doubly regularized support vector machine. Stat Sinica 16:589–615MathSciNetMATH
39.
Zurück zum Zitat Wang L, Zhu J, Zou H (2008) Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 2(3):412–419CrossRef Wang L, Zhu J, Zou H (2008) Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 2(3):412–419CrossRef
40.
Zurück zum Zitat Wang Z, Hu R, Wang S, Jiang J (2014) Face hallucination via weighted adaptive sparse regularization. IEEE Trans Circuits Syst Video Technol 24(5):802–813CrossRef Wang Z, Hu R, Wang S, Jiang J (2014) Face hallucination via weighted adaptive sparse regularization. IEEE Trans Circuits Syst Video Technol 24(5):802–813CrossRef
41.
Zurück zum Zitat Xiao L (2009) Dual averaging methods for regularized stochastic learning and online optimization. In: Advances in neural information processing systems, pp 2116–2124 Xiao L (2009) Dual averaging methods for regularized stochastic learning and online optimization. In: Advances in neural information processing systems, pp 2116–2124
42.
Zurück zum Zitat Xie Z, Li Y (2019) Large-scale support vector regression with budgeted stochastic gradient descent. Int J Mach Learn Cybern 10(6):1529–1541CrossRef Xie Z, Li Y (2019) Large-scale support vector regression with budgeted stochastic gradient descent. Int J Mach Learn Cybern 10(6):1529–1541CrossRef
43.
Zurück zum Zitat Xu Y, Akrotirianakis I, Chakraborty A (2016) Proximal gradient method for huberized support vector machine. Pattern Anal Appl 19(4):989–1005MathSciNetCrossRef Xu Y, Akrotirianakis I, Chakraborty A (2016) Proximal gradient method for huberized support vector machine. Pattern Anal Appl 19(4):989–1005MathSciNetCrossRef
44.
Zurück zum Zitat Xue W, Zhang W (2017) Learning a coupled linearized method in online setting. IEEE Trans Neural Netw Learn Syst 28(2):438–450MathSciNetCrossRef Xue W, Zhang W (2017) Learning a coupled linearized method in online setting. IEEE Trans Neural Netw Learn Syst 28(2):438–450MathSciNetCrossRef
45.
Zurück zum Zitat Zamora E, Sossa H (2017) Dendrite morphological neurons trained by stochastic gradient descent. Neurocomputing 260:420–431CrossRef Zamora E, Sossa H (2017) Dendrite morphological neurons trained by stochastic gradient descent. Neurocomputing 260:420–431CrossRef
46.
Zurück zum Zitat Zhao P, Zhang T (2015) Stochastic optimization with importance sampling for regularized loss minimization. In: Proceedings of the 20th international conference on machine learning Zhao P, Zhang T (2015) Stochastic optimization with importance sampling for regularized loss minimization. In: Proceedings of the 20th international conference on machine learning
47.
Zurück zum Zitat Zhu J, Rosset S, Hastie T, Tibshirani R (2004) 1-norm support vector machines. In: Advances in neural information processing systems, pp 49–56 Zhu J, Rosset S, Hastie T, Tibshirani R (2004) 1-norm support vector machines. In: Advances in neural information processing systems, pp 49–56
48.
Zurück zum Zitat Zinkevich M (2003) Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the 20th international conference on machine learning, pp 928–936 Zinkevich M (2003) Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the 20th international conference on machine learning, pp 928–936
49.
Zurück zum Zitat Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67(2):301–320MathSciNetCrossRef Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67(2):301–320MathSciNetCrossRef
50.
Zurück zum Zitat Zou H, Zhang H (2009) On the adaptive elastic-net with a diverging number of parameters. Ann Stat 37(4):1733–1751MathSciNetCrossRef Zou H, Zhang H (2009) On the adaptive elastic-net with a diverging number of parameters. Ann Stat 37(4):1733–1751MathSciNetCrossRef
Metadaten
Titel
Sample-based online learning for bi-regular hinge loss
verfasst von
Wei Xue
Ping Zhong
Wensheng Zhang
Gaohang Yu
Yebin Chen
Publikationsdatum
24.01.2021
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 6/2021
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-020-01272-7

Weitere Artikel der Ausgabe 6/2021

International Journal of Machine Learning and Cybernetics 6/2021 Zur Ausgabe

Neuer Inhalt