Skip to main content
Erschienen in:

26.07.2023

Optimality of Robust Online Learning

verfasst von: Zheng-Chu Guo, Andreas Christmann, Lei Shi

Erschienen in: Foundations of Computational Mathematics | Ausgabe 5/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we study an online learning algorithm with a robust loss function \(\mathcal {L}_{\sigma }\) for regression over a reproducing kernel Hilbert space (RKHS). The loss function \(\mathcal {L}_{\sigma }\) involving a scaling parameter \(\sigma >0\) can cover a wide range of commonly used robust losses. The proposed algorithm is then a robust alternative for online least squares regression aiming to estimate the conditional mean function. For properly chosen \(\sigma \) and step size, we show that the last iterate of this online algorithm can achieve optimal capacity independent convergence in the mean square distance. Moreover, if additional information on the underlying function space is known, we also establish optimal capacity-dependent rates for strong convergence in RKHS. To the best of our knowledge, both of the two results are new to the existing literature of online learning.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68 (1950), 337–404.MathSciNetCrossRef N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68 (1950), 337–404.MathSciNetCrossRef
2.
Zurück zum Zitat F. Bauer, S. Pereverzev, and L. Rosasco. On regularization algorithms in learning theory. Journal of complexity, 23 (2007), 52–72.MathSciNetCrossRef F. Bauer, S. Pereverzev, and L. Rosasco. On regularization algorithms in learning theory. Journal of complexity, 23 (2007), 52–72.MathSciNetCrossRef
3.
Zurück zum Zitat R. Bessa, V. Miranda, and J. Gama. Entropy and correntropy against minimum square error in offline and online three-day ahead wind power forecasting. IEEE Transactions on Power Systems, 24 (2009), 1657–1666.CrossRef R. Bessa, V. Miranda, and J. Gama. Entropy and correntropy against minimum square error in offline and online three-day ahead wind power forecasting. IEEE Transactions on Power Systems, 24 (2009), 1657–1666.CrossRef
4.
Zurück zum Zitat M. Black and P. Anandan. The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields. Computer vision and image understanding, 63 (1996), 75–104.CrossRef M. Black and P. Anandan. The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields. Computer vision and image understanding, 63 (1996), 75–104.CrossRef
5.
Zurück zum Zitat G. Blanchard and N. Mücke. Optimal rates for regularization of statistical inverse Learning problems. Foundations of Computational Mathematics, 18 (2018), 971–1013.MathSciNetCrossRef G. Blanchard and N. Mücke. Optimal rates for regularization of statistical inverse Learning problems. Foundations of Computational Mathematics, 18 (2018), 971–1013.MathSciNetCrossRef
6.
Zurück zum Zitat L. Bottou, F. E Curtis, and J. Nocedal. Optimization methods for large-scale machine learning. SIAM Review, 60 (2018), 223–311.MathSciNetCrossRef L. Bottou, F. E Curtis, and J. Nocedal. Optimization methods for large-scale machine learning. SIAM Review, 60 (2018), 223–311.MathSciNetCrossRef
7.
Zurück zum Zitat A. Caponnetto and E. De Vito. Optimal rates for the regularized least squares algorithm. Foundations of Computational Mathematics, 7 (2007), 331–368.MathSciNetCrossRef A. Caponnetto and E. De Vito. Optimal rates for the regularized least squares algorithm. Foundations of Computational Mathematics, 7 (2007), 331–368.MathSciNetCrossRef
8.
Zurück zum Zitat X. Chen, B. Tang, J. Fan, and X. Guo. Online gradient descent algorithms for functional data learning. Journal of Complexity, page 101635, 2021. X. Chen, B. Tang, J. Fan, and X. Guo. Online gradient descent algorithms for functional data learning. Journal of Complexity, page 101635, 2021.
9.
Zurück zum Zitat A. Christmann and A. Van Messem, and I. Steinwart. On consistency and robustness properties of support vector machines for heavy-tailed distributions. Statistics and Its Interface, 2 (2009), 331–327.MathSciNetCrossRef A. Christmann and A. Van Messem, and I. Steinwart. On consistency and robustness properties of support vector machines for heavy-tailed distributions. Statistics and Its Interface, 2 (2009), 331–327.MathSciNetCrossRef
10.
Zurück zum Zitat A. Christmann and I. Steinwart. Consistency and robustness of kernel-based regression in convex risk minimization. Bernoulli, 13 (2007), 799–819.MathSciNetCrossRef A. Christmann and I. Steinwart. Consistency and robustness of kernel-based regression in convex risk minimization. Bernoulli, 13 (2007), 799–819.MathSciNetCrossRef
11.
Zurück zum Zitat F. Cucker and D. X. Zhou. Learning Theory: An Approximation Theory Viewpoint. Cambridge Univesity Press, 2007. F. Cucker and D. X. Zhou. Learning Theory: An Approximation Theory Viewpoint. Cambridge Univesity Press, 2007.
12.
Zurück zum Zitat K. De Brabanter, K. Pelckmans, J. De Brabanter, M. Debruyne, J. A. K. Suykens, M. Hubert, and B. De Moor. Robustness of kernel based regression: a comparison of iterative weighting schemes. International Conference on Artificial Neural Networks, (2009), 100–110. K. De Brabanter, K. Pelckmans, J. De Brabanter, M. Debruyne, J. A. K. Suykens, M. Hubert, and B. De Moor. Robustness of kernel based regression: a comparison of iterative weighting schemes. International Conference on Artificial Neural Networks, (2009), 100–110.
13.
Zurück zum Zitat M. Debruyne, A. Christmann, M. Hubert, and J. A. K. Suykens. Robustness of reweighted least squares kernel based regression. Journal of Multivariate Analysis, 101 (2010), 447–463.MathSciNetCrossRef M. Debruyne, A. Christmann, M. Hubert, and J. A. K. Suykens. Robustness of reweighted least squares kernel based regression. Journal of Multivariate Analysis, 101 (2010), 447–463.MathSciNetCrossRef
14.
Zurück zum Zitat E. De Vito, S. Pereverzyev, and L. Rosasco. Adaptive kernel methods using the balancing principle. Foundations of Computational Mathematics, 10 (2010), 455–479.MathSciNetCrossRef E. De Vito, S. Pereverzyev, and L. Rosasco. Adaptive kernel methods using the balancing principle. Foundations of Computational Mathematics, 10 (2010), 455–479.MathSciNetCrossRef
15.
Zurück zum Zitat A. Dieuleveut and F. Bach. Nonparametric stochastic approximation with large step-sizes. The Annals of Statistics, 44 (2016), 1363–1399.MathSciNetCrossRef A. Dieuleveut and F. Bach. Nonparametric stochastic approximation with large step-sizes. The Annals of Statistics, 44 (2016), 1363–1399.MathSciNetCrossRef
16.
Zurück zum Zitat R. Fair. On the robust estimation of econometric models. Annals of Economic and Social Measurement, 3 (1974), 667–677. R. Fair. On the robust estimation of econometric models. Annals of Economic and Social Measurement, 3 (1974), 667–677.
17.
Zurück zum Zitat H. Feng, S. Hou, L. Wei, and D. X. Zhou. CNN models for readability of Chinese texts. Mathematical Foundations of Computing, 5 (2021), 351–362.CrossRef H. Feng, S. Hou, L. Wei, and D. X. Zhou. CNN models for readability of Chinese texts. Mathematical Foundations of Computing, 5 (2021), 351–362.CrossRef
18.
Zurück zum Zitat Y. Feng, X. Huang, L. Shi, Y. Yang, and J. A. K. Suykens. Learning with the maximum correntropy criterion induced losses for regression. Journal of Machine Learning Research, 16 (2015), 993–1034.MathSciNet Y. Feng, X. Huang, L. Shi, Y. Yang, and J. A. K. Suykens. Learning with the maximum correntropy criterion induced losses for regression. Journal of Machine Learning Research, 16 (2015), 993–1034.MathSciNet
19.
Zurück zum Zitat Y. Feng and Q. Wu. A framework of learning through empirical gain maximization. Neural Computation, 33 (2021), 1656–1697.MathSciNetCrossRef Y. Feng and Q. Wu. A framework of learning through empirical gain maximization. Neural Computation, 33 (2021), 1656–1697.MathSciNetCrossRef
20.
Zurück zum Zitat S. Ganan and D. McClure. Bayesian image analysis: An application to single photon emission tomography. Journal of the American Statistical Association, (1985), 12–18. S. Ganan and D. McClure. Bayesian image analysis: An application to single photon emission tomography. Journal of the American Statistical Association, (1985), 12–18.
21.
Zurück zum Zitat X. Guo, Z. C. Guo, and L. Shi. Capacity dependent analysis for functional online learning algorithms. Applied and Computational Harmonic Analysis, 67 (2023), 1–30. X. Guo, Z. C. Guo, and L. Shi. Capacity dependent analysis for functional online learning algorithms. Applied and Computational Harmonic Analysis, 67 (2023), 1–30.
22.
Zurück zum Zitat Z. C. Guo, T. Hu, and L. Shi. Gradient descent for robust kernel based regression. Inverse Problems, 34 (2018), 065009(29pp). Z. C. Guo, T. Hu, and L. Shi. Gradient descent for robust kernel based regression. Inverse Problems, 34 (2018), 065009(29pp).
23.
Zurück zum Zitat Z. C. Guo, S. B. Lin, and D. X. Zhou. Learning theory of distribued spectral algorithms. Inverse Problems, 33 (2017), 074009(29pp). Z. C. Guo, S. B. Lin, and D. X. Zhou. Learning theory of distribued spectral algorithms. Inverse Problems, 33 (2017), 074009(29pp).
24.
Zurück zum Zitat Z. C. Guo and L. Shi. Fast and strong convergence of online learning algorithms. Advances in Computational Mathematics, 26 (2019), 1–26.MathSciNet Z. C. Guo and L. Shi. Fast and strong convergence of online learning algorithms. Advances in Computational Mathematics, 26 (2019), 1–26.MathSciNet
25.
Zurück zum Zitat F. R. Hampel, E. M. Ronchetti and P. J. Rousseeuw, and W. A. Stahel. Robust statistics: The Approach Based on Influence Functions. John Wiley & Sons, New York, 1986. F. R. Hampel, E. M. Ronchetti and P. J. Rousseeuw, and W. A. Stahel. Robust statistics: The Approach Based on Influence Functions. John Wiley & Sons, New York, 1986.
26.
Zurück zum Zitat R. He, W. Zheng, and B. Hu. Maximum correntropy criterion for robust face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33 (2011), 1561–1576.CrossRef R. He, W. Zheng, and B. Hu. Maximum correntropy criterion for robust face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33 (2011), 1561–1576.CrossRef
27.
Zurück zum Zitat P. W. Holland and R. E. Welsch. Robust regression using iteratively reweighted leastsquares. Communications in Statistics-Theory and Methods, 6 (1977), 813–827.CrossRef P. W. Holland and R. E. Welsch. Robust regression using iteratively reweighted leastsquares. Communications in Statistics-Theory and Methods, 6 (1977), 813–827.CrossRef
28.
Zurück zum Zitat S. Huang, Y. Feng, and Q. Wu, Learning theory of minimum error entropy under weak moment conditions. Analysis and Applications, 20 (2022), 121–139.MathSciNetCrossRef S. Huang, Y. Feng, and Q. Wu, Learning theory of minimum error entropy under weak moment conditions. Analysis and Applications, 20 (2022), 121–139.MathSciNetCrossRef
30.
Zurück zum Zitat J. Lin and L. Rosasco. Optimal learning for multi-pass stochastic gradient methods. In Advances in Neural Information Processing Systems, 4556–4564, 2016. J. Lin and L. Rosasco. Optimal learning for multi-pass stochastic gradient methods. In Advances in Neural Information Processing Systems, 4556–4564, 2016.
31.
Zurück zum Zitat W. Liu, P. Pokharel, and J. C. Principe. Correntropy: Properties and applications in non-Gaussian signal processing. IEEE Transactions on Signal Processing, 55 (2007), 5286–5298.MathSciNetCrossRef W. Liu, P. Pokharel, and J. C. Principe. Correntropy: Properties and applications in non-Gaussian signal processing. IEEE Transactions on Signal Processing, 55 (2007), 5286–5298.MathSciNetCrossRef
32.
Zurück zum Zitat S. Lu, P. Mathé, and S. V. Pereverzev. Balancing principle in supervised learning for a general regularization scheme. Applied and Computational Harmonic Analysis, 48 (2020), 123–148.MathSciNetCrossRef S. Lu, P. Mathé, and S. V. Pereverzev. Balancing principle in supervised learning for a general regularization scheme. Applied and Computational Harmonic Analysis, 48 (2020), 123–148.MathSciNetCrossRef
33.
Zurück zum Zitat F. Lv and J. Fan, Optimal learning with Gaussians and correntropy loss. Analysis and Applications, 19(2021), 107–124.MathSciNetCrossRef F. Lv and J. Fan, Optimal learning with Gaussians and correntropy loss. Analysis and Applications, 19(2021), 107–124.MathSciNetCrossRef
34.
Zurück zum Zitat R. Maronna, D. Martin, and V. Yohai. Robust Statistics. John Wiley & Sons, Chichester, 2006.CrossRef R. Maronna, D. Martin, and V. Yohai. Robust Statistics. John Wiley & Sons, Chichester, 2006.CrossRef
35.
Zurück zum Zitat R. A. Maronna and R. D. Martin and V. J. Yohai. Robust Statistics: Theory and Methods. John Wiley & Sons, New York, 2006.CrossRef R. A. Maronna and R. D. Martin and V. J. Yohai. Robust Statistics: Theory and Methods. John Wiley & Sons, New York, 2006.CrossRef
36.
Zurück zum Zitat I. Mizera and C. Müller. Breakdown points of Cauchy regression-scale estimators. Statistics & probability letters, 57 (2002), 79–89.MathSciNetCrossRef I. Mizera and C. Müller. Breakdown points of Cauchy regression-scale estimators. Statistics & probability letters, 57 (2002), 79–89.MathSciNetCrossRef
37.
Zurück zum Zitat L. Pillaud-Vivien, R. Alessandro, and F. Bach. Statistical optimality of stochastic gradient descent on hard learning problems through multiple passes. In Advances in Neural Information Processing Systems, 8114–8124, 2018. L. Pillaud-Vivien, R. Alessandro, and F. Bach. Statistical optimality of stochastic gradient descent on hard learning problems through multiple passes. In Advances in Neural Information Processing Systems, 8114–8124, 2018.
38.
Zurück zum Zitat A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19 (2009), 1574–1609.MathSciNetCrossRef A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19 (2009), 1574–1609.MathSciNetCrossRef
39.
Zurück zum Zitat A. Rakhlin, O. Shamir, and K. Sridharan. Making gradient descent optimal for strongly convex stochastic optimization. In Proceedings of the 29th International Conference on Machine Learning (ICML-12), 449–456, 2012. A. Rakhlin, O. Shamir, and K. Sridharan. Making gradient descent optimal for strongly convex stochastic optimization. In Proceedings of the 29th International Conference on Machine Learning (ICML-12), 449–456, 2012.
40.
Zurück zum Zitat G. Raskutti, M. J. Wainwright, and B. Yu. Early stopping and non-parametric regression: an optimal data-dependent stopping rule. Journal of Machine Learning Research, 15 (2014), 335–366.MathSciNet G. Raskutti, M. J. Wainwright, and B. Yu. Early stopping and non-parametric regression: an optimal data-dependent stopping rule. Journal of Machine Learning Research, 15 (2014), 335–366.MathSciNet
41.
Zurück zum Zitat L. Rosasco, A, Tacchetti, and S. Villa. Regularization by early stopping for online learning algorithms. Stat, 1050 (2014), 30 pages. L. Rosasco, A, Tacchetti, and S. Villa. Regularization by early stopping for online learning algorithms. Stat, 1050 (2014), 30 pages.
42.
Zurück zum Zitat I. Santamaría, P. Pokharel, and J. C. Principe. Generalized correlation function: definition, properties, and application to blind equalization. IEEE Transactions on Signal Processing, 54 (2006), 2187–2197.CrossRef I. Santamaría, P. Pokharel, and J. C. Principe. Generalized correlation function: definition, properties, and application to blind equalization. IEEE Transactions on Signal Processing, 54 (2006), 2187–2197.CrossRef
43.
Zurück zum Zitat B. Schölkopf and A. J. Smola. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, 2018. B. Schölkopf and A. J. Smola. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, 2018.
44.
Zurück zum Zitat S. Smale and D. X. Zhou. Estimating the approximation error in learning theory. Analysis and Applications, 1 (2003), 17–41.MathSciNetCrossRef S. Smale and D. X. Zhou. Estimating the approximation error in learning theory. Analysis and Applications, 1 (2003), 17–41.MathSciNetCrossRef
45.
Zurück zum Zitat S. Smale and D. X. Zhou. Learning theory estimates via integral operators and their approximations. Constructive Approximation, 26 (2007), 153–172.MathSciNetCrossRef S. Smale and D. X. Zhou. Learning theory estimates via integral operators and their approximations. Constructive Approximation, 26 (2007), 153–172.MathSciNetCrossRef
46.
Zurück zum Zitat S. Smale and D. X. Zhou. Online learning with Markov sampling. Analysis and Applications, 7 (2009), 87–113.MathSciNetCrossRef S. Smale and D. X. Zhou. Online learning with Markov sampling. Analysis and Applications, 7 (2009), 87–113.MathSciNetCrossRef
47.
Zurück zum Zitat I. Steinwart. How to compare different loss functions and their risks. Constructive Approximation, 26 (2017), 225–287.MathSciNetCrossRef I. Steinwart. How to compare different loss functions and their risks. Constructive Approximation, 26 (2017), 225–287.MathSciNetCrossRef
48.
Zurück zum Zitat I. Steinwart and A. Christmann. Support Vector Machines. Springer-Verlag, New York, 2008. I. Steinwart and A. Christmann. Support Vector Machines. Springer-Verlag, New York, 2008.
49.
Zurück zum Zitat I. Steinwart, D. R. Hush, and C. Scovel. Optimal rates for regularized least squares regression. In The 22nd Annual Conference on Learning Theory (COLT), 2009. I. Steinwart, D. R. Hush, and C. Scovel. Optimal rates for regularized least squares regression. In The 22nd Annual Conference on Learning Theory (COLT), 2009.
50.
Zurück zum Zitat D. Sun, S. Roth, and M. Black. Secrets of optical flow estimation and their principles. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2432–2439, 2010. D. Sun, S. Roth, and M. Black. Secrets of optical flow estimation and their principles. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2432–2439, 2010.
51.
Zurück zum Zitat I. Sutskever, J. Martens, G. Dahl, and G. Hinton. On the importance of initialization and momentum in deep learning. In International Conference on Machine Learning (ICML-13), 1139–1147, 2013. I. Sutskever, J. Martens, G. Dahl, and G. Hinton. On the importance of initialization and momentum in deep learning. In International Conference on Machine Learning (ICML-13), 1139–1147, 2013.
52.
Zurück zum Zitat Y. Yao. On complexity issues of online learning algorithms. IEEE Transactions on Information Theory, 56 (2010), 6470–6481.MathSciNetCrossRef Y. Yao. On complexity issues of online learning algorithms. IEEE Transactions on Information Theory, 56 (2010), 6470–6481.MathSciNetCrossRef
53.
Zurück zum Zitat Y. Ying and M. Pontil. Online gradient descent learning algorithms. Foundations of Computational Mathematics, 8 (2008), 561–596.MathSciNetCrossRef Y. Ying and M. Pontil. Online gradient descent learning algorithms. Foundations of Computational Mathematics, 8 (2008), 561–596.MathSciNetCrossRef
54.
Zurück zum Zitat Y. Ying and D. X. Zhou. Unregularized online learning algorithms with general loss functions. Applied and Computational Harmonic Analysis, 42 (2017), 224–244.MathSciNetCrossRef Y. Ying and D. X. Zhou. Unregularized online learning algorithms with general loss functions. Applied and Computational Harmonic Analysis, 42 (2017), 224–244.MathSciNetCrossRef
55.
Zurück zum Zitat T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In International Conference on Machine Learning (ICML-04), 919–926, 2004. T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In International Conference on Machine Learning (ICML-04), 919–926, 2004.
56.
Zurück zum Zitat X. Zhu, Z. Li, and J. Sun. Expression recognition method combining convolutional features and Transformer. Mathematical Foundations of Computing, 6 (2023), 203–217.CrossRef X. Zhu, Z. Li, and J. Sun. Expression recognition method combining convolutional features and Transformer. Mathematical Foundations of Computing, 6 (2023), 203–217.CrossRef
Metadaten
Titel
Optimality of Robust Online Learning
verfasst von
Zheng-Chu Guo
Andreas Christmann
Lei Shi
Publikationsdatum
26.07.2023
Verlag
Springer US
Erschienen in
Foundations of Computational Mathematics / Ausgabe 5/2024
Print ISSN: 1615-3375
Elektronische ISSN: 1615-3383
DOI
https://doi.org/10.1007/s10208-023-09616-9