Top

Published in:

2015 | OriginalPaper | Chapter

25. Robust Algorithms via PAC-Bayes and Laplace Distributions

Authors : Asaf Noy, Koby Crammer

Published in: Measures of Complexity

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Laplace random variables are commonly used to model extreme noise in many fields, while systems trained to deal with such noise are often characterized by robustness properties. We introduce new learning algorithms that minimize objectives derived directly from PAC-Bayes generalization bounds, incorporating Laplace distributions. The resulting algorithms are regulated by the Huber loss function which is considered relatively robust to large noise. We analyze the convexity properties of the objective, and propose a few bounds which are fully convex, two of which are jointly convex in the mean and standard deviation under certain conditions. We derive new algorithms analogous to recent boosting algorithms, providing novel relations between boosting and PAC-Bayes analysis. Experiments show that our algorithms outperform AdaBoost (Freund and Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, 1995), L1-LogBoost (Duchi and Singer, Boostingwith structural sparsity, 2009), and RobustBoost (Freund, A more robust boosting algorithm, 2009) in a wide range of noise.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Lower Bounds for Sparse Coding

The family of distributions that are based on the \(\ell _2\) distance from the mean is called multivariate Laplace. Hence we use the different name: Laplace-like family.

Generalization of the following for an arbitrary vector \(\varvec{\sigma }_{Q}\) is straightforward by replacing each example \(\varvec{x}\) with \(\varvec{x}' = \left( {\sigma _{Q,1} x_1 , \ldots ,\sigma _{Q,d} x_d}\right) \).

Notice that if \(x_{k}=0\) the random variable \(\omega _k x_{k}\) equals zero too, therefore we assume without loss of generality that \(x_{k}\ne 0\).

The CDF is also well-defined and can be calculated when \(\left| {x_{i,j} }\right| =\left| {x_{i,k} }\right| \), by taking a limit and getting a distribution which is a mix of the one above with the Bilateral Gamma distribution family.

Ambroladze, A., Parrado-Hernández, E., Shawe-Taylor, J.: Tighter PAC-Bayes bounds. In: Advances in Neural Information Processing Systems 19, pp. 9–16. MIT Press, Cambridge (2006)

Aravkin, A.Y., Bell, B.M., Burke, J.V., Pillonetto, G.: An \(\ell _1\)-Laplace robust Kalman smoother. IEEE Trans. Autom. Control 56(12), 2898–2911 (2011)MathSciNetCrossRef

Buhlmann, P., Hothorn, T.: Boosting algorithms: regularization, prediction and model fitting. Stat. Sci. 22(4) (2007)

Catoni, O.: PAC-Bayesian Surpervised Classication: The Thermodynamics of Statistical Learning, IMS Lecture Notes Monograph Series, vol. 56. Institute of Mathematical Statistics (2007). arxiv.org/abs/0712.0248

Collins, M., Schapire, R.E., Singer, Y.: Logistic regression AdaBoost and Bregman distances. Mach. Learn. 48(1–3), 253–285 (2002)MATHCrossRef

Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 279–297 (1995)

Crammer, K., Mohri, M., Pereira, F.: Gaussian margin machines. J. Mach. Learn. Res. Proc. Track 5, 105–112 (2009)

Crammer, K., Wagner, T.: Volume regularization for binary classification. In: Bartlett, P. et al. (ed.) Advances in Neural Information Processing Systems 25, pp. 341–349 (2013)

Dekel, O., Shalev-Shwartz, S., Singer, Y.: Smooth e-intensive regression by loss symmetrization. In: Learning Theory and Kernel Machines. COLT. Lecture Notes in Computer Science, vol. 2777, pp. 433–447. Springer, Berlin (2003)

10.

Duchi, J., Singer, Y.: Boosting with structural sparsity. In: International Conference on Machine Learning (ICML 26) pp. 297–304 (2009)

11.

Freund, Y.: A more robust boosting algorithm (2009). arXiv preprint arXiv:0905.2138

12.

Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: 2nd European Conference on Computational Learning Theory. Lecture notes in Computer Science, vol. 904, pp. 23–37. Springer, Berlin (1995)

13.

Germain, P., Lacasse, A., Laviolette, F., Marchand, M.: A PAC-Bayes risk bound for general loss functions. In: Advances in Neural Information Processing Systems 19, pp. 449-456 (2006)

14.

Germain, P., Lacasse, A., Laviolette, F., Marchand, M.: PAC-Bayesian learning of linear classifiers. In: International Conference on Machine Learning, ICML 2009, pp. 353–360 (2009)

15.

Germain, P., Lacasse, A., Laviolette, F., Marchand, M., Shanian, S.: From PAC-Bayes bounds to KL regularization. In: Advances in Neural Information Processing Systems 22, pp. 603–610. MIT Press, Cambridge (2009)

16.

Herbrich, R., Graepel, T.: A PAC-Bayesian margin bound for linear classifiers. IEEE Trans. Inf. Theory 48(12), 3140–3150 (2006)MathSciNetCrossRef

17.

Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press, Cambridge (1991)MATHCrossRef

18.

Huber, P.: Robust Statistics. Wiley, New York (1974)

19.

Huber, P.J.: Robust estimation of a location parameter. Ann. Math. Stat. 35(1), 73–101 (1964)MATHCrossRef

20.

Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106(4) (1957)

21.

Keshet, J., McAllester, D.A., Hazan, T.: PAC-Bayesian approach for minimization of phoneme error rate. In: Acoustics, Speech, and Signal Processing (ICASSP), pp. 2224–2227 (2011)

22.

Langford, J., Seeger, M.: Bounds for averaging classifiers. Technical Report, Carnegie Mellon University, Department of Computer Science (2001)

23.

Langford, J., Shawe-Taylor, J.: PAC-Bayes and margins. In: Advances in Neural Information Processing Systems 15, pp. 439–446. MIT Press, Cambridge (2002)

24.

Long, P.M., Servedio, R.A.: Random classification noise defeats all convex potential boosters. Mach. Learn. 78(3), 287–304 (2010)MathSciNetCrossRef

25.

McAllester, D.: PAC-Bayesian model averaging. In: Proceedings of COLT, pp. 164–170. ACM, New York (1999)

26.

McAllester, D.: PAC-Bayesian stochastic model selection. Mach. Learn. 51, 5–21 (2003)MATHCrossRef

27.

McDonald, R.A., Hand, D.J., Eckley, I.: An empirical comparison of three boosting algorithms on real data sets with artificial class noise. In: Multiple Classifier Systems. Lecture Notes in Computer Science, vol. 2709, pp. 35–44. Springer, Berlin (2003)

28.

Noy, A., Crammer, K.: Robust forward algorithms via PAC-Bayes and Laplace distributions. J. Mach. Learn. Res. 33, 678–686 (2014)

29.

Roy, J.F., Marchand, M., Laviolette, F.: From PAC-Bayes bounds to quadratic programs for majority votes. In: Proceedings of the 29th International Conference on Machine Learning (ICML), pp. 649–656 (2011)

30.

Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37(3), 297–336 (1999)MATHCrossRef

31.

Seeger, M.: PAC-Bayesian generalization error bounds for Gaussian processes. J. Mach. Learn. Res. 3, 233–269 (2002)MathSciNetCrossRef

32.

Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: The 25th International Conference on Machine learning, pp. 1096–1103 (2008)

33.

Wang, Y.G., Lin, X., Min, Z., Bai, Z.: Robust estimation using the Huber function with a data-dependent tuning constant. J. Comput. Graph. Stat. 16(2), 468–481 (2007)CrossRef

Title: Robust Algorithms via PAC-Bayes and Laplace Distributions
Authors: Asaf Noy
Koby Crammer
Publisher: Springer International Publishing
Book: Measures of Complexity
Print ISBN: 978-3-319-21851-9

Electronic ISBN: 978-3-319-21852-6

Copyright Year: 2015
DOI: https://doi.org/10.1007/978-3-319-21852-6_25

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner