Skip to main content
Top

2015 | OriginalPaper | Chapter

25. Robust Algorithms via PAC-Bayes and Laplace Distributions

Authors : Asaf Noy, Koby Crammer

Published in: Measures of Complexity

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Laplace random variables are commonly used to model extreme noise in many fields, while systems trained to deal with such noise are often characterized by robustness properties. We introduce new learning algorithms that minimize objectives derived directly from PAC-Bayes generalization bounds, incorporating Laplace distributions. The resulting algorithms are regulated by the Huber loss function which is considered relatively robust to large noise. We analyze the convexity properties of the objective, and propose a few bounds which are fully convex, two of which are jointly convex in the mean and standard deviation under certain conditions. We derive new algorithms analogous to recent boosting algorithms, providing novel relations between boosting and PAC-Bayes analysis. Experiments show that our algorithms outperform AdaBoost (Freund and Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, 1995), L1-LogBoost (Duchi and Singer, Boostingwith structural sparsity, 2009), and RobustBoost (Freund, A more robust boosting algorithm, 2009) in a wide range of noise.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
The family of distributions that are based on the \(\ell _2\) distance from the mean is called multivariate Laplace. Hence we use the different name: Laplace-like family.
 
2
Generalization of the following for an arbitrary vector \(\varvec{\sigma }_{Q}\) is straightforward by replacing each example \(\varvec{x}\) with \(\varvec{x}' = \left( {\sigma _{Q,1} x_1 , \ldots ,\sigma _{Q,d} x_d}\right) \).
 
3
Notice that if \(x_{k}=0\) the random variable \(\omega _k x_{k}\) equals zero too, therefore we assume without loss of generality that \(x_{k}\ne 0\).
 
4
The CDF is also well-defined and can be calculated when \(\left| {x_{i,j} }\right| =\left| {x_{i,k} }\right| \), by taking a limit and getting a distribution which is a mix of the one above with the Bilateral Gamma distribution family.
 
Literature
1.
go back to reference Ambroladze, A., Parrado-Hernández, E., Shawe-Taylor, J.: Tighter PAC-Bayes bounds. In: Advances in Neural Information Processing Systems 19, pp. 9–16. MIT Press, Cambridge (2006) Ambroladze, A., Parrado-Hernández, E., Shawe-Taylor, J.: Tighter PAC-Bayes bounds. In: Advances in Neural Information Processing Systems 19, pp. 9–16. MIT Press, Cambridge (2006)
2.
go back to reference Aravkin, A.Y., Bell, B.M., Burke, J.V., Pillonetto, G.: An \(\ell _1\)-Laplace robust Kalman smoother. IEEE Trans. Autom. Control 56(12), 2898–2911 (2011)MathSciNetCrossRef Aravkin, A.Y., Bell, B.M., Burke, J.V., Pillonetto, G.: An \(\ell _1\)-Laplace robust Kalman smoother. IEEE Trans. Autom. Control 56(12), 2898–2911 (2011)MathSciNetCrossRef
3.
go back to reference Buhlmann, P., Hothorn, T.: Boosting algorithms: regularization, prediction and model fitting. Stat. Sci. 22(4) (2007) Buhlmann, P., Hothorn, T.: Boosting algorithms: regularization, prediction and model fitting. Stat. Sci. 22(4) (2007)
4.
go back to reference Catoni, O.: PAC-Bayesian Surpervised Classication: The Thermodynamics of Statistical Learning, IMS Lecture Notes Monograph Series, vol. 56. Institute of Mathematical Statistics (2007). arxiv.org/abs/0712.0248 Catoni, O.: PAC-Bayesian Surpervised Classication: The Thermodynamics of Statistical Learning, IMS Lecture Notes Monograph Series, vol. 56. Institute of Mathematical Statistics (2007). arxiv.​org/​abs/​0712.​0248
5.
go back to reference Collins, M., Schapire, R.E., Singer, Y.: Logistic regression AdaBoost and Bregman distances. Mach. Learn. 48(1–3), 253–285 (2002)MATHCrossRef Collins, M., Schapire, R.E., Singer, Y.: Logistic regression AdaBoost and Bregman distances. Mach. Learn. 48(1–3), 253–285 (2002)MATHCrossRef
6.
go back to reference Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 279–297 (1995) Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 279–297 (1995)
7.
go back to reference Crammer, K., Mohri, M., Pereira, F.: Gaussian margin machines. J. Mach. Learn. Res. Proc. Track 5, 105–112 (2009) Crammer, K., Mohri, M., Pereira, F.: Gaussian margin machines. J. Mach. Learn. Res. Proc. Track 5, 105–112 (2009)
8.
go back to reference Crammer, K., Wagner, T.: Volume regularization for binary classification. In: Bartlett, P. et al. (ed.) Advances in Neural Information Processing Systems 25, pp. 341–349 (2013) Crammer, K., Wagner, T.: Volume regularization for binary classification. In: Bartlett, P. et al. (ed.) Advances in Neural Information Processing Systems 25, pp. 341–349 (2013)
9.
go back to reference Dekel, O., Shalev-Shwartz, S., Singer, Y.: Smooth e-intensive regression by loss symmetrization. In: Learning Theory and Kernel Machines. COLT. Lecture Notes in Computer Science, vol. 2777, pp. 433–447. Springer, Berlin (2003) Dekel, O., Shalev-Shwartz, S., Singer, Y.: Smooth e-intensive regression by loss symmetrization. In: Learning Theory and Kernel Machines. COLT. Lecture Notes in Computer Science, vol. 2777, pp. 433–447. Springer, Berlin (2003)
10.
go back to reference Duchi, J., Singer, Y.: Boosting with structural sparsity. In: International Conference on Machine Learning (ICML 26) pp. 297–304 (2009) Duchi, J., Singer, Y.: Boosting with structural sparsity. In: International Conference on Machine Learning (ICML 26) pp. 297–304 (2009)
12.
go back to reference Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: 2nd European Conference on Computational Learning Theory. Lecture notes in Computer Science, vol. 904, pp. 23–37. Springer, Berlin (1995) Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: 2nd European Conference on Computational Learning Theory. Lecture notes in Computer Science, vol. 904, pp. 23–37. Springer, Berlin (1995)
13.
go back to reference Germain, P., Lacasse, A., Laviolette, F., Marchand, M.: A PAC-Bayes risk bound for general loss functions. In: Advances in Neural Information Processing Systems 19, pp. 449-456 (2006) Germain, P., Lacasse, A., Laviolette, F., Marchand, M.: A PAC-Bayes risk bound for general loss functions. In: Advances in Neural Information Processing Systems 19, pp. 449-456 (2006)
14.
go back to reference Germain, P., Lacasse, A., Laviolette, F., Marchand, M.: PAC-Bayesian learning of linear classifiers. In: International Conference on Machine Learning, ICML 2009, pp. 353–360 (2009) Germain, P., Lacasse, A., Laviolette, F., Marchand, M.: PAC-Bayesian learning of linear classifiers. In: International Conference on Machine Learning, ICML 2009, pp. 353–360 (2009)
15.
go back to reference Germain, P., Lacasse, A., Laviolette, F., Marchand, M., Shanian, S.: From PAC-Bayes bounds to KL regularization. In: Advances in Neural Information Processing Systems 22, pp. 603–610. MIT Press, Cambridge (2009) Germain, P., Lacasse, A., Laviolette, F., Marchand, M., Shanian, S.: From PAC-Bayes bounds to KL regularization. In: Advances in Neural Information Processing Systems 22, pp. 603–610. MIT Press, Cambridge (2009)
16.
go back to reference Herbrich, R., Graepel, T.: A PAC-Bayesian margin bound for linear classifiers. IEEE Trans. Inf. Theory 48(12), 3140–3150 (2006)MathSciNetCrossRef Herbrich, R., Graepel, T.: A PAC-Bayesian margin bound for linear classifiers. IEEE Trans. Inf. Theory 48(12), 3140–3150 (2006)MathSciNetCrossRef
17.
go back to reference Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press, Cambridge (1991)MATHCrossRef Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press, Cambridge (1991)MATHCrossRef
18.
go back to reference Huber, P.: Robust Statistics. Wiley, New York (1974) Huber, P.: Robust Statistics. Wiley, New York (1974)
19.
go back to reference Huber, P.J.: Robust estimation of a location parameter. Ann. Math. Stat. 35(1), 73–101 (1964)MATHCrossRef Huber, P.J.: Robust estimation of a location parameter. Ann. Math. Stat. 35(1), 73–101 (1964)MATHCrossRef
20.
go back to reference Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106(4) (1957) Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106(4) (1957)
21.
go back to reference Keshet, J., McAllester, D.A., Hazan, T.: PAC-Bayesian approach for minimization of phoneme error rate. In: Acoustics, Speech, and Signal Processing (ICASSP), pp. 2224–2227 (2011) Keshet, J., McAllester, D.A., Hazan, T.: PAC-Bayesian approach for minimization of phoneme error rate. In: Acoustics, Speech, and Signal Processing (ICASSP), pp. 2224–2227 (2011)
22.
go back to reference Langford, J., Seeger, M.: Bounds for averaging classifiers. Technical Report, Carnegie Mellon University, Department of Computer Science (2001) Langford, J., Seeger, M.: Bounds for averaging classifiers. Technical Report, Carnegie Mellon University, Department of Computer Science (2001)
23.
go back to reference Langford, J., Shawe-Taylor, J.: PAC-Bayes and margins. In: Advances in Neural Information Processing Systems 15, pp. 439–446. MIT Press, Cambridge (2002) Langford, J., Shawe-Taylor, J.: PAC-Bayes and margins. In: Advances in Neural Information Processing Systems 15, pp. 439–446. MIT Press, Cambridge (2002)
24.
go back to reference Long, P.M., Servedio, R.A.: Random classification noise defeats all convex potential boosters. Mach. Learn. 78(3), 287–304 (2010)MathSciNetCrossRef Long, P.M., Servedio, R.A.: Random classification noise defeats all convex potential boosters. Mach. Learn. 78(3), 287–304 (2010)MathSciNetCrossRef
25.
go back to reference McAllester, D.: PAC-Bayesian model averaging. In: Proceedings of COLT, pp. 164–170. ACM, New York (1999) McAllester, D.: PAC-Bayesian model averaging. In: Proceedings of COLT, pp. 164–170. ACM, New York (1999)
26.
27.
go back to reference McDonald, R.A., Hand, D.J., Eckley, I.: An empirical comparison of three boosting algorithms on real data sets with artificial class noise. In: Multiple Classifier Systems. Lecture Notes in Computer Science, vol. 2709, pp. 35–44. Springer, Berlin (2003) McDonald, R.A., Hand, D.J., Eckley, I.: An empirical comparison of three boosting algorithms on real data sets with artificial class noise. In: Multiple Classifier Systems. Lecture Notes in Computer Science, vol. 2709, pp. 35–44. Springer, Berlin (2003)
28.
go back to reference Noy, A., Crammer, K.: Robust forward algorithms via PAC-Bayes and Laplace distributions. J. Mach. Learn. Res. 33, 678–686 (2014) Noy, A., Crammer, K.: Robust forward algorithms via PAC-Bayes and Laplace distributions. J. Mach. Learn. Res. 33, 678–686 (2014)
29.
go back to reference Roy, J.F., Marchand, M., Laviolette, F.: From PAC-Bayes bounds to quadratic programs for majority votes. In: Proceedings of the 29th International Conference on Machine Learning (ICML), pp. 649–656 (2011) Roy, J.F., Marchand, M., Laviolette, F.: From PAC-Bayes bounds to quadratic programs for majority votes. In: Proceedings of the 29th International Conference on Machine Learning (ICML), pp. 649–656 (2011)
30.
go back to reference Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37(3), 297–336 (1999)MATHCrossRef Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37(3), 297–336 (1999)MATHCrossRef
31.
go back to reference Seeger, M.: PAC-Bayesian generalization error bounds for Gaussian processes. J. Mach. Learn. Res. 3, 233–269 (2002)MathSciNetCrossRef Seeger, M.: PAC-Bayesian generalization error bounds for Gaussian processes. J. Mach. Learn. Res. 3, 233–269 (2002)MathSciNetCrossRef
32.
go back to reference Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: The 25th International Conference on Machine learning, pp. 1096–1103 (2008) Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: The 25th International Conference on Machine learning, pp. 1096–1103 (2008)
33.
go back to reference Wang, Y.G., Lin, X., Min, Z., Bai, Z.: Robust estimation using the Huber function with a data-dependent tuning constant. J. Comput. Graph. Stat. 16(2), 468–481 (2007)CrossRef Wang, Y.G., Lin, X., Min, Z., Bai, Z.: Robust estimation using the Huber function with a data-dependent tuning constant. J. Comput. Graph. Stat. 16(2), 468–481 (2007)CrossRef
Metadata
Title
Robust Algorithms via PAC-Bayes and Laplace Distributions
Authors
Asaf Noy
Koby Crammer
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-21852-6_25

Premium Partner