Skip to main content

2015 | OriginalPaper | Buchkapitel

25. Robust Algorithms via PAC-Bayes and Laplace Distributions

verfasst von : Asaf Noy, Koby Crammer

Erschienen in: Measures of Complexity

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Laplace random variables are commonly used to model extreme noise in many fields, while systems trained to deal with such noise are often characterized by robustness properties. We introduce new learning algorithms that minimize objectives derived directly from PAC-Bayes generalization bounds, incorporating Laplace distributions. The resulting algorithms are regulated by the Huber loss function which is considered relatively robust to large noise. We analyze the convexity properties of the objective, and propose a few bounds which are fully convex, two of which are jointly convex in the mean and standard deviation under certain conditions. We derive new algorithms analogous to recent boosting algorithms, providing novel relations between boosting and PAC-Bayes analysis. Experiments show that our algorithms outperform AdaBoost (Freund and Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, 1995), L1-LogBoost (Duchi and Singer, Boostingwith structural sparsity, 2009), and RobustBoost (Freund, A more robust boosting algorithm, 2009) in a wide range of noise.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The family of distributions that are based on the \(\ell _2\) distance from the mean is called multivariate Laplace. Hence we use the different name: Laplace-like family.
 
2
Generalization of the following for an arbitrary vector \(\varvec{\sigma }_{Q}\) is straightforward by replacing each example \(\varvec{x}\) with \(\varvec{x}' = \left( {\sigma _{Q,1} x_1 , \ldots ,\sigma _{Q,d} x_d}\right) \).
 
3
Notice that if \(x_{k}=0\) the random variable \(\omega _k x_{k}\) equals zero too, therefore we assume without loss of generality that \(x_{k}\ne 0\).
 
4
The CDF is also well-defined and can be calculated when \(\left| {x_{i,j} }\right| =\left| {x_{i,k} }\right| \), by taking a limit and getting a distribution which is a mix of the one above with the Bilateral Gamma distribution family.
 
Literatur
1.
Zurück zum Zitat Ambroladze, A., Parrado-Hernández, E., Shawe-Taylor, J.: Tighter PAC-Bayes bounds. In: Advances in Neural Information Processing Systems 19, pp. 9–16. MIT Press, Cambridge (2006) Ambroladze, A., Parrado-Hernández, E., Shawe-Taylor, J.: Tighter PAC-Bayes bounds. In: Advances in Neural Information Processing Systems 19, pp. 9–16. MIT Press, Cambridge (2006)
2.
Zurück zum Zitat Aravkin, A.Y., Bell, B.M., Burke, J.V., Pillonetto, G.: An \(\ell _1\)-Laplace robust Kalman smoother. IEEE Trans. Autom. Control 56(12), 2898–2911 (2011)MathSciNetCrossRef Aravkin, A.Y., Bell, B.M., Burke, J.V., Pillonetto, G.: An \(\ell _1\)-Laplace robust Kalman smoother. IEEE Trans. Autom. Control 56(12), 2898–2911 (2011)MathSciNetCrossRef
3.
Zurück zum Zitat Buhlmann, P., Hothorn, T.: Boosting algorithms: regularization, prediction and model fitting. Stat. Sci. 22(4) (2007) Buhlmann, P., Hothorn, T.: Boosting algorithms: regularization, prediction and model fitting. Stat. Sci. 22(4) (2007)
4.
Zurück zum Zitat Catoni, O.: PAC-Bayesian Surpervised Classication: The Thermodynamics of Statistical Learning, IMS Lecture Notes Monograph Series, vol. 56. Institute of Mathematical Statistics (2007). arxiv.org/abs/0712.0248 Catoni, O.: PAC-Bayesian Surpervised Classication: The Thermodynamics of Statistical Learning, IMS Lecture Notes Monograph Series, vol. 56. Institute of Mathematical Statistics (2007). arxiv.​org/​abs/​0712.​0248
5.
Zurück zum Zitat Collins, M., Schapire, R.E., Singer, Y.: Logistic regression AdaBoost and Bregman distances. Mach. Learn. 48(1–3), 253–285 (2002)MATHCrossRef Collins, M., Schapire, R.E., Singer, Y.: Logistic regression AdaBoost and Bregman distances. Mach. Learn. 48(1–3), 253–285 (2002)MATHCrossRef
6.
Zurück zum Zitat Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 279–297 (1995) Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 279–297 (1995)
7.
Zurück zum Zitat Crammer, K., Mohri, M., Pereira, F.: Gaussian margin machines. J. Mach. Learn. Res. Proc. Track 5, 105–112 (2009) Crammer, K., Mohri, M., Pereira, F.: Gaussian margin machines. J. Mach. Learn. Res. Proc. Track 5, 105–112 (2009)
8.
Zurück zum Zitat Crammer, K., Wagner, T.: Volume regularization for binary classification. In: Bartlett, P. et al. (ed.) Advances in Neural Information Processing Systems 25, pp. 341–349 (2013) Crammer, K., Wagner, T.: Volume regularization for binary classification. In: Bartlett, P. et al. (ed.) Advances in Neural Information Processing Systems 25, pp. 341–349 (2013)
9.
Zurück zum Zitat Dekel, O., Shalev-Shwartz, S., Singer, Y.: Smooth e-intensive regression by loss symmetrization. In: Learning Theory and Kernel Machines. COLT. Lecture Notes in Computer Science, vol. 2777, pp. 433–447. Springer, Berlin (2003) Dekel, O., Shalev-Shwartz, S., Singer, Y.: Smooth e-intensive regression by loss symmetrization. In: Learning Theory and Kernel Machines. COLT. Lecture Notes in Computer Science, vol. 2777, pp. 433–447. Springer, Berlin (2003)
10.
Zurück zum Zitat Duchi, J., Singer, Y.: Boosting with structural sparsity. In: International Conference on Machine Learning (ICML 26) pp. 297–304 (2009) Duchi, J., Singer, Y.: Boosting with structural sparsity. In: International Conference on Machine Learning (ICML 26) pp. 297–304 (2009)
12.
Zurück zum Zitat Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: 2nd European Conference on Computational Learning Theory. Lecture notes in Computer Science, vol. 904, pp. 23–37. Springer, Berlin (1995) Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: 2nd European Conference on Computational Learning Theory. Lecture notes in Computer Science, vol. 904, pp. 23–37. Springer, Berlin (1995)
13.
Zurück zum Zitat Germain, P., Lacasse, A., Laviolette, F., Marchand, M.: A PAC-Bayes risk bound for general loss functions. In: Advances in Neural Information Processing Systems 19, pp. 449-456 (2006) Germain, P., Lacasse, A., Laviolette, F., Marchand, M.: A PAC-Bayes risk bound for general loss functions. In: Advances in Neural Information Processing Systems 19, pp. 449-456 (2006)
14.
Zurück zum Zitat Germain, P., Lacasse, A., Laviolette, F., Marchand, M.: PAC-Bayesian learning of linear classifiers. In: International Conference on Machine Learning, ICML 2009, pp. 353–360 (2009) Germain, P., Lacasse, A., Laviolette, F., Marchand, M.: PAC-Bayesian learning of linear classifiers. In: International Conference on Machine Learning, ICML 2009, pp. 353–360 (2009)
15.
Zurück zum Zitat Germain, P., Lacasse, A., Laviolette, F., Marchand, M., Shanian, S.: From PAC-Bayes bounds to KL regularization. In: Advances in Neural Information Processing Systems 22, pp. 603–610. MIT Press, Cambridge (2009) Germain, P., Lacasse, A., Laviolette, F., Marchand, M., Shanian, S.: From PAC-Bayes bounds to KL regularization. In: Advances in Neural Information Processing Systems 22, pp. 603–610. MIT Press, Cambridge (2009)
16.
Zurück zum Zitat Herbrich, R., Graepel, T.: A PAC-Bayesian margin bound for linear classifiers. IEEE Trans. Inf. Theory 48(12), 3140–3150 (2006)MathSciNetCrossRef Herbrich, R., Graepel, T.: A PAC-Bayesian margin bound for linear classifiers. IEEE Trans. Inf. Theory 48(12), 3140–3150 (2006)MathSciNetCrossRef
17.
Zurück zum Zitat Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press, Cambridge (1991)MATHCrossRef Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press, Cambridge (1991)MATHCrossRef
18.
Zurück zum Zitat Huber, P.: Robust Statistics. Wiley, New York (1974) Huber, P.: Robust Statistics. Wiley, New York (1974)
19.
Zurück zum Zitat Huber, P.J.: Robust estimation of a location parameter. Ann. Math. Stat. 35(1), 73–101 (1964)MATHCrossRef Huber, P.J.: Robust estimation of a location parameter. Ann. Math. Stat. 35(1), 73–101 (1964)MATHCrossRef
20.
Zurück zum Zitat Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106(4) (1957) Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106(4) (1957)
21.
Zurück zum Zitat Keshet, J., McAllester, D.A., Hazan, T.: PAC-Bayesian approach for minimization of phoneme error rate. In: Acoustics, Speech, and Signal Processing (ICASSP), pp. 2224–2227 (2011) Keshet, J., McAllester, D.A., Hazan, T.: PAC-Bayesian approach for minimization of phoneme error rate. In: Acoustics, Speech, and Signal Processing (ICASSP), pp. 2224–2227 (2011)
22.
Zurück zum Zitat Langford, J., Seeger, M.: Bounds for averaging classifiers. Technical Report, Carnegie Mellon University, Department of Computer Science (2001) Langford, J., Seeger, M.: Bounds for averaging classifiers. Technical Report, Carnegie Mellon University, Department of Computer Science (2001)
23.
Zurück zum Zitat Langford, J., Shawe-Taylor, J.: PAC-Bayes and margins. In: Advances in Neural Information Processing Systems 15, pp. 439–446. MIT Press, Cambridge (2002) Langford, J., Shawe-Taylor, J.: PAC-Bayes and margins. In: Advances in Neural Information Processing Systems 15, pp. 439–446. MIT Press, Cambridge (2002)
24.
Zurück zum Zitat Long, P.M., Servedio, R.A.: Random classification noise defeats all convex potential boosters. Mach. Learn. 78(3), 287–304 (2010)MathSciNetCrossRef Long, P.M., Servedio, R.A.: Random classification noise defeats all convex potential boosters. Mach. Learn. 78(3), 287–304 (2010)MathSciNetCrossRef
25.
Zurück zum Zitat McAllester, D.: PAC-Bayesian model averaging. In: Proceedings of COLT, pp. 164–170. ACM, New York (1999) McAllester, D.: PAC-Bayesian model averaging. In: Proceedings of COLT, pp. 164–170. ACM, New York (1999)
26.
Zurück zum Zitat McAllester, D.: PAC-Bayesian stochastic model selection. Mach. Learn. 51, 5–21 (2003)MATHCrossRef McAllester, D.: PAC-Bayesian stochastic model selection. Mach. Learn. 51, 5–21 (2003)MATHCrossRef
27.
Zurück zum Zitat McDonald, R.A., Hand, D.J., Eckley, I.: An empirical comparison of three boosting algorithms on real data sets with artificial class noise. In: Multiple Classifier Systems. Lecture Notes in Computer Science, vol. 2709, pp. 35–44. Springer, Berlin (2003) McDonald, R.A., Hand, D.J., Eckley, I.: An empirical comparison of three boosting algorithms on real data sets with artificial class noise. In: Multiple Classifier Systems. Lecture Notes in Computer Science, vol. 2709, pp. 35–44. Springer, Berlin (2003)
28.
Zurück zum Zitat Noy, A., Crammer, K.: Robust forward algorithms via PAC-Bayes and Laplace distributions. J. Mach. Learn. Res. 33, 678–686 (2014) Noy, A., Crammer, K.: Robust forward algorithms via PAC-Bayes and Laplace distributions. J. Mach. Learn. Res. 33, 678–686 (2014)
29.
Zurück zum Zitat Roy, J.F., Marchand, M., Laviolette, F.: From PAC-Bayes bounds to quadratic programs for majority votes. In: Proceedings of the 29th International Conference on Machine Learning (ICML), pp. 649–656 (2011) Roy, J.F., Marchand, M., Laviolette, F.: From PAC-Bayes bounds to quadratic programs for majority votes. In: Proceedings of the 29th International Conference on Machine Learning (ICML), pp. 649–656 (2011)
30.
Zurück zum Zitat Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37(3), 297–336 (1999)MATHCrossRef Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37(3), 297–336 (1999)MATHCrossRef
31.
Zurück zum Zitat Seeger, M.: PAC-Bayesian generalization error bounds for Gaussian processes. J. Mach. Learn. Res. 3, 233–269 (2002)MathSciNetCrossRef Seeger, M.: PAC-Bayesian generalization error bounds for Gaussian processes. J. Mach. Learn. Res. 3, 233–269 (2002)MathSciNetCrossRef
32.
Zurück zum Zitat Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: The 25th International Conference on Machine learning, pp. 1096–1103 (2008) Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: The 25th International Conference on Machine learning, pp. 1096–1103 (2008)
33.
Zurück zum Zitat Wang, Y.G., Lin, X., Min, Z., Bai, Z.: Robust estimation using the Huber function with a data-dependent tuning constant. J. Comput. Graph. Stat. 16(2), 468–481 (2007)CrossRef Wang, Y.G., Lin, X., Min, Z., Bai, Z.: Robust estimation using the Huber function with a data-dependent tuning constant. J. Comput. Graph. Stat. 16(2), 468–481 (2007)CrossRef
Metadaten
Titel
Robust Algorithms via PAC-Bayes and Laplace Distributions
verfasst von
Asaf Noy
Koby Crammer
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-21852-6_25