Skip to main content
Top

2018 | OriginalPaper | Chapter

PAC-Bayesian Aggregation of Affine Estimators

Authors : L. Montuelle, E. Le Pennec

Published in: Nonparametric Statistics

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Aggregating estimators using exponential weights depending on their risk appears optimal in expectation but not in probability. We use here a slight overpenalization to obtain oracle inequality in probability for such an explicit aggregation procedure. We focus on the fixed design regression framework and the aggregation of linear estimators and obtain results for a large family of linear estimators under a non-necessarily independent sub-Gaussian noise assumptions.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (Tsahkadsor, 1971) (pp. 267–281). Budapest: Akadémiai Kiadó. Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (Tsahkadsor, 1971) (pp. 267–281). Budapest: Akadémiai Kiadó.
2.
go back to reference Alquier, P., & Lounici, K. (2011). PAC-Bayesian bounds for sparse regression estimation with exponential weights. Electronic Journal of Statistics, 5, 127–145. Alquier, P., & Lounici, K. (2011). PAC-Bayesian bounds for sparse regression estimation with exponential weights. Electronic Journal of Statistics, 5, 127–145.
3.
go back to reference Amit, Y., & Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Computation, 9(7), 1545–1588. Amit, Y., & Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Computation, 9(7), 1545–1588.
4.
go back to reference Bellec, P. C. (2018). Optimal bounds for aggregation of affine estimators. The Annals of Statistics, 46(1), 30–59. Bellec, P. C. (2018). Optimal bounds for aggregation of affine estimators. The Annals of Statistics, 46(1), 30–59.
5.
go back to reference Belloni, A., Chernozhukov, V., & Wang, L. (2011). Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika, 98(4), 791–806. Belloni, A., Chernozhukov, V., & Wang, L. (2011). Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika, 98(4), 791–806.
6.
go back to reference Biau, G. (2012). Analysis of a random forests model. Journal of Machine Learning Research, 13, 1063–1095. Biau, G. (2012). Analysis of a random forests model. Journal of Machine Learning Research, 13, 1063–1095.
7.
go back to reference Biau, G., & Devroye, L. (2010). On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification. Journal of Multivariate Analysis, 101(10), 2499–2518. Biau, G., & Devroye, L. (2010). On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification. Journal of Multivariate Analysis, 101(10), 2499–2518.
8.
go back to reference Biau, G., Devroye, L., & Lugosi, G. (2008). Consistency of random forests and other averaging classifiers. Journal of Machine Learning Research, 9, 2015–2033. Biau, G., Devroye, L., & Lugosi, G. (2008). Consistency of random forests and other averaging classifiers. Journal of Machine Learning Research, 9, 2015–2033.
9.
go back to reference Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
10.
go back to reference Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
11.
go back to reference Catoni, O. (2004). Statistical learning theory and stochastic optimization: Vol. 1851. Lecture notes in mathematics. Berlin: Springer. Lecture notes from the 31st Summer School on Probability Theory held in Saint-Flour, 8–25 July 2001. Catoni, O. (2004). Statistical learning theory and stochastic optimization: Vol. 1851. Lecture notes in mathematics. Berlin: Springer. Lecture notes from the 31st Summer School on Probability Theory held in Saint-Flour, 8–25 July 2001.
12.
go back to reference Catoni, O. (2007). Pac-Bayesian supervised classification: The thermodynamics of statistical learning: Vol. 56. Institute of Mathematical Statistics Lecture Notes—Monograph Series. Beachwood, OH: Institute of Mathematical Statistics. Catoni, O. (2007). Pac-Bayesian supervised classification: The thermodynamics of statistical learning: Vol. 56. Institute of Mathematical Statistics Lecture Notes—Monograph Series. Beachwood, OH: Institute of Mathematical Statistics.
13.
go back to reference Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D. P., Schapire, R. E., & Warmuth M. K. (1997). How to use expert advice. Journal of the ACM, 44(3), 427–485. Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D. P., Schapire, R. E., & Warmuth M. K. (1997). How to use expert advice. Journal of the ACM, 44(3), 427–485.
14.
go back to reference Cesa-Bianchi, N., & Lugosi, G. (1999). On prediction of individual sequences. The Annals of Statistics, 27(6), 1865–1895. Cesa-Bianchi, N., & Lugosi, G. (1999). On prediction of individual sequences. The Annals of Statistics, 27(6), 1865–1895.
15.
go back to reference Chernousova, E., Golubev, Y., & Krymova, E. (2013). Ordered smoothers with exponential weighting. Electronic Journal of Statistics, 7, 2395–2419. Chernousova, E., Golubev, Y., & Krymova, E. (2013). Ordered smoothers with exponential weighting. Electronic Journal of Statistics, 7, 2395–2419.
16.
go back to reference Dai, D., Rigollet, P., Xia, L., & Zhang, T. (2014). Aggregation of affine estimators. Electronic Journal of Statistics, 8, 302–327. Dai, D., Rigollet, P., Xia, L., & Zhang, T. (2014). Aggregation of affine estimators. Electronic Journal of Statistics, 8, 302–327.
17.
go back to reference Dai, D., Rigollet, P., & Zhang, T. (2012). Deviation optimal learning using greedy Q-aggregation. The Annals of Statistics, 40(3), 1878–1905. Dai, D., Rigollet, P., & Zhang, T. (2012). Deviation optimal learning using greedy Q-aggregation. The Annals of Statistics, 40(3), 1878–1905.
18.
go back to reference Dalalyan, A. S. (2012). SOCP based variance free Dantzig selector with application to robust estimation. Comptes Rendus Mathematique Academie des Sciences, Paris, 350(15–16), 785–788. Dalalyan, A. S. (2012). SOCP based variance free Dantzig selector with application to robust estimation. Comptes Rendus Mathematique Academie des Sciences, Paris, 350(15–16), 785–788.
19.
go back to reference Dalalyan, A. S., Hebiri,M., Meziani, K., & Salmon, J. (2013). Learning heteroscedastic models by convex programming under group sparsity. Proceedings of the 30th International Conference on Machine Learning, Proceedings of Machine Learning Research, 28(3), 379–387. Dalalyan, A. S., Hebiri,M., Meziani, K., & Salmon, J. (2013). Learning heteroscedastic models by convex programming under group sparsity. Proceedings of the 30th International Conference on Machine Learning, Proceedings of Machine Learning Research, 28(3), 379–387.
20.
go back to reference Dalalyan, A. S., & Salmon, J. (2012). Sharp oracle inequalities for aggregation of affine estimators. The Annals of Statistics, 40(4), 2327–2355. Dalalyan, A. S., & Salmon, J. (2012). Sharp oracle inequalities for aggregation of affine estimators. The Annals of Statistics, 40(4), 2327–2355.
21.
go back to reference Dalalyan, A. S., & Tsybakov, A. B. (2007). Aggregation by exponential weighting and sharp oracle inequalities. In N. H. Bshouty & C. Gentile (Eds.), Learning theory: Vol. 4539. Lecture notes in computer science (pp. 97–111). Berlin: Springer. Dalalyan, A. S., & Tsybakov, A. B. (2007). Aggregation by exponential weighting and sharp oracle inequalities. In N. H. Bshouty & C. Gentile (Eds.), Learning theory: Vol. 4539. Lecture notes in computer science (pp. 97–111). Berlin: Springer.
22.
go back to reference Dalalyan, A. S., & Tsybakov, A. B. (2008). Aggregation by exponential weighting, sharp pac-bayesian bounds and sparsity. Machine Learning, 72(1–2), 39–61. Dalalyan, A. S., & Tsybakov, A. B. (2008). Aggregation by exponential weighting, sharp pac-bayesian bounds and sparsity. Machine Learning, 72(1–2), 39–61.
23.
go back to reference Dalalyan, A. S., & Tsybakov, A. B. (2012). Sparse regression learning by aggregation and Langevin Monte-Carlo. Journal of Computer and System Sciences, 78(5), 1423–1443. Dalalyan, A. S., & Tsybakov, A. B. (2012). Sparse regression learning by aggregation and Langevin Monte-Carlo. Journal of Computer and System Sciences, 78(5), 1423–1443.
24.
go back to reference Donoho, D. L., Johnstone, I. M., Kerkyacharian, G., & Picard D. (1995). Wavelet shrinkage: Asymptopia? Journal of the Royal Statistical Society Series B, 57(2), 301–369. Donoho, D. L., Johnstone, I. M., Kerkyacharian, G., & Picard D. (1995). Wavelet shrinkage: Asymptopia? Journal of the Royal Statistical Society Series B, 57(2), 301–369.
25.
go back to reference Freund, Y. (1995). Boosting a weak learning algorithm by majority. Information and Computation, 121(2), 256–285. Freund, Y. (1995). Boosting a weak learning algorithm by majority. Information and Computation, 121(2), 256–285.
26.
go back to reference Gasser, T., & Müller, H. G. (1984). Estimating regression functions and their derivatives by the kernel method. Scandinavian Journal of Statistics, 11(3), 171–185. Gasser, T., & Müller, H. G. (1984). Estimating regression functions and their derivatives by the kernel method. Scandinavian Journal of Statistics, 11(3), 171–185.
27.
go back to reference Genuer, R. (2011). Forêts aléatoires : aspects théoriques, sélection de variables et applications. PhD thesis, Université Paris-Sud. Genuer, R. (2011). Forêts aléatoires : aspects théoriques, sélection de variables et applications. PhD thesis, Université Paris-Sud.
28.
go back to reference Gerchinovitz, S. (2011). Prediction of Individual Sequences and Prediction in the Statistical Framework : Some Links Around Sparse Regression and Aggregation Techniques. Thesis, Université Paris Sud. Gerchinovitz, S. (2011). Prediction of Individual Sequences and Prediction in the Statistical Framework : Some Links Around Sparse Regression and Aggregation Techniques. Thesis, Université Paris Sud.
29.
go back to reference Giraud, C. (2008). Mixing least-squares estimators when the variance is unknown. Bernoulli, 14(4), 1089–1107. Giraud, C. (2008). Mixing least-squares estimators when the variance is unknown. Bernoulli, 14(4), 1089–1107.
30.
go back to reference Giraud, C., Huet, S., & Verzelen, N. (2012). High-dimensional regression with unknown variance. Statistical Science, 27(4), 500–518. Giraud, C., Huet, S., & Verzelen, N. (2012). High-dimensional regression with unknown variance. Statistical Science, 27(4), 500–518.
31.
go back to reference Golubev, Y. (2012). Exponential weighting and oracle inequalities for projection estimates. Problems of Information Transmission, 48, 269–280. Golubev, Y. (2012). Exponential weighting and oracle inequalities for projection estimates. Problems of Information Transmission, 48, 269–280.
32.
go back to reference Golubev, Y., & Ostobski, D. (2014). Concentration inequalities for the exponential weighting method. Mathematical Methods of Statistics, 23(1), 20–37. Golubev, Y., & Ostobski, D. (2014). Concentration inequalities for the exponential weighting method. Mathematical Methods of Statistics, 23(1), 20–37.
33.
go back to reference Guedj, B., & Alquier, P. (2013). PAC-Bayesian estimation and prediction in sparse additive models. Electronic Journal of Statistics, 7, 264–291. Guedj, B., & Alquier, P. (2013). PAC-Bayesian estimation and prediction in sparse additive models. Electronic Journal of Statistics, 7, 264–291.
34.
go back to reference Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer series in statistics (2nd ed.). New York: Springer. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer series in statistics (2nd ed.). New York: Springer.
35.
go back to reference Lecué, G. (2007). Optimal rates of aggregation in classification under low noise assumption. Bernoulli, 13(4), 1000–1022. Lecué, G. (2007). Optimal rates of aggregation in classification under low noise assumption. Bernoulli, 13(4), 1000–1022.
36.
go back to reference Leung, G., & Barron, A. R. (2006). Information theory and mixing least-squares regressions. IEEE Transactions on Information Theory, 52(8), 3396–3410. Leung, G., & Barron, A. R. (2006). Information theory and mixing least-squares regressions. IEEE Transactions on Information Theory, 52(8), 3396–3410.
37.
go back to reference Littlestone, N., & Warmuth, M. K. (1994). The weighted majority algorithm. Information and Computation, 108(2), 212–261. Littlestone, N., & Warmuth, M. K. (1994). The weighted majority algorithm. Information and Computation, 108(2), 212–261.
38.
go back to reference Lounici, K. (2007). Generalized mirror averaging and D-convex aggregation. Mathematical Methods of Statistics, 16(3), 246–259. Lounici, K. (2007). Generalized mirror averaging and D-convex aggregation. Mathematical Methods of Statistics, 16(3), 246–259.
39.
go back to reference Nadaraya, É. (1965). On non-parametric estimates of density functions and regression curves. Theory of Probability & Its Applications, 10(1), 186–190. Nadaraya, É. (1965). On non-parametric estimates of density functions and regression curves. Theory of Probability & Its Applications, 10(1), 186–190.
40.
go back to reference Nemirovski, A. (2000). Topics in non-parametric statistics. In Lectures on probability theory and statistics (Saint-Flour, 1998): Vol. 1738. Lecture notes in mathematics (pp. 85–277). Berlin: Springer. Nemirovski, A. (2000). Topics in non-parametric statistics. In Lectures on probability theory and statistics (Saint-Flour, 1998): Vol. 1738. Lecture notes in mathematics (pp. 85–277). Berlin: Springer.
41.
go back to reference Rigollet, P. (2006). Inégalités d’oracle, agrégration et adaptation. PhD thesis, Université Pierre et Marie Curie- Paris VI. Rigollet, P. (2006). Inégalités d’oracle, agrégration et adaptation. PhD thesis, Université Pierre et Marie Curie- Paris VI.
42.
go back to reference Rigollet, P., & Tsybakov, A. B. (2007). Linear and convex aggregation of density estimators. Mathematical Methods of Statistics, 16(3), 260–280. Rigollet, P., & Tsybakov, A. B. (2007). Linear and convex aggregation of density estimators. Mathematical Methods of Statistics, 16(3), 260–280.
43.
go back to reference Rigollet, P., & Tsybakov, A. B. (2012). Sparse estimation by exponential weighting. Statistical Science, 27(4), 558–575. Rigollet, P., & Tsybakov, A. B. (2012). Sparse estimation by exponential weighting. Statistical Science, 27(4), 558–575.
44.
go back to reference Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197–227. Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197–227.
45.
go back to reference Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.
46.
go back to reference Sun, T., & Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika, 99(4), 879–898. Sun, T., & Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika, 99(4), 879–898.
47.
go back to reference Tibshirani, R. (1994). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288. Tibshirani, R. (1994). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.
48.
go back to reference Tsybakov, A. B. (2003). Optimal rates of aggregation. In B. Schölkopf & M. K. Warmuth (Eds.), Learning theory and kernel machines: Vol. 2777. Lecture notes in computer science (pp. 303–313). Berlin/Heidelberg: Springer. Tsybakov, A. B. (2003). Optimal rates of aggregation. In B. Schölkopf & M. K. Warmuth (Eds.), Learning theory and kernel machines: Vol. 2777. Lecture notes in computer science (pp. 303–313). Berlin/Heidelberg: Springer.
49.
go back to reference Tsybakov, A. B. (2008). Agrégation d’estimateurs et optimisation stochastique. Journal de la Société Franēaise de Statistique & Review of Statistics and Its Application, 149(1), 3–26. Tsybakov, A. B. (2008). Agrégation d’estimateurs et optimisation stochastique. Journal de la Société Franēaise de Statistique & Review of Statistics and Its Application, 149(1), 3–26.
50.
go back to reference Vovk, V. G. (1990). Aggregating strategies. In Proceedings of the Third Annual Workshop on Computational Learning Theory, COLT ’90 (pp. 371–386). San Francisco, CA: Morgan Kaufmann Publishers Inc. Vovk, V. G. (1990). Aggregating strategies. In Proceedings of the Third Annual Workshop on Computational Learning Theory, COLT ’90 (pp. 371–386). San Francisco, CA: Morgan Kaufmann Publishers Inc.
51.
go back to reference Wahba, G. (1990). Spline models for observational data: Vol. 59. CBMS-NSF regional conference series in applied mathematics. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM). Wahba, G. (1990). Spline models for observational data: Vol. 59. CBMS-NSF regional conference series in applied mathematics. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM).
52.
go back to reference Watson, G. S. (1964). Smooth regression analysis. Sankhya Series A, 26, 359–372. Watson, G. S. (1964). Smooth regression analysis. Sankhya Series A, 26, 359–372.
53.
go back to reference Yang, Y. (2000). Adaptive estimation in pattern recognition by combining different procedures. Statistica Sinica, 10(4), 1069–1089. Yang, Y. (2000). Adaptive estimation in pattern recognition by combining different procedures. Statistica Sinica, 10(4), 1069–1089.
54.
go back to reference Yang, Y. (2000). Combining different procedures for adaptive regression. Journal of Multivariate Analysis, 74(1), 135–161. Yang, Y. (2000). Combining different procedures for adaptive regression. Journal of Multivariate Analysis, 74(1), 135–161.
55.
go back to reference Yang, Y. (2000). Mixing strategies for density estimation. The Annals of Statistics, 28(1), 75–87. Yang, Y. (2000). Mixing strategies for density estimation. The Annals of Statistics, 28(1), 75–87.
56.
go back to reference Yang, Y. (2001). Adaptive regression by mixing. Journal of the American Statistical Association, 96(454), 574–588. Yang, Y. (2001). Adaptive regression by mixing. Journal of the American Statistical Association, 96(454), 574–588.
57.
go back to reference Yang, Y. (2003). Regression with multiple candidate models: Selecting or mixing? Statistica Sinica, 13(3), 783–809. Yang, Y. (2003). Regression with multiple candidate models: Selecting or mixing? Statistica Sinica, 13(3), 783–809.
58.
go back to reference Yang, Y. (2004). Aggregating regression procedures to improve performance. Bernoulli, 10(1), 25–47. Yang, Y. (2004). Aggregating regression procedures to improve performance. Bernoulli, 10(1), 25–47.
59.
go back to reference Yang, Y. (2004). Combining forecasting procedures: some theoretical results. Econometric Theory, 20(1), 176–222. Yang, Y. (2004). Combining forecasting procedures: some theoretical results. Econometric Theory, 20(1), 176–222.
60.
go back to reference Zou, H., & Hastie T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 67(2), 301–320. Zou, H., & Hastie T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 67(2), 301–320.
Metadata
Title
PAC-Bayesian Aggregation of Affine Estimators
Authors
L. Montuelle
E. Le Pennec
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-96941-1_9

Premium Partner