Top

Published in:

2018 | OriginalPaper | Chapter

PAC-Bayesian Aggregation of Affine Estimators

Authors : L. Montuelle, E. Le Pennec

Published in: Nonparametric Statistics

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Aggregating estimators using exponential weights depending on their risk appears optimal in expectation but not in probability. We use here a slight overpenalization to obtain oracle inequality in probability for such an explicit aggregation procedure. We focus on the fixed design regression framework and the aggregation of linear estimators and obtain results for a large family of linear estimators under a non-necessarily independent sub-Gaussian noise assumptions.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter A Nonparametric Classification Algorithm Based on Optimized Templates

next chapter Light- and Heavy-Tailed Density Estimation by Gamma-Weibull Kernel

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (Tsahkadsor, 1971) (pp. 267–281). Budapest: Akadémiai Kiadó.

Alquier, P., & Lounici, K. (2011). PAC-Bayesian bounds for sparse regression estimation with exponential weights. Electronic Journal of Statistics, 5, 127–145.

Amit, Y., & Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Computation, 9(7), 1545–1588.

Bellec, P. C. (2018). Optimal bounds for aggregation of affine estimators. The Annals of Statistics, 46(1), 30–59.

Belloni, A., Chernozhukov, V., & Wang, L. (2011). Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika, 98(4), 791–806.

Biau, G. (2012). Analysis of a random forests model. Journal of Machine Learning Research, 13, 1063–1095.

Biau, G., & Devroye, L. (2010). On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification. Journal of Multivariate Analysis, 101(10), 2499–2518.

Biau, G., Devroye, L., & Lugosi, G. (2008). Consistency of random forests and other averaging classifiers. Journal of Machine Learning Research, 9, 2015–2033.

Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.

10.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

11.

Catoni, O. (2004). Statistical learning theory and stochastic optimization: Vol. 1851. Lecture notes in mathematics. Berlin: Springer. Lecture notes from the 31st Summer School on Probability Theory held in Saint-Flour, 8–25 July 2001.

12.

Catoni, O. (2007). Pac-Bayesian supervised classification: The thermodynamics of statistical learning: Vol. 56. Institute of Mathematical Statistics Lecture Notes—Monograph Series. Beachwood, OH: Institute of Mathematical Statistics.

13.

Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D. P., Schapire, R. E., & Warmuth M. K. (1997). How to use expert advice. Journal of the ACM, 44(3), 427–485.

14.

Cesa-Bianchi, N., & Lugosi, G. (1999). On prediction of individual sequences. The Annals of Statistics, 27(6), 1865–1895.

15.

Chernousova, E., Golubev, Y., & Krymova, E. (2013). Ordered smoothers with exponential weighting. Electronic Journal of Statistics, 7, 2395–2419.

16.

Dai, D., Rigollet, P., Xia, L., & Zhang, T. (2014). Aggregation of affine estimators. Electronic Journal of Statistics, 8, 302–327.

17.

Dai, D., Rigollet, P., & Zhang, T. (2012). Deviation optimal learning using greedy Q-aggregation. The Annals of Statistics, 40(3), 1878–1905.

18.

Dalalyan, A. S. (2012). SOCP based variance free Dantzig selector with application to robust estimation. Comptes Rendus Mathematique Academie des Sciences, Paris, 350(15–16), 785–788.

19.

Dalalyan, A. S., Hebiri,M., Meziani, K., & Salmon, J. (2013). Learning heteroscedastic models by convex programming under group sparsity. Proceedings of the 30th International Conference on Machine Learning, Proceedings of Machine Learning Research, 28(3), 379–387.

20.

Dalalyan, A. S., & Salmon, J. (2012). Sharp oracle inequalities for aggregation of affine estimators. The Annals of Statistics, 40(4), 2327–2355.

21.

Dalalyan, A. S., & Tsybakov, A. B. (2007). Aggregation by exponential weighting and sharp oracle inequalities. In N. H. Bshouty & C. Gentile (Eds.), Learning theory: Vol. 4539. Lecture notes in computer science (pp. 97–111). Berlin: Springer.

22.

Dalalyan, A. S., & Tsybakov, A. B. (2008). Aggregation by exponential weighting, sharp pac-bayesian bounds and sparsity. Machine Learning, 72(1–2), 39–61.

23.

Dalalyan, A. S., & Tsybakov, A. B. (2012). Sparse regression learning by aggregation and Langevin Monte-Carlo. Journal of Computer and System Sciences, 78(5), 1423–1443.

24.

Donoho, D. L., Johnstone, I. M., Kerkyacharian, G., & Picard D. (1995). Wavelet shrinkage: Asymptopia? Journal of the Royal Statistical Society Series B, 57(2), 301–369.

25.

Freund, Y. (1995). Boosting a weak learning algorithm by majority. Information and Computation, 121(2), 256–285.

26.

Gasser, T., & Müller, H. G. (1984). Estimating regression functions and their derivatives by the kernel method. Scandinavian Journal of Statistics, 11(3), 171–185.

27.

Genuer, R. (2011). Forêts aléatoires : aspects théoriques, sélection de variables et applications. PhD thesis, Université Paris-Sud.

28.

Gerchinovitz, S. (2011). Prediction of Individual Sequences and Prediction in the Statistical Framework : Some Links Around Sparse Regression and Aggregation Techniques. Thesis, Université Paris Sud.

29.

Giraud, C. (2008). Mixing least-squares estimators when the variance is unknown. Bernoulli, 14(4), 1089–1107.

30.

Giraud, C., Huet, S., & Verzelen, N. (2012). High-dimensional regression with unknown variance. Statistical Science, 27(4), 500–518.

31.

Golubev, Y. (2012). Exponential weighting and oracle inequalities for projection estimates. Problems of Information Transmission, 48, 269–280.

32.

Golubev, Y., & Ostobski, D. (2014). Concentration inequalities for the exponential weighting method. Mathematical Methods of Statistics, 23(1), 20–37.

33.

Guedj, B., & Alquier, P. (2013). PAC-Bayesian estimation and prediction in sparse additive models. Electronic Journal of Statistics, 7, 264–291.

34.

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer series in statistics (2nd ed.). New York: Springer.

35.

Lecué, G. (2007). Optimal rates of aggregation in classification under low noise assumption. Bernoulli, 13(4), 1000–1022.

36.

Leung, G., & Barron, A. R. (2006). Information theory and mixing least-squares regressions. IEEE Transactions on Information Theory, 52(8), 3396–3410.

37.

Littlestone, N., & Warmuth, M. K. (1994). The weighted majority algorithm. Information and Computation, 108(2), 212–261.

38.

Lounici, K. (2007). Generalized mirror averaging and D-convex aggregation. Mathematical Methods of Statistics, 16(3), 246–259.

39.

Nadaraya, É. (1965). On non-parametric estimates of density functions and regression curves. Theory of Probability & Its Applications, 10(1), 186–190.

40.

Nemirovski, A. (2000). Topics in non-parametric statistics. In Lectures on probability theory and statistics (Saint-Flour, 1998): Vol. 1738. Lecture notes in mathematics (pp. 85–277). Berlin: Springer.

41.

Rigollet, P. (2006). Inégalités d’oracle, agrégration et adaptation. PhD thesis, Université Pierre et Marie Curie- Paris VI.

42.

Rigollet, P., & Tsybakov, A. B. (2007). Linear and convex aggregation of density estimators. Mathematical Methods of Statistics, 16(3), 260–280.

43.

Rigollet, P., & Tsybakov, A. B. (2012). Sparse estimation by exponential weighting. Statistical Science, 27(4), 558–575.

44.

Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197–227.

45.

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.

46.

Sun, T., & Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika, 99(4), 879–898.

47.

Tibshirani, R. (1994). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.

48.

Tsybakov, A. B. (2003). Optimal rates of aggregation. In B. Schölkopf & M. K. Warmuth (Eds.), Learning theory and kernel machines: Vol. 2777. Lecture notes in computer science (pp. 303–313). Berlin/Heidelberg: Springer.

49.

Tsybakov, A. B. (2008). Agrégation d’estimateurs et optimisation stochastique. Journal de la Société Franēaise de Statistique & Review of Statistics and Its Application, 149(1), 3–26.

50.

Vovk, V. G. (1990). Aggregating strategies. In Proceedings of the Third Annual Workshop on Computational Learning Theory, COLT ’90 (pp. 371–386). San Francisco, CA: Morgan Kaufmann Publishers Inc.

51.

Wahba, G. (1990). Spline models for observational data: Vol. 59. CBMS-NSF regional conference series in applied mathematics. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM).

52.

Watson, G. S. (1964). Smooth regression analysis. Sankhya Series A, 26, 359–372.

53.

Yang, Y. (2000). Adaptive estimation in pattern recognition by combining different procedures. Statistica Sinica, 10(4), 1069–1089.

54.

Yang, Y. (2000). Combining different procedures for adaptive regression. Journal of Multivariate Analysis, 74(1), 135–161.

55.

Yang, Y. (2000). Mixing strategies for density estimation. The Annals of Statistics, 28(1), 75–87.

56.

Yang, Y. (2001). Adaptive regression by mixing. Journal of the American Statistical Association, 96(454), 574–588.

57.

Yang, Y. (2003). Regression with multiple candidate models: Selecting or mixing? Statistica Sinica, 13(3), 783–809.

58.

Yang, Y. (2004). Aggregating regression procedures to improve performance. Bernoulli, 10(1), 25–47.

59.

Yang, Y. (2004). Combining forecasting procedures: some theoretical results. Econometric Theory, 20(1), 176–222.

60.

Zou, H., & Hastie T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 67(2), 301–320.

Title: PAC-Bayesian Aggregation of Affine Estimators
Authors: L. Montuelle
E. Le Pennec
Publisher: Springer International Publishing
Book: Nonparametric Statistics
Print ISBN: 978-3-319-96940-4

Electronic ISBN: 978-3-319-96941-1

Copyright Year: 2018
DOI: https://doi.org/10.1007/978-3-319-96941-1_9

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner