Top

Foundations of Computational Mathematics

Published in:

05-08-2019

Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey

Authors: Gábor Lugosi, Shahar Mendelson

Published in: Foundations of Computational Mathematics | Issue 5/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

We survey some of the recent advances in mean estimation and regression function estimation. In particular, we describe sub-Gaussian mean estimators for possibly heavy-tailed data in both the univariate and multivariate settings. We focus on estimators based on median-of-means techniques, but other methods such as the trimmed-mean and Catoni’s estimators are also reviewed. We give detailed proofs for the cornerstone results. We dedicate a section to statistical learning problems—in particular, regression function estimation—in the presence of possibly heavy-tailed data.

previous article Second-Order Models for Optimal Transport and Cubic Splines on the Wasserstein Space

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

As we explain in what follows, it suffices to ensure that the comparison is correct between \(\mu \) and any point that is not too close to \(\mu \).

In the proof of Theorem 8, “well-behaved” means that (3.5) holds for a majority of the blocks.

The case \(q=3\) is the standard Berry–Esseen theorem, while for \(2<q<3\) one may use generalized Berry–Esseen bounds, see [71].

Note that one has the freedom to select a function \(\widehat{f}\) that does not belong to \({{\mathcal {F}}}\).

N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58:137–147, 2002.MathSciNetCrossRef

G. Aloupis. Geometric measures of data depth. DIMACS series in discrete mathematics and theoretical computer science, 72:147–158, 2006.MathSciNetCrossRef

M. Anthony and P. L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999. CrossRef

J.-Y. Audibert and O. Catoni. Robust linear least squares regression. The Annals of Statistics, 39:2766–2794, 2011.MathSciNetCrossRef

Y. Baraud and L. Birgé. Rho-estimators revisited: General theory and applications. The Annals of Statistics, 46(6B):3767–3804, 2018.MathSciNetCrossRef

Y. Baraud, L. Birgé, and M. Sart. A new method for estimation and model selection: \(\rho \)-estimation. Inventiones Mathematicae, 207(2):425–517, 2017.MathSciNetCrossRef

P.L. Bartlett, O. Bousquet, and S. Mendelson. Localized Rademacher complexities. Annals of Statistics, 33:1497–1537, 2005.MathSciNetCrossRef

P.J. Bickel. On some robust estimates of location. The Annals of Mathematical Statistics, 36:847–858, 1965.MathSciNetCrossRef

A. Blumer, A. Ehrenfeucht, D. Haussler, and M.K. Warmuth. Learnability and the Vapnik–Chervonenkis dimension. Journal of the ACM, 36:929–965, 1989.MathSciNetCrossRef

10.

S. Boucheron, G. Lugosi, and P. Massart. Concentration inequalities:A Nonasymptotic Theory of Independence. Oxford University Press, 2013.CrossRef

11.

C. Brownlees, E. Joly, and G. Lugosi. Empirical risk minimization for heavy-tailed losses. Annals of Statistics, 43:2507–2536, 2015.MathSciNetCrossRef

12.

S. Bubeck, N. Cesa-Bianchi, and G. Lugosi. Bandits with heavy tail. IEEE Transactions on Information Theory, 59:7711–7717, 2013.MathSciNetCrossRef

13.

P. Bühlmann and S. van de Geer. Statistics for high-dimensional data. Springer Series in Statistics. Springer, Heidelberg, 2011. Methods, theory and applications.CrossRef

14.

O. Catoni. Challenging the empirical mean and empirical variance: a deviation study. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 48(4):1148–1185, 2012.MathSciNetCrossRef

15.

O. Catoni and I. Giulini. Dimension-free PAC-Bayesian bounds for matrices, vectors, and linear least squares regression. arXiv preprint arXiv:1712.02747, 2017.

16.

O. Catoni and I. Giulini. Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector. arXiv preprint arXiv:1802.04308, 2018.

17.

Moses Charikar, Jacob Steinhardt, and Gregory Valiant. Learning from untrusted data. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 47–60. ACM, 2017.

18.

Y. Cherapanamjeri, N. Flammarion, and P. Bartlett. Fast mean estimation with sub-Gaussian rates. arXiv preprint arXiv:1902.01998, 2019.

19.

M. Chichignoud and J. Lederer. A robust, adaptive m-estimator for pointwise estimation in heteroscedastic regression. Bernoulli, 20(3):1560–1599, 2014.MathSciNetCrossRef

20.

M.B. Cohen, Y.T. Lee, G. Miller, J. Pachocki, and A. Sidford. Geometric median in nearly linear time. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, pages 9–21. ACM, 2016.

21.

L. Devroye, L. Györfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer-Verlag, New York, 1996.CrossRef

22.

L. Devroye, M. Lerasle, G. Lugosi, and R.I. Oliveira. Sub-Gaussian mean estimators. Annals of Statistics, 2016.

23.

I. Diakonikolas, G. Kamath, D.M. Kane, J. Li, A. Moitra, and A. Stewart. Robust estimators in high dimensions without the computational intractability. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on, pages 655–664. IEEE, 2016.

24.

I. Diakonikolas, G. Kamath, D.M. Kane, J. Li, A. Moitra, and A. Stewart. Being robust (in high dimensions) can be practical. In Proceedings of the 34th International Conference on Machine Learning (ICML 2017), 2017.

25.

I. Diakonikolas, G. Kamath, D.M. Kane, J. Li, A. Moitra, and A. Stewart. Robustly learning a Gaussian: Getting optimal error, efficiently. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2683–2702. Society for Industrial and Applied Mathematics, 2018.

26.

I. Diakonikolas, D.M. Kane, and A. Stewart. Efficient robust proper learning of log-concave distributions. arXiv preprint arXiv:1606.03077, 2016.

27.

I. Diakonikolas, W. Kong, and A. Stewart. Efficient algorithms and lower bounds for robust linear regression. arXiv preprint arXiv:1806.00040, 2018.

28.

J. Fan, Q. Li, and Y. Wang. Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(1):247–265, 2017.MathSciNetCrossRef

29.

L. Györfi, M. Kohler, A. Krzyżak, and H. Walk. A distribution-free theory of nonparametric regression. Springer-Verlag, New York, 2002.CrossRef

30.

F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw, and W.A. Stahel. Robust statistics: the approach based on influence functions, volume 196. Wiley, 1986.MATH

31.

Q. Han and J.A. Wellner. A sharp multiplier inequality with applications to heavy-tailed regression problems. arXiv preprint arXiv:1706.02410, 2017.

32.

W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13–30, 1963.MathSciNetCrossRef

33.

S.B. Hopkins. Sub-Gaussian mean estimation in polynomial time. Annals of Statistics, 2019, to appear.

34.

S.B. Hopkins and J. Li. Mixture models, robustness, and sum of squares proofs. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1021–1034. ACM, 2018.

35.

D. Hsu. Robust statistics. http://www.inherentuncertainty.org/2010/12/robust-statistics.html, 2010.

36.

D. Hsu and S. Sabato. Loss minimization and parameter estimation with heavy tails. Journal of Machine Learning Research, 17:1–40, 2016.MathSciNetMATH

37.

M. Huber. An optimal (\(\epsilon \), \(\delta \))-randomized approximation scheme for the mean of random variables with bounded relative variance. Random Structures & Algorithms, 2019.

38.

P.J. Huber. Robust estimation of a location parameter. The annals of mathematical statistics, 35(1):73–101, 1964.MathSciNetCrossRef

39.

P.J. Huber and E.M. Ronchetti. Robust statistics. Wiley, New York, 2009. Second edition.

40.

M. Jerrum, L. Valiant, and V. Vazirani. Random generation of combinatorial structures from a uniform distribution. Theoretical Computer Science, 43:186–188, 1986.MathSciNetCrossRef

41.

E. Joly, G. Lugosi, and R. I. Oliveira. On the estimation of the mean of a random vector. Electronic Journal of Statistics, 11:440–451, 2017.MathSciNetCrossRef

42.

A. Klivans, P.K. Kothari, and R. Meka. Efficient algorithms for outlier-robust regression. In Proceedings of the 31st Annual Conference of Learning Theory (COLT 2018), 2018.

43.

V. Koltchinskii. Oracle inequalities in empirical risk minimization and sparse recovery problems, volume 2033 of Lecture Notes in Mathematics. Springer, Heidelberg, 2011. Lectures from the 38th Probability Summer School held in Saint-Flour, 2008, École d’Été de Probabilités de Saint-Flour. [Saint-Flour Probability Summer School].

44.

P.K. Kothari, J. Steinhardt, and D. Steurer. Robust moment estimation and improved clustering via sum of squares. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1035–1046. ACM, 2018.

45.

Kevin A. Lai, Anup B. Rao, and Santosh Vempala. Agnostic estimation of mean and covariance. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on, pages 665–674. IEEE, 2016.

46.

G. Lecué and M. Lerasle. Learning from mom’s principles: Le cam’s approach. arXiv preprint arXiv:1701.01961, 2017.

47.

G. Lecué and M. Lerasle. Robust machine learning by median-of-means: theory and practice. Annals of Stastistics, 2019, to appear.

48.

G. Lecué, M. Lerasle, and T. Mathieu. Robust classification via mom minimization. arXiv preprint arXiv:1808.03106, 2018.

49.

G. Lecué and S. Mendelson. Learning subgaussian classes: Upper and minimax bounds. In S. Boucheron and N. Vayatis, editors, Topics in Learning Theory. Societe Mathematique de France, 2016.

50.

G. Lecué and S. Mendelson. Performance of empirical risk minimization in linear aggregation. Bernoulli, 22(3):1520–1534, 2016.MathSciNetCrossRef

51.

M. Ledoux. The concentration of measure phenomenon. American Mathematical Society, Providence, RI, 2001.MATH

52.

M. Ledoux and M. Talagrand. Probability in Banach Space. Springer-Verlag, New York, 1991.CrossRef

53.

M. Lerasle and R. I. Oliveira. Robust empirical mean estimators. arXiv:1112.3914, 2012.

54.

Po-Ling Loh and Xin Lu Tan. High-dimensional robust precision matrix estimation: Cellwise corruption under \(\epsilon \)-contamination. Electronic Journal of Statistics, 12(1):1429–1467, 2018.MathSciNetCrossRef

55.

G. Lugosi and S. Mendelson. Robust multivariate mean estimation: the optimality of trimmed mean. manuscript, 2019.

56.

G. Lugosi and S. Mendelson. Sub-Gaussian estimators of the mean of a random vector. Annals of Statistics, 47:783–794, 2019.MathSciNetCrossRef

57.

G. Lugosi and S. Mendelson. Near-optimal mean estimators with respect to general norms. Probability Theory and Related Fields, 2019, to appear.

58.

G. Lugosi and S. Mendelson. Regularization, sparse recovery, and median-of-means tournaments. Bernoulli, 2019, to appear.

59.

G. Lugosi and S. Mendelson. Risk minimization by median-of-means tournaments. Journal of the European Mathematical Society, 2019, to appear.

60.

P. Massart. Concentration inequalities and model selection. Ecole d’été de Probabilités de Saint-Flour 2003. Lecture Notes in Mathematics. Springer, 2006.

61.

S. Mendelson. Learning without concentration. Journal of the ACM, 62:21, 2015.MathSciNetCrossRef

62.

S. Mendelson. An optimal unrestricted learning procedure. arXiv preprint arXiv:1707.05342, 2017.

63.

S. Mendelson. Learning without concentration for general loss functions. Probability Theory and Related Fields, 171(1-2):459–502, 2018.MathSciNetCrossRef

64.

S. Mendelson and N. Zhivotovskiy. Robust covariance estimation under \({L}_4-{L}_2\) norm equivalence. arXiv preprint arXiv:1809.10462, 2018.

65.

S. Minsker. Geometric median and robust estimation in Banach spaces. Bernoulli, 21:2308–2335, 2015.MathSciNetCrossRef

66.

Stanislav Minsker. Sub-Gaussian estimators of the mean of a random matrix with heavy-tailed entries. The Annals of Statistics, 46(6A):2871–2903, 2018.MathSciNetCrossRef

67.

Stanislav Minsker. Uniform bounds for robust mean estimators. arXiv preprint arXiv:1812.03523, 2018.

68.

Stanislav Minsker and Nate Strawn. Distributed statistical estimation and rates of convergence in normal approximation. arXiv preprint arXiv:1704.02658, 2017.

69.

A.S. Nemirovsky and D.B. Yudin. Problem complexity and method efficiency in optimization. 1983.

70.

Roberto I. Oliveira and Paulo Orenstein. The sub-Gaussian property of trimmed means estimators. Technical report, IMPA, 2019.

71.

Valentin V Petrov. Limit theorems of probability theory: sequences of independent random variables. Technical report, Oxford, New York, 1995.

72.

IG Shevtsova. On the absolute constants in the Berry–Esseen-type inequalities. In Doklady Mathematics, volume 89, pages 378–381. Springer, 2014.

73.

C.G. Small. A survey of multidimensional medians. International Statistical Review, pages 263–277, 1990.

74.

S.M. Stigler. The asymptotic distribution of the trimmed mean. The Annals of Statistics, 1:472–477, 1973.MathSciNetCrossRef

75.

B.S. Tsirelson, I.A. Ibragimov, and V.N. Sudakov. Norm of Gaussian sample function. In Proceedings of the 3rd Japan-U.S.S.R. Symposium on Probability Theory, volume 550 of Lecture Notes in Mathematics, pages 20–41. Springer-Verlag, Berlin, 1976.

76.

A. B. Tsybakov. Introduction to nonparametric estimation. Springer Series in Statistics. Springer, New York, 2009.CrossRef

77.

J.W. Tukey. Mathematics and the picturing of data. In Proceedings of the International Congress of Mathematicians, Vancouver, 1975, volume 2, pages 523–531, 1975.

78.

J.W. Tukey and D.H. McLaughlin. Less vulnerable confidence and significance procedures for location based on a single sample: Trimming/winsorization 1. Sankhyā: The Indian Journal of Statistics, Series A, 25:331–352, 1963.MathSciNetMATH

79.

L.G. Valiant. A theory of the learnable. Communications of the ACM, 27:1134–1142, 1984.CrossRef

80.

S. van de Geer. Applications of empirical process theory, volume 6 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2000.

81.

A.W. van der Waart and J.A. Wellner. Weak convergence and empirical processes. Springer, 1996.

82.

V.N. Vapnik and A.Ya. Chervonenkis. Theory of Pattern Recognition. Nauka, Moscow, 1974. (in Russian); German translation: Theorie der Zeichenerkennung, Akademie Verlag, Berlin, 1979.

83.

R. Vershynin. Lectures in geometric functional analysis. 2009.

Title: Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey
Authors: Gábor Lugosi
Shahar Mendelson
Publication date: 05-08-2019
Publisher: Springer US
Published in: Foundations of Computational Mathematics / Issue 5/2019
Print ISSN: 1615-3375
Electronic ISSN: 1615-3383
DOI: https://doi.org/10.1007/s10208-019-09427-x

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 5/2019

Lower Bounds on Matrix Factorization Ranks via Noncommutative Polynomial Optimization

Linear Differential Equations as a Data Structure

Second-Order Models for Optimal Transport and Cubic Splines on the Wasserstein Space

Variational Discretizations of Gauge Field Theories Using Group-Equivariant Interpolation

Interpolation, the Rudimentary Geometry of Spaces of Lipschitz Functions, and Geometric Complexity

Preface

Premium Partner