Big Data Analysis
High Dimensional Probability, Statistics, Optimization, and Inference
- 2025
- Book
- Author
- Junwei Lu
- Publisher
- Springer Nature Switzerland
About this book
This book covers the methods and theory of high dimensional probability, statistics, large-scale optimization, and inference. We aim to quickly bring readers to the frontier and interdisciplinary areas of statistics, optimization, probability, and machine learning. This book covers topics in:
High dimensional probability, Concentration inequality, Sub-Gaussian random variables, Chernoff bounds, Hoeffding's inequality, Maximal inequalities, High dimensional linear regression, Ordinary least square, Compressed sensing, Lasso, Variations of Lasso including group lasso, fused lasso, adaptive lasso, etc., General high dimensional M- estimators, Variable selection consistency, High dimensional Optimization, Convex geometry, Lagrange duality, Gradient descent, Proximal gradient descent, LARS, ADMM, Mirror descent, Stochastic optimization, Large-Scale Inference, Linear model hypothesis testing, high dimensional inference, Chi-square test, maximal test, and Higher criticism, False discovery rate control.
Table of Contents
-
Frontmatter
-
Foundations of Big Data Analysis
-
Frontmatter
-
Chapter 1. Introduction
Junwei LuThis chapter delves into the world of big data, defining it as high-dimensional data with vast sample sizes and feature dimensions. It highlights the three key features of big data: volume, velocity, and variety, and outlines the typical protocol for data analysis, which includes building high-dimensional statistical models, developing fast algorithms, and making statistical inferences. The chapter introduces the four cornerstones of modern big data analysis: probability, statistical learning, optimization, and inference. It also explores three major principles that guide big data analysis: the concentration principle, which emphasizes the convergence of random observations to the population truth as sample size increases; the parsimonious principle, which posits that only a small proportion of features in high-dimensional data are significant; and the Taylor principle, which suggests that most functions are 'almost' quadratic. These principles are illustrated through examples such as the sparse linear model and the additive model, providing a practical understanding of their application in big data analysis.AI Generated
This summary of the content was generated with the help of AI.
AbstractThis book aims to solve two major questions:1.How to analyze big data? (Method)2.Why it works? (Theory)To clarify the questions above, we need to define what is “big data.” In this book, big data is almost a synonym of “high-dimensional data.” The dataset, usually denoted as \(\mathbb {X}\), is an \(n \times d\) matrix, where n is the sample size and d is the number of features (or feature dimension). -
Chapter 2. Preliminaries in Probability
Junwei LuThis chapter delves into the fundamentals of probability theory, starting with the basics of statistical models and random samples. It clarifies the distinction between random samples and data, using the example of rolling a die to illustrate the concepts of random variables and their distributions. The chapter also explores distribution functions, including cumulative distribution functions (cdf), probability density functions (pdf), and probability mass functions (pmf). It further discusses key statistical measures like expectation and variance, and introduces the concept of statistics and their sampling distributions. The chapter concludes with an examination of asymptotic theory, covering topics such as convergence in probability and distribution, consistent estimators, and statistical rates. This comprehensive overview provides a solid foundation for understanding the language of uncertainty and its applications in data analysis.AI Generated
This summary of the content was generated with the help of AI.
AbstractWhen we roll a die, the small cube with six different numbers on its six faces will never give us a determinist answer before it comes to rest. Einstein thought even “God does not throw dice.” However, probabilists have to (as well as statisticians!). Actually, they have developed a dedicated language to describe the world of uncertainty. We will first begin with reviewing several fundamental terminologies in probability theory. -
Chapter 3. Preliminaries in Linear Algebra
Junwei LuThis chapter delves into the basics of linear algebra, covering essential topics such as matrices, their operations, and properties. It introduces the concepts of eigenvalues and eigenvectors, explaining their significance through the eigenvalue decomposition theorem. The chapter also explores the variational form of eigenvalues, providing a unique perspective on these mathematical constructs. Furthermore, it discusses the singular value decomposition, a generalization of eigenvalue decomposition applicable to non-square matrices. The chapter concludes with a visualization of the eigenvalue decomposition process, illustrating how a matrix transforms canonical unit vectors through a series of steps. This comprehensive overview equips readers with a solid foundation in linear algebra, enabling them to apply these concepts in their respective fields.AI Generated
This summary of the content was generated with the help of AI.
AbstractLinearity is the simplest structure in mathematics. Let’s review the basic notations and terminologies in linear algebra.
-
-
High-Dimensional Probability
-
Frontmatter
-
Chapter 4. Concentration Inequalities
Junwei LuThis chapter delves into the world of concentration inequalities, focusing on the distinction between asymptotic and non-asymptotic approaches. It begins by discussing the Law of Large Numbers and the Central Limit Theorem, which are fundamental asymptotic results in probability theory. These theorems describe the behavior of the sample mean as the number of samples grows to infinity. However, the chapter highlights two major problems with asymptotic properties: they lack information about the convergence rate and may fail in high-dimensional settings. To address these issues, the chapter introduces non-asymptotic concentration inequalities, which provide bounds on the tail probabilities of random variables for any fixed sample size and dimension. The chapter then explores sub-Gaussian random variables, which exhibit tail probabilities similar to those of Gaussian distributions. It presents several key results, including the Markov inequality, Chebyshev inequality, and Chernoff bound, which offer increasingly tighter bounds on tail probabilities. The chapter also discusses the concentration of sample means of sub-Gaussian random variables, culminating in the Hoeffding inequality. Through practical examples and clear explanations, this chapter demonstrates the importance of non-asymptotic concentration inequalities in modern data analysis and statistical learning.AI Generated
This summary of the content was generated with the help of AI.
AbstractIn the previous chapter, we discussed the concentration principle. It states that the more samples we have, the random observations converge to the population truth. In particular, we have the two important theorems in the probability theory describing this phenomenon. -
Chapter 5. Sub-exponential Random Variables
Junwei LuThis chapter delves into the concentration principle, extending beyond the sample average to encompass general statistics. It begins by introducing the general concentration principle, which states that a random variable concentrates to its mean under certain conditions. The McDiarmid inequality is a key focus, with a detailed proof and an example of its application in bounding the uniform rate of a kernel density estimator. The chapter also explores sub-exponential random variables, their moment-generating functions, and tail probabilities. It concludes with a theorem on the sample average of sub-exponential random variables. The text provides a comprehensive overview of these topics, making it an essential read for professionals seeking to understand the theoretical underpinnings of concentration inequalities in statistics and machine learning.AI Generated
This summary of the content was generated with the help of AI.
AbstractIn the previous chapter, we showed the concentration of sample average in the Hoeffding inequality. The asymptotic results like law of large numbers and central limit theorem are also about the sample average. If we look into the proof of these results, we can find that these results rely on the additive formality of the sample mean. So we have the impression that the concentration principle works for the average, but does it cover other statistics? In fact, we have many nonlinear estimators in statistics and machine learning. Can we expect that a general statistic \(f(X_1, \ldots , X_n)\) concentrates to its expectation? The answer is positive. Sample mean is not special. -
Chapter 6. Bernstein and Maximal Inequalities
Junwei LuThis chapter delves into the Bernstein inequality, a powerful tool in probability theory that provides a stronger concentration inequality for sub-exponential random variables. The text begins by defining the Bernstein condition, which is crucial for understanding the inequality. It then proceeds to prove the Bernstein inequality, demonstrating its superiority over the Hoeffding inequality under certain conditions. The chapter also explores the maximal inequality, which is essential for studying the uniform performance of multiple estimators, especially in high-dimensional scenarios. The text employs the discretization trick to bridge the gap between finite and infinite sets of random variables, providing a novel approach to controlling the tail probability. The chapter concludes with an example of implementing the discretization trick to the maximal inequality, offering practical insights into its application.AI Generated
This summary of the content was generated with the help of AI.
AbstractIn the previous chapter, we showed that the sample mean of independent sub-exponential random variables \(X_1, \ldots , X_n\) with the parameter \(\alpha \) has the tail probabilitywhere \(x \wedge y = \min (x, y)\) and \(x \vee y = \max (x, y)\). Therefore, with probability at least \(1- \delta \),$$\displaystyle \mathbb {P} (|\bar {X}_n - \mathbb {E} X| > t) \le = 2e^{- \frac {n}{2} \left ( \frac {t^2}{\alpha ^2} \wedge \frac {t}{\alpha } \right )}, $$We can see that the two types of sup-exponential tail probability give us two types of rate: \(O(\alpha / \sqrt {n})\) and \(O(\alpha / n)\). Although the second term is dominated by the first term, it implies the possibility of giving two types of rates in the concentration inequality. We are going to show a stronger concentration inequality of such type.$$\displaystyle \lvert \bar {X}_n - \mathbb {E} X \rvert \le \sqrt {\frac {\alpha ^2}{n} \log \Big (\frac {2}{\delta }}\Big ) \vee \left ( \frac {\alpha }{n} \log \Big (\frac {2}{\delta }\Big ) \right ). $$
-
-
High-Dimensional Statistics
-
Frontmatter
-
Chapter 7. Ordinary Least Squares
Junwei LuThis chapter delves into the Ordinary Least Squares (OLS) method, a cornerstone of linear regression. It begins by defining the linear regression model and introducing matrix notations for the design matrix, response vector, and noise vector. The text explores two primary goals in linear regression: prediction accuracy and parameter estimation, with a focus on the fixed design setting. The OLS estimator is presented with a closed-form formula and a geometric interpretation, illustrating how OLS finds the closest point in the space spanned by the design matrix to the response vector. The chapter also provides proofs for the geometric meaning of each entry in the OLS solution and discusses the statistical rate of mean squared error for OLS. Additionally, it covers the projection matrix and its role in projecting vectors onto the linear space spanned by the columns of the design matrix. The chapter concludes with a theorem on the mean squared error of least squares, providing insights into the statistical properties of OLS.AI Generated
This summary of the content was generated with the help of AI.
AbstractGiven the outcome \(Y_i\) and the covariates \(X_i\) for \(i = 1, \ldots , n\), a regression model assumeswhere \(\varepsilon _i\) is the error/noise. We typically assume that the error terms satisfy \(\mathbb {E} \varepsilon _i = 0\) and \(\epsilon _1, \ldots , \epsilon _n\) are independent.$$\displaystyle Y_i = f (X_i) + \varepsilon _i, \text{ for all } i = 1, \ldots , n, $$ -
Chapter 8. Compressive Sensing
Junwei LuThis chapter delves into the intricacies of high-dimensional linear models, focusing on the challenges posed by ordinary least squares in sparse data settings. It introduces the concept of sparse linear models, where only a few features are non-zero, and explores the limitations of traditional estimation methods. The text then presents the Lasso estimator, a powerful tool for variable selection and regularization, and discusses its geometric interpretation. Additionally, it covers compressive sensing, a technique for efficient signal recovery in high-dimensional spaces. The chapter also provides insights into the cone condition, which ensures perfect recovery in basis pursuit. Through clear explanations and illustrative figures, this chapter offers a comprehensive overview of these advanced topics, making it an invaluable resource for professionals seeking to understand and apply these methods in their work.AI Generated
This summary of the content was generated with the help of AI.
AbstractIn the high-dimensional setting, we’re essentially looking at the same linear model \(Y = \mathbb {X} \beta + \varepsilon \) with \(\mathbb {X} \in \mathbb {R}^{n \times d}\). However, we now expect the number of features d is much larger than its sample size n. Under the high-dimensional setting, the ordinary least squares estimator will have troubles. If the features are linearly independent, we have \(\mathrm {rank} (\mathbb {X}) = n\). Then, \(\mathbb {X} \widehat {\beta }^{\mathrm {LS}} = P_{\mathbb {X}} Y = Y\), i.e., the ordinary least squares will overfit. Therefore, we need to invoke the parsimonious principle and introduce the following sparse linear model. -
Chapter 9. Restricted Isometry Property
Junwei LuThis chapter delves into the Restricted Isometry Property (RIP), a key concept in compressive sensing that ensures perfect signal recovery. It addresses the challenges of constructing matrices that satisfy the RIP condition and explores how this property enables efficient signal compression. The chapter provides a detailed proof of perfect recovery under RIP and discusses the practical implications of using random matrices to meet the RIP criteria. Additionally, it offers a concrete method for constructing matrices that satisfy the 3s-RIP condition, answering critical questions about signal recovery and compression efficiency. By the end of the chapter, readers will understand how RIP simplifies the process of signal recovery and how it can be applied to compress high-dimensional signals effectively.AI Generated
This summary of the content was generated with the help of AI.
AbstractIn the previous chapter, we introduce the problem of compressive sensing: how to find the sparse truth \(\beta ^*\) from the linear equation \(Y=\mathbb {X}\beta ^*\). Recall that we list three major questions for the compressive sensing:1.What is the algorithm to recover \(\beta ^*\)?2.What kind of matrix \(\mathbb {X}\) can guarantee the recovery?3.How efficiently can we compress \(\beta ^*\), i.e., how small can n be with respect to d?The first question is solved by the basis pursuit estimator \(\widehat \beta = \operatorname *{\text{arg min}}_{\beta } \|\beta \|_1\) such that \(Y = \mathbb {X}\beta \). The second question is partially answered in Theorem 6.6 of Chap. 6, as we show that the cone condition \(\mathbb {C}(S)\bigcap \mathrm {Null}(\mathbb {X})=0\) is a sufficient and necessary condition for the perfect recovery of basis pursuit in Theorem 6.6. However, the cone condition is not easy to use in practice. It is not straightforward to construct \(\mathbb {X}\) starting from the cone condition. In this chapter, we will discuss another sufficient condition for perfect recovery, called restricted isometry property, which is stronger but easier to implement. We will talk about how to construct \(\mathbb {X}\) based on this property and answer the third question. -
Chapter 10. Statistical Properties of Lasso
Junwei LuThis chapter delves into the statistical properties of the Lasso estimator, a method for estimating high-dimensional linear models. It begins by revisiting the sparse linear model and introducing the restricted eigenvalue (RE) condition, a crucial concept for understanding Lasso's performance. The chapter compares the RE condition with the restricted isometry property (RIP), highlighting that the RE condition is less restrictive and more appropriate for Lasso. It provides a detailed explanation of why the RE condition is necessary for analyzing Lasso, supported by visual representations of the least squares loss landscape and the Hessian matrix. The chapter also presents the statistical rate of the Lasso estimator, discussing how the rate depends on the curvature of the loss function and the choice of the tuning parameter. It concludes with a concrete example of a design matrix satisfying the RE condition, demonstrating the practical implications of the theoretical analysis.AI Generated
This summary of the content was generated with the help of AI.
AbstractIn this chapter, we return to the noisy linear regression. Recall the sparse linear model \(Y = \mathbb {X} \beta ^* + \varepsilon \), where \(\mathbb {X} \in \mathbb {R}^{n\times d}\) and \(\|\beta ^*\|_0 \le s\). We estimate the high-dimensional linear model via the Lasso estimatorIn this chapter, we will study the statistical properties of the Lasso estimator. Like the RIP condition for the basis pursuit, we also need conditions for Lasso.$$\displaystyle \widehat \beta = \operatorname *{\text{arg min}}_{\beta }\frac {1}{2n} \|Y-\mathbb {X}\beta \|_2^2 + \lambda \|\beta \|_1. $$ -
Chapter 11. Variations of Lasso
Junwei LuThis chapter delves into the limitations of the Lasso estimator and explores its extensions in high-dimensional statistics. It begins by outlining the key limitations of Lasso, including its restriction to linear models, bias, and sensitivity to tuning parameters. The chapter then discusses various extensions of Lasso, such as the generalized Lasso for high-dimensional models, high-dimensional classification models like logistic regression and linear discriminant analysis, and graphical models for network visualization. It also introduces innovative approaches to overcome Lasso's biases and sensitivities, including the adaptive Lasso, SCAD penalty, and square-root Lasso. The chapter concludes with a discussion on the application of Lasso and its extensions to heavy-tailed noises using quantile regression. Throughout the chapter, concrete examples and practical applications are provided, making it a valuable resource for professionals seeking to enhance their understanding of Lasso and its extensions.AI Generated
This summary of the content was generated with the help of AI.
AbstractIn the previous chapter, we study the high-dimensional linear model \(Y = \mathbb {X}\beta ^* + \epsilon \), with \(\mathbb {X} \in \mathbb {R}^{n \times d}\) and \(\|\beta ^*\|_0\le s\). We propose to estimate \(\beta ^*\) via Lasso estimatorWe consider two assumptions: (1) the design matrix satisfies the restricted eigenvalue condition and (2) the noises \(\varepsilon \) are independent sub-Gaussians with variance proxy \(\sigma ^2\). If we choose \(\lambda = C\sigma \sqrt {\log d/n}\) for some sufficiently large constant C, we show that the Lasso estimator has the statistical rate \(\| \widehat \beta ^{\mathrm {Lasso}} - \beta ^*\|_2 = O_P(\sqrt {s\log d/n})\).$$\displaystyle \widehat \beta ^{\mathrm {Lasso}} = \operatorname *{\text{arg min}}_{\beta } \frac {1}{2n}\|Y - \mathbb {X} \beta \|_2^2 + \lambda \|\beta \|_1. $$
-
-
High-Dimensional Optimization
-
Frontmatter
-
Chapter 12. Convexity and Subgradient
Junwei LuThis chapter delves into the world of convex optimization, a crucial concept in solving high-dimensional optimization problems. It begins by defining convex sets and functions, illustrating their properties with geometric interpretations. The chapter then introduces the concept of subgradients, which are essential for handling non-smooth convex functions. It explores the first-order methods, which are particularly efficient in high-dimensional optimization due to their reliance on gradients rather than Hessian matrices. The chapter also discusses the optimality conditions for both unconstrained and constrained convex optimization problems, providing a clear understanding of when a solution is indeed the global minimum. Furthermore, it highlights the practical challenges posed by high-dimensional data, such as storage and computation issues, and offers insights into how to tackle these challenges effectively. The chapter concludes with a discussion on the importance of convexity in ensuring that local minima are also global minima, a property that greatly simplifies the optimization process.AI Generated
This summary of the content was generated with the help of AI.
AbstractFrom the previous chapters, we can see that many estimators can be formulated as an optimization problem. -
Chapter 13. Gradient Descent
Junwei LuThis chapter delves into the design and application of gradient descent algorithms for solving convex optimization problems. It begins with the unconstrained problem, explaining how the steepest descent direction is determined and how the gradient descent algorithm iteratively minimizes the objective function. The concept of L-smoothness is introduced to ensure good convergence properties, and the convergence rate of gradient descent is thoroughly analyzed. The chapter then extends to constrained problems, introducing the Frank-Wolfe algorithm, which ensures that the solution remains within the feasible set. Practical examples, such as the power iteration for finding the leading eigenvector of a matrix and the constrained Lasso problem, illustrate the application of these algorithms. Finally, the chapter explores accelerated gradient descent, which exploits the history of the trajectory to achieve faster convergence. The accelerated gradient descent algorithm is compared to the standard gradient descent, highlighting its advantages and the conditions under which it outperforms the traditional method. Throughout the chapter, detailed proofs and visual aids provide a comprehensive understanding of the algorithms' behavior and convergence properties.AI Generated
This summary of the content was generated with the help of AI.
AbstractIn this chapter, we will start designing algorithms to solve the convex optimizationOur goal is to find the minimizer \(x^* = \operatorname *{\text{arg min}}_{x\in {M}}f(x)\). Let us start with the unconstrained problem first with \({M} = \mathbb {R}^d\). If we start our search for \(x^*\) at some value \(x_0\), we aim to move to the next point such that the value of \(f(x)\) becomes smaller.$$\displaystyle \min _{x \in {\mathcal X}} f(x), \text{ where }f\text{ and }{M}\text{ are convex}. $$ -
Chapter 14. Proximal Gradient Descent
Junwei LuThis chapter delves into the proximal gradient descent algorithm, a powerful tool for optimizing composite loss functions. It begins by revisiting the gradient descent algorithm and its convergence rates for smooth objective functions. The focus then shifts to handling non-smooth penalty terms, which are common in high-dimensional M-estimators like Lasso. The proximal gradient descent algorithm is introduced as a solution to maintain fast convergence rates despite the non-smoothness of the objective function. The chapter provides a new perspective on gradient descent, viewing it as minimizing a local quadratic approximation of the objective function. It then modifies this perspective to derive the proximal gradient descent algorithm. Practical examples, such as constrained optimization and Lasso, are provided to illustrate the algorithm's application. The chapter also explores the accelerated proximal gradient descent algorithm, which combines Nesterov's acceleration idea with the proximal gradient descent. The convergence rates of these algorithms are thoroughly analyzed, and proofs are provided to support the theoretical claims. The chapter concludes with a discussion on the Lyapunov function, which is used to prove the convergence rate of the accelerated proximal gradient descent algorithm.AI Generated
This summary of the content was generated with the help of AI.
AbstractIn the previous chapter, we introduce the gradient descent and accelerated gradient algorithm to solve the unconstrained optimization. We show the convergence rates of these two algorithms when the objective function is smooth. However, in Lasso \(\min _{\beta } \frac {1}{2}\|Y - \mathbb {X}\beta \|_2^2 + \lambda \|\beta \|_1\), the \(\ell _1\)-norm penalty term is not smooth. -
Chapter 15. Mirror Descent
Junwei LuThis chapter delves into the world of optimization algorithms, focusing on Mirror Descent and Bregman Divergence. It begins by introducing the proximal perspective of gradient descent and the concept of Bregman Divergence, which is a generalization of the quadratic norm. The text explains how Bregman Divergence can lead to more efficient algorithms by better fitting the geometry of the problem. It also discusses the Mirror Descent algorithm, which uses Bregman Divergence in the proximal term, and compares it with other algorithms like the Frank-Wolfe algorithm and Projected Gradient Descent. The chapter provides practical examples, such as the Probability Simplex, to illustrate how to choose the proper Bregman Divergence under specific constraints. Furthermore, it explores Nesterov’s Smoothing, a technique to approximate non-smooth functions with smooth ones, and discusses its application in optimization problems. The text concludes with a theorem that shows the convergence rate using Nesterov’s smoothing idea. Throughout the chapter, the text uses clear explanations and visualizations to make complex concepts more understandable.AI Generated
This summary of the content was generated with the help of AI.
AbstractIn the previous chapter, we introduce the proximal perspective of the gradient descent. -
Chapter 16. Duality and ADMM
Junwei LuThis chapter delves into the concept of duality in optimization and its application to solve composite objective functions. It begins by reviewing duality, highlighting its importance in converting primal problems into dual problems, and illustrating this with the Lasso problem. The chapter then introduces the Alternating Direction Method of Multipliers (ADMM), a powerful algorithm for solving composite optimization problems. It demonstrates the application of ADMM to various problems, including the fused Lasso, graphical Lasso, and consensus optimization for massive data. The chapter concludes with a discussion on the distributed nature of ADMM, making it a valuable tool for large-scale optimization tasks. Readers will gain insights into the power of duality and ADMM, and how these methods can be applied to efficiently solve complex optimization problems.AI Generated
This summary of the content was generated with the help of AI.
AbstractIn the previous chapters, we introduce the proximal gradient descent to solve the optimization problem \(\min _x f(x) + g(x)\), where f is smooth but g is not differentiable.
-
-
High-Dimensional Inference
-
Frontmatter
-
Chapter 17. High Dimensional Inference
Junwei LuThis chapter delves into the intricacies of high-dimensional statistical inference, focusing on estimation and inference problems. It begins by introducing the major goals of estimation, such as finding estimators and understanding their convergence rates. The text then shifts to inference, emphasizing the importance of uncertainty assessment, confidence intervals, and hypothesis testing. A significant portion of the chapter is dedicated to high-dimensional inference, where the parameters of interest are typically larger than the sample sizes. The chapter discusses the challenges of multiple hypothesis testing, the family-wise error rate, and the false discovery rate. It also reviews important theoretical results, such as the central limit theorem and Slutsky's theorem, which are crucial for understanding the asymptotic normality of least squares. The chapter concludes with a detailed derivation of the asymptotic normality for ordinary least squares, providing a comprehensive overview of the topic.AI Generated
This summary of the content was generated with the help of AI.
AbstractWe start the part of high-dimensional inference by introducing the problems in statistical inference. Given the statistical model \(\{\mathbb {P}_{\theta }|\theta \in \Theta \}\), we observe \(X_1,\ldots ,X_n \stackrel {iid}{\sim } \mathbb {P}_{\theta ^*}\) where \(\theta ^*\) is the truth. Here are the major goals to estimation and inference. -
Chapter 18. Debiased Lasso
Junwei LuThis chapter delves into the debiased Lasso method for conducting inference in high-dimensional linear models. It begins by deriving confidence intervals for the Lasso estimator, decomposing it into a bias, a leading term, and a remainder term. The text then proves the asymptotic normality of the debiased Lasso, demonstrating that the estimator converges in distribution to a normal distribution under certain conditions. The chapter also explores the feasibility of the CLIME estimator and its role in satisfying the necessary conditions for the debiased Lasso. Additionally, it generalizes the debiasing method to general high-dimensional M-estimators, discussing the assumptions required for asymptotic normality. The chapter concludes with a comparison of Lasso and debiased Lasso, highlighting the stronger assumptions needed for the latter. This detailed exploration provides valuable insights into the debiased Lasso method and its applications in statistical inference.AI Generated
This summary of the content was generated with the help of AI.
AbstractIn this chapter, we aim to conduct inference for high-dimensional linear model. Recall the sparse linear model \(Y = \mathbb {X} \beta ^* + \varepsilon \), where \(\mathbb {X} \in \mathbb {R}^{n\times d}\) and \(\|\beta ^*\|_0 \le s\). -
Chapter 19. Multiple Hypotheses
Junwei LuThis chapter delves into the intricacies of conformal inference and multiple hypotheses testing, offering a robust framework for building confidence intervals without relying on unnecessary assumptions. The text begins by explaining the concept of conformal inference, which aims to construct confidence intervals for predictions using i.i.d. random pairs. It highlights the importance of symmetry and the uniform distribution in constructing these intervals, ultimately leading to a method that avoids overfitting and can be generalized to more sophisticated frameworks. The chapter then shifts its focus to multiple hypotheses testing, particularly in scenarios where the number of hypotheses, N, can be very large, such as in genome-wide association studies (GWAS). It discusses the challenges of controlling the family-wise error rate (FWER) and introduces the Bonferroni correction, which, while conservative, provides a straightforward method for controlling FWER. The text also explores the maximal statistic as an alternative to the Bonferroni correction, offering a more efficient way to utilize p-values and control FWER. The chapter concludes by outlining the next steps in estimating the quantile of maximal statistics, setting the stage for further advancements in this field. Throughout the chapter, the text provides a detailed and practical approach to these statistical methods, making it an invaluable resource for professionals seeking to enhance their understanding of conformal inference and multiple hypotheses testing.AI Generated
This summary of the content was generated with the help of AI.
AbstractThe conformal inference aims to build confidence intervals for predictions without any unnecessary assumptions, especially those about models. -
Chapter 20. False Discovery Rate
Junwei LuThis chapter delves into sophisticated statistical techniques for managing false discovery rates (FDR) and family-wise error rates (FWER) in hypothesis testing. It begins by discussing the Gaussian multiplier bootstrap method, which is used to estimate the quantile of the maximal statistic, particularly when statistics are dependent and only asymptotically normal. The chapter provides a detailed procedure for estimating the quantile of the maximal statistic, including handling cases where the covariance matrix is unknown. Additionally, it explores the Benjamini-Hochberg procedure for controlling the false discovery rate, particularly when p-values are independent. The chapter includes a proof that demonstrates the effectiveness of the Benjamini-Hochberg procedure in controlling FDR. Throughout, the text offers practical examples and theoretical insights, making it a comprehensive guide for professionals looking to refine their statistical analysis skills.AI Generated
This summary of the content was generated with the help of AI.
AbstractWe will continue the discussion on control family-wise error rate via the maximal statistic. Given hypotheses \(\{ H_{0i} \}_{i=1}^N\), for each \(H_{0j}\), we have a statistic \(T_j\) such that for a single hypothesis, we will reject \(H_{0j}\) if \(T_j \ge q_{\alpha }\) where \(q_{\alpha } = \operatorname *{\text{arg min}}_t\mathbb {P}_{H_0}(|T_j|>t) \le \alpha \). -
Chapter 21. Knock-Off
Junwei LuThis chapter delves into the challenging task of controlling the false discovery rate (FDR) when dealing with dependent p-values in statistical hypothesis testing. It begins by revisiting the definition of FDR and the case of independent p-values before exploring the more complex scenario of dependent p-values. The text introduces a framework for selecting features related to a response variable, such as phenotypes or SNPs, and discusses the use of permutation tests for controlling the FDR. However, it highlights the limitations of permutation tests through a counterexample and proposes the knock-off approach as an alternative. The knock-off method involves constructing dummy variables and defining a knock-off score with specific properties. The chapter provides a detailed description of the knock-off procedure, including a proof of its validity using martingales and the optimal stopping theorem. It also discusses the estimation of the false discovery proportion and the conditions under which the knock-off procedure is effective. The chapter concludes with a discussion of the knock-off approach's advantages and its potential applications in various fields.AI Generated
This summary of the content was generated with the help of AI.
AbstractWe will continue the discussion of controlling the false discovery rate (FDR). When testing null hypotheses \(\{H_{0j}\}_{j=1}^d\), recall that the FDR is defined asIn the previous chapter, we discussed the case where the p-values corresponding to the \(\{H_{0j}\}_{j=1}^d\) were independent. Here, we consider the more challenging case where the p-values are dependent.$$\displaystyle \text{FDR} = \mathbb {E} \bigg ( \frac {\# \text{False Positives}}{\# \text{Rejected Hypotheses}} \bigg ). $$
-
-
Backmatter
- Electronic ISBN
- 978-3-032-03161-7
- Print ISBN
- 978-3-032-03160-0
- DOI
- https://doi.org/10.1007/978-3-032-03161-7
PDF files of this book have been created in accordance with the PDF/UA-1 standard to enhance accessibility, including screen reader support, described non-text content (images, graphs), bookmarks for easy navigation, keyboard-friendly links and forms and searchable, selectable text. We recognize the importance of accessibility, and we welcome queries about accessibility for any of our products. If you have a question or an access need, please get in touch with us at accessibilitysupport@springernature.com.