Top

Empirical Economics

Published in:

Open Access 29-01-2023

Hotelling tubes, confidence bands and conformal inference

Author: Roger Koenker

Published in: Empirical Economics | Issue 6/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

Stochastic frontier models and methods as pioneered by Peter Schmidt in Aigner et al. (J Econom 6:21–37, 1977), Horrace and Schmidt (J Product Anal 7:257–282, 1996), Amsler et al. (J Econom 190:280–288, 2016) constitute a rare departure from the usual econometric obsession with models for conditional means. They also provided an early stimulus for the development of quantile regression methods. After a brief tutorial on Hotelling tube methods for constructing confidence bands for nonparametric quantile regression, strengthened performance guarantees for such bands are described based on recent developments in conformal inference. These methods may be considered to be a rather idiosyncratic new approach to nonparametric inference for stochastic frontier models.

Version: January 14, 2023. Code to reproduce the numerical results reported here is available from http://www.econ.uiuc.edu/~roger/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

One of my indelible memories of Peter Schmidt was a conversation we had in my kitchen at a party for Midwest Econometrics Group participants in 1993 about the uneasy relationship between statistics and econometrics. “If a statistical tree falls in the forest, but no econometrician sees it,” Peter said matter-of-factly, “then it never happened.” In 1939 Harold Hotelling, arguably one of the most eminent statisticians and econometricians of the twentieth century witnessed such an event and wrote about in Hotelling (1939). The paper inspired Hermann Weyl to write a highly influential paper, Weyl (1939) generalizing it. Hotelling’s idea has attracted a small coterie of admirers in statistics, but it is fair to say that it remains almost unknown in econometrics.

My quixotic aim in this paper is to rescue Hotelling’s idea from econometric obscurity. I will begin by describing a simple setting in which the idea can be employed to construct a confidence interval for a scalar parameter that enters awkwardly in a standard regression problem. Then, I will describe how it can be used to construct uniform confidence bands for nonparametric regression using penalty methods, and finally I will compare performance with confidence bands constructed with recently developed methods of conformal inference.

2 Hotelling’s regression problem

Consider the nonlinear regression model

$$\begin{aligned} Y_i=x_i^\top \alpha + \lambda _i(\tau )\beta + \varepsilon _i \end{aligned}$$

where $\alpha , \beta , \tau $ are unknown parameters, $\lambda _i(\cdot )$ are known functions and $\varepsilon _i\sim {{\mathcal {N}}}(0, \sigma ^2)$. For the sake of concreteness, we might interpret $\lambda _i (\tau )$ as a Box-Cox transformation of another covariate, say $(z_i^\tau -1)/\tau $. We would like to test $H_0: \beta =0$. Under the null, the Box-Cox parameter $\tau $ is not identified, so we need to consider strategies that properly account for this.¹

By the familiar (Frisch and Waugh 1933) trickery, we can eliminate the $\alpha $ effect.² Redefining the notation and assuming for convenience that $\sigma ^2 = 1$, we are left with the likelihood ratio statistic

$$\begin{aligned} L = \inf _\tau \sum (Y_i-{\hat{\beta _\tau }} \lambda _i(\tau ))^2/\sum Y_i^2 \end{aligned}$$

Now, denoting the n-vectors, $Y = (Y_i)$, $\lambda = (\lambda _i)$, and the Euclidean norm by $\Vert \cdot \Vert $, ${\hat{\beta _\tau }} = Y^\top \lambda (\tau )/ \Vert \lambda (\tau ) \Vert ^2$ so we can rewrite,

$$\begin{aligned} L= & {} \inf _\tau \Vert Y \Vert ^{-2} ( \Vert Y \Vert ^2 - 2(Y^\top \lambda )^2/ \Vert \lambda \Vert ^2 + (Y^\top \lambda )^2/ \Vert \lambda \Vert ^2)\\= & {} 1 - \sup _\tau \left( \frac{Y^\top \lambda (\tau )}{ \Vert \lambda (\tau ) \Vert \Vert Y \Vert }\right) ^2\\\equiv & {} 1 - \sup _\tau (\gamma (\tau )^\top U )^2 \end{aligned}$$

Now $ U =Y/ \Vert Y \Vert $ is uniformly distributed on the sphere $S^{n-1}$ and $\gamma (\tau ) = \lambda (\tau )/ \Vert \lambda (\tau ) \Vert $ is a curve in $S^{n-1}$. Thus, the test rejects when $W=\sup _\tau \gamma (\tau )^\top U $ exceeds some value $w=\cos \theta $ which is equivalent to

$$\begin{aligned} U \in \gamma ^\theta= & {} \{u\in S^{n-1}: \sup _t u^\top \gamma (t)\ge \cos \theta \}\\= & {} \{ u\in S^{n-1}: \text {d}(u, \gamma ) \le (2(1-w))^{1/2}\}. \end{aligned}$$

Note that the original definition of L is such that we reject for small values, so $L<c, $ implies we reject for $\sup _\tau \gamma (\tau )^\top U > w = \cos \theta $ for some critical value of $\theta $. This is illustrated in Fig. 1 of Johansen and Johnstone (1990), reproduced here as Fig. 1. They call this the “angular or geodesic radius $\theta $ about $\gamma $:”

$$\begin{aligned} d^2(u, \gamma )= & {} \sin ^2(\theta ) + (1-\cos (\theta ))^2\\= & {} 1- 2\cos \theta + \cos ^2\theta + \sin ^2\theta \\= & {} 2(1-\cos \theta ). \end{aligned}$$

So when the distance $d(u,\gamma )$ is small, U falls inside tube, and we reject. This may seem a bit counter-intuitive, but is nonetheless correct. There are probably many ways to it sound more intuitive. Here is one possibility. Since it all boils down to a cosine, that is the simple correlation between $\lambda (\tau )$ and Y, we want to reject $H_0: \beta =0$ if this correlation/cosine is too large, but Y’s that make it too large are the $Y's$ that fall inside the tube.

So how do we compute the critical w or equivalently the critical $\theta $? Since $W>w \equiv \cos \theta $ is equivalent to U being in the tube, we need the volume of the tube. Let $| \gamma |$ denote the length of the arc $\gamma (\tau )$ on the sphere. This can be approximated by the finite difference formula,

$$\begin{aligned} | \gamma | = \int \Vert \dot{\gamma }(\tau ) \Vert \text {d} \tau \approx \sum _{i=2}^m \Vert (\gamma (\tau _i) - \gamma (\tau _{i-1}))\Vert , \end{aligned}$$

and the $\tau $’s are chosen on some relatively fine grid of m points. Note that in the finite difference approximation the $\tau _i - \tau _{i-1}$ that would normally appear in the denominator of the difference quotient inside the norm expression cancels with the contribution of the $\text {d}\tau $.

Theorem 1

If $\gamma $ is a non-closed regular curve in $S^{d-1}$ then for w near 1,

$$\begin{aligned} {{\mathcal {P}}}(W\ge w) = \frac{| \gamma |}{2\pi }(1-w^2)^{\frac{d-2}{2}} +\frac{1}{2} {{\mathcal {P}}} (B\left( \frac{1}{2}, \frac{d-1}{2}\right) \ge w^2) \end{aligned}$$

(1)

where $B(1/2, (d-1)/2) $ is a beta random variable. If $\gamma $ is closed, i.e., forms a closed loop without end points, then the second “cap” term is omitted.

We ignore pathological complications involving self-intersections of the curve, $\gamma $. This follows from a result of Hotelling (1939), as does the next theorem.

Theorem 2

Let $\gamma $ be a regular closed curve in $S^{d-1}$ with length $|\gamma |$. And

$$\begin{aligned} \gamma ^\theta= & {} \{u \in S^{d-1}\big | \sup _t u^\top \gamma (t) \ge \theta \}\\= & {} \{u\in S^{d-1} \big | \text {d}(u, \gamma ) \le (2(1-w))^{1/2}\} \end{aligned}$$

where $w=\cos \theta $. If $\theta $ is sufficiently small, then the volume of the tube $V(\gamma ^\theta )$ is given by

$$\begin{aligned} V(\gamma ^\theta ) = |\gamma | \Omega _{d-2} \sin ^{d-2} \theta \end{aligned}$$

(2)

where $\Omega _{d-2} = \pi ^{(d-2)/2}\Gamma (d/2)$ is the volume of the unit ball in $R^{d-2}.$

Heuristically, the formula is,

$$\begin{aligned} V(\gamma ^\theta ) = (\text{ length } \text{ of } \text{ tube}) \cdot (\text{ Volume } \text{ of } \text{ unit } \text{ ball}) \cdot \text{ radius}^{d-2} \end{aligned}$$

Recall that the volume of the unit ball in dimension d is $V=\pi ^{d/2}/\Gamma ((d+2)/2)$. When $\theta $ is larger, or $\gamma $ is twisty, then the tube may intersect itself and the formula would need some refinement. Figure 2 is a crude attempt to depict tube on the 2-sphere, those with enhanced geometric imagination may try to visualize a three dimensional tube on the 3-sphere embedded in 4-space.

When the curve is not closed, then it needs “caps” on each end. These caps are given by

$$\begin{aligned} w_{d-2}\int _{\cos \theta }^1 (1-z^2)^{(d-3)/2}\text {d}z \end{aligned}$$

where $w_{d-2} = 2\pi ^{(d-1)/2} /\Gamma ((d-1)/2)$ is the $(d-2)$-volume of $S^{d-2}$. Note that the volume of the sphere, $V(S^{d-1}) = 2 \pi ^{d/2} /\Gamma (d/2)$, is not the same as the volume of the ball. Note also that $(1-z^2)^{1/2}$ is again the radius and integrating out the $r^{d-3}$ yields a $d-2$ dimensional volume. A useful reference for this sort of geometry is Kendall (1961).

How do we get from (2) to (1)? Recall that U is uniform on the $(d-1)$ sphere so we need to divide by the volume of that sphere to evaluate the probability of being in the tube, so for closed curves,

$$\begin{aligned} \frac{V(\text{ tube})}{V(\text{ sphere})}= & {} \frac{ | \gamma | \Omega _{d-2}\sin ^{d-2}\theta }{2\pi ^{d/2}/\Gamma (d/2)}\\= & {} \frac{ | \gamma | (\pi ^{(d-2)/2}/\Gamma (d/2))\sin ^{d-2}\theta }{2\pi (\pi ^{(d-2)/2}/\Gamma (d/2))}\\= & {} \frac{ | \gamma | }{2\pi }(1-w^2)^{(d-2)/2}. \end{aligned}$$

To include caps, we also need to divide by the volume of the sphere. Note that

$$\begin{aligned} \mathbb {P}(B_{1/2, \frac{d-1}{2}} \ge w^2)= & {} \int _{w^2}^1 \left[ x^{1/2-1}(1-x)^{\frac{d-1}{2}-1}/B(1/2, \frac{d-1}{2})\right] \text {d}x\\= & {} \int _{w^2}^1 \left[ x^{-1/2} (1-x)^{\frac{d-3}{2}}/B\right] \text {d}x. \end{aligned}$$

Changing variables $x\rightarrow y^2$, we have

$$\begin{aligned}= & {} \int _{y_0}^1\left[ y^{-1}(1-y^2)^{\frac{d-3}{2}}/B\right] 2y \text {d}y\\= & {} 2\int _{y_0}^1 B^{-1}(1-y^2)^{\frac{d-3}{2}}\text {d}y. \end{aligned}$$

It remains to show that $B^{-1} = w_{d-2}/\text{V(sphere) }$, which follows after a little simplification and recalling that $\Gamma (1/2) = \sqrt{\pi }$.

To check how the Hotelling tube procedure performs in moderate sample sizes, Table 1 reports results of a small simulation experiment. Data are generated with iid $x_i$ standard log-normal and

$$\begin{aligned} y_i = \beta _n (x_i^\tau - 1)/\tau + \epsilon _i, \; u \sim \mathcal {N}(0,1). \end{aligned}$$

Three values of $\tau $ are considered $\tau \in \{ -0.5, 0, 0.5 \}$. Local alternatives, $\beta _n = \beta _0/\sqrt{n}$, are considered with $\beta _0 \in \{ 0, 1, 2\}$. The nominal level of the Hotelling test is taken to be 0.05. and 1000 replications of the experiment are made for each parametric setting. When $\beta = 0$ so the null is true, the test delivers quite accurate size for all of the sample sizes considered, and power is respectable when $\beta $ deviates from zero.

3 Uniform confidence bands for nonparametric regression

Consider the series expansion model

$$\begin{aligned} Y_i=\sum _{j=1}^d \beta _j a_j(t_i) + \varepsilon _i \end{aligned}$$

with $\varepsilon _i\sim {{\mathcal {N}}}(0, \sigma ^2)$ as before and $t\in I \subset {\mathbb {R}}$. Our objective is to find a positive c such that

$$\begin{aligned} P_{\beta ,\sigma , \Sigma }(|\beta ^\top a(t) - {\hat{\beta ^\top }} a(t)| \le c\sigma (a(t)^\top \Sigma a(t))^{1/2} \; \forall \; t\in I) \approx 1-\alpha , \end{aligned}$$

uniformly in $\beta , \sigma .$ Johansen and Johnstone (1990) write this as, $ P_{\beta ,\sigma , \Sigma } (T < c\sigma ) $ where

$$\begin{aligned} T=\sup _{a\in C} \frac{a^\top ({\hat{\beta }} - \beta )}{\sqrt{a^\top \Sigma a}} \end{aligned}$$

Table 1

Rejection frequencies for the Hotelling likelihood ratio test for a simple Box-Cox example

	$\beta _0 = $ 0			$\beta _0 = $ 1			$\beta _0 = $2
	$\tau = $ −0.5	$\tau = $ 0	$\tau = $ 0.5	$\tau = $ −0.5	$\tau = $ 0	$\tau = $ 0.5	$\tau = $ −0.5	$\tau = $ 0	$\tau = $ 0.5
n = 20	0.056	0.058	0.049	0.313	0.193	0.182	0.781	0.459	0.380
n = 50	0.049	0.051	0.057	0.275	0.225	0.342	0.639	0.577	0.782
n = 100	0.063	0.048	0.056	0.350	0.261	0.281	0.840	0.637	0.704
n = 500	0.048	0.052	0.055	0.298	0.243	0.288	0.747	0.612	0.735
n = 1000	0.063	0.046	0.047	0.299	0.218	0.250	0.724	0.549	0.667

Tests are nominal level $\alpha = 0.05$. Local alternatives are employed of the form: $\beta _n = \beta _0 / \sqrt{n}$

Now consider $X\sim {{\mathcal {N}}}(\xi , \Sigma )$, so X plays the role of ${\hat{\beta }}$ and $\xi $ of $\beta .$ We’d like to make a confidence statement about $\{a^\top \xi |a\in C\}$ and C is some sort of “curve.” So now we write,

$$\begin{aligned} T=T(X, \xi ) = \sup _{a\in C} \frac{a^\top (X-\xi )}{\sqrt{a^\top \Sigma a}}. \end{aligned}$$

We want the distribution of T, so we can obtain the confidence set

$$\begin{aligned} R_x=\{ \{a^\top \xi \}_{a\in C} | T(X, \xi ) < c_{1-\varepsilon }\} \end{aligned}$$

where $P_{\xi , \Sigma } (T< c_{1-\varepsilon }) = 1-\varepsilon .$ Write $T=RW$ where,

$$\begin{aligned} R^2=(X-\xi )^\top \Sigma ^{-1}(X-\xi ) \sim \chi _d^2, \end{aligned}$$

and

$$\begin{aligned} W = \sup _{a\in C} \frac{a^\top (X-\xi )}{\sqrt{a^\top \Sigma a}\sqrt{(X-\xi )\Sigma ^{-1}(X-\xi )}} = \sup _{a\in C} \frac{(\Sigma ^{1/2}a)^\top \Sigma ^{-1/2} (X-\xi )}{ | \Sigma ^{1/2} a | | \Sigma ^{-1/2}(X-\xi ) | }. \end{aligned}$$

Now to put things back into the earlier framework of $\gamma $ and U we set,

$$\begin{aligned} \gamma (a)= & {} \frac{\Sigma ^{1/2}a}{ | \Sigma ^{1/2} a | }\\ U= & {} \Sigma ^{-1/2} (X-\xi )/ | \Sigma ^{-1/2} (X-\xi ) |. \end{aligned}$$

So as before, $\gamma =\gamma (C) \subset S^{d-1}$, and U is uniform on $S^{d-1}$. R and W do not depend on $\xi , \Sigma $ or they do, but only via $\gamma $. $R^2$ is independent of W and $R^2\sim \chi _d^2$ so,

$$\begin{aligned} \mathbb {P}(T>c)=\int _c^\infty \mathbb {P}(W>c/r) \mathbb {P}(R\in dr). \end{aligned}$$

The random variable W has the same form as in the simple example so,

$$\begin{aligned} \mathbb {P}(W>w) = \frac{ | \gamma | }{2\pi }(1-w^2)^{(d-2)/2} + \frac{1}{2}{{\mathcal {P}}}(B\ge w^2) \equiv b_\gamma (w). \end{aligned}$$

(Naiman 1986) bounds this probability by,

$$\begin{aligned} \mathbb {P}(T>c) \le \int _c^\infty \min \{b_\gamma (c/r), 1\} {\mathcal P}(R\in \text {d}r), \end{aligned}$$

and Knowles (1987) suggests ignoring the $b_\gamma <1$ constraint and then integrates the bound to obtain,

$$\begin{aligned} \mathbb {P}(T>c) \le \frac{ | \gamma | }{2\pi } e^{-c^2/2} + 1-\Phi (c). \end{aligned}$$

This integration may appear somewhat miraculous, but does actually work out provided that one carefully observes the ${{\mathcal {P}}}(R\in \text {d}r)$ term. Since $R^2\sim \chi _d^2$, letting F denote the distribution function of $\chi _d^2$, we have,

$$\begin{aligned} \mathbb {P}(R\le r) = {{\mathcal {P}}}(R^2\le r^2) = F(r^2) \end{aligned}$$

so the corresponding density of R is

$$\begin{aligned} f_{\scriptscriptstyle R}(r) = 2rF'(r^2) = 2rf_{{\scriptscriptstyle R}^2}(r^2). \end{aligned}$$

Once one has this bound then various other things fall into place. For example,

$$\begin{aligned} {{\mathcal {P}}}(|T|>c) \le 2\mathbb {P}(T>c). \end{aligned}$$

(Johansen and Johnstone 1990) give further details on the accuracy of the bounds and applications.

4 Additive models for total variation penalized nonparametric quantile regression

In Koenker (2011), I have described a general approach to estimation and inference for additive nonparametric quantile regression models of the form,

$$\begin{aligned} Q_{{\scriptscriptstyle Y}_i|x_i, z_i} (\tau |x_i, z_i) = x_i^\top \theta _0 + \sum _{j=1}^J g_j (z_{ij}). \end{aligned}$$

The components $g = (g_1, \cdots , g_J)$ can be univariate or bivariate. Their smoothness can be controlled by penalizing total variation of the functions themselves or their gradients. Estimation is carried out by solving the linear program,

$$\begin{aligned} \min _{(\theta _0, g)} \sum \rho _\tau (y_i-x_i^\top \theta _0 - \sum g_j(z_{ij})) + \lambda _0 \Vert \theta _0 \Vert _1 + \sum _{j=1}^{\scriptscriptstyle J} \lambda _j \bigvee (\nabla g_j) \end{aligned}$$

(2)

where $\rho _\tau (u) = u (\tau - \mathbb {1}(u < 0))$ is the usual quantile objective function, $\Vert \theta _0 \Vert _1 = \sum _{k=1}^{\scriptscriptstyle K} |\theta _{0k}|$ and $\bigvee (\nabla g_j)$ denotes the total variation of the derivative or gradient of the function g. Recall that for g with absolutely continuous derivative $g'$ we can express the total variation of $g':{\mathcal R} \rightarrow {{\mathcal {R}}}$ as

$$\begin{aligned} \bigvee (g'(z)) = \int |g''(z) |\text {d}z \end{aligned}$$

while for $g:{\mathbb {R}}^2 \rightarrow {\mathbb {R}}$ with absolutely continuous gradient,

$$\begin{aligned} \bigvee (\nabla g) = \int \Vert \nabla ^2g(z)\Vert \text {d}z \end{aligned}$$

where $\nabla ^2g(z)$ denotes the Hessian of g and $\Vert \cdot \Vert $ denotes the Hilbert–Schmidt norm for matrices. In contrast, total variation penalization of the component functions themselves yields piecewise constant solutions.

Adapting the Hotelling tube idea to construct uniform confidence bands for these components is also described in Koenker (2011), as is selection of the smoothing parameters $\lambda _j, j = 0, 1, \cdots J$. It should be stressed that all of this machinery relies on the validity of Gaussian approximations for the fitted parameters and estimated functions and is conditional on selected tuning parameters. This is in accord with a large strand of earlier literature including (Wahba 1983; Nychka 1983), and Krivobokova et al. (2010); however, there are inevitable questions that can be raised about both aspects. To explore this, we consider some recent proposals for strengthening coverage guarantees based on conformal inference in the next section.

5 Conformal quantile regression

Conformal prediction, and conformal inference more generally, has grown out of work by Vladimir Volk and colleagues, see, e.g., Shafer and Vovk (2008) for an overview. It has emerged as an essential tool in uncertainty quantification throughout statistics and machine learning. An essential feature of the conformal inference approach in regression is a sample splitting device that allows one to adjust a confidence band constructed with training data based on its performance on a validation sample. Strong finite sample performance guarantees can be proven based on seemingly rather weak exchangeability assumptions. In regression settings, early work presumed conventional iid error structure when constructing the initial bands from the training data; however, (Romano et al. 2019) noted that in more heterogeneous settings narrower bands could be constructed using quantile regression methods. This approach has been further developed in Lei and Candès (2022). In high-dimensional regression, this typically would involve some form of random forest or neural network model for the initial bands, but the same methods can be used in simpler models like the additive models described above.

Construction of conformal prediction bands for additive quantile regression models can be described briefly as follows:

Note that the conformal adjustment of the initial band can make it wider or narrower. When $Q<0$, it indicates that the validation sample fell well inside the initial band indicating that it is safe to shrink the width of the initial band.

There are several potential difficulties with the foregoing recipe.

Predictions based on the training sample typically are not equipped to extrapolate beyond the empirical support of the training data, so if the validation data, or new data requiring a conformal interval, lie outside that support some accommodation must be made.
Performance guarantees are based on marginal coverage of the band, so it may happen that in certain regions of design space there may be failures of coverage that are compensated by satisfactory coverage elsewhere. As shown by Foygel Barber et al. (2020), conditional coverage is not achievable in any generality.
All of the familiar challenges of penalty methods for regression smoothing persist, so choice of smoothing parameters, in particular, can cause headaches, even though poor $\lambda $ selection can in principle be ameliorated by the conformal adjustment.

We conclude this section by illustrating the use of the conformal method in an artificial data example taken from Romano et al. (2019). Simulated data are generated as,

https://static-content.springer.com/image/art%3A10.1007%2Fs00181-022-02344-z/MediaObjects/181_2022_2344_Equ35_HTML.png

There are 7000 observations plotted in grey. The Poisson contribution to the response produces a banded structure to the scatterplot with pronounced heteroscedasticity. There are a small number of extreme outliers many of which lie outside the frame of the figure; such outliers are harmless since we are estimating conditional quantile functions. Penalizing total variation of $g'$ yields a piecewise linear fit that does not fit the scatter as well as the piecewise constant estimate obtained by penalizing the total variation of g itself. It is striking here that the conformal adjustment in both figures is almost imperceptible. Thus, if interest focuses on prediction intervals for the response, the initial estimates provided by the penalized quantile regression estimates are fine, even though they are based on only half the original sample.

Prediction bands for Y are fine as far as they go, but what if we wanted confidence bands for the conditional quantile functions? Some might argue, e.g., Geiser (1993); Clarke and Clarke (2018), that it is pointless to predict quantities that can never be observed, but I subscribe to the principle: every decent estimate deserves a standard error. Figure 5 illustrates confidence bands for the lower, $\tau = 0.05$ and upper, $\tau = 0.95$ conditional quantile functions as estimated using penalization of $g'$. The dark grey bands are the pointwise bands, while the lighter grey bands are those based on the Hotelling tube approach. Note that the bands for the 0.05 estimate are extremely narrow since the data are very concentrated in this region, so the $\tau = 0.05$ conditional quantile is very precisely estimated.

6 Discussion

The large literature in econometrics about stochastic frontier models is mostly concerned with parametric models of the tail behavior of the response “near the production frontier.” Nonparametric quantile regression offers yet another perspective on estimating such models. It would be extremely foolish to make any claims for alternative methodology described here on the basis of the flimsy evidence offered, let me conclude simply by saying that it might be worthy of further consideration.

Declarations

Conflict of interest

The author has had no funding support or other potential conflict of interest.

Human and animal rights

Nor does the study involve any human or animal participants; nor have any vegetables been harmed in the preparation of this work.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

previous article Proportional incremental cost probability functions and their frontiers

next article Indirect inference estimation of stochastic production frontier models with skew-normal noise

There is of course a large literature on such problems, notably: Davies (1977, 1987); Andrews and Ploberger (1994); Hansen (1996), none of whom mention Hotelling. An exception that justifies the qualified “almost unknown” above is Kim et al. (1998). I do not claim that the Hotelling approach is “best” in any sense, only that it is worthy of further consideration. To this end, software to compute the confidence bands described below is available in the R package quantreg, Koenker (1999) for a general class of total variation penalized, additive, nonparametric quantile regression models.

Hotelling obviously knew all about how to do this, and one doubts that he learned it from Frisch, but this would probably be hard to establish.

Aigner DJ, Lovell CAK, Schmidt P (1977) Formulation and estimation of stochastic Frontier production function models. J Econom 6:21–37CrossRef

Amsler C, Prokhovov A, Schmidt P (2016) Endogeneity in stochastic Frontier models. J Econom 190:280–288CrossRef

Andrews D, Ploberger W (1994) Optimal tests when a nuisance parameter is present only under the alternative. Econometrica 62:1383–1414CrossRef

Clarke BS, Clarke JL (2018) Predictive statistics: analysis and inference beyond models. Cambridge University press, CambridgeCrossRef

Davies R (1977) Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 64:247–254CrossRef

Davies R (1987) Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 74:33–43

Foygel Barber R, Candès EJ, Ramdas A, Tibshirani RJ (2020) The limits of distribution-free conditional predictive inference. Inf Inference A J IMA 10:455–482CrossRef

Frisch R, Waugh FV (1933) Partial time regressions as compared with individual trends. Econometrica 1:387–401CrossRef

Geiser S (1993) Predictive inference: an introduction. Chapman and Hall, LondonCrossRef

Hansen B (1996) Inference when a nuisance parameter is not identified under the null hypothesis. Econometrica 64:413–430CrossRef

Horrace W, Schmidt P (1996) Confidence statements for efficiency estimates from stochastic Frontier models. J Product Anal 7:257–282CrossRef

Hotelling H (1939) Tubes and Spheres in $n$-spaces, and a class of statistical problems. Am J Math 61:440–460CrossRef

Johansen S, Johnstone IM (1990) Hotelling’s theorem on the volume of tubes: some illustrations in simultaneous inference and data analysis. Ann Stat 18:652–684CrossRef

Kendall M (1961) Course in the geometry of n dimensions. Hafner, Ohio

Kim P, Naiman D, Li Q, Stengos T (1998) On Hotelling’s approach to hypothesis testing when a nuisance parameter is present only under the alternative. J Econom Theory Econom 4:105–130

Knowles M (1987) Simultaneous confidence bands for random functions. Ph.D. thesis, Stanford University

Koenker R (1999) Quantreg: an R package for quantile regression methods. https://cran.r-project.org/web/packages/quantreg/index.html

Koenker R (2011) Additive models for quantile regression: model selection and confidence bandaids. Braz J Probab Stat 25:239–262CrossRef

Krivobokova T, Kneib T, Claeskens G (2010) Simultaneous confidence bands for penalized spline estimators. J Am Stat Assoc 105:852–863CrossRef

Lei L, Candès E (2022) Conformal inference of counterfactuals and individual treatment effects. J Royal Stat Assoc (B) Forthcom 34:1660

Naiman D (1986) Conservative confidence bands in curvilinear regression. Ann Stat 14:896–906CrossRef

Nychka D (1983) Bayesian confidence intervals for smoothing splines. J Am Stat Assoc 83:1134–43CrossRef

Romano Y, Patterson E, Candès E J (2019) Conformalized quantile regression. In: 33rd Conference on neural information processing systems Vancouver, Canada

Shafer G, Vovk V (2008) A tutorial on conformal prediction. J Mach Learn Res 9:371–421

Wahba G (1983) Bayesian confidence intervals for the cross-validated smoothing spline. J Royal Stat Assoc (B) 45:133–50

Weyl H (1939) On the volume of tubes. Am J Math 61:461–472CrossRef

Title: Hotelling tubes, confidence bands and conformal inference
Author: Roger Koenker
Publication date: 29-01-2023
Publisher: Springer Berlin Heidelberg
Published in: Empirical Economics / Issue 6/2023
Print ISSN: 0377-7332
Electronic ISSN: 1435-8921
DOI: https://doi.org/10.1007/s00181-022-02344-z

Springer Professional

Hotelling tubes, confidence bands and conformal inference

Abstract

Publisher's Note

1 Introduction

2 Hotelling’s regression problem

3 Uniform confidence bands for nonparametric regression

4 Additive models for total variation penalized nonparametric quantile regression

5 Conformal quantile regression

6 Discussion

Declarations

Conflict of interest

Human and animal rights

Publisher's Note

Premium Partner

	\(\beta _0 = \) 0			\(\beta _0 = \) 1			\(\beta _0 = \)2
	\(\tau = \) −0.5	\(\tau = \) 0	\(\tau = \) 0.5	\(\tau = \) −0.5	\(\tau = \) 0	\(\tau = \) 0.5	\(\tau = \) −0.5	\(\tau = \) 0	\(\tau = \) 0.5
n = 20	0.056	0.058	0.049	0.313	0.193	0.182	0.781	0.459	0.380
n = 50	0.049	0.051	0.057	0.275	0.225	0.342	0.639	0.577	0.782
n = 100	0.063	0.048	0.056	0.350	0.261	0.281	0.840	0.637	0.704
n = 500	0.048	0.052	0.055	0.298	0.243	0.288	0.747	0.612	0.735
n = 1000	0.063	0.046	0.047	0.299	0.218	0.250	0.724	0.549	0.667

Springer Professional

Abstract

Publisher's Note

1 Introduction

2 Hotelling’s regression problem

3 Uniform confidence bands for nonparametric regression

4 Additive models for total variation penalized nonparametric quantile regression

5 Conformal quantile regression

6 Discussion

Declarations

Conflict of interest

Human and animal rights

Publisher's Note

Other articles of this Issue 6/2023

Dynamic panel GMM estimators with improved finite sample properties using parametric restrictions for dimension reduction

Generalized kernel regularized least squares estimator with parametric error covariance

Refined GMM estimators for simultaneous equations models with network interactions

An alternative corrected ordinary least squares estimator for the stochastic frontier model

Multivariate models of commodity futures markets: a dynamic copula approach

Does climate change affect economic data?

Premium Partner