Hypothesis testing in nonparametric models of production using multiple sample splits

Simar, Léopold; Wilson, Paul W.

doi:10.1007/s11123-020-00574-w

Hypothesis testing in nonparametric models of production using multiple sample splits

Published: 31 March 2020

Volume 53, pages 287–303, (2020)
Cite this article

Journal of Productivity Analysis Aims and scope Submit manuscript

Léopold Simar¹ &
Paul W. Wilson²

894 Accesses
39 Citations
Explore all metrics

Abstract

Several tests of model structure developed by Kneip et al. (J Bus Econ Stat 34:435–456, 2016) and Daraio et al. (Econ J 21:170–191, 2018) rely on comparing sample means of two different efficiency estimators, one appropriate under the conditions of the null hypothesis and the other appropriate under the conditions of the alternative hypothesis. These tests rely on central limit theorems developed by Kneip et al. (Econ Theory 31:394–422, 2015) and Daraio et al. (Econ J 21:170–191, 2018), but require that the original sample be split randomly into two independent subsamples. This introduces some ambiguity surrounding the sample-split, which may be determined by choice of a seed for a random number generator. We develop a method that eliminates much of this ambiguity by repeating the random splits a large number of times. We use a bootstrap algorithm to exploit the information from the multiple sample-splits. Our simulation results show that in many cases, eliminating this ambiguity results in tests with better size and power than tests that employ a single sample-split.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Monte Carlo sampling processes and incentive compatible allocations in large economies

Article Open access 23 October 2020

Linear Regression Under Random Double-Truncation

Nonparametric Statistical Analysis of Production

Notes

For two vectors a and b of length n with ith elements a_i and b_i, c = a ∘ b is a vector of length n with ith element c_i = a_ib_i.
For an estimator \(\widehat{\theta }(x,y)\) of θ(x, y) converging at rate n^κ, \(\widehat{\theta }(x,y)-\theta (x,y)={O}_{p}({n}^{-\kappa })\). In other words, the estimation error of \(\widehat{\theta }(x,y)\) is of order in probability n^−κ. In such cases, estimation error becomes less in probabilistic terms as the sample size n increases, but how fast this happens depends on the magnitude of κ.
In addition, the results on consistency, limiting distributions and rates of convergence have been extended to hyperbolic versions of the FDH and VRS-DEA estimators by Wheelock and Wilson (2008) and Wilson (2011), and to directional-distance versions by Simar and Vanhems (2012) and Simar et al. (2012). For each type of estimator—FDH, VRS-DEA or CRS-DEA—the value of κ remains the same across the different orientations.
These results extend trivially to the output-oriented estimators. Wilson (2019) extends the results to the hyperbolic orientation.
Simar and Zelenyuk (2020) propose adding a bias correction to the sample variances in (3.3) and (3.4) to improve performance in small samples. We have not used this idea here, in order to facilitate comparison with previous simulation results appearing in Kneip et al. (2016) and Daraio et al. (2018). Our focus here is on the impact of multiple sample-splits, but in applications one can use the improved variance estimator without increasing computational burden.
A typo appears in Kneip et al. (2016) after Eq. (4.2), which holds for (p + q) ≥ 4 instead of (p + q) > 5. A similar typo follows Eq. (52), which holds for (p + q) ≥ 2 instead of (p + q) > 3.
In the test of convexity versus non-convexity, one applies the FDH estimator to the observations in \({{\mathcal{X}}}_{1,{n}_{1}}\) and the VRS-DEA estimator to the observations in \({{\mathcal{X}}}_{2,{n}_{2}}\). In the test of separability, one applies the conditional VRS-DEA (or conditional FDH) estimator to observations in \({{\mathcal{X}}}_{1,{n}_{1}}\) and the unconditional VRS-DEA (or unconditional FDH) estimator to the observations in \({{\mathcal{X}}}_{2,{n}_{2}}\). In the test of CRS to VRS outlined here, both estimators used to construct the test statistic have the same rate of convergence, and so κ = 2/(p + q). In the tests of convexity and separability, the two estimators used for each test have different rates of convergence under the null, and so the variances in the denominators of the statistics for these tests are divided by different powers of n, reflecting the different convergence rates of the estimators. See Kneip et al. (2016) and Daraio et al. (2018) for details.
Daraio et al. (2018) report results from experiments with n = 1000 in addition to n = 100 and 200. However, with 10 sample splits, and 1000 bootstrap replications, the computational burden for each experiment here is 10,010 times that of the experiments in Daraio et al. (2018). Moreover, with the separability test, a bandwidth parameter must be selected by cross-validation, which requires time of order O(n²), and this must be done 10,010 times. Consequently, we consider only n = 100, 200 for the separability test.
Note that while the statistic in (3.8) is a Kolmogorov–Smirnov statistic, the usual tables cannot be used to assess significance due to the dependence problem here.
Computation times for the convexity test are faster than those given here due to the fact that the FDH estimator involves lower computational burden than the VRS-DEA estimator. On the other hand, times for the separability test are slower than those for the RTS test due to the necessity of cross-validation to optimize bandwidths used by the conditional efficiency estimators.

References

Chambers RG, Chung Y, Färe R (1998) Profit, directional distance functions, and Nerlovian efficiency. J Optim Theory Appl 98:351–364
Article Google Scholar
Charnes A, Cooper WW, Rhodes E (1978) Measuring the efficiency of decision making units. Eur J Oper Res 2:429–444
Article Google Scholar
Daouia A, Simar L, Wilson PW (2017) Measuring firm performance using nonparametric quantile-type distances. Econ Rev 36:156–181
Article Google Scholar
Daraio C, Simar L (2007) Conditional nonparametric frontier models for convex and nonconvex technologies: a unifying approach. J Prod Anal 28:13–32
Article Google Scholar
Daraio C, Simar L, Wilson PW (2018) Central limit theorems for conditional efficiency measures and tests of the ‘separability condition’ in non-parametric, two-stage models of production. Econ J 21:170–191
Google Scholar
Deprins D, Simar L, Tulkens H (1984) Measuring labor inefficiency in post offices. In: Pestieau MMP, Tulkens H (eds.) The performance of public enterprises: concepts and measurements, North-Holland, Amsterdam, pp 243–267
Färe R, Grosskopf S. New directions: efficiency and productivity. Springer Science & Business Media, New York
Färe R, Grosskopf S, Lovell CAK (1985) The measurement of efficiency of production. Kluwer-Nijhoff Publishing, Boston
Book Google Scholar
Färe R, Grosskopf S, Margaritis D (2008) Productivity and efficiency: malmquist and more. In: Fried H, Lovell CAK, Schmidt S (eds.) The measurement of productive efficiency, chap. 5, 2nd edn. Oxford University Press, Oxford, pp 522–621
Farrell MJ (1957) The measurement of productive efficiency. J R Stat Soc A 120:253–281
Article Google Scholar
Hoel PG, Port SC, Stone CJ (1971) Introduction to statistical theory. Houghton Mifflin Company, Boston
Google Scholar
Jeong SO, Park BU, Simar L (2010) Nonparametric conditional efficiency measures: asymptotic properties. Annals Oper Res 173:105–122
Article Google Scholar
Kneip A, Park B, Simar L (1998) A note on the convergence of nonparametric DEA efficiency measures. Econ Theory 14:783–793
Article Google Scholar
Kneip A, Simar L, Wilson PW (2008) Asymptotics and consistent bootstraps for DEA estimators in non-parametric frontier models. Econ Theory 24:1663–1697
Article Google Scholar
Kneip A, Simar L, Wilson PW (2015) When bias kills the variance: central limit theorems for DEA and FDH efficiency scores. Econ Theory 31:394–422
Article Google Scholar
Kneip A, Simar L, Wilson PW (2016) Testing hypotheses in nonparametric models of production. J Bus Econ Stat 34:435–456
Article Google Scholar
Kneip A, Simar L, Wilson PW (2018) Inference in dynamic, nonparametric models of production: central limit theorems for Malmquist indices. Discussion paper #2018/10. Institut de Statistique, Biostatistique et Sciences Actuarielles, Université Catholique de Louvain, Louvain-la-Neuve, Belgium
Kneip A, Simar L, Wilson PW (2020) Infernece in dynamic, nonparametric models of production with constant returns to scale and non-convex production sets. In progress.
Mammen E (1992) When does bootstrap work? Asymptotic results and simulations. Springer-Verlag, Berlin
Book Google Scholar
Park BU, Jeong S-O, Simar L (2010) Asymptotic distribution of conical-hull estimators of directional edges. Annals Stat 38:1320–1340
Article Google Scholar
Park BU, Simar L, Weiner C (2000) FDH efficiency scores from a stochastic point of view. Econ Theory 16:855–877
Article Google Scholar
Simar L, Vanhems A (2012) Probabilistic characterization of directional distances and their robust versions. J Econ 166:342–354
Article Google Scholar
Simar L, Vanhems A, Wilson PW (2012) Statistical inference for DEA estimators of directional distances. Eur J Oper Res 220:853–864
Article Google Scholar
Simar L, Wilson PW (1998) Sensitivity analysis of efficiency scores: How to bootstrap in nonparametric frontier models. Manag Sci 44:49–61
Article Google Scholar
Simar L, Wilson PW (1999a) Some problems with the Ferrier/Hirschberg bootstrap idea. J Prod Anal 11:67–80
Article Google Scholar
Simar L, Wilson PW (1999b) Of course we can bootstrap DEA scores! But does it mean anything? Logic trumps wishful thinking. J Prod Anal 11:93–97
Article Google Scholar
Simar L, Wilson PW (2007) Estimation and inference in two-stage, semi-parametric models of productive efficiency. J Econ 136:31–64
Article Google Scholar
Simar L, Wilson PW (2011) Two-Stage DEA: caveat emptor. J Prod Anal 36:205–218
Article Google Scholar
Simar L, Wilson PW (2013) Estimation and inference in nonparametric frontier models: recent developments and perspectives. In: Foundations and trends® in econometrics. 5:183–337
Simar L, Wilson PW (2015) Statistical approaches for non-parametric frontier models: a guided tour. Int Stat Rev 83:77–110
Article Google Scholar
Simar L, Zelenyuk V (2020) Improving finite sample approximation by central limit theorems for estimates from Data Envelopment Analysis. Eur J Oper Res 284:1002–1015
Article Google Scholar
Thain D, Tannenbaum T, Livny M (2005) Distributed computing in practice: the Condor experience. Concurr Comput Pract Exp 17:323–356
Article Google Scholar
Wheelock DC, Wilson PW (2008) Non-parametric, unconditional quantile estimation for efficiency analysis with an application to Federal Reserve check processing operations. J Econ 145:209–225
Article Google Scholar
Wilson PW (2008) FEAR: a software package for frontier efficiency analysis with R. Socio-Econ Plan Sci 42:247–254
Article Google Scholar
Wilson PW (2011) Asymptotic properties of some non-parametric hyperbolic efficiency estimators. In: Van Keilegom I, Wilson PW (eds.) Exploring research frontiers in contemporary statistics and econometrics. Springer-Verlag, Berlin, p 115–150
Wilson PW (2018) Dimension reduction in nonparametric models of production. Eur J Oper Res 267:349–367
Article Google Scholar
Wilson PW (2020) U.S. banking in the post-crisis era: New results from new methods. In: Parmeter, C. & Sickles, R. (eds.) Methodological contributions to the advancement of productivity and efficiency analysis. Springer International Publishing AG, Cham, Switzerland (In press)

Download references

Acknowledgements

We are grateful to the Cyber Infrastructure Technology Integration group at Clemson University for operating the Palmetto Cluster used for simulations in this paper.

Author information

Authors and Affiliations

Institut de Statistique, Biostatistique, et Sciences Actuarielles, Université Catholique de Louvain-la-Neuve, Louvain-la-Neuve, Belgium
Léopold Simar
Department of Economics and School of Computing, Division of Computer Science, Clemson University, Clemson, SC, USA
Paul W. Wilson

Authors

Léopold Simar
View author publications
You can also search for this author in PubMed Google Scholar
Paul W. Wilson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul W. Wilson.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Simar, L., Wilson, P.W. Hypothesis testing in nonparametric models of production using multiple sample splits. J Prod Anal 53, 287–303 (2020). https://doi.org/10.1007/s11123-020-00574-w

Download citation

Published: 31 March 2020
Issue Date: June 2020
DOI: https://doi.org/10.1007/s11123-020-00574-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hypothesis testing in nonparametric models of production using multiple sample splits

Abstract

Access this article

Similar content being viewed by others

Monte Carlo sampling processes and incentive compatible allocations in large economies

Linear Regression Under Random Double-Truncation

Nonparametric Statistical Analysis of Production

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hypothesis testing in nonparametric models of production using multiple sample splits

Abstract

Access this article

Similar content being viewed by others

Monte Carlo sampling processes and incentive compatible allocations in large economies

Linear Regression Under Random Double-Truncation

Nonparametric Statistical Analysis of Production

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation