Skip to main content

Über dieses Buch

Published in honor of the sixty-fifth birthday of Professor Ingram Olkin of Stanford University. Part I contains a brief biography of Professor Olkin and an interview with him discussing his career and his research interests. Part II contains 32 technical papers written in Professor Olkin's honor by his collaborators, colleagues, and Ph.D. students. These original papers cover a wealth of topics in mathematical and applied statistics, including probability inequalities and characterizations, multivariate analysis and association, linear and nonlinear models, ranking and selection, experimental design, and approaches to statistical inference. The volume reflects the wide range of Professor Olkin's interests in and contributions to research in statistics, and provides an overview of new developments in these areas of research.



An Appreciation


A Brief Biography and Appreciation of Ingram Olkin

Ingram Olkin, known affectionately to his friends in his youth as “Red,” was born July 23, 1924 in Waterbury, Connecticut. He was the only child of Julius and Karola (Bander) Olkin. His family moved from Waterbury to New York City in 1934. Ingram graduated from the Bronx’s DeWitt Clinton High School in 1941, and began studying statistics in the Mathematics Department at the City College of New York. After serving as a meteorologist in the Air Force during World War II (1943–1946), achieving the rank of First Lieutenant, Ingram resumed his studies at City College. He received his B.S. in mathematics in 1947.

Leon Jay Gleser, Michael D. Perlman, S. James Press, Allan R. Sampson

A Conversation with Ingram Olkin

Early in 1986, a new journal Statistical Science of the Institute of Mathematical Statistics appeared. This is a journal Ingram Olkin was intimately involved in founding. One of the most popular features of Statistical Science is its interviews with distinguished statisticians and probabilists. In the spirit of those interviews, the Editors of this volume wanted to include an interview with Ingram. However, one does not “interview” Ingram; one simply starts him talking, and sits back to listen and enjoy.

Leon Jay Gleser, Michael D. Perlman, S. James Press, Allan R. Sampson

Bibliography of Ingram Olkin

Leon Jay Gleser, Michael D. Perlman, S. James Press, Allan R. Sampson

Contributions to Probability and Statistics


Probability Inequalities and Characterizations

1. A Convolution Inequality

We establish an elementary convolution inequality which appears to be novel although it extends and complements a famous old result of W.H. Young. In the course of the proof we are led to a simple interpolation result which has applications in measure theory.

Gavin Brown, Larry Shepp

2. Peakedness of Weighted Averages of Jointly Distributed Random Variables

This note extends the Proschan (1965) result on peakedness comparison for convex combinations of i.i.d. random variables from a PF2 density. Now the underlying random variables are jointly distributed from a Schur-concave density. The result permits a more refined description of convergence in the Law of Large Numbers.

Wai Chan, Dong Ho Park, Frank Proschan

3. Multivariate Majorization

The concept of univariate majorization plays a central role in the study of Lorenz dominance for income distribution comparisions in economics. The first part of this paper reviews different conditions which are equivalent to Lorenz dominance. The second part of the present paper poses the question whether such equivalences extend to the multivariate case. Some concepts of multivariate majorization are presented along with a few new results. For economic applications, the notion of a concave utility function on vector observations appears to play a crucial role in multivariate majorization. It is shown that such concavity follows from some easily understandable axioms.

Somesh Das Gupta, Subir Kumar Bhandari

4. Some Results on Convolutions and a Statistical Application

Classes of distributions, of both discrete and continuous type, are introduced for which the right tail of the distribution is nonincreasing. It is shown that these classes are closed under convolution, thus providing sufficient conditions for nonincreasing right tails to be preserved under convolution. A start is made on verifying a conjecture concerning the extension to the left of nondecreasing right tails under successive convolution. The results give properties of the distributions of random walks on the integers. A statistical application is the verification of a conjecture of Sobel and Huyett (1957) concerning the minimal probability of correct selection for the usual indifference zone procedure for selecting the Bernoulli population with the largest success probability.

M. L. Eaton, L. J. Gleser

5. The X + Y, X/Y Characterization of the Gamma Distribution

We prove, by elementary methods, that if X and Y are in-dependent random variables, not constant, such that X + Y is independent of X/Y then either X,Y or —X, —Y have gamma distributions with common scale parameter. This extends the result of Lukacs, who proved it for positive random variables, using differential equations for the characteristic functions. The aim here is to use more elementary methods for the X,Y positive case as well as elementary methods for proving that the restriction to positive X,Y may be removed.

George Marsaglia

6. A Bivariate Uniform Distribution

The univariate distribution uniform on the unit interval [0,1] is important primarily because of the following characterization: Let X be a random variable taking values in [0,1]. Then the distribution of X + U (mod 1) is the same as the distribution of X for all nonnegative random variables U independent of X if and only if X has a distribution uniform on [0,1].A natural bivariate version of this is the following: Let (X,Y) be a random vector taking values in the unit square. Then (*) (X + U (mod 1), Y + V (mod 1)) has the same distribution as (X,Y) for every pair (U,V) of nonnegative random variables independent of (X,Y) if and only if X and Y are independent and uniformly distributed on [0,1]. But if (*) is required to hold only when U = V with probability one, then (X,Y) can have any one of a large class of bivariate uniform distributions which are given an explicit representation and studied in this paper.

Albert W. Marshall

7. Multinomial Problems in Geometric Probability with Dirichlet Analysis

A variety of new combinatorial results are obtained using the recently developed technique of Dirichlet Analysis, which utilizes the study of Dirichlet integrals. These results are stated as seventeen problems which are geometrical or combinatorial in nature.

Milton Sobel

8. Probability Inequalities for n-Dimensional Rectangles via Multivariate Majorization

Inequalities for the probability content $$ P\left[ { \cap _{j\, = \,1}^n\left\{ {\,{a_{1j\,}}\, \leqslant \,{X_j}\, \leqslant \,{a_{2j}}} \right\}} \right] $$ are obtained, via concepts of multivariate majorization (which involves the diversity of elements of the 2xn matrix A = (aij)). A special case of the general result is that $$ P\left[ { \cap _{j = 1}^n\left\{ {{a_{1j}} \leqslant {X_j} \leqslant {a_{2j}}} \right\}} \right] \leqslant P\left[ { \cap _{j = 1}^n\left\{ {{{\bar a}_1} \leqslant {X_j} \leqslant {{\bar a}_2}} \right\}} \right] $$ for $$ {\bar a_{i\,}} = \,\frac{1}{n}\,\sum\nolimits_{j = 1}^n {{a_{ij}}\,\left( {i\, = \,1,\,2} \right).} $$. The main theorems apply in most important cases, including the exchangeable normal, t, chi-square and gamma, F, beta, and Dirichlet distributions. The proofs of the inequalities involve a convex combination of an n-dimensional rectangle and its permutation sets.

Y. L. Tong

9. Minimum Majorization Decomposition

A famous theorem of Birkhoff says that any doubly stochastic matrix D can be decomposed into a convex combination of permutation matrices R. The various decompositions correspond to probability distributions on the set of permutations that satisfy the linear constraints E[R] = D. This paper illustrates how to decompose D so that the resulting probability distribution is minimal in the sense that it does not majorize any other distribution satisfying these constraints.Any distribution maximizing a strictly Schur concave function g under these linear constraints will be minimal in the above sense (Joe (1987)). In particular, for D in the relative interior of the convex hull of the permutation matrices, the probability functions p that maximize $$g\left( p \right)\, = \, - \,\sum\nolimits_\pi {p\left( \pi \right)} \,$$ log p(π), subject to E[R] = D, form an exponential family £ with sufficient statistic R.This paper provides a theorem that characterizes the exponential family £ by a property called quasi-independence. Quasi-independence is defined in terms of the invariance of the product measure over Latin sets. The characterization suggests an algorithm for an explicit minimal decomposition of a doubly stochastic matrix.

Joseph S. Verducci

Multivariate Analysis and Association

10. The Asymptotic Distribution of Characteristic Roots and Vectors in Multivariate Components of Variance

The asymptotic distribution of the characteristic roots and vectors of one Wishart matrix in the metric of another as the two degrees of freedom increase in fixed proportion is obtained. In the balanced one-way multivariate analysis of variance these two matrices are the sample effect and error covariance matrices, and the numbers of degrees of freedom are (approximately) proportional to the number of classes. The maximum likelihood estimate of the effect covariance matrix of a given rank depends on the characteristic roots and vectors.

T. W. Anderson

11. Univariate and Multivariate Analyses of Variance for Incomplete Factorial Experiments Having Monotone Structure in Treatments

In this paper, we give the analysis of variance for an incomplete factorial experiment with three fixed factors with a monotone structure (in the treatments) when the design of the experiment is completely randomized. We consider both the univariate and the multivariate case. For the multivariate case, the analysis is given for the general situation when the data vectors are incomplete and have a monotone sample pattern. (See Bhargava 1962, 1975.) The monotone sample case includes as a special case the complete sample case in which no observations are missing in any vector. A univariate example is given.

R. P. Bhargava

12. The Limiting Distribution of the Rank Correlation Coefficient

A new correlation coefficient, $${R_g}$$, based on ranks and greatest deviation was defined in Gideon and Hollister (1987). In there the exact distributions were obtained by enumeration for small sample sizes, and by computer simulations for larger sample sizes. In this note, it is shown that the asymptotic distribution of n1/2$${R_g}$$ is N(0,1) when the variables are independent and n is the sample size. This limit is derived by restating the definition of $${R_g}$$ in terms of a rank measure and then using a limit theorem on set-indexed empirical processes which appears in Pyke (1985). The limiting distribution can be compared to the critical values for large samples given in Figure 2 of Gideon and Hollister (1987). Methods for deriving the limiting distribution under fixed and contiguous alternatives are also described.

Rudy A. Gideon, Michael J. Prentice, Ronald Pyke

13. Mean and Variance of Sample Size in Multivariate Heteroscedastic Method

For statistical inference on several mean vectors when population covariance matrices are different, the heteroscedastic method is employed to overcome difficulties under a two-stage sampling scheme. Total sample size for each sample is thus a random variable. Both exact and approximate upper and lower bounds for the mean and variance of the sample size are given. Tables are computed for some special cases of these bounds in order to have some information on their numerical behavior.

Hiroto Hyakutake, Minoru Siotani

14. A Comparative Study of Cluster Analysis and MANCOVA in the Analysis of Mathematics Achievement Data

This article is based upon the author's Ph.D. dissertation (Lockley, 1970), which was one of the first attempts to use cluster analytic techniques in the analysis of educational data. It is published here to illustrate an alternative approach to the analyses of covariance of student achievement utilizing large numbers of related social, demographic, environmental and educational covariates which are widely applied in educational research. In this alternative approach, covariates relating to school environment and teaching method are used to identify clusters of school- community-teaching environments. Such environments are then compared for their effects on student achievement using analysis of variance (with adjustments only for covariates that reflect students’ individual prior abilities and achievements). If successful, such an approach both increases statistical power and provides insight into the effects of school environments on student achievement.The data used in this article comes from junior high schools that participated in a pioneering research project conducted at Stanford University in the late 1960’s by the National Longitudinal Study of Mathematical Abilities (NLSMA). Evidence in the data supports the existence of 3 or 4 school-environment clusters, and suggests that such clusters (or the schools themselves) do make a difference in the mathematics achievement of junior high school students.

J. E. Lockley

15. Bayesian Inference in Factor Analysis

We propose a new method for analyzing factor analysis models using a Bayesian approach. Normal theory is used for the sampling distribution, and we adopt a model with a full disturbance covariance matrix. Using vague and natural conjugate priors for the parameters, we find that the marginal posterior distribution of the factor scores is approximately a matrix T-distribution, in large samples. This explicit result permits simple interval estimation and hypothesis testing of the factor scores. Explicit point estimators of the factor score elements, in large samples, are obtained as means of the respective marginal posterior distributions. Factor loadings are estimated as joint modes (with the factor scores), or alternatively as means or modes of the distribution of the factor loadings conditional upon the estimated factor scores. Disturbance variances and covariances are estimated conditional upon the estimated factor scores and factor loadings.

S. James Press, K. Shigemasu

16. Computational Aspects of Association for Bivariate Discrete Distributions

For bivariate discrete probability distributions, P, on the M x N lattice, various aspects for checking association (Esary, Proschan and Walkup (1967)) are considered. A new algorithm is given for verifying whether or not P is associated. The efficiency of this algorithm is obtained and compared to the efficiency of a simple algorithm based on the definition of association. When M = N = 5, for example, the new algorithm requires less than 3% of the computations required for the simple algorithm. In obtaining these results a new set function Q is constructed from P, on all upper sets in the lattice. In order to construct the algorithm to check association, we define a computationally important set of extreme points and consider related combinatorics.

Allan R. Sampson, Lyn R. Whitaker

Linear and Nonlinear Models, Ranking and Selection, Design

17. A Comparison of the Performances of Procedures for Selecting the Normal Population Having the Largest Mean when the Variances are Known and Equal

We study the performance characteristics of procedures for selecting the normal population which has the largest mean when the variances are known and equal. The procedures studied are the single-stage procedure of Bechhofer, the closed two-stage procedure of Tamhane and Bechhofer, the open sequential procedure of Bechhofer, Kiefer, and Sobel and a truncated version of that procedure by Bechhofer and Goldsman, the closed multi-stage procedure with elimination of Paulson and improved closed versions of that procedure by Fabian and by Hartmann. The performance characteristics studied are the achieved probability of a correct selection, the expected number of stages required to terminate experimentation, and the expected total number of observations required to terminate experimentation. Except for the single-stage procedure, all performance characteristics are estimated by Monte Carlo sampling. Based on these results, recommendations are made concerning which procedure to use in different circumstances.

Robert E. Bechhofer, David M. Goldsman

18. Parametric Empirical Bayes Rules for Selecting the Most Probable Multinomial Event

Consider a multinomial population with k(≥ 2) cells and the associated probability vector $$\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle\thicksim}$}}{p} \, = \,\left( {{p_1},...,{p_k}} \right)$$. Let $${p_{\left[ k \right]}}\, = \,\mathop {\max }\limits_{1 \leqslant i \leqslant k} \,{p_i}$$. A cell associated with $${p_{\left[ k \right]}}$$ is called the most probable event. We are interested in selecting the most probable event. Let i denote the index of the selected cell. Under the loss function $$L\left( {\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle\thicksim}$}}{p} ,i} \right) = {p_{\left[ k \right]}}\, - \,{p_i}$$, this statistical selection problem is studied via a parametric empirical Bayes approach. Two empirical Bayes selection rules are proposed. They are shown to be asymptotically optimal at least of order 0(exp$$( - {c_{i\,}}n)$$) for some positive constants ci. i = 1,2, where n is the number of accumulated past experiences observations) at hand. Finally, for the problem of selecting the least probable event associated with p[1] under the loss $${p_i}$$ − p[i], two empirical Bayes selection rules are also proposed. The corresponding rates of convergence are found to be at least of order 0(exp$$( - {c_{i\,}}n)$$) for some positive constants ci, i = 3, 4.

Shanti S. Gupta, TaChen Liang

19. Bayesian Estimation in Two-Way Tables With Heterogeneous Variances

Consider a two-way table with one observation per cell and heterogeneous variances across the columns. We assume there is available proper prior knowledge about these variances and obtain the joint posterior distribution for the variances, where, however, the normalizing constant has to be evaluated numerically. We use this joint posterior distribution to examine the precision (inverse of the variance) in a given column as a fraction of the sum of all column precisions. An example is discussed.

Irwin Guttman, Ulrich Menzefricke

20. Calibrating For Differences

Suppose that an approximate linear model, or nonparametric regression, relates instrument readings y to standards x. A method is derived for constructing interval estimates of displacements x1 − x2 between standards based on corresponding instrument readings y1,y2, and the results of a calibration experiment.

George Knafl, Jerome Sacks, Cliff Spiegelman

21. Complete Class Results For Linear Regression Designs Over The Multi-Dimensional Cube

Complete classes of designs and of moment matrices for linear regression over the multi-dimensional unit cube 3-represented. An essentially complete class of designs comprises the uniform distributions on the vertices with a fixed number of entries being equal to unity, and mixtures of neighboring such designs. The corresponding class of moment matrices is minimally complete. The derivation is built on information increasing orderings, that is, a superposition of the majorization ordering generated by the permutation groups, and the Loewner ordering of symmetric matches.

Friedrich Pukelsheim

22. A Unified Method of Estimation in Linear Models with Mixed Effects

A unified approach is developed for the estimation of unknown fixed parameters and prediction of random effects in a mixed Gauss- Markov linear model. It is shown that both the estimators and their mean square errors can be expressed in terms of the elements of a g-inverse of a partitioned matrix which can be set up in terms of the matrices used in expressing the model. No assumptions are made on the ranks of the matrices involved. The method is parallel to the one developed by the author in the case of the fixed effects Gauss-Markov model using a g-inverse of a partitioned matrix (Rao, 1971, 1972, 1973, 1985).A new concept of generalized normal equations is introduced for the simultaneous estimation of fixed parameters, random effects, and random error. All the results are deduced from a general lemma on an optimization problem. This paper is self-contained as all the algebraic results used are stated and proved. The unified theory developed in an earlier paper (Rao, 1988) is somewhat simplified.

C. Radhakrishna Rao

23. Shrinking Techniques for Robust Regression

The asymptotic normality of robust estimators suggests that shrinking techniques previously considered for least squares regression are appropriate in robust regression as well. Moreover, the noisy nature of the data frequently encountered in robust regression problems makes the use of shrinking estimators particularly advantageous. Asymptotic and finite sample results and a short simulation demonstrate that shrinking techniques can indeed improve a robust estimator’s performance.

Richard L. Schmoyer, Steven F. Arnold

24. Asymptotic Mean Squared Error of Shrinkage Estimators

When hyperparameters are estimated in a Bayesian model which produces shrinkage estimators of group means, it is well known that the mean square errors of the estimated means are underestimated if hyperparameter estimates are simply substituted for hyperparameter values in mean square formulas. In this article, a method for approximating the mean square error is described for the case where the estimators of the hyperparameters are obtained by maximum likelihood, and hence are asymptotically normal. Under this approach, the Bayesian model is interpreted as a random-effects model. The method is useful in situations such as small area estimation under stratified sampling (with a large number of strata).

T. W. F. Stroud

Approaches to Inference

25. Likelihood Analysis of a Binomial Sample Size Problem

The problem of estimating the binomial sample size N from k observed numbers of successes is examined from a likelihood point of view. The direct use of the likelihood function for inference about N is illustrated when p is known, and the problem of inference is considered when p is unknown, and has to be eliminated in some way from the likelihood. Different methods (Bayesian, integrated likelihood, conditional likelihood, profile likelihood) for eliminating the nuisance parameter are found to lead to very different likelihoods in N in an example. This occurs because of a strong ridge in the two-parameter likelihood in N and p. Integrating out the parameter p is found to be unsatisfactory, but reparameterization of the model shows that the inference about N is almost unaffected by the new nuisance parameter. The resulting likelihood in N corresponds closely to the profile likelihood in the original parameterization.

Murray Aitkin, Mikis Stasinopoulos

26. Truncation, Information, and the Coefficient of Variation

The Fisher information in a random sample from the truncated version of a distribution that belongs to an exponential family is compared with the Fisher information in a random sample from the un- truncated distribution. Conditions under which there is more information in the selection sample are given. Examples involving the normal and gamma distributions with various selection sets, and the zero-truncated binomial, Poisson, and negative binomial distributions are discussed. A property pertaining to the coefficient of variation of certain discrete distributions on the non-negative integers is introduced and shown to be satisfied by all binomial, Poisson, and negative binomial distributions.

M. J. Bayarri, M. H. DeGroot, P. K. Goel

27. Asymptotic Error Bounds for Power Approximations to Multinomial Tests of Fit

The Cressie-Read (1984) class of goodness-of-fit tests is considered. Asymptotic error bounds are derived for two new non-local approximations, the classical noncentral X2 approximation, a moment-corrected version of it and normal approximations to the power of these tests.

F. C. Drost, W. C. M. Kallenberg, D. S. Moore, J. Oosterhoff

28. Estimating the Normal Mean and Variance Under A Publication Selection Model

Maximum likelihood estimators of the mean and variance of a normal distribution are obtained under a publication selection model in which data are reported only when the hypothesis that the mean is 0 is rejected. An approximation to the asymptotic variance-covariance matrix for these estimators is given. Also discussed are the marginal distributions of the sample mean and variance under the selection model.

Larry V. Hedges

29. Estimating Poisson Error Rates When Debugging Software

Five estimators for the vector of mistake rates of errors discovered in debugging software are proposed and compared for a model in which an unknown number of errors yield numbers of mistakes having independend Poisson distributions.

Gerald J. Lieberman, Sheldon M. Ross

30. A Comparison of Likelihood Ratio, Wald, and Rao Tests

Three commonly considered methods for forming approximate (large-sample) tests of a simple null hypothesis are: (a) the likelihood ratio test, (b) the Wald “linearization” test, and (c) a quadratic scores test due to Rao. In the context of testing that the variance of a normal distribution is equal to one, it is possible to make detailed finite-sample power comparisons, both “local” and “nonlocal,” of these tests. In contrast to at least one assertion made in the literature (Chandra and Joshi, 1983), none of the three tests dominates another, even locally.

Albert Madansky

31. On the Inadmissibility of the Modified Step-Down Test Based on Fisher’s Method For Combining Independentp-Values

Marden and Perlman (1988) have shown that the classical step-down procedure for the Hotelling T2 testing problem is inadmissible in most cases. Mudholkar and Subbaiah (1980) proposed a modified step- down procedure wherein the p-values associated with the sequence of stepwise F tests are combined according to Fisher’s combination method. In the present paper it is shown that the modified step-down procedure is inadmissible if at least one step is of dimension one.

John I. Marden, Michael D. Perlman

32. On A Statistical Problem Involving the Measurement of Strength and Duration-of-Load for Construction Materials

This paper presents a new analysis, applicable to construction materials, for estimating the probabilistic behavior of the duration-of-load, say T when the load is of magnitude ℓ, based on the imposed stress ratio of load to characteristic material strength. The stochastic behavior of the logarithm of the duration-of-load, given the strength S exceeds ℓ, is of axiomatic importance. We assume the conditional distribution, presuming strength S were known, to be of the form $$\left[ {\ln {T_\ell }|S = s} \right] - H\left( {s/\ell } \right) \sim {\sigma _0}Z,$$ where the r.v. Z is a standard variate appropriately chosen for the material, and the regression H is a monotone increasing function of the form $$ H\left( x \right) = {{\alpha }_{0}} + {{\alpha }_{1}}x + {{\alpha }_{{ - 1}}}{{x}^{{ - 1}}} $$ for x < 0.

Sam C. Saunders

25. Likelihood Analysis of a Binomial Sample Size Problem

The problem of estimating the binomial sample size N from k observed numbers of successes is examined from a likelihood point of view. The direct use of the likelihood function for inference about N is illustrated when p is known, and the problem of inference is considered when p is unknown, and has to be eliminated in some way from the likelihood. Different methods (Bayesian, integrated likelihood, conditional likelihood, profile likelihood) for eliminating the nuisance parameter are found to lead to very different likelihoods in N in an example. This occurs because of a strong ridge in the two-parameter likelihood in N and p.Integrating out the parameter p is found to be unsatisfactory, but reparameterization of the model shows that the inference about N is almost unaffected by the new nuisance parameter. The resulting likelihood in N corresponds closely to the profile likelihood in the original parameterization.

Murray Aitkin, Mikis Stasinopoulos


Weitere Informationen