Skip to main content

Über dieses Buch

This volume is a collection of papers presented at a conference held in Shoresh Holiday Resort near Jerusalem, Israel, in December 2000 organized by the Israeli Ministry of Science, Culture and Sport. The theme of the conference was "Foundation of Statistical Inference: Applications in the Medical and Social Sciences and in Industry and the Interface of Computer Sciences". The following is a quotation from the Program and Abstract booklet of the conference. "Over the past several decades, the field of statistics has seen tremendous growth and development in theory and methodology. At the same time, the advent of computers has facilitated the use of modern statistics in all branches of science, making statistics even more interdisciplinary than in the past; statistics, thus, has become strongly rooted in all empirical research in the medical, social, and engineering sciences. The abundance of computer programs and the variety of methods available to users brought to light the critical issues of choosing models and, given a data set, the methods most suitable for its analysis. Mathematical statisticians have devoted a great deal of effort to studying the appropriateness of models for various types of data, and defining the conditions under which a particular method work. " In 1985 an international conference with a similar title* was held in Is­ rael. It provided a platform for a formal debate between the two main schools of thought in Statistics, the Bayesian, and the Frequentists.



Identification with Incomplete Observations, Data Mining


Bounding Entries in Multi-way Contingency Tables Given a Set of Marginal Totals

We describe new results for sharp upper and lower bounds on the entries in multi-way tables of counts based on a set of released and possibly overlapping marginal tables. In particular, we present a generalized version of the shuttle algorithm proposed by Buzzigoli and Giusti that computes sharp integer bounds for an arbitrary set of fixed marginals. We also present two examples which illustrate the practical import of the bounds for assessing disclosure risk.
Adrian Dobra, Stephen E. Fienberg

Identification and Estimation with Incomplete Data

This paper is concerned with identification and estimation of econometric models when the sampling process produces missing observations. Missing observations occur frequently in applications due, for example, to non-response to questions on a survey or attrition from a panel. Missing observations usually cause population parameters of interest in applications to be unidentified except under untestable and often controversial assumptions. However, it is often possible to find identified, informative, bounds on these parameters that do not rely on untestable assumptions about the process through which data become missing. The bounds contain all logically possible values of the population parameters. Moreover, every parameter value within the bounds is consistent with some model of the process that generates missing observations. The bounds can be estimated consistently from data and often enable substantively important conclusions to be drawn without making untestable assumptions about missing observations. There are also situations in which the bounds are very wide. This is an indication that the data contain little information about the population parameters of interest and that substantive conclusions rely mainly on identifying assumptions that cannot be tested.
Joel L. Horowitz, Charles F. Manski

Computational Information Retrieval

The main goal of this note is to introduce the notion of collection dependent “same context words”. Two (or more) words are the “same context words” if they occur in the same (or similar) context across a given text collection. Each word w in the collection is associated with a profile P(w). The profile P(w) is the set of words occurring in sentences that contain w. We introduce a distance function in the set profiles, and use it to cluster words. Words contained in the same cluster are “same context words”. We select “same context words” for several text collections, and briefly discuss further possible applications of the introduced concepts to a number of information retrieval related problems.
Jacob Kogan

Studying Treatment Response to Inform Treatment Choice

An important practical objective of empirical studies of treatment response is to provide decision makers with information useful in choosing treatments. Often the decision maker is a planner who must choose treatments for the members of a heterogeneous population; for example, a physician may choose medical treatments for a population of patients. Studies of treatment response cannot provide all the information that planners would like to have as they choose treatments, but researchers can be of service by addressing several questions: How should studies be designed in order to be most informative? How should studies report their findings so as to be most useful in decision making? How should planners utilize the information that studies provide? This paper addresses aspects of these broad questions, focusing on pervasive problems of identification that arise when studying treatment response and making treatment choices.
Charles F. Manski

Bayesian Methods and Modelling


Some Interactive Decision Problems Emerging in Statistical Games

We consider games which arise when two statisticians must make a decision simultaneously, and the loss function depends on both decisions. We are interested, in particular, in situations when information is detrimental, in a sense to be made precise. We show that in certain problems related to Bayesian testing and prediction the phenomenon of information rejection occurs for certain values of the parameters involved.
Bruno Bassan, Marco Scarsini, Shmuel Zamir

Probabilistic Modelling: An Historical and Philosophical Digression

This paper is about the conflict between the modern formal treatment of statistical inference and the role of subjectivity, inventiveness and personal involvement which, I claim, should be allowed in any non trivial applied probabilistic modelling.Iconcentrate, intentionally, on the limitations of the formal treatment and try to overemphasize the qualitative, informal judgments involved in applied inference. Overdispersion and Item Response models are used as an illustration.
Antonio Forcina

A Bayesian View on Sampling the 2 X 2 Table

We study exact and approximate inferential procedures for the 2 x 2 table from both the frequentist and Bayesian mode mediated by Likelihood Principles. In particular, for a variety of sampling rules, inferential procedures for a Bayesian approach are the same while differences ensue for various exact and some approximate conditional frequentist methods. In fact, for certain sensible sampling rules, no exact conditional frequentist procedure is available. In a hypothetical situation where it is assumed that the sampling rule that led to the table was unknown, suggestions are made to handle this case, that indicate the general superiority and versatility of the Bayesian approach.
Seymour Geisser

Bayesian Designs for Binomial Experiments

Calculating the size of the sample required for an experiment is of paramount importance in statistical theory. We describe a new methodology for calculating the optimal sample size when a hypothesis test between two or more binomial proportions takes place. The posterior risk is computed and should not exceed a pre-specified level. A second constraint examines the likelihood of the unknown data not satisfying the bound on the risk.
Athanassios Katsis, Blaza Toman

On the Second Order Minimax Improvement of the Sample Mean in the Estimation of a Mean Value of the Exponential Dispersion Family

We consider the problem of the second order minimax improvement of the sample mean in the estimation of the mean value of the Exponential Dispersion Family (EDF), when the space of all possible values of mean is nonrestricted. We show a necessary and sufficient conditions for the possibility of such an improvement.
Zinoviy Landsman

Bayesian Analysis of Cell Migration — Linking Experimental Data and Theoretical Models

We analyze experimental time series from phase contrast microscopy of cells moving on a 2D substrate. Using Bayesian analysis a statistical model is developed which allows to characterize cell migration with a few parameters.
Roland Preuss, Albrecht Schwab, Hans J Schnittler, Peter Dieterich

Testing, Goodness of Fit and Randomness


Sequential Bayes Detection of Trend Changes

Let W t (0 ≤ t < ∞) denote a Brownian motion process which has zero drift during the time interval [0, v) and drift θ during the time interval [v, ∞), where θ and v are unknown. The process W is observed sequentially. The general goal is to find a stopping time T of W that ‘detects’ the unknown time point v as soon and as reliably as possible on the basis of this information. We work in a Bayesian framework and discuss a loss structure that is closely connected to that of the Bayes tests of power one of Lerche ([4]). This work extends Beibel’s ([2]) where only normal priors on θ were studied. An important ingredient in our proof is the comparison of the process of the posterior variance under different priors similar to the arguments in Paulsen ([6]).
Martin Beibel, Hans R. Lerche

Box—Cox Transformation for Semiparametric Comparison of Two Samples

We consider the density ratio model which specifies a linear parametric function of the log-likelihood ratio of two densities without assuming any specific form about them and has been found useful for semiparametric comparison of two samples. We study the Box-Cox family of transformations in the context of the density ratio model to suggest a data driven method for identification of the model’s true parametric part. The methodology is illustrated by a real data example.
Konstantinos Fokianos

Minimax Nonparametric Goodness-of-Fit Testing

We discuss and study minimax nonparametric goodness-of-fit testing problems under Gaussian models in the sequence space and in the functional space. The unknown signal is assumed to vanish under the null-hypothesis. We consider alternatives under two-side constraints determined by Besov norms. We present the description of the types of sharp asymptotics under the sequence space model and of the rate asymptotics under the functional model. The structures of asymptotically minimax and minimax consistent test procedures are given. These results extend recent results of the paper [12]. The results for an adaptive setting are presented as well.
Yuri I. Ingster, Irina A. Suslina

Testing Randomness on the Basis of the Number of Different Patterns

The problem of randomness testing gained importance because of the need to assess the quality of different random number generators. The wide use of public key cryptography necessitated testing for randomness binary strings produced by such generators. The evaluation of random nature of various generators outputs became vital for communications industry where digital signatures and key management are crucial for information processing and for computer security.
Andrew L. Rukhin

The π* Index as a New Alternative for Assessing Goodness of Fit of Logistic Regression

In this paper the 7π* index of fit introduced by Rudas et al. [9] is applied to the model of logistic regression. First, the original definition of π* is given with its interpretation, then a review is given on logistic regression focusing on how to assess model fit in traditional ways. Assessing fit often requires grouping of the data and the main part of this paper is concerned with methods for grouping the data and choosing computational technics. These are illustrated using a standard set of data.
Emese Verdes, Tamás Rudas

Statistics of Stationary Processes


Consistent Estimation of Early and Frequent Change Points

We address two types of processes with change points that often arise in practical situations. These are processes withearly change pointsand processes withfrequent change points.Early change points may occur after very few observations and may be followed by additional change points or more complicated patterns. Frequent change points separate different homogeneous phases of the observed process with the possibility of very short phases.
Michael I. Baron, Nira Granott

Asymptotic Behaviour of Estimators of the Parameters of Nearly Unstable INAR(1) Models

A sequence of first-order integer-valued autoregressive type (INAR(1)) processes is investigated, where the autoregressive type coefficients converge to 1. It is shown that the limiting distribution of the joint conditional least squares estimators for this coefficient and for the mean of the innovation is normal. Consequences for sequences of Galton-Watson branching processes with unobservable immigration, where the mean of the offspring distribution converges to 1 (which is the critical value), are discussed.
Márton Ispány, Gyula Pap, Martien C. A. van Zuijlen

Guessing the Output of a Stationary Binary Time Series

The forward prediction problem for a binary time series {Xn} n=0 is to estimate the probability that Xn+1= 1 based on the observations X i , 0 ≤ i≤ n without prior knowledge of the distribution of the process {Xn}. It is known that this is not possible if one estimates at all values of n. We present a simple procedure which will attempt to make such a prediction infinitely often at carefully selected stopping times chosen by the algorithm. The growth rate of the stopping times is also studied.
Gusztáv Morvai

Asymptotic Expansions for Long-Memory Stationary Gaussian Processes

This paper surveys results recently obtained by the authors on higher-order asymptotic expansions for stationary Gaussian processes with long memory, that is, with a hyperbolically decaying autocovariance function. Such processes have been used to model time series data in various fields. Frequentist-type results presented include the following: an Edgeworth expansion for the sample autocovariance function, an Edgeworth expansion for the log-likelihood derivatives and the maximum likelihood estimator in parametric time series models, and a Bartlett corrected likelihood ratio test for the fractional integration parameter in the ARFIMA model. Bayesian-type results presented include the following: an Edgeworth expansion for the posterior density of the parameter vector in parametric models, identification of matching priors under which frequentist and Bayesian inferences approximately agree, and identification of approximate reference priors in the sense of Bernardo, which carry minimum initial information on the parameter vector in a certain Kullback-Leibler sense. The key tools are theorems concerning the limiting behavior of the trace of the product of certain Toeplitz matrices and a general theorem of Durbin on Edgeworth expansions for dependent data. The results and proofs are briefly sketched, with references to the original papers for further details.
David M. Zucker, Judith Rousseau, Anne Philippe, Offer Lieberman


Weitere Informationen