Skip to main content

1997 | Buch

Case Studies in Bayesian Statistics

Volume III

herausgegeben von: Constantine Gatsonis, James S. Hodges, Robert E. Kass, Robert McCulloch, Peter Rossi, Nozer D. Singpurwalla

Verlag: Springer New York

Buchreihe : Lecture Notes in Statistics

insite
SUCHEN

Über dieses Buch

Like the first two volumes, this third volume of case studies presents detailed applications of Bayesian statistical analysis, emphasizing the sci­ entific context. The papers were presented and discussed at a workshop at Carnegie Mellon University, October 5-7, 1995. In this volume, which is dedicated to the memory of Morris H. DeGroot, econometric applica­ tions are highlighted. There are six invited papers, each with accompany­ ing invited discussion, and eight contributed papers (which were selected following refereeing). In addition, we include prefatory recollections about Morrie DeGroot by James o. Berger and Richard M. Cyert. INVITED PAPERS In Probing Public Opinion: The State of Valencia Experience, Jose Bernardo, who was a scientific advisor to the President of the State of Valencia, Spain, summarizes procedures that were set up to probe public opinion, and were used as an input to the government's decision making process. At the outset, a sample survey had to be designed. The problem of finding an optimal Bayesian design, based on logarithmic divergence be­ tween probability distributions, involves minimization over 21483 points in the action space. To solve it, simulated annealing was used. The author describes the objective of obtaining the probability that an individual clas­ sified in a certain group will prefer one of several possible alternatives, and his approach using posterior distributions based on reference priors.

Inhaltsverzeichnis

Frontmatter

Invited Papers

Frontmatter
Probing Public Opinion: the State of Valencia Experience
Abstract
This paper summarizes the procedures which have been set up during the last years at the Government of the State of Valencia, Spain, to systematically probe its public opinion as an important input into its decision processes.
After a brief description of the electoral setup, we (i) outline the use of a simulated annealing algorithm, designed to find a good design for sample surveys, which is based on the identification of representative electoral sections, (ii) describe the methods used to analyze the data obtained from sample surveys on politically relevant topics, (iii) outline the proceedings of election day -detailing the special problems posed by the analysis of exit poll, representative sections, and early returns data - and (iv) describe a solution to the problem of estimating the political transition matrices which identify the reallocation of the vote of each individual party between two political elections.
Throughout the paper, special attention is given to the illustration of the methods with real data. The arguments fall entirely within the Bayesian framework.
José M. Bernardo
Pressure Matching for Hydrocarbon Reservoirs: A Case Study in the Use of Bayes Linear Strategies for Large Computer Experiments
Abstract
In the oil industry, fluid flow models for reservoirs are usually too complex to be solved analytically and approximate numerical solutions must be obtained using a ‘reservoir simulator’, a complex computer pro­gram which takes as input descriptions of the reservoir geology. We describe a Bayes linear strategy for history matching; that is, seeking simulator in­puts for which the outputs match closely to historical production. This approach, which only requires specification of means, variances and covari­ances, formally combines reservoir engineers’ beliefs with data from fast approximations to the simulator. We present an account of our experiences in applying the strategy to match the pressure history of an active reservoir. The methodology is appropriate in a wide variety of applications involving inverse problems in computer experiments.
Peter S. Craig, Michael Goldstein, Allan H. Seheult, James A. Smith
Hierarchical Bayes Models for Micro-Marketing Strategies
Abstract
Micro-marketing refers to the customization of marketing mix variables to the store-level. We show how prices could be customized to the store-level, rather than adopting a uniform marketing policy across all stores. A basis for these customized pricing strategies is a result of differences in interbrand competition across stores. These changes in interbrand competition are related to demographic and competitive characteristics of the store’s trading area. This study finds that profitable micro-marketing pricing strategies can be implemented. These pricing strategies can increase expected operating profits by 25%.
Alan L. Montgomery
Modeling Mortality Rates for Elderly Heart Attack Patients: Profiling Hospitals in the Cooperative Cardiovascular Project
Abstract
Public debate on costs and effectiveness of health care in the United States has generated a growing emphasis on “profiling” of medical care providers. The process of profiling involves comparing resource use and quality of care among medical providers to community or to normative standards. Information from provider profiles may be used to determine whether providers deviate from acceptable standards in the type of care they deliver. In this paper we profile hospitals based on 30-day mortality rates for a cohort of 14,581 Medicare patients discharged with acute myocardial infarction (AMI) in 1993 from hospitals located in Alabama, Connecticut, Iowa, or Wisconsin. Clinical and socio-demographic information for the study cohort were collected retrospectively from individually reviewed medical charts and administrative files.
To account for within-hospital association among patient outcomes and between-hospital variability of practice patterns, we fit a Hierarchical Logistic Regression Model to the mortality data. We also employed multiple imputation methods to impute anomalous and missing values under the assumption that the complete multivariate data follow a General Location Model. Once predictor variables were selected and appropriately transformed, imputed datasets were then fitted to the Hierarchical Logistic Regression Model using Markov Chain Monte Carlo methods, and the results of each fit were combined across imputations to produce inferences for the model. We estimated several indices of excess mortality: (1) the posterior probability that hospital mortality was one and one half times the median mortality over all hospitals for patients of average admission severity, (2) the probability that the difference between adjusted and standardized hospital mortality was large, and (3) the probability that hospital mortality was greater than a benchmark value.
We found that the overall unadjusted 30-day mortality rate was 21%, and ranged from 18% in Connecticut to 23% in Alabama; observed hospital-specific mortality ranged from 0 to 67% across the 389 hospitals. After adjusting for patient age, mean arterial pressure, creatinine and respiration rate measured at admission in each hospital, hospital mortality ranged from 10% to 53%. The probability that hospital-specific mortality for the “average” patient was one and one half times the median mortality for similar patients was greater than 14% for one quarter of the sampled hospitals. The posterior probability of a large discrepancy between risk-adjusted and standardized hospital-specific mortality was less than 6% for three quarters of the hospitals.
Sharon-Lise T. Normand, Mark E. Glickman, Thomas J. Ryan
A Bayesian Approach to the Modeling of Spatial-Temporal Precipitation Data
Abstract
Most precipitation data comes in the form of daily rainfall totals collected across a network of rain gauges. Research over the past several years on the statistical modeling of rainfall data has led to the development of models in which rain events are formed according to some stochastic process, and deposit rain over an area before they die. Fitting such models to daily data is difficult, however, because of the absence of direct observation of the rain events. In this paper, we argue that such a fitting procedure is possible within a Bayesian framework. The methodology relies heavily on Markov chain simulation algorithms to produce a reconstruction of the unseen process of rain events. As applications, we discuss the potential of such methodology in demonstrating changes in precipitation patterns as a result of actual or hypothesized changes in the global climate.
R. L. Smith, P. J. Robinson
Variable Selection Tests of Asset Pricing Models
Abstract
An asset pricing test is just variable selection confined to the intercepts. Framing the testing problem as variable selection facilitates development of a new Bayesian multivariate test that strikes a balance between the extreme of tests based purely on statistical significance (e.g., Gibbons, Ross, and Shanken (GRS) (1989)) and the extreme of tests based purely on economic significance (i.e., just look at the intercepts). Our procedure jointly tests for statistical and economic significance while explicitly accounting for the fact that, since all models are false, no model can satisfy a sharp null hypothesis. In addition, our most important prior represents the largest average pricing error considered economically insignificant. This prior’s simple interpretation is a key feature of our approach. We demonstrate our test on both simulated economies and actual data and compare it to the GRS test.
Ross L. Stevens

Contributed Papers

Frontmatter
Modeling the History of Diabetic Retinopathy
Abstract
The Wisconsin Epidemiologic Study of Diabetic Retinopathy (WESDR) is a population based study to investigate prevalence, incidence, and progression of diabetic retinopathy. To analyze the young-onset insulin-dependent subpopulation of this study, we propose and fit a hidden Markov model. A nonhomogeneous discrete-time Markov chain describes the natural course of the disease. We also account for complicating factors such as treatment intervention and death. The model is formulated on a yearly basis so as to correspond with a common interval for physician visits. Bayesian inference is used to combine the WESDR data with the model. Because the health status of each subject was observed at several separated years, there are many unobserved variables. Markov chain Monte Carlo is used to simulate the posterior. Predictive distributions are discussed as a prognostic tool to assist researchers in evaluating costs and benefits of interventions.
Bruce A. Craig, Michael A. Newton
Hierarchical Bayesian Analysis for Prevalence Estimation
Abstract
We discuss a Bayesian assessment of the prevalence of psychiatric disorder in adolescents using data from the Great Smoky Mountain Study (GSMS). The GSMS was designed to assess the prevalence of adolescent psychiatric disorders, and assess the need for and use of mental health services by this population. The study uses a multi-site two-phase sampling design. Twelve counties of Western North Carolina were included in the sample. In the first phase, a screening instrument designed to uncover general behavioral and emotional problems was given to a random sample of 4500 subjects. For 100% of those screening as high risk and 10% of those screening as low risk, the Child and Adolescent Psychiatric Assessment (CAPA) interview was given. In this paper, we apply a hierarchical Bayesian model that accounts for the within-county and between-county variability. Along with the posterior distribution of prevalence of disorder, we also calculate the posterior distributions measures of screening efficiency including the sensitivity and specificity of the screening instrument. The computations are performed using the Gibbs sampling algorithm.
Alaattin Erkanli, Refik Soyer, Dalene Stangl
Estimating the Cells of a Contingency Table with Limited Information, for Use in Geodemographic Marketing
Abstract
We develop a Bayesian approach for inferring the joint distribution of several demographic variables when in possession of only the marginal distribution of each variable and prior information about the correlations among the variables. The approach is applied to four marketing problems, two involving direct mail advertising and two involving the location of a retail site, using public domain U.S. Census Bureau data for Sioux Falls and the state of South Dakota. The Bayesian approach has several advantages, which we discuss. We compute posterior quantities using importance sampling and compare this method to Laplace’s approximation and the usual normal approximation. The Bayesian approach does a good job of recovering the joint distribution of the demographic variables and provides a measure of uncertainty about the resulting estimates. Hypothesis testing, highest posterior density regions, and decision problems are demonstrated.
James S. Hodges, Kirthi Kalyanam, Daniel S. Putler
Multiresolution Assessment of Forest Inhomogeneity
Abstract
The spatial distribution of dominant tree species in an undisturbed mature stand tends to be regular and even, often exhibiting less variation than a simple Poisson model would suggest; in contrast the spatial distribution of species in a recovering or transitional stand would be expected to display considerable spatial variation. This paper studies the spatial distribution of hickory trees within the Bormann research plot of Duke Forest in an attempt to assess the degree of variation, as an indicator for forest maturation, using models recently introduced in Wolpert and Ickstadt (1995). A data augmentation scheme and Markov chain Monte Carlo methods are employed to evaluate Bayesian posterior distributions.
Katja Ickstadt, Robert L. Wolpert
Assessment of Deleterious Gene Models Using Predictive p-values
Abstract
Using both hierarchical modeling and Bayesian computing techniques, Lee, Newton, Nordheim and Kang (1994) proposed a complex statistical model for studying the characteristics of deleterious genes in plant species. To assess the fit of such models the use of posterior predictive checks has been suggested. These techniques are based on measures of the discrepancy between the observed data and potential data evaluated with respect to the posterior predictive distribution. In our current study we use a posterior predictive p-value, evaluating the probability of observing a predictive density as small as that actually observed. To calculate these posterior predictive p-values from the sample output of the predictive distribution, we use both a density estimation technique and a quantile method. Because we avoid the difficulties of evaluating this p-value directly from the posterior predictive distribution and because our evaluation of the p-value does not rely on any discrepancy measures, it is straightforward in our case. Regarding the assumption about a genetic parameter representing intensity of mortality of deleterious genes, our results indicate a better fit of a mildly-deleterious model as compared with strongly-deleterious or full lethal models.
Jae Kyun Lee
Bayesian Inference for the Best Ordinal Multinomial Population in a Taste Test
Abstract
Sensory experiments are used by industries to select the best product out of a set of similar products. In a taste-testing experiment 36 panelists were asked to rate 11 entrees of a military ration on a 9-point hedonic scale. An objective of the study was to select the best entree and to estimate the probability of at least a neutral response from the selected entree. Our probabilistic structure starts with the multinomial model. Let \({\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle\thicksim}$}}{n} _i} = ({n_{i1}}, \ldots ,{n_{iJ}})\)’ be the cell counts and \({\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle\thicksim}$}}{p} _i} = ({p_{i1}}, \ldots ,{p_{iJ}})\)’ be the cell probabilities, i = 1,…,I. Then, given \({\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle\thicksim}$}}{p} _i}\), we take independent multinomial distributions for the \({\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle\thicksim}$}}{n} _i}\), and nonidentical Dirichlet distributions for the \({\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle\thicksim}$}}{p} _i}\). A priori each population has a possible nonzero probability of being the best. Using the simple tree order, we obtain the most probable population a posteriori under three criteria. Let \({\mu _i} = \sum\nolimits_{j = 1}^J {j{p_{ij}}} ,{\text{ }}{\sigma _i} = {\left\{ {\sum\nolimits_{j = 1}^J {{p_{ij}}{{\left( {j - {\mu _i}} \right)}^2}} } \right\}^{1/2}}{\text{ and }}{v_i} = {\sigma _i}/{\mu _i}\) be the mean, standard deviation and coefficient of variation of the i th population, i = 1,…, I. We select the population with the largest mean, the population with the smallest standard deviation, or the population with the smallest coefficient of variation. Selection and estimation occur simultaneously. Our method is sampling based, and uses Monte Carlo integration accommodated by rejection sampling. Chicken was judged the best entree by the three criteria, and the posterior probability of at least a neutral response for chicken was.78.
Balgobin Nandram
A Random-Effects Multinomial Probit Model of Car Ownership Choice
Abstract
The number of cars in a household has an important effect on its travel behavior (e.g., choice of number of trips, mode to work and non-work destinations), hence car ownership modeling is an essential component of any travel demand forecasting effort. In this paper we report on a random effects multinomial probit model of car ownership level, estimated using longitudinal data collected in the Netherlands.
A Bayesian approach is taken and the model is estimated by means of a modification of the Gibbs sampling with data augmentation algorithm considered by McCulloch and Rossi (1994). The modification consists in performing, after each Gibbs sampling cycle, a Metropolis step along a direction of constant likelihood. An examination of the simulation output illustrates the improved performance of the resulting sampler.
Agostino Nobile, Chandra R. Bhat, Eric I. Pas
Changepoint Modeling of Longitudinal PSA as a Biomarker for Prostate Cancer
Abstract
Prostate-specific antigen (PSA) is an important indicator of the presence of prostate disease. When the volume of the prostate increases, as when cancer is present, the levels of PSA in the blood also increase. Our work focuses on using PSA levels as a biomarker for the recurrence of prostate cancer in patients that have been previously diagnosed and treated by radiotherapy. We fit a fully Bayesian hierarchical changepoint model to longitudinal PSA readings. Our objective is twofold; to better understand the natural history of PSA levels in patients who have completed treatment, and to use the model to identify individual changepoints that are indicative of recurrence. With the goal of accurate early detection of recurrence, we perform a prospective sequential analysis to compare several diagnostic rules, including a rule based on the posterior distribution of individual changepoints.
Elizabeth H. Slate, Kathleen A. Cronin
A Subjective Bayesian Approach to Environmental Sampling
Abstract
In determining the sample size that should be taken to assess possible environmental damage, both prior opinions on the extent of the damage, and utilities for decisions that may be arrived at as a consequence of the extent of the damage should be used. In situations with multiple stakeholders, it is desirable for the stakeholder choosing the sample size to do so in light of which party will make the decisions about possible remedial action. A method is presented for calculating the optimal sample size for one party to choose, that incorporates the possibilities that either party may make the decisions about remedial action.
L. J. Wolfson, J. B. Kadane, M. J. Small
Backmatter
Metadaten
Titel
Case Studies in Bayesian Statistics
herausgegeben von
Constantine Gatsonis
James S. Hodges
Robert E. Kass
Robert McCulloch
Peter Rossi
Nozer D. Singpurwalla
Copyright-Jahr
1997
Verlag
Springer New York
Electronic ISBN
978-1-4612-2290-3
Print ISBN
978-0-387-94990-1
DOI
https://doi.org/10.1007/978-1-4612-2290-3