Skip to main content
Top

2017 | Book

Bayesian Statistics in Action

BAYSM 2016, Florence, Italy, June 19-21

Editors: Raffaele Argiento, Ettore Lanzarone, Isadora Antoniano Villalobos, Alessandra Mattei

Publisher: Springer International Publishing

Book Series : Springer Proceedings in Mathematics & Statistics

insite
SEARCH

About this book

This book is a selection of peer-reviewed contributions presented at the third Bayesian Young Statisticians Meeting, BAYSM 2016, Florence, Italy, June 19-21. The meeting provided a unique opportunity for young researchers, M.S. students, Ph.D. students, and postdocs dealing with Bayesian statistics to connect with the Bayesian community at large, to exchange ideas, and to network with others working in the same field. The contributions develop and apply Bayesian methods in a variety of fields, ranging from the traditional (e.g., biostatistics and reliability) to the most innovative ones (e.g., big data and networks).

Table of Contents

Frontmatter

Theory and Methods

Frontmatter
Sequential Monte Carlo Methods in Random Intercept Models for Longitudinal Data
Abstract
Longitudinal modelling is common in the field of Biostatistical research. In some studies, it becomes mandatory to update posterior distributions based on new data in order to perform inferential process on-line. In such situations, the use of posterior distribution as the prior distribution in the new application of the Bayes’ theorem is sensible. However, the analytic form of the posterior distribution is not always available and we only have an approximated sample of it, thus making the process “not-so-easy”. Equivalent inferences could be obtained through a Bayesian inferential process based on the set that integrates the old and new data. Nevertheless, this is not always a real alternative, because it may be computationally very costly in terms of both time and resources. This work uses the dynamic characteristics of sequential Monte Carlo methods for “static” setups in the framework of longitudinal modelling scenarios. We used this methodology in real data through a random intercept model.
Danilo Alvares, Carmen Armero, Anabel Forte, Nicolas Chopin
On the Truncation Error of a Superposed Gamma Process
Abstract
Completely random measures (CRMs) form a key ingredient of a wealth of stochastic models, in particular in Bayesian Nonparametrics for defining prior distributions. CRMs can be represented as infinite series of weighted random point masses. A constructive representation due to Ferguson and Klass provides the jumps of the series in decreasing order. This feature is of primary interest when it comes to sampling since it minimizes the truncation error for a fixed truncation level of the series. In this paper we focus on a general class of CRMs, namely the superposed gamma process, which suitably transformed has already been successfully implemented in Bayesian Nonparametrics, and quantify the quality of the approximation in two ways. First, we derive a bound in probability for the truncation error. Second, following [1], we study a moment-matching criterion which consists in evaluating a measure of discrepancy between actual moments of the CRM and moments based on the simulation output. To this end, we show that the moments of this class of processes can be obtained analytically.
Julyan Arbel, Igor Prünster
On the Study of Two Models for Integer-Valued High-Frequency Data
Abstract
Financial prices are usually modelled as continuous, often involving geometric Brownian motion with drift, leverage and possibly jump components. An alternative modelling approach allows financial observations to take integer values that are multiples of a fixed quantity, the ticksize - the monetary value associated with a single change during the price evolution. In the case of high-frequency data, the sample exhibits diverse trading operations in a few seconds. In this context, the observables are assumed to be conditionally independent and identically distributed from either of two flexible likelihoods: the Skellam distribution - defined as the difference between two independent Poisson distributions - or a mixture of Geometric distributions. Posterior inference is obtained via adaptive Gibbs sampling algorithms. Comparisons of the models applied to high-frequency financial data is provided.
Andrea Cremaschi, Jim E. Griffin
Identification and Estimation of Principal Causal Effects in Randomized Experiments with Treatment Switching
Abstract
In randomized clinical trials designed to evaluate the effect of a treatment on patients with advanced disease stages, treatment switching is often allowed for ethical reasons. Because the switching is a prognosis-related choice, identification and estimation of the effect of the actual receipt of the treatment becomes problematic. Existing methods in the literature try to reconstruct the ideal situation that would be observed if the switchers had not switched. Rather than focusing on reconstructing the a-priori counterfactual outcome for the switchers, had they not switched, we propose to identify and estimate effects for (latent) subgroups of units according to their switching behaviour. The reference framework of the proposed method is the potential outcome approach. In order to estimate causal effects for sub- groups of units not affected by treatment, we rely on the principal stratification approach (Frangakis and Rubin in Biometrics 58(1): 21–29 2002) [1]. To illustrate the proposed method and evaluate the maintained assumptions, we analyse a dataset from a randomized clinical trial on patients with asymptomatic HIV infection assigned to immediate (the active treatment) or deferred (the control treatment) Zidovudine (ZDV). The results, obtained through a full-Bayesian estimation approach, are promising and emphasize the high heterogeneity of the effects for different latent subgroups defined according to the switching behaviour.
Emanuele Gramuglia
A Bayesian Joint Dispersion Model with Flexible Links
Abstract
The objective is to jointly model longitudinal and survival data taking into account their interdependence in a real HIV/AIDS dataset inside a Bayesian framework. We propose a linear mixed effects dispersion model for the CD4 longitudinal counts with a between-individual heterogeneity in the mean and variance, relaxing the usual assumption of a common variance for the longitudinal residuals. A hazard regression model is considered in addition to model the time since HIV/AIDS diagnostic until failure, where the coefficients accounting for the linking between the longitudinal and survival processes are time-varying. This flexibility is specified using penalized Splines and allows the relationship to vary in time. Because residual heteroscedasticity may be related with the survival, the standard deviation is considered as a covariate in the hazard model thus enabling to study the effect of the CD4 counts’ stability on the survival. The proposed framework outperforms the traditional joint models, highlighting the importance in correctly taking account the individual heterogeneity for the measurement errors variance.
Rui Martins
Local Posterior Concentration Rate for Multilevel Sparse Sequences
Abstract
We consider empirical Bayesian inference in the many normal means model in the situation when the high-dimensional mean vector is multilevel sparse, that is, most of the entries of the parameter vector are some fixed values. For instance, the traditional sparse signal is a particular case (with one level) of multilevel sparse sequences. We apply an empirical Bayesian approach, namely we put an appropriate prior modeling the multilevel sparsity and make data-dependent choices of certain parameters of the prior. We establish local (i.e., with rate depending on the “true” parameter) posterior contraction and estimation results. Global adaptive minimax results (for the estimation and posterior contraction problems) over sparsity classes follow from our local results if the sparsity level is of polynomial order. The results are illustrated by simulations.
Eduard Belitser, Nurzhan Nurushev
Likelihood Tempering in Dynamic Model Averaging
Abstract
We study the problem of online prediction with a set of candidate models using dynamic model averaging procedures. The standard assumptions of model averaging state that the set of admissible models contains the true one(s), and that these models are continuously updated by valid data. However, both these assumptions are often violated in practice. The models used for online tasks are often more or less misspecified and the data corrupted (which is, mathematically, a demonstration of the same problem). Both these factors negatively influence the Bayesian inference and the resulting predictions. In this paper, we propose to suppress these issues by extending the Bayesian update by a sort of likelihood tempering, moderating the impact of observed data to inference. The method is compared to the generic dynamic model averaging and to an alternative solution via sequential quasi-Bayesian mixture modeling.
Jan Reichl, Kamil Dedecius
Localization in High-Dimensional Monte Carlo Filtering
Abstract
The high dimensionality and computational constraints associated with filtering problems in large-scale geophysical applications are particularly challenging for the Particle Filter (PF). Approximate but efficient methods such as the Ensemble Kalman Filter (EnKF) are therefore usually preferred. A key element of these approximate methods is localization, which is a general technique to avoid the curse of dimensionality and consists in limiting the influence of observations to neighboring sites. However, while it works effectively with the EnKF, localization introduces harmful discontinuities in the estimated physical fields when applied blindly to the PF. In the present paper, we explore two possible local algorithms based on the Ensemble Kalman Particle Filter (EnKPF), a hybrid method combining the EnKF and the PF. A simulation study in a conjugate normal setup allows to highlight the trade-offs involved when applying localization to PF algorithms in the high-dimensional setting. Experiments with the Lorenz96 model demonstrate the ability of the local EnKPF algorithms to perform well even with a small number of particles compared to the problem size.
Sylvain Robert, Hans R. Künsch
Linear Inverse Problem with Range Prior on Correlations and Its Variational Bayes Inference
Abstract
The choice of regularization for an ill-conditioned linear inverse problem has significant impact on the resulting estimates. We consider a linear inverse model with on the solution in the form of zero mean Gaussian prior and with covariance matrix represented in modified Cholesky form. Elements of the covariance are considered as hyper-parameters with truncated Gaussian prior. The truncation points are obtained from expert judgment as range on correlations of selected elements of the solution. This model is motivated by estimation of mixture of radionuclides from gamma dose rate measurements under the prior knowledge on range of their ratios. Since we aim at high dimensional problems, we use the Variational Bayes inference procedure to derive approximate inference of the model. The method is illustrated and compared on a simple example and on more realistic 6 h long release of mixture of 3 radionuclides.
Ondřej Tichý, Václav Šmídl

Applications and Case Studies

Frontmatter
Bayesian Hierarchical Model for Assessment of Climate Model Biases
Abstract
Studies of climate change rely on numerical outputs simulated from Global Climate Models coupling the dynamics of ocean and atmosphere (GCMs). GCMs are, however, notoriously affected by substantial systematic errors (biases), whose assessment is essential to assert the accuracy and robustness of simulated climate features. This contribution focuses on constructing a Bayesian hierarchical model for the quantification of climate model biases in a multi-model framework. The method combines information from a multi-model ensemble of GCM simulations to provide a unified assessment of the bias. It further individuates different bias components that are characterized as non-stationary spatial fields accounting for spatial dependence. The approach is illustrated based on the case of near-surface air temperature bias over the tropical Atlantic and bordering regions from a multi-model ensemble of historical simulations from the fifth phase of the Coupled Model Intercomparison Project.
Maeregu Woldeyes Arisido, Carlo Gaetan, Davide Zanchettin, Angelo Rubino
An Application of Bayesian Seemingly Unrelated Regression Models with Flexible Tails
Abstract
Seemingly unrelated regression (SUR) models are useful for capturing the correlation structure between different regression equations. While the multivariate normal distribution is a common choice for the random error term in an SUR model, the multivariate \(t\)-distribution is also popular for robustness considerations. However, the multivariate \(t\)-distribution is elliptical which leads to the limitation that the degrees of freedom of its marginal distributions are identical. In this paper, we consider a non-elliptical multivariate Student-\(t\) error distribution which allows flexible shape parameters for the marginal distributions. This non-elliptical distribution is constructed via a scale mixtures of normal form and therefore the Markov chain Monte Carlo (MCMC) algorithms are used for Bayesian inference of SUR models. In the empirical study of the capital asset pricing model (CAPM), we show that this non-elliptical Student-\(t\) distribution outperforms the multivariate normal and multivariate Student-\(t\) distributions.
Charles Au, S. T. Boris Choy
Bayesian Inference of Stochastic Pursuit Models from Basketball Tracking Data
Abstract
We develop a Metropolis algorithm to perform Bayesian inference for models given by coupled stochastic differential equations. A key challenge in developing practical algorithms is the computation of the likelihood. We address this problem through the use of a fast method to track the probability density function of the stochastic differential equation. The method applies quadrature to the Chapman–Kolmogorov equation associated with a temporal discretization of the stochastic differential equation. The inference method can be adapted to scenarios in which we have multiple observations at one time, multiple time series, or observations with large and/or irregular temporal spacing. Computational tests show that the resulting Metropolis algorithm is capable of efficient inference for an electrical oscillator model.
Harish S. Bhat, R. W. M. A. Madushani, Shagun Rawat
Identification of Patient-Specific Parameters in a Kinetic Model of Fluid and Mass Transfer During Dialysis
Abstract
Hemodialysis (HD) is nowadays the most common therapy to treat renal insufficiency. However, despite the improvements made in the last years, HD is still associated with a non-negligible rate of co-morbidities, which could be reduced by means of an appropriate treatment customization. Many differential multi-compartment models have been developed to describe solute kinetics during HD, to optimize treatments, and to prevent intra-dialysis complications; however, they often refer to an average uremic patient. On the contrary, the clinical need for customization requires patient-specific models. In this work, assuming that the customization can be obtained by means of patient-specific model parameters, we propose a Bayesian approach to estimate the patient-specific parameters of a multi-compartment model and to predict the single patient’s response to the treatment, in order to prevent intra-dialysis complications. The likelihood function is obtained through a discretized version of a multi-compartment model, where the discretization is in terms of a Runge–Kutta method to guarantee the convergence, and the posterior densities of model parameters are obtained through Markov Chain Monte Carlo simulation.
Camilla Bianchi, Ettore Lanzarone, Giustina Casagrande, Maria Laura Costantino
A Bayesian Nonparametric Approach to Ecological Risk Assessment
Abstract
We revisit a classical method for ecological risk assessment, the Species Sensitivity Distribution (SSD) approach, in a Bayesian nonparametric framework. SSD is a mandatory diagnostic required by environmental regulatory bodies from the European Union, the United States, Australia, China etc. Yet, it is subject to much scientific criticism, notably concerning a historically debated parametric assumption for modelling species variability. Tackling the problem using nonparametric mixture models, it is possible to shed this parametric assumption and build a statistically sounder basis for SSD. We use Normalized Random Measures with Independent Increments (NRMI) as the mixing measure because they offer a greater flexibility than the Dirichlet process. Indeed, NRMI can induce a prior on the number of components in the mixture model that is less informative than the Dirichlet process. This feature is consistent with the fact that SSD practitioners do not usually have a strong prior belief on the number of components. In this short paper, we illustrate the advantage of the nonparametric SSD over the classical normal SSD and a kernel density estimate SSD on several real datasets. We summarise the results of the complete study in [18], where the method is generalised to censored data and a systematic comparison on simulated data is also presented, along with a study of the clustering induced by the mixture model to examine patterns in species sensitivity.
Guillaume Kon Kam King, Julyan Arbel, Igor Prünster
Approximate Bayesian Computation Methods in the Identification of Atmospheric Contamination Sources for DAPPLE Experiment
Abstract
Sudden releases of harmful material into a densely-populated area pose a significant risk to human health. The apparent problem of determining the source of an emission in urban and industrialized areas from the limited information provided by a set of released substance concentration measurements is an ill-posed inverse problem. When the only information available is a set of measurements of the released substance concentration in urban and industrial areas, it is difficult to determine the source of emission. The Bayesian probability framework provides a connection between model, observational and additional information about the contamination source. The posterior distribution of the source parameters was sampled using an Approximate Bayesian Computation (ABC) algorithm. The stochastic source determination method was validated against the real data set acquired in a highly disturbed flow field in an urban environment. The datasets used to validate the proposed methodology include the dispersion of contaminant plumes in a full-scale field experiment performed within the project ‘Dispersion of Air Pollutants and their Penetration into the Local Environment in London (DAPPLE)’.
Piotr Kopka, Anna Wawrzynczak, Mieczyslaw Borysiewicz
Bayesian Survival Analysis to Model Plant Resistance and Tolerance to Virus Diseases
Abstract
Viruses constitute a major threat to large-scale production of crops worldwide producing important economical losses and undermining sustainability. We evaluated a new plant variety for resistance and tolerance to a specific virus through a comparison with other well-known varieties. The study is based on two independent Bayesian accelerated failure time models which assess resistance and tolerance survival times. Information concerning plant genotype and virus biotype were considered as baseline covariates and error terms were assumed to follow a modified standard Gumbel distribution. Frequentist approach to these models was also considered in order to compare the results of the study from both statistical methodologies.
Elena Lázaro, Carmen Armero, Luis Rubio
Randomization Inference and Bayesian Inference in Regression Discontinuity Designs: An Application to Italian University Grants
Abstract
Motivated by the evaluation of Italian University grants, we will address the problem of multiplicities in (fuzzy) Regression Discontinuity (RD) settings. Following Li, Mattei and Mealli [1], we adopt a probabilistic formulation of the assignment mechanism underling RD designs and we select suitable subpopulations around the cutoff point on the basis of observed covariates using both randomization tests and a Bayesian model-based approach both accounting for the problem of multiple testing. We then conduct our analysis studying the effect of university grants on two binary outcomes, dropout and a variable equal to one for students who realize at least one University Credit (CFU), using both the Fisher-exact P-value approach and a model-based Bayesian approach. In both cases we account for the multivariate nature of the outcome by (a) proposing a multiple testing approach, and (b) defining estimands on the joint outcome.
Federica Licari
Bayesian Methods for Microsimulation Models
Abstract
This article proposes Bayesian methods for microsimulation models and for policy evaluations. In particular, the Bayesian Multinomial Logit and the Bayesian Multinomial Mixed Logit models are presented. They are applied to labour-market choices by single females and single males, enriched with EUROMOD microsimulated information, to evaluate fiscal policy effects. Estimates using the two Bayesian models are reported and compared to the results stemming from a standard approach to the analysis of the phenomenon under consideration. Improvements in model performances, when Bayesian methods are introduced and when random effects are included, are outlined. Finally, ongoing work, based on nonparametric model extensions and on analysis of work choices by couples is briefly described.
Consuelo R. Nava, Cinzia Carota, Ugo Colombino
A Bayesian Model for Describing and Predicting the Stochastic Demand of Emergency Calls
Abstract
Emergency Medical Service (EMS) systems aim at providing immediate medical care in case of emergency. A careful planning is a major prerequisite for the success of an EMS system, in particular to reduce the response time to emergency calls. Unfortunately, the demand for emergency services is highly variable and uncertainty should not be neglected while planning the activities. Thus, it is of fundamental importance to predict the number of future emergency calls and their interarrival times to support the decision-making process. In this paper, we propose a Bayesian model to predict the number of emergency calls in future time periods. Calls are described by means of a generalized linear mixed model, whose posterior densities of parameters are obtained through Markov Chain Monte Carlo simulation. Moreover, predictions are given in terms of their posterior predictive probabilities. Results from the application to a relevant real case show the applicability of the model in the practice and validate the approach.
Vittorio Nicoletta, Ettore Lanzarone, Alessandra Guglielmi, Valérie Bélanger, Angel Ruiz
Flexible Parallel Split-Merge MCMC for the HDP
Abstract
The Hierarchical Dirichlet Process Mixture model is useful to deal with topic modeling. Several sampling schemes have been developed to implement the model. We implemented the previously existing Sub-Cluster HDP Sampler introduced, since it can be parallelized. Our contribution consists in making the code as flexible as possible, in order to allow for an extension to several applications. We tested our code on synthetic and real datasets for topic modeling with categorical data.
Debora Parisi, Stefania Perego
Bayesian Inference for Continuous Time Animal Movement Based on Steps and Turns
Abstract
Although animal locations gained via GPS, etc. are typically observed on a discrete time scale, movement models formulated in continuous time are preferable in order to avoid the struggles experienced in discrete time when faced with irregular observations or the prospect of comparing analyses on different time scales. A class of models able to emulate a range of movement ideas are defined by representing movement as a combination of stochastic processes describing both speed and bearing. A method for Bayesian inference for such models is described through the use of a Markov chain Monte Carlo approach. Such inference relies on an augmentation of the animal’s locations in discrete time that have been observed with error, with a more detailed movement path gained via simulation techniques. Analysis of real data on an individual reindeer Rangifer tarandus illustrates the presented methods.
Alison Parton, Paul G. Blackwell, Anna Skarin
Explaining the Lethality of Boko Haram’s Terrorist Attacks in Nigeria, 2009–2014: A Hierarchical Bayesian Approach
Abstract
Since 2009, Nigeria has been the scene of numerous deadly terrorist attacks perpetrated by the terrorist group Boko Haram. In response to this threat, stakeholders in the fight against terrorism have deployed various counterterrorism policies, the costs of which could be reduced through efficient preventive measures. Statistical models able to integrate complex spatial dependence structures have not yet been applied, despite their potential for providing guidance to assess characteristics of terrorist attacks. In an effort to address this shortcoming, we use a flexible approach that represents a Gaussian Markov random field through stochastic partial differential equation and model the fine-scale spatial patterns of the lethality of terrorism perpetrated by Boko Haram in Nigeria from 2009 to 2014. Our results suggest that the lethality of terrorist attacks is contagious in space and attacks are more likely to be lethal at higher altitudes and far from large cities.
André Python, Janine Illian, Charlotte Jones-Todd, Marta Blangiardo
Optimizing Movement of Cooperating Pedestrians by Exploiting Floor-Field Model and Markov Decision Process
Abstract
Optimizing movement of pedestrians is a topic of great importance, calling for modeling crowds. In this contribution we address the problem of evacuation, where pedestrians choose their actions in order to leave the endangered area. To address such decision making process we exploit the well-known floor-field model with modeling based on Markov decision processes (MDP). In addition, we also allow the pedestrians to cooperate and exchange their information (probability distribution) about the state of the surrounding environment. This information in form of probability distributions is then combined in the Kullback–Leibler sense. We show in the simulation study how the use of MDP and information sharing positively influences the amount of inhaled CO and the evacuation time.
Vladimíra Sečkárová, Pavel Hrabák
Metadata
Title
Bayesian Statistics in Action
Editors
Raffaele Argiento
Ettore Lanzarone
Isadora Antoniano Villalobos
Alessandra Mattei
Copyright Year
2017
Electronic ISBN
978-3-319-54084-9
Print ISBN
978-3-319-54083-2
DOI
https://doi.org/10.1007/978-3-319-54084-9

Premium Partner