Skip to main content

2012 | Buch

Graphical Models with R

verfasst von: Søren Højsgaard, David Edwards, Steffen Lauritzen

Verlag: Springer US

Buchreihe : Use R!

insite
SUCHEN

Über dieses Buch

Graphical models in their modern form have been around since the late 1970s and appear today in many areas of the sciences. Along with the ongoing developments of graphical models, a number of different graphical modeling software programs have been written over the years. In recent years many of these software developments have taken place within the R community, either in the form of new packages or by providing an R interface to existing software. This book attempts to give the reader a gentle introduction to graphical modeling using R and the main features of some of these packages. In addition, the book provides examples of how more advanced aspects of graphical modeling can be represented and handled within R. Topics covered in the seven chapters include graphical models for contingency tables, Gaussian and mixed graphical models, Bayesian networks and modeling high dimensional data.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Graphs and Conditional Independence
Abstract
In this chapter we introduce graphs as mathematical objects, show how to work with them using R and explain how they are related to statistical models. We focus mainly on undirected graphs and directed acyclic graphs (DAGs), but also briefly treat chain graphs, that have both undirected and directed edges. Key concepts such as clique, path, separation, ancestral set, triangulated graph and perfect vertex ordering, and operations such as moralization and triangulation, are described and illustrated through examples using R. Certain models for multivariate data give rise to patterns of conditional independences that can be represented as a graph, the so-called dependence graph. For such a model, the conditional independences that hold can be directly read off the dependence graph. For undirected graphs, we show how this may be done using the graphical property of separation: for DAGS and chain graphs, analogous properties called d-separation and c-separation are used.
Søren Højsgaard, David Edwards, Steffen Lauritzen
Chapter 2. Log-Linear Models
Abstract
This chapter describes graphical models for multivariate discrete (categorical) data. It starts out by describing various different ways in which such data may be represented in R—for example, as contingency tables—and how to convert between these representations. It then gives a concise exposition of the theory of hierarchical log-linear models, with illustrative examples using the gRim package. Topics covered include log-linear model formulae, dependence graphs, graphical and decomposable models, maximum likelihood estimation using the IPS algorithm, and hypothesis testing. Model selection is briefly discussed and illustrated using a stepwise algorithm. Graphical modeling is particularly useful for multi-dimensional tables, and since these are often sparse, it is necessary to adjust the degrees of freedom as normally calculated. Other topics treated in the chapter include exact conditional tests, ordinal categorical variables, and a comparison of fitting log-linear models using IPS and the glm algorithms. A final section describes some utilities for working with the models.
Søren Højsgaard, David Edwards, Steffen Lauritzen
Chapter 3. Bayesian Networks
Abstract
This chapter deals with Bayesian networks. The term usually refers to graphical models (most often with discrete variables) based on directed acyclic graphs (DAGs), applied in the expert system context. The emphasis differs somewhat from ordinary statistical modeling, since the DAG is usually taken as known and the focus is on efficient calculation of conditional probabilities of states of unobserved variables. Implemented naively, these calculations would be forbiddingly complex, but using message passing techniques they can be implemented very efficiently. (Somewhat confusingly, Bayesian networks are not in fact very Bayesian, in the sense that the Bayesian inferential methods are not normally used.) We explain the techniques and illustrate them on an example involving chest clinic data, using the package gRain. We show how appropriate R objects can be created, compiled and probability propagation performed in order to compute the required quantities. Topics covered in later sections include simulation and prediction using the network objects, use of the RHugin package, and structural learning, i.e., how the network may be learnt from the data.
Søren Højsgaard, David Edwards, Steffen Lauritzen
Chapter 4. Gaussian Graphical Models
Abstract
This chapter describes graphical models for multivariate continuous data based on the Gaussian (normal) distribution. We gently introduce the undirected models by examining the partial correlation structure of two sets of data, one relating to meat composition of pig carcasses and the other to body fat measurements. We then give a concise exposition of the model theory, covering topics such as maximum likelihood estimation using the IPS algorithm, hypothesis testing, and decomposability. We also explain the close relation between the models and linear regression models. We describe various approaches to model selection, including stepwise selection, the glasso algorithm and the SIN algorithm and apply these to the example datasets. We then turn to directed Gaussian graphical models that can be represented as DAGs. We explain a key concept, Markov equivalence, and describe how certain mixed graphs called pDAGS and essential graphs are used to represent equivalence classes of models. We describe various model selection algorithms for directed Gaussian models, including PC algorithm, the hill-climbing algorithm, and the max-min hill-climbing algorithm and apply them to the example datasets. Finally, we briefly describe Gaussian chain graph models and illustrate use of a model selection algorithm for these models.
Søren Højsgaard, David Edwards, Steffen Lauritzen
Chapter 5. Mixed Interaction Models
Abstract
This chapter describes graphical models for mixed data, that is to say, with both discrete and continuous variables. Such data are frequently met in practice. The models are based on the conditional Gaussian distribution: that is, conditional on the discrete variables, the continuous variables are Gaussian with mean depending on the discrete variables. These models combine and generalize hierarchical log-linear models and Gaussian graphical models described in Chaps. 2 and 4. We start by describing some example datasets that are used as illustration. We then give a concise exposition of the theory of homogeneous mixed interaction models, illustrating using the gRim package. This exposition includes accounts of model formulae, important model subclasses such as graphical and decomposable models, maximum likelihood estimation using the IPS algorithm, and hypothesis testing. The final sections illustrate stepwise model selection and the construction of a mixed chain graph model.
Søren Højsgaard, David Edwards, Steffen Lauritzen
Chapter 6. Graphical Models for Complex Stochastic Systems
Abstract
This chapter provides a brief introduction to the use of Bayesian graphical models in R. In these models, parameters are treated as random quantities on an equal footing with the random variables. This allows complex stochastic systems to modeled, often using Markov chain Monte Carlo (MCMC) sampling methods. We first consider a series of examples, ranging from simple repeated sampling, linear regression models, random coefficient regression models, to the chest clinic example of Chap. 3. We formulate these as Bayesian graphical models, and represent them graphically in a compact form using plates. We then describe a special case, in which the unknown parameters are all discrete, and explain how probability propagation methods described in Chap. 3 may be used to compute the posterior distributions. We then turn to the general case, when MCMC sampling is required, and explain the computations involved in Metropolis-Hastings sampling and Gibbs sampling. Finally we illustrate a Bayesian linear regression analysis using the R2WinBUGS package.
Søren Højsgaard, David Edwards, Steffen Lauritzen
Chapter 7. High Dimensional Modelling
Abstract
This chapter describes methods suitable for high-dimensional graphical modeling. Recent years have seen intense interest in applying graphical modeling techniques to data of high dimension: by this we mean from hundreds to tens of thousands of variables. Such data arise routinely in fields such as molecular biology. We first describe two typical datasets: one from a study of gene expression in breast cancer patients, and the other from the HapMap project, in which a large number of genomic markers and gene expression measurements are recorded for 90 individuals. We compare the computational efficiency of some model selection algorithms, as applied to one of the example datasets. Of these, an extension of the Chow-Liu algorithm to find the minimal BIC forest, implemented in the gRapHD package, is found to be most efficient. Also the glasso algorithm and a stepwise decomposable search algorithm are highly efficient. We describe these algorithms in more detail and illustrate their use on the example datasets. Finally, as a more advanced example, we illustrate how a Bayesian equivalent to the minimal BIC forest algorithm for high-dimensional discrete data may be obtained. Assuming a hyper-Dirichlet prior, the maximum a posteriori forest is derived by using the extended Chow-Liu algorithm with appropriate user-defined edge weights. This is illustrated using a subset of the HapMap data.
Søren Højsgaard, David Edwards, Steffen Lauritzen
Backmatter
Metadaten
Titel
Graphical Models with R
verfasst von
Søren Højsgaard
David Edwards
Steffen Lauritzen
Copyright-Jahr
2012
Verlag
Springer US
Electronic ISBN
978-1-4614-2299-0
Print ISBN
978-1-4614-2298-3
DOI
https://doi.org/10.1007/978-1-4614-2299-0