Skip to main content

2008 | Buch

Model Based Inference in the Life Sciences: A Primer on Evidence

insite
SUCHEN

Über dieses Buch

The abstract concept of “information” can be quantified and this has led to many important advances in the analysis of data in the empirical sciences. This text focuses on a science philosophy based on “multiple working hypotheses” and statistical models to represent them. The fundamental science question relates to the empirical evidence for hypotheses in this set—a formal strength of evidence. Kullback-Leibler information is the information lost when a model is used to approximate full reality. Hirotugu Akaike found a link between K-L information (a cornerstone of information theory) and the maximized log-likelihood (a cornerstone of mathematical statistics). This combination has become the basis for a new paradigm in model based inference. The text advocates formal inference from all the hypotheses/models in the a priori set—multimodel inference.

This compelling approach allows a simple ranking of the science hypothesis and their models. Simple methods are introduced for computing the likelihood of model i, given the data; the probability of model i, given the data; and evidence ratios. These quantities represent a formal strength of evidence and are easy to compute and understand, given the estimated model parameters and associated quantities (e.g., residual sum of squares, maximized log-likelihood, and covariance matrices). Additional forms of multimodel inference include model averaging, unconditional variances, and ways to rank the relative importance of predictor variables.

This textbook is written for people new to the information-theoretic approaches to statistical inference, whether graduate students, post-docs, or professionals in various universities, agencies or institutes. Readers are expected to have a background in general statistical principles, regression analysis, and some exposure to likelihood methods. This is not an elementary text as it assumes reasonable competence in modeling and parameter estimation.

Inhaltsverzeichnis

Frontmatter
1. Introduction: Science Hypotheses and Science Philosophy
Abstract
Science is about discovering new things, about better understanding processes and systems, and generally furthering our knowledge. Deep in science philosophy is the notion of hypotheses and mathematical models to represent these hypotheses. It is partially the quantification of hypotheses that provides the illusive concept of rigor in science. Science is partially an adversarial process; hypotheses battle for primacy aided by observations, data, and models. Science is one of the few human endeavors that is truly progressive. Progress in science is defined as approaching an increased understanding of truth – science evolves in a sense.
2. Data and Models
Abstract
Data should be taken from an appropriate probabilistic sampling protocol or from a valid experimental design, which also involves a probabilistic component. These are important steps leading to a degree of scientific rigor. Such data often arise from probabilistic sampling of some kind and are said to be “representative.“ Outside of this desirable framework lie populations where such ideal sampling is largely unfeasible. For example, human populations are often composed of members that are heterogeneous to sampling. Thus, by definition, it is impossible to draw a random sample and such heterogeneity can lead to negative biases in estimators of population size. Estimators that are robust to such heterogeneity have been developed and these approaches have proven to be useful, but the standard error is often large. In general, care must be exercised to either achieve reasonably representative samples or derive models and estimators that can provide useful inferences from (the sometimes unavoidable) nonrandom sampling.
3. Information Theory and Entropy
Abstract
Solomon Kullback (1907–1994) was born in Brooklyn, New York, USA, and graduated from the City College of New York in 1927, received an M.A. degree in mathematics in 1929, and completed a Ph.D. in mathematics from the George Washington University in 1934. Kully as he was known to all who knew him, had two major careers: one in the Defense Department (1930–1962) and the other in the Department of Statistics at George Washington University (1962–1972). He was chairman of the Statistics Department from 1964–1972. Much of his professional life was spent in the National Security Agency and most of his work during this time is still classified. Most of his studies on information theory were done during this time. Many of his results up to 1958 were published in his 1959 book, “Information Theory and Statistics.” Additional details on Kullback may be found in Greenhouse (1994) and Anonymous (1997).
When we receive something that decreases our uncertainty about the state of the world, it is called information. Information is like “news,” it informs. Information is not directly related to physical quantities. Information is not material and is not a form of energy, but it can be stored and communicated using material or energy means. It cannot be measured with instruments but can be defined in terms of a probability distribution. Information is a decrease in uncertainty.
4. Quantifying the Evidence About Science Hypotheses
Abstract
Richard Arthur Leibler (1914–2003) was born in Chicago, Illinois on March 18, 1914. He received a Bachelors and Masters degree in mathematics from Northwestern University and a Ph.D. in mathematics at the University of Illinois (1939). After serving in the Navy during the war, he was a member of the Institute for Advanced Study at Princeton and a member of the von Neumann Computer Project 1946–1948. From 1948–1980 he worked for the National Security Agency (1948–1958 and 1977–1980) and the Communications Research Division of the Institute for Defense Analysis (1958–1977). He then was the president of Data Handling Inc., a consulting firm for the Intelligence Community. He received many awards, including the Exceptional Civilian Service Award.
The ability to simply rank science hypotheses and their models is a major advance over what can be done using null hypothesis tests. However, much more can be done, all under the framework of “strength of evidence,” for hypotheses in the a priori candidate set. Such evidence is exactly what Platt (1964) wanted in his well-known paper on strong inference. I begin by describing four new evidential quantities.
5. Multimodel Inference
Abstract
Hirotugu Akaike was born in 1927 in Fujinomiya-shi, Shizuoka-jen in Japan. He received B.S. and D.S. degrees in mathematics from the University of Tokyo in 1952 and 1961, respectively. He worked at the Institute of Statistical Mathematics for over 30 years, becoming its Director General in 1982. He has received many awards, prizes, and honors for his work in theoretical and applied statistics (de Leeuw 1992; Parzen 1994). This list includes the Asahi Prize, the Japanese Medal with Purple Ribbon, the Japan Statistical Society Award, and the 2006 Kyoto Prize. The three volume set entitled “Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach” (Bozdogan 1994) was to commemorate Professor Hirotugu Akaike’s 65th birthday. He is currently a Professor Emeritus at the Institute, a position he has held since 1994 and he received the Kyoto Prize in Basic Science in March, 2007.
For many years it seemed logical to somehow select the best model from an a priori set (but many people ran “all possible models”) and make inductive inferences from that best model. This approach has been the goal, for example, in regression analysis using AIC, Mallows’ (1973) Cp or step-up, step-down, or stepwise methods. Making inferences from the estimated best model seems logical and has served science for the past 50 years.
6. Advanced Topics
Abstract
Kenneth P. Burnham (1942–) Dr. Burnham received a B.S. degree in biology from Portland State University and his M.S. and Ph.D. degrees in mathematical statistics in 1969 and 1972 from Oregon State University. He has worked at the interface between the life sciences and statistics in Maryland, North Carolina, and Colorado. He has made long strings of fundamental contributions to the quantitative theory of capture–recapture and distance sampling theory and analysis. His contributions to the model selection arena and its practical application have been profound. He was selected as a Fellow by the American Statistical Association in 1990 and promoted to the position of Senior Scientist by the U.S. Geological Survey in 2004. He has a long list of awards and honors for his work, including the Distinguished Achievement Award from the American Statistical Association and the Distinguished Statistical Ecology Award from INTERCOL (International Congress of Ecology). He has just become an elected member in the International Statistical Institute. Ken (left) is shown with Hirotugu Akaike at the 2007 Kyoto Laureate Symposium. Photo courtesy of Paul Doherty and Kate Huyvaert.
7. Summary
Abstract
Kei Takeuchi (1933–) was born in Tokyo, Japan, and graduated in 1956 from the University of Tokyo. He received a Ph.D. in economics in 1966 (Kei-zaigaku Hakushi) and his research interests include mathematical statistics, econometrics, global environmental problems, history of civilization, and Japanese economy. He is the author of many books on mathematics, statistics, and the impacts of science and technology on society. His 1976 paper, although obscure and in Japanese, is important as it gives the general result from Kullback–Leibler information, now called TIC in honor of his name. He is currently a professor on the Faculty of International Studies at MeijiGakuin University and Professor Emeritus, University of Tokyo.
I will provide a brief summary of some of the main issues. The remarks below are written from a science perspective because that is what model based inference is about. I wrote this text for others interested in good science strategies and effective methods and the important concept of evidence. Application of the information-theoretic approaches are very broad and potentially useful over a very wide range of science and nonscience applications.
Backmatter
Metadaten
Titel
Model Based Inference in the Life Sciences: A Primer on Evidence
verfasst von
David R. Anderson
Copyright-Jahr
2008
Verlag
Springer New York
Electronic ISBN
978-0-387-74075-1
Print ISBN
978-0-387-74073-7
DOI
https://doi.org/10.1007/978-0-387-74075-1