Skip to main content
Top

2008 | Book

Introductory Statistics with R

insite
SEARCH

About this book

R is an Open Source implementation of the well-known S language. It works on multiple computing platforms and can be freely downloaded. R is thus ideally suited for teaching at many levels as well as for practical data analysis and methodological development.

This book provides an elementary-level introduction to R, targeting both non-statistician scientists in various fields and students of statistics. The main mode of presentation is via code examples with liberal commenting of the code and the output, from the computational as well as the statistical viewpoint. Brief sections introduce the statistical methods before they are used. A supplementary R package can be downloaded and contains the data sets. All examples are directly runnable and all graphics in the text are generated from the examples. The statistical methodology covered includes statistical standard distributions, one- and two-sample tests with continuous data, regression analysis, one- and two-way analysis of variance, regression analysis, analysis of tabular data, and sample size calculations. In addition, the last four chapters contain introductions to multiple linear regression analysis, linear models in general, logistic regression, and survival analysis.

Table of Contents

Frontmatter
1. Basics
Abstract
The purpose of this chapter is to get you started using R. It is assumed that you have a working installation of the software and of the ISwR package that contains the data sets for this book. Instructions for obtaining and installing the software are given in Appendix A.
Peter Dalgaard
2. The R environment
Abstract
This chapter collects some practical aspects of working with R. It describes issues regarding the structure of the workspace, graphical devices and their parameters, and elementary programming, and includes a fairly extensive, although far from complete, discussion of data entry.
Peter Dalgaard
3. Probability and distributions
Abstract
The concepts of randomness and probability are central to statistics. It is an empirical fact that most experiments and investigations are not perfectly reproducible. The degree of irreproducibility may vary: Some experiments in physics may yield data that are accurate to many decimal places, whereas data on biological systems are typically much less reliable. However, the view of data as something coming from a statistical distribution is vital to understanding statistical methods. In this section, we outline the basic ideas of probability and the functions that R has for random sampling and handling of theoretical distributions.
Peter Dalgaard
4. Descriptive statistics and graphics
Abstract
Before going into the actual statistical modelling and analysis of a data set, it is often useful to make some simple characterizations of the data in terms of summary statistics and graphics.
Peter Dalgaard
5. One- and two-sample tests
Abstract
Most of the rest of this book describes applications of R for actual statistical analysis. The focus to some extent shifts from explanation of the syntax to description of the output and specific arguments to the relevant functions.
Peter Dalgaard
6. Regression and correlation
Abstract
The main object of this chapter is to show how to perform basic regression analyses, including plots for model checking and display of confidence and prediction intervals. Furthermore, we describe the related topic of correlation in both its parametric and nonparametric variants.
Peter Dalgaard
7. Analysis of variance and the Kruskal–Wallis test
Abstract
In this section, we consider comparisons among more than two groups parametrically, using analysis of variance, as well as nonparametrically, using the Kruskal–Wallis test. Furthermore, we look at two-way analysis of variance in the case of one observation per cell.
Peter Dalgaard
8. Tabular data
Abstract
This chapter describes a series of functions designed to analyze tabular data. Specifically, we look at the functions prop.test, binom.test, chisq.test, and fisher.test.
Peter Dalgaard
9. Power and the computation of sample size
Abstract
A statistical test will not be able to detect a true difference if the sample size is too small compared with the magnitude of the difference. When designing experiments, the experimenter should try to ensure that a sufficient amount of data are collected to be reasonably sure that a difference of a specified size will be detected. R has methods for doing these calculations in the simple cases of comparing means using one- or two-sample t tests and comparing two proportions.
Peter Dalgaard
10. Advanced data handling
Abstract
In the preceding text, we have covered a basic set of elementary statistical procedures. In the chapters that follow, we begin to discuss more elaborate statistical modelling.
This is also a natural point to discuss some data handling techniques that are useful in the practical analysis of data but were too advanced to cover in the first two chapters of the book.
Peter Dalgaard
11. Multiple regression
Abstract
This chapter discusses the case of regression analysis with multiple predictors. There is not really much new here since model specification and output do not differ a lot from what has been described for regression analysis and analysis of variance. The news is mainly the model search aspect, namely among a set of potential descriptive variables to look for a subset that describes the response sufficiently well.
Peter Dalgaard
12. Linear models
Abstract
Many data sets are inherently too complex to be handled adequately by standard procedures and thus require the formulation of ad hoc models. The class of linear models provides a flexible framework into which many — although not all — of these cases can be fitted.
Peter Dalgaard
13. Logistic regression
Abstract
Sometimes you wish to model binary outcomes, variables that can have only two possible values: diseased or nondiseased, and so forth. For instance, you want to describe the risk of getting a disease depending on various kinds of exposures. Chapter 8 discusses some simple techniques based on tabulation, but you might also want to model dose-response relationships (where the predictor is a continuous variable) or model the effect of multiple variables simultaneously. It would be very attractive to be able to use the same modelling techniques as for linear models.
Peter Dalgaard
14. Survival analysis
Abstract
The analysis of lifetimes is an important topic within biology and medicine in particular but also in reliability analysis with engineering applications. Such data are often highly nonnormally distributed, so that the use of standard linear models is problematic.
Peter Dalgaard
15. Rates and Poisson regression
Abstract
Epidemiological studies often involve the calculation of rates, typically rates of death or incidence rates of a chronic or acute disease. This is based upon counts of events occurring within a certain amount of time. The Poisson regression method is often employed for the statistical analysis of such data. However, data that are not actually counts of events but rather measurements of time until an event (or nonevent) can be analyzed by a technique which is formally equivalent.
Peter Dalgaard
16. Nonlinear curve fitting
Abstract
Curve fitting problems occur in many scientific areas. The typical case is that you wish to fit the relation between some response y and a one-dimensional predictor x, by adjusting a (possibly multidimensional) parameter β.
Peter Dalgaard
Backmatter
Metadata
Title
Introductory Statistics with R
Author
Peter Dalgaard
Copyright Year
2008
Publisher
Springer New York
Electronic ISBN
978-0-387-79054-1
Print ISBN
978-0-387-79053-4
DOI
https://doi.org/10.1007/978-0-387-79054-1