Skip to main content
main-content
Top

1994 | Book

Modern Applied Statistics with S-Plus

Authors: W. N. Venables, B. D. Ripley

Publisher: Springer New York

Book Series: Statistics and Computing

share
SHARE
insite
SEARCH

About this book

S-Plus is a powerful environment for statistical and graphical analysis of data. It provides the tools to implement many statistical ideas which have been made possible by the widespread availability of workstations having good graphics and computational capabilities. This book is a guide to using S-Plus to perform statistical analyses and provides both an introduction to the use of S-Plus and a course in modern statistical methods. The aim of the book is to show how to use S-Plus as a powerful and graphical system. Readers are assumed to have a basic grounding in statistics, and so the book is intended for would-be users of S-Plus, and both students and researchers using statistics. Throughout, the emphasis is on presenting practical problems and full analyses of real data sets.

Table of Contents

Frontmatter
Chapter 1. Introduction
Abstract
Statistics is fundamentally concerned with the understanding of structures in data. One of the effects of the information-technology era has been to make it much easier to collect extensive datasets with minimal human intervention. Fortunately the same technological advances allow the users of statistics access to much more powerful ‘calculators’ to manipulate and display data. This book is about the modern developments in applied statistics which have been made possible by the widespread availability of workstations with high-resolution graphics and computational power equal to a mainframe of a few years ago. Workstations need software, and the S system developed at AT&T’s Bell Laboratories provides a very flexible and powerful environment in which to implement new statistical ideas. Thus this book provides both an introduction to the use of S and a course in modern statistical methods.
W. N. Venables, B. D. Ripley
Chapter 2. The S Language
Abstract
S is a language for the manipulation of objects. It aims to be both an interactive language (like, for example, a Unix shell language) as well as a complete programming language with some convenient object-oriented features. In this chapter we shall be concerned with the interactive language, and hence certain language constructs used mainly in programming will be postponed to Chapter 4.
W. N. Venables, B. D. Ripley
Chapter 3. Graphical Output
Abstract
S-PLUS provides comprehensive graphics facilities, from simple facilities for producing common diagnostic plots by plot (object) to fine control over publication-quality graphs. In consequence, the number of graphics parameters is huge. In this chapter, we build up the complexity gradually. Most readers will not need to go beyond the first 3 sections, and indeed the material later in this chapter is not used elsewhere in this book. However, we have needed to make use of it, especially in matching existing graphical styles.
W. N. Venables, B. D. Ripley
Chapter 4. Programming in S
Abstract
The S language is both an interactive language and a language for adding new functions to the S system. It is a complete programming language with control structures, recursion and a useful variety of data types. The S environment provides many functions to handle standard operations, but most users need occasionally to write new functions. This chapter is concerned with designing, writing, testing and correcting your own S functions.
W. N. Venables, B. D. Ripley
Chapter 5. Distributions and Data Summaries
Abstract
In this chapter we cover a number of topics from classical univariate statistics. Many of the functions used are S-PLUS extensions to S.
W. N. Venables, B. D. Ripley
Chapter 6. Linear Statistical Models
Abstract
Linear models form the core of classical statistics, and S provides extensive facilities to fit and manipulate them. These work with a version of the Wilkinson-Rogers syntax (Wilkinson & Rogers, 1973) for specifying models which we discuss in the Section 6.2, and which is also used for generalized linear models, models for survival analysis and tree-based models in later chapters. The main function for fitting linear models is lm, which provides our first example of a style of S functions we shall see repeatedly in later chapters, producing a fitted model object which is then analysed by generic functions.
W. N. Venables, B. D. Ripley
Chapter 7. Generalized Linear Models
Abstract
Generalized linear models (GLMs) extend linear models to accommodate both non-normal response distributions and transformations to linearity. (We will assume that Chapter 6 has been read before this chapter.) The essay by Firth (1991) gives a good introduction to GLMs; the comprehensive reference is McCullagh & Neider (1989).
W. N. Venables, B. D. Ripley
Chapter 8. Robust Statistics
Abstract
Outliers are sample values which cause surprise in relation to the majority of the sample. This is not a pejorative term; outliers may be correct, but they should always be checked for transcription errors. They can play havoc with standard statistical methods, and many robust and resistant methods have been developed since 1960 to be less sensitive to outliers.
W. N. Venables, B. D. Ripley
Chapter 9. Non-linear Regression Models
Abstract
In linear regression the mean surface in sample space is a plane. In non-linear regression the mean surface may be an arbitrary curved surface but in other respects the models are similar. In practice the mean surface in most non-linear regression models will be approximately planar in the region(s) of high likelihood allowing good approximations based on linear regression techniques to be used. Non-linear regression models can still present tricky computational and inferential problems. (Indeed, the examples here exceeded the capacity of S-PLUS for Windows 3.1.)
W. N. Venables, B. D. Ripley
Chapter 10. Modern Regression
Abstract
S-PLUS has a ‘Modern Regression Module’ which contains functions for a number of regression methods. These are not necessarily non-linear in the sense of Chapter 9, which refers to a non-linear parametrization, but they do allow nonlinear functions of the independent variables to be chosen by the procedures. The methods are all fairly computer-intensive, and so are only feasible in the era of plentiful computing power (and hence are ‘modern’). Some of these methods are part of the S modelling language, and others have been added by S-PLUS. As the latter predate the modelling language and have not been updated, the functions of this chapter do not have a consistent style and user interface.
W. N. Venables, B. D. Ripley
Chapter 11. Survival Analysis
Abstract
Survival analysis is not part of S, but has been added to S-PLUS based on functions written by Terry Therneau (Mayo Foundation) and available as survival2 code from statlib (see Appendix D for further information.) The functions in survival3 were released in mid-1992. They are not part of S-PLUS 3.2, but are scheduled to be included in late 1994. As these functions are much easier to use and provide a higher capability, this chapter is based on their use. (This does mean that the methods are probably not accessible to Windows users at present, as the library uses C code. Section 11.6 sketches how to use survival2, for those who have no other choice.)
W. N. Venables, B. D. Ripley
Chapter 12. Multivariate Analysis
Abstract
Multivariate analysis is concerned with datasets which have more than one response variable for each observational or experimental unit. The datasets can be summarized by data matrices X with n rows and p columns, the rows representing the observations or cases, and the columns the variables. The matrix can be viewed either way, depending whether the main interest is in the relationships between the cases or between the variables. Note that for consistency we represent the variables of a case by the row vector x.
W. N. Venables, B. D. Ripley
Chapter 13. Tree-based Methods
Abstract
The use of tree-based models will be relatively unfamiliar to statisticians, although researchers in other fields have found trees to be an attractive way to express knowledge and aid decision-making. Keys such as Figure 13.1 are common in botany and in medical decision-making, and provide a way to encapsulate and structure the knowledge of experts to be used by less-experienced users. Notice how this tree uses both categorical variables and splits on continuous variables.
W. N. Venables, B. D. Ripley
Chapter 14. Time Series
Abstract
There are now a large number of books on time series. Our philosophy and notation are close to those of the applied book by Diggle (1990) (from which some of our examples are taken). Brockwell and Davis (1991) and Priestley (1981) provide more theoretical treatments, and Bloomfield (1976) and Priestley are particularly thorough on spectral analysis.
W. N. Venables, B. D. Ripley
Chapter 15. Spatial Statistics
Abstract
Spatial statistics is a recent and graphical subject which is ideally suited to implementation in S; S itself includes one spatial interpolation method, akima, and loess which can be used for two-dimensional smoothing, but the specialist methods of spatial statistics have been added and are given in our library spatial. The main references for spatial statistics are Ripley (1981, 1988), Diggle (1983), Upton & Fingleton (1985) and Cressie (1991). Not surprisingly, our notation is closest to Ripley (1981).
W. N. Venables, B. D. Ripley
Backmatter
Metadata
Title
Modern Applied Statistics with S-Plus
Authors
W. N. Venables
B. D. Ripley
Copyright Year
1994
Publisher
Springer New York
Electronic ISBN
978-1-4899-2819-1
Print ISBN
978-1-4899-2821-4
DOI
https://doi.org/10.1007/978-1-4899-2819-1