Skip to main content
main-content
Top

About this book

S-PLUS is a powerful environment for the statistical and graphical analysis of data. It provides the tools to implement many statistical ideas that have been made possible by the widespread availability of workstations having good graphics and computational capabilities. This book is a guide to using S-PLUS to perform statistical analyses and provides both an introduction to the use of S-PLUS and a course in modern statistical methods. S-PLUS is available commercially for both Windows and UNIX workstations, and both versions are covered in depth. The aim of the book is to show how to use S-PLUS as a powerful and graphical data analysis system. Readers are assumed to have a basic grounding in statistics, and so the book is intended for would-be users of S-PLUS, and both students and researchers using statistics. Throughout, the emphasis is on presenting practical problems and full analyses of real data sets. Many of the methods discussed are state-of-the-art approaches to topics such as linear, non-linear, and smooth regression models, tree-based methods, multivariate analysis and pattern recognition, survival analysis, time series and spatial statistics. Throughout modern techniques such as robust methods, non-parametric smoothing and bootstrapping are used where appropriate. This third edition is intended for users of S-PLUS 4.5, 5.0 or later, although S-PLUS 3.3/4 are also considered. The major change from the second edition is coverage of the current versions of S-PLUS. The material has been extensively rewritten using new examples and the latest computationally-intensive methods. Volume 2: S programming, which is in preparation, will provide an in-depth guide for those writing software in the S language.

Table of Contents

Frontmatter

Chapter 1. Introduction

Abstract
Statistics is fundamentally concerned with the understanding of structure in data. One of the effects of the information-technology era has been to make it much easier to collect extensive datasets with minimal human intervention. Fortunately the same technological advances allow the users of statistics access to much more powerful ‘calculators’ to manipulate and display data. This book is about the modern developments in applied statistics which have been made possible by the widespread availability of workstations with high-resolution graphics and computational power equal to a mainframe of a few years ago. Workstations need software, and the S1 system developed at AT&T’s Bell Laboratories and now at Lucent Technologies provides a very flexible and powerful environment in which to implement new statistical ideas. S is exclusively licensed to the Data Analysis Products Division of MathSoft Inc. who distribute an enhanced system called S-PLUS; we refer to the language as S and the environment as S-PLUS.
W. N. Venables, B. D. Ripley

Chapter 2. The S Language

Abstract
S is a language for the manipulation of objects. It aims to be both an interactive language (like, for example, a UNIX shell language) as well as a complete programming language with some convenient object-oriented features. In this chapter we are concerned with the interactive language, and hence certain language constructs used mainly in programming are postponed to Chapter 4.
W. N. Venables, B. D. Ripley

Chapter 3. Graphics

Abstract
S-PLUS provides comprehensive graphics facilities, from simple facilities for producing common diagnostic plots by plot (object) to fine control over publication-quality graphs. In consequence, the number of graphics parameters is huge. In this chapter, we build up the complexity gradually. Most readers will not need the material in Section 3.4, and indeed the material there is not used elsewhere in this book. However, we have needed to make use of it, especially in matching existing graphical styles.
W. N. Venables, B. D. Ripley

Chapter 4. Programming in S

Abstract
The S language is both an interactive language and a language for adding new functions to the S-PLUS system. It is a complete programming language with control structures, recursion and a useful variety of data types. The S-PLUS environment provides many functions to handle standard operations, but most users need occasionally to write new functions, a topic we discuss in detail in the companion volume. In this chapter we discuss language ideas that will be seen in system functions.
W. N. Venables, B. D. Ripley

Chapter 5. Univariate Statistics

Abstract
In this chapter we cover a number of topics from classical univariate statistics plus some modern versions.
W. N. Venables, B. D. Ripley

Chapter 6. Linear Statistical Models

Abstract
Linear models form the core of classical statistics and are still the basis of much of statistical practice; many modern modelling and analytical techniques build on the methodology developed for linear models.
W. N. Venables, B. D. Ripley

Chapter 7. Generalized Linear Models

Abstract
Generalized linear models (GLMs) extend linear models to accommodate both non-normal response distributions and transformations to linearity. (We assume that Chapter 6 has been read before this chapter.) The essay by Firth (1991) gives a good introduction to GLMs; the comprehensive reference is McCullagh & Nelder (1989).
W. N. Venables, B. D. Ripley

Chapter 8. Non-linear Models

Abstract
In linear regression the mean surface is a plane in sample space; in non-linear regression it may be an arbitrary curved surface but in all other respects the models are the same. Fortunately the mean surface in most non-linear regression models met in practice will be approximately planar in the region of highest likelihood, allowing some good approximations based on linear regression to be used, but non-linear regression models can still present tricky computational and inferential problems.
W. N. Venables, B. D. Ripley

Chapter 9. Smooth Regression

Abstract
S-PLUS has a ‘Modern Regression Module’ which contains functions for a number of regression methods. These are not necessarily non-linear in the sense of Chapter 8, which refers to a non-linear parametrization, but they do allow nonlinear functions of the independent variables to be chosen by the procedures. The methods are all fairly computer-intensive, and so are only feasible in the era of plentiful computing power (and hence are ‘modern’). There are few texts covering this material. Although they do not cover all our topics in equal detail, for what they do cover Hastie & Tibshirani (1990), Simonoff (1996) and Bowman & Azzalini (1997) are good references.
W. N. Venables, B. D. Ripley

Chapter 10. Tree-based Methods

Abstract
The use of tree-based models will be relatively unfamiliar to statisticians, although researchers in other fields have found trees to be an attractive way to express knowledge and aid decision-making. Keys such as Figure 10.1 are common in botany and in medical decision-making, and provide a way to encapsulate and structure the knowledge of experts to be used by less-experienced users. Notice how this tree uses both categorical variables and splits on continuous variables. (It is a tree, and readers are encouraged to draw it.)
W. N. Venables, B. D. Ripley

Chapter 11. Multivariate Analysis and Pattern Recognition

Abstract
Multivariate analysis is concerned with datasets that have more than one response variable for each observational or experimental unit. The datasets can be summarized by data matrices X with n rows and p columns, the rows representing the observations or cases, and the columns the variables. The matrix can be viewed either way, depending on whether the main interest is in the relationships between the cases or between the variables. Note that for consistency we represent the variables of a case by the row vector x.
W. N. Venables, B. D. Ripley

Chapter 12. Survival Analysis

Abstract
S-PLUS contains extensive survival analysis facilities written by Terry Therneau (Mayo Foundation). There are several versions of his code in different versions of S-PLUS; the examples here were computed with survival5 which is available from http://​www.​mayo.​edu/​hsr/​biostat.​html and is similar to the code included in S-PLUS 2000.
W. N. Venables, B. D. Ripley

Chapter 13. Time Series Analysis

Abstract
There are now a large number of books on time series. Our philosophy and notation are close to those of the applied book by Diggle (1990) (from which some of our examples are taken). Brockwell & Davis (1991) and Priestley (1981) provide more theoretical treatments, and Bloomfield (1976) and Priestley are particularly thorough on spectral analysis. Brockwell & Davis (1996) is an excellent low-level introduction to the theory.
W. N. Venables, B. D. Ripley

Chapter 14. Spatial Statistics

Abstract
Spatial statistics is a recent and graphical subject that is ideally suited to implementation in S; S itself includes one spatial interpolation method, akima, and 1oess which can be used for two-dimensional smoothing, but the specialist methods of spatial statistics have been added and are given in our library spatial. The main references for spatial statistics are Ripley (1981, 1988), Diggle (1983), Upton & Fingleton (1985) and Cressie (1991). Not surprisingly, our notation is closest to that of Ripley (1981).
W. N. Venables, B. D. Ripley

Backmatter

Additional information