Skip to main content

About this book

S-PLUS is a powerful tool for interactive data analysis, the creation of graphs, and the implementation of customized routines. Originating as the S language of AT&T Bell Laboratories, its modern language and flexibility make it appealing to data analysts from many scientific fields.
This book explains the basics of S-PLUS in a clear style at a level suitable for people with little computing or statistical knowledge. Unlike the S-PLUS manuals, it is not comprehensive, but instead introduces the most important ideas of S-PLUS through the use of many examples. Each chapter also includes a collection of exercises that are accompanied by fully worked-out solutions and detailed comments. The volume is rounded off with practical hints on how efficient work can be performed in S-PLUS. The book is well-suited for self-study and as a textbook.
For the second edition, the text has been updated to incorporate the completely revised S Language and its implementation in S-PLUS. New chapters have been added to explain how to work with the graphical user interface of the Windows(R) version, how to explore relationships in data using the powerful Trellis graphics system, and how to understand and use object-oriented programming. In addition, the programming chapter has been extended to cover some of the more technical but important aspects of S-PLUS.

Table of Contents


1. Introduction

Over the years, the S language and S-Plus have undergone many changes. Since its development in the mid-seventies, the three main authors of S, Rick Becker, John Chambers, and Allan Wilks, have enhanced the entire language considerably. All their work was done at Bell Labs with the original goal of defining a language to make it easier to do repetitive tasks in data analysis, like calculating a linear model.

2. Windows User Interface

It has been fashionable lately for statistical software packages to become interactive and/or “point-and-click.” S-Plus has always been interactive but has only recently added a Windows-based point-and-click interface. The Windows user now has the best of both worlds available, the ease of the graphical user interfact (GUI) combined with the detailed commands and control offered by the command line environment.

3. A First Session

The remainder of the book is devoted to the use of S-Plus via the command line. We will look at how to use S-Plus as a calculator for simple and complex calculations. We will look at different data structures and how to handle data flexibly. We will generate graphs displaying data and mathematical functions, explore data sets in detail, and use statistical models. Finally, we will import data in several ways, automatize tasks by writing functions, and investigate many details that can help a lot in practice.

4. A Second Session

Some of the most basic functions available in S-Plus were presented in the previous chapter. It is time to move on to more advanced data structures that will allow easier completion of necessary tasks. We begin with matrices and then branch out into more specialized structures like data frames and lists. Subsetting by index value and missing values are introduced and we close with a few new applications and a review of the material covered in the chapter.

5. Graphics

Graphs are one of S-Plus’s strongest capabilities and most attractive features. This section gives an overview of how to create graphs and how to use the different parameters to modify elements and layouts of graphs.

6. Trellis Graphics

Cleveland (1993) introduced the idea of Trellis displays as a graphical way of examining high-dimensional structure in data by means of conditional one-, two-, and three-dimensional graphs. The problem addressed is how observations of one or more variables depend on the observation of other variables. To answer this question, the data are subset (split up into groups) according to the observations of the conditioning variables — the ones that are supposed to have an influence on the others.

7. Exploring Data

In the preceding chapters, we have laid the foundation for understanding the concepts and ideas of the S-Plus system. We explored basic ideas and how to use S-Plus for performing calculations, and we have seen how data can be generated, stored, and accessed. Furthermore, we also looked at how data can be displayed graphically. All this will be useful as we explore real data sets in this chapter. We will explore data sets that come with S-Plus, specifically the Barley and Geyser data sets.

8. Statistical Modeling

We have now learned some elementary statistical techniques in S-Plus and the basics of graphical data analysis. The next step is to see what S-Plus has to offer in terms of modeling. Statistical modeling is one of the strongest S-Plus features because of its unified approach, wide variety of model types, and excellent diagnostic capabilities. We start with an example of how to fit a simple linear regression model and corresponding diagnostics. The example is presented with a minimum of technical explanation, designed as a quick introduction. We then formally explain the unified approach to model syntax and structure, along with comments on several of the more popular types of statistical models.

9. Programming

Now that you have learned the elementary commands in S-Plus and many ways of applying them, it is time to discover its advanced programming functionalities. This chapter introduces list structures, loops, and writing functions, covers debugging matters, and gives some more details on how to use S-Plus as a programming language.

10. Object-Oriented Programming

We will now cover the S Language in some more depth. It is described in full detail in the standard reference (Chambers, 1998).

11. Input and Output

It is every user’s intention to analyze his or her own data. To do this, the data have to be read into the system before it can be analyzed. This chapter discusses in detail the different ways of reading and writing data. Writing S-Plus output and transferring data files are other important input/output functions covered.

12. S-Plus Internals

This chapter will go into detail about how S-Plus works and how you can take advantage of it. We will look at how S-Plus starts up, how it finds and stores data, how the environment can be customized, and how data are passed between functions.

13. Tips and Tricks

This chapter deals with tips and tricks to develop code and use S-Plus more efficiently. We look at how a development environment is used and work is structured, We will investigate Greek letters in graphs, batch jobs, and integration of C and Fortran routines, and, finally, we take a look at how to use libraries and how to include S-Plus graphs in text processors.

14. Information Sources on and Around S-Plus

As this book approaches its end, it is time to look at further sources of information. There are several very useful sources of information, and we are now going to look at the most popular ones: the S-News E-mail discussion list, the StatLib electronic archive, the “R” program and its related sources, and some books serving as good further reading.

15. Bibliography


Additional information