Skip to main content
Top

1986 | Book

Graphical Exploratory Data Analysis

Authors: S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf

Publisher: Springer New York

Book Series : Springer Texts in Statistics

insite
SEARCH

About this book

Portraying data graphically certainly contributes toward a clearer and more penetrative understanding of data and also makes sophisticated statistical data analyses more marketable. This realization has emerged from many years of experience in teaching students, in research, and especially from engaging in statistical consulting work in a variety of subject fields. Consequently, we were somewhat surprised to discover that a comprehen­ sive, yet simple presentation of graphical exploratory techniques for the data analyst was not available. Generally books on the subject were either too incomplete, stopping at a histogram or pie chart, or were too technical and specialized and not linked to readily available computer programs. Many of these graphical techniques have furthermore only recently appeared in statis­ tical journals and are thus not easily accessible to the statistically unsophis­ ticated data analyst. This book, therefore, attempts to give a sound overview of most of the well-known and widely used methods of analyzing and portraying data graph­ ically. Throughout the book the emphasis is on exploratory techniques. Real­ izing the futility of presenting these methods without the necessary computer programs to actually perform them, we endeavored to provide working com­ puter programs in almost every case. Graphic representations are illustrated throughout by making use of real-life data. Two such data sets are frequently used throughout the text. In realizing the aims set out above we avoided intricate theoretical derivations and explanations but we nevertheless are convinced that this book will be of inestimable value even to a trained statistician.

Table of Contents

Frontmatter
Chapter 1. The Role of Graphics in Data Exploration
Abstract
One of the most difficult tasks of a researcher is to convey findings based on statistical analyses to interested persons. Failure to communicate these findings successfully puts paid to all his data-analytical work, irrespective of its quality.
S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf
Chapter 2. Graphics for Univariate and Bivariate Data
Abstract
This chapter deals with a number of diverse representations which are all aimed at providing meaningful, simple and interesting presentations of univariate and bivariate data. All these representations may be regarded as types of exploratory techniques that are used without any preconceived ideas of possible further analyses.
S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf
Chapter 3. Graphics for Selecting a Probability Model
Abstract
Statistical inferential techniques can be divided into two groups, namely parametric and non-parametric (distribution-free) techniques. The difference between the two types of procedures concerns the fact that in dealing with non-parametric techniques few assumptions are made regarding the nature of the distribution of the population from which the sample is drawn. Parametric procedures, on the other hand, are based on specific knowledge of the underlying probability distribution from which the sample derives. The level of significance in parametric testing and the confidence coefficient in parametric interval estimation are therefore valid only if the assumptions made in respect of the underlying probability distribution are actually correct.
S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf
Chapter 4. Visual Representation of Multivariate Data
Abstract
During the past two or three decades a considerable number of highly expressive visual techniques have been developed for representing multivariate data in two dimensions. Some of these techniques, such as Andrews’ curves and Chernoff faces, have captured the imagination of the research community (particularly the non-statisticians) and are currently enjoying widespread interest. Some ten representations will be discussed in this chapter and illustrated on the basis of simple examples.
S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf
Chapter 5. Cluster Analysis
Abstract
As in the previous chapter this chapter mainly deals with multivariate data. A multivariate data set can conveniently be expressed in the form of a matrix as follows:
$$X = \left[ {\begin{array}{*{20}{c}} {{X_{11}}} \\ {{X_{21}}} \\ \vdots \\ {{X_{n1}}} \end{array}\,\,\,\,\begin{array}{*{20}{c}} {{X_{12}}} \\ {{X_{22}}} \\ \vdots \\ {{X_{n2}}} \end{array}\,\,\,\begin{array}{*{20}{c}} \ldots \\ \ldots \\ \, \\ \ldots \end{array}\,\,\,\begin{array}{*{20}{c}} {{X_{1p}}} \\ {{X_{2p}}} \\ \vdots \\ {{X_{np}}} \end{array}} \right]\, = \,\left[ {\begin{array}{*{20}{c}} {{{X'}_1}} \\ {{{X'}_2}} \\ \vdots \\ {{{X'}_n}} \end{array}} \right]\,.$$
The rows of X therefore, represent the n different cases while the columns represent the p variables.
S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf
Chapter 6. Multidimensional Scaling
Abstract
A data matrix usually contains too much information to absorb at once. The differences between the various rows and columns as well as the interactions between them, are difficult to determine merely by looking at the matrix. However, if the above information can be simplified to representations in one, two or three dimensions at the most, the human eye is usually capable of observing any differences and interactions between rows and columns with the aid of geometrical distance comparisons. This simplifying process is commonly known as multidimensional scaling.
S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf
Chapter 7. Graphical Representations in Regression Analysis
Abstract
In a regression analysis the following graphical techniques are particularly important:
(i)
the scatterplot
 
(ii)
plots of different types of residuals
 
(iii)
the representation of Mallows’ C k -statistic
 
(iv)
the construction of confidence and forecast bands
 
(v)
the ridge trace.
 
S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf
Chapter 8. CHAID and XAID: Exploratory Techniques for Analyzing Extensive Data Sets
Abstract
In many situations techniques are called for which will enable the researcher to identify particular patterns in the data and which can be used for formulating any structural relationships between the variables. The computer programs CHAID and XAID, which are to be discussed in this chapter, are both examples of so-called AID procedures (Automatic Interaction Detection), according to which the outcome of a dependent variable Y can be predicted on the basis of those predictors (independent variables) which contribute the most to the variation in Y.
S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf
Chapter 9. Control Charts
Abstract
An important aim of modern manufacturing processes is to produce articles that differ as little as possible with regard to certain important characteristics. It is interesting however, that despite the greatest care and precision, items produced still tend to vary owing to the combined effect of a variety of factors inherent in a manufacturing process.
S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf
Chapter 10. Time Series Representations
Abstract
Time series data are usually collected on a monthly, quarterly or annual basis. Such time series provide important economic and demographic information and are published by various institutions on a regular basis.
S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf
Chapter 11. Further Useful Graphics
Abstract
It is often important to compare two univariate samples in respect of locality and variation. The P-P (Probability-Probability) plot is a simple and informative method for drawing such a comparison. By using a univariate or bivariate scaling of multivariate data however, a P-P plot can also be used for comparing two multivariate samples with each other. The univariate and multivariate cases will now be discussed.
S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf
Backmatter
Metadata
Title
Graphical Exploratory Data Analysis
Authors
S. H. C. du Toit
A. G. W. Steyn
R. H. Stumpf
Copyright Year
1986
Publisher
Springer New York
Electronic ISBN
978-1-4612-4950-4
Print ISBN
978-1-4612-9371-2
DOI
https://doi.org/10.1007/978-1-4612-4950-4