Skip to main content
main-content

Über dieses Buch

Portraying data graphically certainly contributes toward a clearer and more penetrative understanding of data and also makes sophisticated statistical data analyses more marketable. This realization has emerged from many years of experience in teaching students, in research, and especially from engaging in statistical consulting work in a variety of subject fields. Consequently, we were somewhat surprised to discover that a comprehen­ sive, yet simple presentation of graphical exploratory techniques for the data analyst was not available. Generally books on the subject were either too incomplete, stopping at a histogram or pie chart, or were too technical and specialized and not linked to readily available computer programs. Many of these graphical techniques have furthermore only recently appeared in statis­ tical journals and are thus not easily accessible to the statistically unsophis­ ticated data analyst. This book, therefore, attempts to give a sound overview of most of the well-known and widely used methods of analyzing and portraying data graph­ ically. Throughout the book the emphasis is on exploratory techniques. Real­ izing the futility of presenting these methods without the necessary computer programs to actually perform them, we endeavored to provide working com­ puter programs in almost every case. Graphic representations are illustrated throughout by making use of real-life data. Two such data sets are frequently used throughout the text. In realizing the aims set out above we avoided intricate theoretical derivations and explanations but we nevertheless are convinced that this book will be of inestimable value even to a trained statistician.

Inhaltsverzeichnis

Frontmatter

Chapter 1. The Role of Graphics in Data Exploration

Abstract
One of the most difficult tasks of a researcher is to convey findings based on statistical analyses to interested persons. Failure to communicate these findings successfully puts paid to all his data-analytical work, irrespective of its quality.
S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf

Chapter 2. Graphics for Univariate and Bivariate Data

Abstract
This chapter deals with a number of diverse representations which are all aimed at providing meaningful, simple and interesting presentations of univariate and bivariate data. All these representations may be regarded as types of exploratory techniques that are used without any preconceived ideas of possible further analyses.
S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf

Chapter 3. Graphics for Selecting a Probability Model

Abstract
Statistical inferential techniques can be divided into two groups, namely parametric and non-parametric (distribution-free) techniques. The difference between the two types of procedures concerns the fact that in dealing with non-parametric techniques few assumptions are made regarding the nature of the distribution of the population from which the sample is drawn. Parametric procedures, on the other hand, are based on specific knowledge of the underlying probability distribution from which the sample derives. The level of significance in parametric testing and the confidence coefficient in parametric interval estimation are therefore valid only if the assumptions made in respect of the underlying probability distribution are actually correct.
S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf

Chapter 4. Visual Representation of Multivariate Data

Abstract
During the past two or three decades a considerable number of highly expressive visual techniques have been developed for representing multivariate data in two dimensions. Some of these techniques, such as Andrews’ curves and Chernoff faces, have captured the imagination of the research community (particularly the non-statisticians) and are currently enjoying widespread interest. Some ten representations will be discussed in this chapter and illustrated on the basis of simple examples.
S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf

Chapter 5. Cluster Analysis

Abstract
As in the previous chapter this chapter mainly deals with multivariate data. A multivariate data set can conveniently be expressed in the form of a matrix as follows:
$$X = \left[ {\begin{array}{*{20}{c}} {{X_{11}}} \\ {{X_{21}}} \\ \vdots \\ {{X_{n1}}} \end{array}\,\,\,\,\begin{array}{*{20}{c}} {{X_{12}}} \\ {{X_{22}}} \\ \vdots \\ {{X_{n2}}} \end{array}\,\,\,\begin{array}{*{20}{c}} \ldots \\ \ldots \\ \, \\ \ldots \end{array}\,\,\,\begin{array}{*{20}{c}} {{X_{1p}}} \\ {{X_{2p}}} \\ \vdots \\ {{X_{np}}} \end{array}} \right]\, = \,\left[ {\begin{array}{*{20}{c}} {{{X'}_1}} \\ {{{X'}_2}} \\ \vdots \\ {{{X'}_n}} \end{array}} \right]\,.$$
The rows of X therefore, represent the n different cases while the columns represent the p variables.
S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf

Chapter 6. Multidimensional Scaling

Abstract
A data matrix usually contains too much information to absorb at once. The differences between the various rows and columns as well as the interactions between them, are difficult to determine merely by looking at the matrix. However, if the above information can be simplified to representations in one, two or three dimensions at the most, the human eye is usually capable of observing any differences and interactions between rows and columns with the aid of geometrical distance comparisons. This simplifying process is commonly known as multidimensional scaling.
S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf

Chapter 7. Graphical Representations in Regression Analysis

Abstract
In a regression analysis the following graphical techniques are particularly important:
(i)
the scatterplot
 
(ii)
plots of different types of residuals
 
(iii)
the representation of Mallows’ C k -statistic
 
(iv)
the construction of confidence and forecast bands
 
(v)
the ridge trace.
 
S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf

Chapter 8. CHAID and XAID: Exploratory Techniques for Analyzing Extensive Data Sets

Abstract
In many situations techniques are called for which will enable the researcher to identify particular patterns in the data and which can be used for formulating any structural relationships between the variables. The computer programs CHAID and XAID, which are to be discussed in this chapter, are both examples of so-called AID procedures (Automatic Interaction Detection), according to which the outcome of a dependent variable Y can be predicted on the basis of those predictors (independent variables) which contribute the most to the variation in Y.
S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf

Chapter 9. Control Charts

Abstract
An important aim of modern manufacturing processes is to produce articles that differ as little as possible with regard to certain important characteristics. It is interesting however, that despite the greatest care and precision, items produced still tend to vary owing to the combined effect of a variety of factors inherent in a manufacturing process.
S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf

Chapter 10. Time Series Representations

Abstract
Time series data are usually collected on a monthly, quarterly or annual basis. Such time series provide important economic and demographic information and are published by various institutions on a regular basis.
S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf

Chapter 11. Further Useful Graphics

Abstract
It is often important to compare two univariate samples in respect of locality and variation. The P-P (Probability-Probability) plot is a simple and informative method for drawing such a comparison. By using a univariate or bivariate scaling of multivariate data however, a P-P plot can also be used for comparing two multivariate samples with each other. The univariate and multivariate cases will now be discussed.
S. H. C. du Toit, A. G. W. Steyn, R. H. Stumpf

Backmatter

Weitere Informationen