Skip to main content

About this book

This book describes an interactive statistical computing environment called 1 XploRe. As the name suggests, support for exploratory statistical analysis is given by a variety of computational tools. XploRe is a matrix-oriented statistical language with a comprehensive set of basic statistical operations that provides highly interactive graphics, as well as a programming environ­ ment for user-written macros; it offers hard-wired smoothing procedures for effective high-dimensional data analysis. Its highly dynamic graphic capa­ bilities make it possible to construct student-level front ends for teaching basic elements of statistics. Hot keys make it an easy-to-use computing environment for statistical analysis. The primary objective of this book is to show how the XploRe system can be used as an effective computing environment for a large number of statistical tasks. The computing tasks we consider range from basic data matrix manipulations to interactive customizing of graphs and dynamic fit­ ting of high-dimensional statistical models. The XploRe language is similar to other statistical languages and offers an interactive help system that can be extended to user-written algorithms. The language is intuitive and read­ ers with access to other systems can, without major difficulty, reproduce the examples presented here and use them as a basis for further investigation.

Table of Contents


A Beginner’s Course


1. Un Amuse-Gueule

An amuse-gueule or a canapé is a small appetizer that comes before a menu with several courses. The amuse-gueule is light and should prepare your taste for the later meal. Here, the courses on the menu are different XploRe applications. Some of the applications require one to be a connoisseur with experienced taste in computer-aided statistical modeling. The amuse-gueule given here is therefore designed to develop a good taste.
Wolfgang Härdle

2. An XploRe Tutorial

XploRe is started by entering the command xplore (usually from the directory C:\XPLORE3). A graphic appears with a copyright screen. Hitting any key displays the standard environment of XploRe. the screen is divided into
  • a command line,
  • an icon list, and
  • an action window.
Wolfgang Härdle

3. The Integrated Working Environment

The aim of this chapter is to take a deeper look into the working environment. of XploRe. This working environment consists of several tools as, for example, an editor, a help system, a command line interpreter, and an interactive graphic. We will show how these tools work together.
Claudia Gajewski

XploRe in Use


4. Graphical Aids for Statistical Data Analysis

Graphical methods are best explained by working through examples. In this chapter we shall conduct a preliminary analysis of a data set on neonatal mortality. Data on the birth of 3331 children were registered from 1990 to 1992 at Clinique Saint Luc, Brussels, Belgium (Ritter and Bouckaert, 1993). Of these newborns, 56 died in utero, at birth, or within the first seven days after birth. Table 4.1 shows an excerpt of the data matrix.
Sigbert Klinke, Christian Ritter

5. Density and Regression Smoothing

A useful tool for examining the overall structure of data is kernel density estimation. It provides a graphical device for understanding the overall pattern of the data structure. This includes symmetry and the number and locations of modes and valleys. The basic idea is to redistribute the point mass at each datum point by a smoothed density centered at the datum point. An important question is how much the point mass should be smoothed out. This will be discussed in the next section. More detailed discussions on this subject can be found in Chapter 6.
Jianqing Fan, Marlene Müller

6. Bandwidth Selection in Density Estimation

The motivation for density estimation in statistics and data analysis is to realize where observations occur more frequently in a sample. The aim of density estimation is to approximate a “true” probability density function f(x) from a sample information {X i }n i=1 of independent and identically distributed observations. The estimated density is constructed by centering around each observation X i a kernel function K h (u) = K(u/h)/h with u = x - X i , and averaging the values of this function at any given x.
Marco Bianchi

7. Interactive Graphics for Teaching Simple Statistics

The progress of computer science, in association with the progress of computer hardware, is making computers a more accessible and important tool in assisting individuals in processing and computing information. This fact enhances the use of computers for educational and teaching purposes. That is, the current sophisticated potential of computers can be used to provide a means to acquire concepts and to develop reasoning and problem-solving skills. Computer-aided learning (CAL) and computer-aided instruction (CAI) have become established fields in computer science; they utilize the computer as a tool for learning and presenting instruction that is individualized, interactive, and guided.
Isabel Proença

8. XClust: Clustering in an Interactive Way

Cluster analysis attempts to detect structures in the data. Some of the most important and widely applicable clustering techniques are partitioning methods and hierarchical clustering algorithms. Well-known methods from both these families are available simply by commands in the interactive statistical computing environment XploRe. Moreover, new adaptive clustering methods (which are often much more stable against random selection or small random disturbance of the data, and which seem to be a little bit intelligent because of their ability for learning the appropriate distance measures) can be carried out by macros. Additionally, the importance of each variable involved in clustering can be evaluated by taking into account its adaptive weight. The adaptive techniques are based on adaptive distances which should also be used in order to obtain multivariate plots (Mucha, 1992).
Hans-Joachim Mucha

9. Exploratory Projection Pursuit

“Projection Pursuit” (PP) stands for a class of exploratory projection techniques. This class contains methods designed for analyzing high dimensional data using low-dimensional projections. The main idea is to describe “interesting” projections by maximizing an objective function or projection pursuit index.
Sigbert Klinke, Jörg Polzehl

10. Generalized Linear Models

In this chapter we shall discuss a class of statistical models that generalize the well-understood normal linear model. A normal or Gaussian model assumes that the response Y is equal to the sum of a linear combination X T β of the d-dimensional predictor X and a Gaussian distributed error term. It is well known that the least-squares estimator \(\hat \beta \) of β performs well under these assumptions. Moreover, extensive diagnostic tools have been developed for models of this type.
Joseph Hilbe, Berwin A. Turlach

11. Additive Modeling

In Chapter 10, on Generalized Linear Models, we saw how the standard linear model with normal assumptions can be generalized to incorporate a broader range of models. The essential idea was to connect the conditional mean of the response variable via a link function to a one-dimensional projection of the explanatory variables (a single index).
Thomas Kötter, Berwin A. Turlach

12. Comparing Parametric and Semiparametric Binary Response Models

Binary response models are frequently applied in economics and other social sciences. Whereas standard parametric models such as Probit and Logit models still dominate the applied literature, there have been important theoretical advances in semi- and nonparametric approaches to binary response analysis (see Horowitz, 1993a, for an excellent and up-to-date survey). From the perspective of the applied researcher, the development of new techniques that go beyond Logit and Probit are important for several reasons:
Economic theory usually does not provide clear guidelines on how a parametric model should be specified. Hence, the assumptions underlying Probit and Logit models are rarely justified on theoretical grounds. Rather, they are motivated by convenience and by reference to “standard practice.”
Misspecification of parametric models can cause parameter estimates and inferences based on these parameters to be inconsistent. Moreover, predictions made from misspecified parametric models can be inaccurate and misleading.
Isabel Proença, Axel Werwatz

13. Approximative Methods for Regression Models with Errors in the Covariates

We examine general regression models where some of the covariates are measured with error. The response is denoted by Y and we distinguish between two types of covariates. X are those covariates, that cannot be observed exactly. Instead of X, a surrogate variable W is observed. Z are further covariates measured without error. The example considered in this article is an occupational study on the relationship between dust concentration and chronic bronchitis. In the study, N = 499 workers of a cement plant in Heidelberg were observed from 1960 to 1977 (for details, see DFG-Forschungsbericht, 1981). The response Y is the appearance of chronic bronchitis, and the correctly measured covariates Z are smoking and duration of exposure. The effect of the dust concentration in the individual working area X is of primary interest in the study. This concentration was measured several times in a certain time period and averaged, leading to the surrogate W for the concentration. There arc two problems, which arc typical for this kind of measurement. The first problem concerns the correctness of the single values, that is., the measurement error due to the instrument.. The second problem here is the question, “what is the true covariate X?” Here, we have to use an operational definition like the “long-term average of the dust concentration.”
Raymond J. Carroll, Helmut Küchenhoff

14. Nonlinear Time Series Analysis

The recent development of nonlinear time series analysis is primarily due to the efforts to overcome the limitations of linear models such as the autoregressive moving-average models of Box and Jenkins (1976) in real applications. It is also attributed to the development of nonlinear/nonparamctric regression techniques which provides many useful tools. Advanced computational power and easy-to-use advanced software and graphics such as S-Plus and XploRe make all of these possible.
Rong Chen, Christian Hafner

15. Un Digestif

Un digestif après un bon repas! As a digestif after the XploRe chapters we offer the creation of a dynamic XploRe display. We shall mix six different, types of windows into one display and dynamically rotate the d3d pictures. We shall do this exercise with columns 4 and 5 of the bank dataset.
Wolfgang Härdle, Sigbert Klinke


Additional information