nach oben

1997 | Buch

Kapitel lesen Erstes Kapitel lesen

Data Structures for Computational Statistics

verfasst von: Dr. Sigbert Klinke

Verlag: Physica-Verlag HD

Buchreihe : Contributions to Statistics

Enthalten in: Professional Book Archive

Einloggen, um Zugang zu erhalten

Über dieses Buch

Since the beginning of the seventies computer hardware is available to use programmable computers for various tasks. During the nineties the hardware has developed from the big main frames to personal workstations. Nowadays it is not only the hardware which is much more powerful, but workstations can do much more work than a main frame, compared to the seventies. In parallel we find a specialization in the software. Languages like COBOL for business orientated programming or Fortran for scientific computing only marked the beginning. The introduction of personal computers in the eighties gave new impulses for even further development, already at the beginning of the seven ties some special languages like SAS or SPSS were available for statisticians. Now that personal computers have become very popular the number of pro grams start to explode. Today we will find a wide variety of programs for almost any statistical purpose (Koch & Haag 1995).

Inhaltsverzeichnis

Frontmatter

1. Introduction

Summary

This chapter first explains what data structures are and why they are important for statistical software. Then we take a look at why we need interactive environments for our work and what the appropriate tools should be. We do not discuss the requirements for the graphical user interface (GUI) in detail. The last section will present the actual state of soft- and hardware and which future developments we expect.

Sigbert Klinke

2. Exploratory Statistical Techniques

Summary

The first step is to analyze the statistical graphics in use, so we examine the descriptive statistics. Next is the boxplot, the Q-Q plot, the histogram, the regressogram, the barchart, the dotchart, the piechart, the 2D-scatterplot, the 2D-contour plot, the scatterplot matrix, the 3D-scatterplot, the 3D-contour plot, the Chernoff faces and the parallel coordinate plot. We use some of the tools of XploRe to examine the Berlin housing dataset which use the plots mentioned above. Finally we state that two kinds of windows are necessary, one which can draw points, lines and areas, another which can draw glyphs windows, i.e. the Chernoff faces, star diagrams and so on.

Sigbert Klinke

3. Some Statistical Applications

Summary

Here some applications are discussed (cluster analysis, teachware, regression methods). The cluster analysis will serve as an example for the use of graphics. Teachware needs to be highly interactive and we shortly discuss the approach of Proenca (1995). The section about regression methods shows how detailed a programming language should be. The trade-off between speed and understanding in a statistical routine still plays an important role.

Sigbert Klinke

4. Exploratory Projection Pursuit

Summary

In this chapter we will discuss in detail one statistical technique. We will cover possible extensions (multivariate projections, inclusion of discrete variables) and show which graphics are used for representing results. The danger in EPP is that we interpret a random structure as a real structure in the data. We describe tests for detecting a structure which is not a multivariate gaussian distribution. Pictures are presented from tools (macros in XploRe and XGobi) which are used to execute exploratory projection pursuit. At the end we will specify the requirements necessary for a tool to do EPP, but neiter my implementation nor XGobi can satisfy all of them.

Sigbert Klinke

5. Data Structures

Summary

In the first section we will show that graphical objects can be generated in three steps. Then we will develop a hierarchy for the graphical data structures (datapart, windows, displays). In the next section we will give reasons why matrices are no sufficient structure to store statistical data, so we need multidimensional arrays. Then we will discuss their impact on mathematical and statistical operations. The second section will close with a description why we need hierarchical objects. In the third section several forms of linking will be discussed. First we will give examples of linking plots in the thesis, then we will show further examples of linking, i.e. asking data themselves or linked data, the link of events with subroutines and at last the linking between different datasets. The fourth section will describe in short some statistical packages and indicate which features concerning data structures are available in these programs.

Sigbert Klinke

6. Implementation in XploRe

Summary

Here we will show the implementation of the data structures developed in the chapter before in XploRe. Nevertheless not everything which is explained before will be part of XploRe 3.2. First we will describe how graphical data structures are implemented in XploRe 3.2 and how they can be used interactively. Then we will show the data structures for the data being implemented in XploRe 3.2. The basic data structure is a matrix and no hierarchical lists are possible. Then we describe which possibilities of linking are offered in XploRe 3.2. Then we will describe some selected commands in Xplore 3.2 for producing interactivity, for reading and writing data, reading and storing macros, libraries and for binned kernel estimators. These commands will show solutions to some problems we mentioned before or which will be used to show extensions based on the extension from matrices to multivariate arrays. The third section will describe how some selected tools work (random number generator, PCA, grand tour, multidimensional scaling, clustering, multivariate kernel regression, PPR, wavelet regression, interactive contouring). It will show that we are able to implement a variety of (interactive) tools efficiently with the proposed data structures. The fourth section will describe the implementation of arrays in XploRe 4.0, and the fifth will show how the commands BINDATA and CONV are extended for the use with arrays.

Sigbert Klinke

7. Conclusion

Abstract

We have presented two versions of the software XploRe which implement our ideas about data structures in statistical software. The data structures for graphics and linking are implemented in XploRe 3.2 while the data structures for statistical data are implemented in XploRe 4.0. But we have not implemented all ideas.

Sigbert Klinke

Backmatter

Titel: Data Structures for Computational Statistics
verfasst von: Dr. Sigbert Klinke
Verlag: Physica-Verlag HD
Electronic ISBN: 978-3-642-59242-3
Print ISBN: 978-3-7908-0982-4
DOI: https://doi.org/10.1007/978-3-642-59242-3

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

1. Introduction

2. Exploratory Statistical Techniques

3. Some Statistical Applications

4. Exploratory Projection Pursuit

5. Data Structures

6. Implementation in XploRe

7. Conclusion

Backmatter