nach oben

2006 | Buch

Kapitel lesen Erstes Kapitel lesen

Graphics of Large Datasets

Visualizing a Million

verfasst von: Antony Unwin, Martin Theus, Heike Hofmann

Verlag: Springer New York

Buchreihe : Statistics and Computing

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

Graphics are great for exploring data, but how can they be used for looking at the large datasets that are commonplace to-day? This book shows how to look at ways of visualizing large datasets, whether large in numbers of cases or large in numbers of variables or large in both. Data visualization is useful for data cleaning, exploring data, identifying trends and clusters, spotting local patterns, evaluating modeling output, and presenting results. It is essential for exploratory data analysis and data mining. Data analysts, statisticians, computer scientists-indeed anyone who has to explore a large dataset of their own-should benefit from reading this book.

New approaches to graphics are needed to visualize the information in large datasets and most of the innovations described in this book are developments of standard graphics. There are considerable advantages in extending displays which are well-known and well-tried, both in understanding how best to make use of them in your work and in presenting results to others. It should also make the book readily accessible for readers who already have a little experience of drawing statistical graphics. All ideas are illustrated with displays from analyses of real datasets and the authors emphasize the importance of interpreting displays effectively. Graphics should be drawn to convey information and the book includes many insightful examples.

From the reviews:

"Anyone interested in modern techniques for visualizing data will be well rewarded by reading this book. There is a wealth of important plotting types and techniques." Paul Murrell for the Journal of Statistical Software, December 2006

"This fascinating book looks at the question of visualizing large datasets from many different perspectives. Different authors are responsible for different chapters and this approach works well in giving the reader alternative viewpoints of the same problem. Interestingly the authors have cleverly chosen a definition of 'large dataset'. Essentially they focus on datasets with the order of a million cases. As the authors point out there are now many examples of much larger datasets but by limiting to ones that can be loaded in their entirety in standard statistical software they end up with a book that has great utility to the practitioner rather than just the theorist. Another very attractive feature of the book is the many colour plates, showing clearly what can now routinely be seen on the computer screen. The interactive nature of data analysis with large datasets is hard to reproduce in a book but the authors make an excellent attempt to do just this." P. Marriott for the Short Book Reviews of the ISI

Inhaltsverzeichnis

Frontmatter

Introduction

1. Introduction

Antony Unwin

Basics

Frontmatter

2. Statistical Graphics

Martin Theus

3. Scaling Up Graphics

3.7 Summary

The design and implementation of statistical graphics should pay attention to the challenges from big datasets. For many users, this has not been an issue up till now and so some statistical and graphics packages can have problems with graphics of more than 10,000 cases.

However, most of the plots used in statistical graphics can be scaled up to be usable with large datasets. Areal plots for categorical data are quite robust against large data glyph-based plots do have more serious problems. Modifications like α-blending or binning, interactions like (logical) zooming and panning, or interactive reordering and grouping are of great assistance when dealing with large datasets.

In general, all statistical graphics that summarize the data, and plot some version of these summaries, will scale up to large datasets. Barcharts, for instance, plot the breakdown of a categorical variable, which is a sufficient summary to fully describe the data. Binned scatterplots show an approximation of the underlying scatterplot and have a complexity that depends on the (constant) size of the binning grid rather than on the size of the dataset.

Martin Theus

4. Interacting with Graphics

Antony Unwin

Applications

Frontmatter

5. Multivariate Categorical Data — Mosaic Plots

Heike Hofmann

6. Rotating Plots

Dianne Cook, Leslie Miller

7. Multivariate Continuous Data — Parallel Coordinates

7.7 Summary

This chapter has introduced a smooth modified version of the parallel coordinate plot. The modifications are based on a parameter transformation process and its geometric structure. The mathematics behind the new plot has been explained with views that show how patterns may be detected in a dataset. The smooth curves have several significant features, including a norm-reducing property and orthogonal crossings of the axes.

Although not explicitly mentioned, the analysis of the datasets in the two examples needed a lot of interactions with the software. Actions like reordering and rescaling of axes (cf. Section 4.4.3) or the application of density estimation procedures are necessary steps towards a meaningful and presentable visualization.

Rida Moustafa, Ed Wegman

Backmatter

Titel: Graphics of Large Datasets
verfasst von: Antony Unwin
Martin Theus
Heike Hofmann
Verlag: Springer New York
Electronic ISBN: 978-0-387-37977-7
Print ISBN: 978-0-387-32906-2
DOI: https://doi.org/10.1007/0-387-37977-0

Springer Professional

Graphics of Large Datasets

Visualizing a Million

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Introduction

1. Introduction

Basics

Frontmatter

2. Statistical Graphics

3. Scaling Up Graphics

4. Interacting with Graphics

Applications

Frontmatter

5. Multivariate Categorical Data — Mosaic Plots

6. Rotating Plots

7. Multivariate Continuous Data — Parallel Coordinates

8. Networks

9. Trees

10. Transactions

11. Graphics of a Large Dataset

Backmatter