Skip to main content

2020 | OriginalPaper | Buchkapitel

4. Visualization

verfasst von : Alfonso Zamora Saiz, Carlos Quesada González, Lluís Hurtado Gil, Diego Mondéjar Ruiz

Erschienen in: An Introduction to Data Analysis in R

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Presenting conclusions with the help of a graph can greatly improve your communication and convincing skills. R is a proficient tool for data visualization and in this chapter we explore some of the most well known plotting packages. First, with the R base graphics one can elaborate most of the fundamental graph styles with great level of customization. This package is commonly used to produce explanatory graphs, being a valuable help to visualize the properties of a dataset. Second, the widely used ggplot2 package can be used to produce highly aesthetic graphs with ease. This exceptional tool processes input data into a final plot which displays new conclusions in an understandable fashion. Finally, and for an extra domain on data visualization, the packages plotly and leaflet, specialized in the construction of interactive plots and maps respectively, are introduced.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
A univariate dataset consists of one variable data, whereas multivariate allows for many variables. More on this will be seen in Chap. 5.
 
2
Throughout Chap. 4, whenever this happens, we omit repeated arguments and focus only on the particular ones. The reader should understand a similar usage for those arguments appearing in several plotting functions.
 
3
RGB stands for Red Green Blue and is a way of defining almost every color based on the proportion of each primary color.
 
4
For example, plotting a matrix with u and v as columns yields the right-hand picture in Fig. 4.1.
 
5
This can also be obtained with function pairs( ) from package graphics used on numerical matrices.
 
6
The dataset iris is contained in the package datasets included in the R core.
 
7
When the argument col is filled with the variable Species, which is a factor vector with three levels, the first three different colors in the R palette are assigned to corresponding observations from each level.
 
8
The USPersonalExpenditure dataset is contained in the datasets package.
 
9
Legend arguments are passed with args.legend and will be explored in detail in Sect. 4.1.3.
 
10
This will be explained in detail in Sect. 5.​1.​1.
 
11
The notches are depicted to a distance of ± 1.58 the interquartile range (a dispersion measure of the data explained in Sect. 5.​1.​2) divided by the square root of the sample size. This calculation, according to [3], gives a 95% confidence interval for the difference between the two medians being statistically significant.
 
12
A continuous variable X is a function taking values on the real numbers. See Sect. 5.​2.​1.
 
13
It is important to note that the number of breaks is only interpreted by R as a suggestion, so you might ask for breaks=5 and get a plot with 7 breaks, for example.
 
14
Recall that seq( start, end, by) creates a sequence vector with the starting and end points and the gap between entries.
 
16
The ggplot2 motto is Create Elegant Data Visualizations Using the Grammar of Graphics.
 
17
For more examples and full description of all functions, visit https://​ggplot2.​tidyverse.​org/​index.​html.
 
18
It allows more than the two required arguments, but their purpose can be achieved in a more natural way with other layers.
 
19
The main specific arguments are listed in the table and exploring them is left to the reader since, by now, it should be straight forward.
 
20
The name alpha is a standard way to refer to transparency, not only in programming but also in picture or video edition.
 
21
The line type, width, and other components that relate to the particular aspects of a line can be modified by using several secondary arguments that the reader can check in the documentation.
 
22
Except for the main title and axes labels.
 
23
The calculations for the slope and intercept will be studied in Sect. 5.​3.​2.
 
24
By means of outlier.color, outlier.fill, outlier.shape (which hides the outliers if set to NA), outlier.size, outlier.stroke, and outlier.alpha.
 
25
By default, a method is chosen based on the sample size. For less than 1000 observations, the method loess (locally estimated scatterplot smoothing, [5]) is implemented; otherwise, gam (generalized additive models, [8, 10]), is used. Both methods are far advanced for the scope of this book, the reason why we stick to the linear model by setting method="lm."
 
26
To have the plot by rows we will use facet_grid( cut∼ .) .
 
27
If an implemented function is used, it should be between commas.
 
28
The code above is equivalent to set theta="x."
 
29
plotly is also available in other languages such as Python and in standalone online versions.
 
30
Leaflet is an open-source JavaScript library.
 
32
In geographic coordinates, latitude is the angular distance measured along a meridian, with value 0 at the equator and 90 at the north and south poles. Longitude is the angular distance from the Greenwich meridian along the equator going from 0 till 180 East and West, respectively. South latitudes and West longitudes will be set in R as negative.
 
33
The Staples Center is a multi-purpose arena in Los Angeles city, site of several sports and arts international events.
 
35
The usage of this platform requires to be registered and to accept Google terms and introduce our billing data, even though no charge is done without explicit user approval.
 
36
This is a fake key used as an example, which should be substituted by the reader’s personal one.
 
37
In fact it is a tibble: a data format used by R packages from the tidyverse universe like leaflet. In most situations it can be used just as a data frame.
 
38
Try flights[!( country=="Russian Federation" | country=="United Kingdom") ].
 
Literatur
2.
Zurück zum Zitat Belsley, D.A., Kuh, E. and Welsch, R.E. Regression diagnostics: Identifying influential data and sources of collinearity. Wiley Series in Probability and Mathematical Statistics, 571(4), 1980. Belsley, D.A., Kuh, E. and Welsch, R.E. Regression diagnostics: Identifying influential data and sources of collinearity. Wiley Series in Probability and Mathematical Statistics, 571(4), 1980.
3.
Zurück zum Zitat Chambers, J. M., Cleveland, W. S., Kleiner, B. and Tukey, P. A. Graphical Methods for Data Analysis. Wadsworth & Brooks/Cole Mathematics Series, Springer, Heidelberg, Germany, 1983. Chambers, J. M., Cleveland, W. S., Kleiner, B. and Tukey, P. A. Graphical Methods for Data Analysis. Wadsworth & Brooks/Cole Mathematics Series, Springer, Heidelberg, Germany, 1983.
4.
Zurück zum Zitat Cheng, J., Xie, Y., Wickham, H. and Agafonkin, V. leaflet: Create interactive web maps with the JavaScript ‘leaflet’ library. R package version, 1(0):423, 2017. Cheng, J., Xie, Y., Wickham, H. and Agafonkin, V. leaflet: Create interactive web maps with the JavaScript ‘leaflet’ library. R package version, 1(0):423, 2017.
5.
Zurück zum Zitat Cleveland, W.S. LOWESS: A program for smoothing scatterplots by robust locally weighted regression. American Statistician, 35(1):54, 1981.CrossRef Cleveland, W.S. LOWESS: A program for smoothing scatterplots by robust locally weighted regression. American Statistician, 35(1):54, 1981.CrossRef
6.
Zurück zum Zitat Cleveland, W.S. The Elements of Graphing Data. Wadsworth Publ. Co., California, USA, 1985. Cleveland, W.S. The Elements of Graphing Data. Wadsworth Publ. Co., California, USA, 1985.
7.
Zurück zum Zitat Fisher, R.A. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179–188, 1936.CrossRef Fisher, R.A. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179–188, 1936.CrossRef
8.
Zurück zum Zitat Hastie, T., Tibshirani, R. and Friedman, J.H. The elements of statistical learning: data mining, inference, and prediction. Springer, Berlin, Germany, 2009.CrossRef Hastie, T., Tibshirani, R. and Friedman, J.H. The elements of statistical learning: data mining, inference, and prediction. Springer, Berlin, Germany, 2009.CrossRef
10.
Zurück zum Zitat James, G., Witten, D., Hastie, T. and Tibshirani, R. An Introduction to Statistical Learning: With Applications in R. Springer Publishing Company, Incorporated, New York, USA, 2014.MATH James, G., Witten, D., Hastie, T. and Tibshirani, R. An Introduction to Statistical Learning: With Applications in R. Springer Publishing Company, Incorporated, New York, USA, 2014.MATH
13.
Zurück zum Zitat Sterling, A. Unpublished BS Thesis, 1977. Sterling, A. Unpublished BS Thesis, 1977.
14.
Zurück zum Zitat H. et al. Wickham. Welcome to the tidyverse. Journal of Open Source Software, 4(43):1686, 2019. H. et al. Wickham. Welcome to the tidyverse. Journal of Open Source Software, 4(43):1686, 2019.
16.
Zurück zum Zitat Wickham, H. and Grolemund, G. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc., California, USA, 2017. Wickham, H. and Grolemund, G. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc., California, USA, 2017.
17.
Zurück zum Zitat Wilkinson, L. The grammar of graphics. Springer Science & Business Media, Berlin, Germany, 2006. Wilkinson, L. The grammar of graphics. Springer Science & Business Media, Berlin, Germany, 2006.
Metadaten
Titel
Visualization
verfasst von
Alfonso Zamora Saiz
Carlos Quesada González
Lluís Hurtado Gil
Diego Mondéjar Ruiz
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-48997-7_4

Premium Partner