Skip to main content

2020 | OriginalPaper | Buchkapitel

2. Introduction to R

verfasst von : Alfonso Zamora Saiz, Carlos Quesada González, Lluís Hurtado Gil, Diego Mondéjar Ruiz

Erschienen in: An Introduction to Data Analysis in R

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

From Business Intelligence to advanced statistics applications, professionals are expected to access and manipulate large datasets, and R is the perfect tool for it. In this introductory chapter, we explain the principles of programming and the position of R in data science today. Then, a beginners level course on R starts introducing the main data types of this superior programming language. Examples and exercises are included to provide a hands-on training, guaranteeing the users control and understanding of R capabilities. Then, two main generic programming tools are introduced: control structures and functions. This will allow us to manipulate our datasets and generate all sorts of values and conclusions. In addition, this chapter includes specific R operators that highly simplify the use of R and enhance its capabilities.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
This algorithm is usually called long division method in US schools and many other places.
 
2
Assembly languages are often abbreviated asm.
 
3
Here, with machine we refer both to hardware, the architecture of the computer, and software, the operating system.
 
4
FORTRAN is the acronym of FORmula TRANslation.
 
5
Latest FORTRAN version was released on November 28, 2018, known as FORTRAN 2018, see https://​wg5-fortran.​org/​f2018.​html.
 
6
Message, by Peter Dalgaard, of the first beta version released https://​stat.​ethz.​ch/​pipermail/​r-announce/​2000/​000127.​html.
 
7
GNU is a recursive acronym for “GNU’s Not Unix.”
 
9
Upon completion of this book, last stable versions are R 3.6.2, called Dark and Stormy Night, released on December 12, 2019.
 
11
Visit https://​stat.​ethz.​ch/​pipermail/​r-announce/​1997/​000001.​html for the announcement by Kurt Hornik of the opening of CRAN site.
 
13
In order to know the exact amount of available packages at a certain moment one can type nrow( available.packages( ) ) on the R console..
 
18
Computational costs are defined as the amount of time and memory needed to run an algorithm.
 
19
Upon this book completion, the CRAN package repository features 15,368 available packages comprehending many possible extensions of the R core library.
 
25
Announcement of RStudio release on February 28, 2011, https://​blog.​rstudio.​com/​2011/​02/​28/​rstudio-new-open-source-ide-for-r/​. Upon completion of this book, last stable version released is RStudio 1.2.5033, on December 3, 2019.
 
26
For example, Oracle, OLBC, Spark, and many others.
 
27
A very useful shortcut is to use Control+ Enter in PC, or Command+ Enter in Mac, to run each code line.
 
28
Have a look at the menu Code in RStudio for different run options.
 
29
The package ggplot2 will be called in Sect. 4.​2 to create ellaborated plots in R.
 
30
When calling library( ) quotation marks are not needed.
 
36
The acronym RAM stands for Random-access memory.
 
37
The same result is achieved by typing assign( "x",4) .
 
38
A Boolean expression is a data type whose possible values are either TRUE or FALSE . It is named after the mathematician George Bool.
 
39
For simplicity, logical values can also be written as T or F. We will use the full word or the initial letter indistinctively throughout the book.
 
40
In Sect. 5.​1 we will see how to remove these NAs when performing calculations over vectors containing them, with the argument na.rm .
 
41
The conversion between Celsius degrees C and Fahrenheit degrees F is F = 1.8 ×C + 32. To go from Celsius to Kelvin we just shift the zero in the scale to 273.15.
 
42
See Sect. 5.​1.​1 for an explanation of the arithmetic mean and other statistical measures.
 
43
summary( ) is one of the most robust and powerful commands in R. Almost all kind of structures can be passed as an argument of this command and it will usually provide plenty of information.
 
44
Everything can be ordered, alphabetically for example, but nominal scales have no meaningful order related to anything intrinsic to the nature of the variable.
 
45
Thanks to the combination command c( ) , if data are of different types, all of them are stored in the most general type admitting all kinds appearing in the structure.
 
46
Unlike matrices, if the column lengths to be included in the data frame are not the same, the function returns an error and a data frame filling the gaps is not created.
 
47
In Spain and other countries, two family names are used, preserving both the last name of the father and the mother.
 
48
When applying as.data.frame, unless otherwise specified, the default names of the variables are V 1, V 2, etc., meaning variable 1, variable 2, etc.
 
49
Some R packages are specially designed for dealing with datasets, such as tibble and data.table, we will explore the later one in Chap. 3.
 
50
Technically speaking, when using 1 : 10, R is internally doing a loop, so the previous code could be simplified to print( 1 : 10) but it is valid as a first and easy example.
 
51
Note that, in the example, the logical evaluation of the expression 3!=3 is FALSE, whereas being or not a logical expression is TRUE. Try the command is.logical( "Hello") to see the difference.
 
52
Observe that f(1) is undefined, because we are dividing by zero. Despite this, R outputs Inf recovering the limits of f when x approaches 0.
 
53
A richer function is already implemented in the R base library under the name mat.or.vec( ) .
 
54
The computational advantages and disadvantages of using or not return( ) are beyond the scope of this book.
 
Literatur
1.
Zurück zum Zitat Allaire, J.J. Rstudio: Integrated development environment for r. In The R User Conference, useR!, page 14, Coventry, UK, 2011. University of Warwick. Allaire, J.J. Rstudio: Integrated development environment for r. In The R User Conference, useR!, page 14, Coventry, UK, 2011. University of Warwick.
2.
Zurück zum Zitat Allen, F.E. The history of language processor technology in IBM. IBM Journal of Research and Development, 25(5):535–548, 1981.CrossRef Allen, F.E. The history of language processor technology in IBM. IBM Journal of Research and Development, 25(5):535–548, 1981.CrossRef
3.
Zurück zum Zitat Austrian, G. Herman Hollerith, forgotten giant of information processing. Columbia University Press, New York, USA, 1982. Austrian, G. Herman Hollerith, forgotten giant of information processing. Columbia University Press, New York, USA, 1982.
4.
Zurück zum Zitat Babbage, C. Passages from the Life of a Philosopher. Longman, Green, Longman, Roberts, and Green, London, UK, 1864. Babbage, C. Passages from the Life of a Philosopher. Longman, Green, Longman, Roberts, and Green, London, UK, 1864.
5.
Zurück zum Zitat Blass, A. and Gurevich, Y. Algorithms: A quest for absolute definitions. In Current Trends in Theoretical Computer Science: The Challenge of the New Century Vol 1: Algorithms and Complexity Vol 2: Formal Models and Semantics, pages 283–311. World Scientific, Singapur, 2004. Blass, A. and Gurevich, Y. Algorithms: A quest for absolute definitions. In Current Trends in Theoretical Computer Science: The Challenge of the New Century Vol 1: Algorithms and Complexity Vol 2: Formal Models and Semantics, pages 283–311. World Scientific, Singapur, 2004.
6.
Zurück zum Zitat Böhm, C. Calculatrices digitales. Du déchiffrage de formules logico-mathématiques par la machine même dans la conception du programme. Annali di Matematica Pura ed Applicata, 37(1):175–217, 1954. Böhm, C. Calculatrices digitales. Du déchiffrage de formules logico-mathématiques par la machine même dans la conception du programme. Annali di Matematica Pura ed Applicata, 37(1):175–217, 1954.
7.
Zurück zum Zitat Cardelli, L. Type systems. ACM Computing Surveys, 28(1):263–264, 1996.CrossRef Cardelli, L. Type systems. ACM Computing Surveys, 28(1):263–264, 1996.CrossRef
8.
Zurück zum Zitat Chambers, J.M.S. Programming with data: A guide to the S language. Springer Science & Business Media, Berlin, Germany, 1998. Chambers, J.M.S. Programming with data: A guide to the S language. Springer Science & Business Media, Berlin, Germany, 1998.
9.
Zurück zum Zitat Conference on Data Systems Languages. Programming Language Committee. CODASYL COBOL journal of development, 1968. United States Dept. of Commerce, National Bureau of Standards, Maryland, USA, 1969. Conference on Data Systems Languages. Programming Language Committee. CODASYL COBOL journal of development, 1968. United States Dept. of Commerce, National Bureau of Standards, Maryland, USA, 1969.
10.
Zurück zum Zitat Copeland, B.J. The Essential Turing. Clarendon Press, Oxford, UK, 2004.MATH Copeland, B.J. The Essential Turing. Clarendon Press, Oxford, UK, 2004.MATH
11.
Zurück zum Zitat Dobre, A.M., Caragea, N. and Alexandru, C.A. R versus Other Statistical Software. Ovidius University Annals, Series Economic Sciences, 13(1), 2013. Dobre, A.M., Caragea, N. and Alexandru, C.A. R versus Other Statistical Software. Ovidius University Annals, Series Economic Sciences, 13(1), 2013.
12.
Zurück zum Zitat Dybvig, R.K. The SCHEME programming language. MIT Press, Massachusetts, USA, 2009.MATH Dybvig, R.K. The SCHEME programming language. MIT Press, Massachusetts, USA, 2009.MATH
13.
Zurück zum Zitat Friedman, D.P., Wand, M. and Haynes, C.T. Essentials of programming languages. MIT Press, Massachusetts, USA, 2001.MATH Friedman, D.P., Wand, M. and Haynes, C.T. Essentials of programming languages. MIT Press, Massachusetts, USA, 2001.MATH
14.
Zurück zum Zitat Gunter, C.A. Semantics of programming languages: structures and techniques. MIT Press, Massachusetts, USA, 1992.MATH Gunter, C.A. Semantics of programming languages: structures and techniques. MIT Press, Massachusetts, USA, 1992.MATH
15.
Zurück zum Zitat Harper, R. What, if anything, is a programming paradigm?, 2017. Harper, R. What, if anything, is a programming paradigm?, 2017.
16.
Zurück zum Zitat Hornik, K. The comprehensive R archive network. Wiley Interdisciplinary Reviews: Computational Statistics, 4(4):394–398, 2012.CrossRef Hornik, K. The comprehensive R archive network. Wiley Interdisciplinary Reviews: Computational Statistics, 4(4):394–398, 2012.CrossRef
18.
Zurück zum Zitat Ihaka, R. R: lessons learned, directions for the future. In Joint Statistical Meetings proceedings, Virginia, USA, 2010. ASA. Ihaka, R. R: lessons learned, directions for the future. In Joint Statistical Meetings proceedings, Virginia, USA, 2010. ASA.
19.
Zurück zum Zitat Ihaka, R. and Gentleman, R. R: a language for data analysis and graphics. Journal of computational and graphical statistics, 5(3):299–314, 1996. Ihaka, R. and Gentleman, R. R: a language for data analysis and graphics. Journal of computational and graphical statistics, 5(3):299–314, 1996.
21.
Zurück zum Zitat Knuth, D.E. The art of computer programming, volume 3. Pearson Education, London, UK, 1997. Knuth, D.E. The art of computer programming, volume 3. Pearson Education, London, UK, 1997.
22.
Zurück zum Zitat Knuth, D.E. and Pardo, L.T. The early development of programming languages. In A history of computing in the twentieth century, pages 197–273. Elsevier, Amsterdam, Netherlands, 1980. Knuth, D.E. and Pardo, L.T. The early development of programming languages. In A history of computing in the twentieth century, pages 197–273. Elsevier, Amsterdam, Netherlands, 1980.
23.
Zurück zum Zitat McCarthy, J. Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I. Commun. ACM, 3(4):184–195, 1960.CrossRef McCarthy, J. Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I. Commun. ACM, 3(4):184–195, 1960.CrossRef
24.
Zurück zum Zitat Menabrea, L.F. Notions sur la Machine Analytique de M. Charles Babbage. Bibliothèque Universelle de Genève, 41:352–376, 1842. Translated, with additional notes by Augusta Ada, Countess of Lovelace, as Sketch of the Analytical Engine. Menabrea, L.F. Notions sur la Machine Analytique de M. Charles Babbage. Bibliothèque Universelle de Genève, 41:352–376, 1842. Translated, with additional notes by Augusta Ada, Countess of Lovelace, as Sketch of the Analytical Engine.
25.
Zurück zum Zitat Posselt, E.A. and Philadelphia Museum of Art. The Jacquard Machine Analyzed and Explained: with an Appendix on the Preparation of Jacquard Cards. Published under the auspices of the school [Pennsylvania museum and school of industrial art], Pennsylvania, USA, 1887. Posselt, E.A. and Philadelphia Museum of Art. The Jacquard Machine Analyzed and Explained: with an Appendix on the Preparation of Jacquard Cards. Published under the auspices of the school [Pennsylvania museum and school of industrial art], Pennsylvania, USA, 1887.
27.
Zurück zum Zitat Pugh, E.W. and Eugene Spafford Collection. Building IBM: Shaping an Industry and Its Technology. MIT Press, Massachusetts, USA, 1995. Pugh, E.W. and Eugene Spafford Collection. Building IBM: Shaping an Industry and Its Technology. MIT Press, Massachusetts, USA, 1995.
28.
Zurück zum Zitat R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria, 2018. R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria, 2018.
29.
Zurück zum Zitat Racine, J.S. RStudio: A platform-independent IDE for R and Sweave. Journal of Applied Econometrics, 27(1):167–172, 2012.CrossRef Racine, J.S. RStudio: A platform-independent IDE for R and Sweave. Journal of Applied Econometrics, 27(1):167–172, 2012.CrossRef
30.
Zurück zum Zitat Rogers, H. and Rogers, H. Theory of recursive functions and effective computability, volume 5. McGraw-Hill, New York, USA, 1967.MATH Rogers, H. and Rogers, H. Theory of recursive functions and effective computability, volume 5. McGraw-Hill, New York, USA, 1967.MATH
31.
Zurück zum Zitat RStudio Team. RStudio: Integrated Development Environment for R. Massachusetts, USA, 2015. RStudio Team. RStudio: Integrated Development Environment for R. Massachusetts, USA, 2015.
32.
Zurück zum Zitat Slonneger, K. and Kurtz, B.L. Formal syntax and semantics of programming languages, volume 340. Addison-Wesley Reading, Massachusetts, USA, 1995. Slonneger, K. and Kurtz, B.L. Formal syntax and semantics of programming languages, volume 340. Addison-Wesley Reading, Massachusetts, USA, 1995.
33.
Zurück zum Zitat Truesdell, L.E. The development of punch card tabulation in the Bureau of the Census, 1890-1940; with outlines of actual tabulation programs. U. S. Dept. of Commerce, Bureau of the Census Washington, Washington DC, USA, 1965. Truesdell, L.E. The development of punch card tabulation in the Bureau of the Census, 1890-1940; with outlines of actual tabulation programs. U. S. Dept. of Commerce, Bureau of the Census Washington, Washington DC, USA, 1965.
34.
Zurück zum Zitat Turing, A.M. On Computable Numbers, with an Application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, 2(42):230–265, 1936.MathSciNetMATH Turing, A.M. On Computable Numbers, with an Application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, 2(42):230–265, 1936.MathSciNetMATH
35.
Zurück zum Zitat Van Roy, P. and Haridi, S. Concepts, Techniques, and Models of Computer Programming. MIT Press, Massachusetts, USA, 2003. Van Roy, P. and Haridi, S. Concepts, Techniques, and Models of Computer Programming. MIT Press, Massachusetts, USA, 2003.
36.
Zurück zum Zitat Wickham, H. R Packages: Organize, Test, Document, and Share Your Code. O’Reilly Media, California, USA, 2015. Wickham, H. R Packages: Organize, Test, Document, and Share Your Code. O’Reilly Media, California, USA, 2015.
37.
Zurück zum Zitat Winskel, G. The formal semantics of programming languages: an introduction. MIT Press, Massachusetts, USA, 1993.CrossRef Winskel, G. The formal semantics of programming languages: an introduction. MIT Press, Massachusetts, USA, 1993.CrossRef
Metadaten
Titel
Introduction to R
verfasst von
Alfonso Zamora Saiz
Carlos Quesada González
Lluís Hurtado Gil
Diego Mondéjar Ruiz
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-48997-7_2

Premium Partner