Skip to main content

2020 | Buch

An Introduction to Data Analysis in R

Hands-on Coding, Data Mining, Visualization and Statistics from Scratch

verfasst von: Alfonso Zamora Saiz, Carlos Quesada González, Lluís Hurtado Gil, Diego Mondéjar Ruiz

Verlag: Springer International Publishing

Buchreihe : Use R!

insite
SUCHEN

Über dieses Buch

This textbook offers an easy-to-follow, practical guide to modern data analysis using the programming language R. The chapters cover topics such as the fundamentals of programming in R, data collection and preprocessing, including web scraping, data visualization, and statistical methods, including multivariate analysis, and feature exercises at the end of each section. The text requires only basic statistics skills, as it strikes a balance between statistical and mathematical understanding and implementation in R, with a special emphasis on reproducible examples and real-world applications. This textbook is primarily intended for undergraduate students of mathematics, statistics, physics, economics, finance and business who are pursuing a career in data analytics. It will be equally valuable for master students of data science and industry professionals who want to conduct data analyses.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Introduction
Abstract
Since the beginning of the twenty-first century, the humankind has witnessed the emergence of a new generation of mathematical and statistical tools that are reshaping the way of doing business and the future of society. Everything is data nowadays: company clients are tabulated pieces of data, laboratory experiments output is expressed as data, and our own history records through the internet are also made of data. And these data need to be treated, to be taken into account, to have all their important information extracted and to serve business, society, or ourselves. And that is the task of a data analyst.
Alfonso Zamora Saiz, Carlos Quesada González, Lluís Hurtado Gil, Diego Mondéjar Ruiz
Chapter 2. Introduction to R
Abstract
From Business Intelligence to advanced statistics applications, professionals are expected to access and manipulate large datasets, and R is the perfect tool for it. In this introductory chapter, we explain the principles of programming and the position of R in data science today. Then, a beginners level course on R starts introducing the main data types of this superior programming language. Examples and exercises are included to provide a hands-on training, guaranteeing the users control and understanding of R capabilities. Then, two main generic programming tools are introduced: control structures and functions. This will allow us to manipulate our datasets and generate all sorts of values and conclusions. In addition, this chapter includes specific R operators that highly simplify the use of R and enhance its capabilities.
Alfonso Zamora Saiz, Carlos Quesada González, Lluís Hurtado Gil, Diego Mondéjar Ruiz
Chapter 3. Databases in R
Abstract
Prior to any data analysis, it is fundamental to be able to handle different sources and formats of information, such as files or web sites, and it is equally important to understand how to transform and manipulate all kinds of data so as to prepare everything in the right way to perform an statistical analysis. This chapter is divided into two parts, the first delves with the diversity of environments for data sources, ranging from importation of structured data or the use of APIs to the more advanced usage od scraping tools for cases when data is not prepared to be downloaded explicitly. Then advanced features that allow to transform raw data into ready to analyze tables are discussed with special focus on the exceptionally fast data.table.
Alfonso Zamora Saiz, Carlos Quesada González, Lluís Hurtado Gil, Diego Mondéjar Ruiz
Chapter 4. Visualization
Abstract
Presenting conclusions with the help of a graph can greatly improve your communication and convincing skills. R is a proficient tool for data visualization and in this chapter we explore some of the most well known plotting packages. First, with the R base graphics one can elaborate most of the fundamental graph styles with great level of customization. This package is commonly used to produce explanatory graphs, being a valuable help to visualize the properties of a dataset. Second, the widely used ggplot2 package can be used to produce highly aesthetic graphs with ease. This exceptional tool processes input data into a final plot which displays new conclusions in an understandable fashion. Finally, and for an extra domain on data visualization, the packages plotly and leaflet, specialized in the construction of interactive plots and maps respectively, are introduced.
Alfonso Zamora Saiz, Carlos Quesada González, Lluís Hurtado Gil, Diego Mondéjar Ruiz
Chapter 5. Data Analysis with R
Abstract
The goal of data analysis is to use statistical tools to describe, infer, and predict values. The number of different concepts and techniques is very vast and it is essential to structure the way we learn them. Some descriptive statistics concepts are needed to understand the data we are working with. Then we can apply inference techniques to generalize some conclusions that we have found in our data. Finally, multivariate statistics allows us to construct models used to predict values of the variables we are studying. The program R is focused in these techniques and some advanced statistics can be performed easily with it.
Alfonso Zamora Saiz, Carlos Quesada González, Lluís Hurtado Gil, Diego Mondéjar Ruiz
Backmatter
Metadaten
Titel
An Introduction to Data Analysis in R
verfasst von
Alfonso Zamora Saiz
Carlos Quesada González
Lluís Hurtado Gil
Diego Mondéjar Ruiz
Copyright-Jahr
2020
Electronic ISBN
978-3-030-48997-7
Print ISBN
978-3-030-48996-0
DOI
https://doi.org/10.1007/978-3-030-48997-7

Premium Partner