Skip to main content

2024 | Buch

Introduction to Probability, Statistics & R

Foundations for Data-Based Sciences

insite
SUCHEN

Über dieses Buch

A strong grasp of elementary statistics and probability, along with basic skills in using R, is essential for various scientific disciplines reliant on data analysis. This book serves as a gateway to learning statistical methods from scratch, assuming a solid background in high school mathematics. Readers gradually progress from basic concepts to advanced statistical modelling, with examples from actuarial, biological, ecological, engineering, environmental, medicine, and social sciences highlighting the real-world relevance of the subject. An accompanying R package enables seamless practice and immediate application, making it ideal for beginners. The book comprises 19 chapters divided into five parts. Part I introduces basic statistics and the R software package, teaching readers to calculate simple statistics and create basic data graphs. Part II delves into probability concepts, including rules and conditional probability, and introduces widelyused discrete and continuous probability distributions (e.g., binomial, Poisson, normal, log-normal). It concludes with the central limit theorem and joint distributions for multiple random variables. Part III explores statistical inference, covering point and interval estimation, hypothesis testing, and Bayesian inference. This part is intentionally less technical, making it accessible to readers without an extensive mathematical background. Part IV addresses advanced probability and statistical distribution theory, assuming some familiarity with (or concurrent study of) mathematical methods like advanced calculus and linear algebra. Finally, Part V focuses on advanced statistical modelling using simple and multiple regression and analysis of variance, laying the foundation for further studies in machine learning and data science applicable to various data and decision analytics contexts. Based on years of teaching experience, this textbook includes numerousexercises and makes extensive use of R, making it ideal for year-long data science modules and courses. In addition to university courses, the book amply covers the syllabus for the Actuarial Statistics 1 examination of the Institute and Faculty of Actuaries in London. It also provides a solid foundation for postgraduate studies in statistics and probability, or a reliable reference for statistics.

Inhaltsverzeichnis

Frontmatter

Introduction to Basic Statistics and R

Frontmatter
1. Introduction to Basic Statistics
Abstract
Chapter 1: This chapter introduces basic statistics such as the mean, median and mode and standard deviation. It also provides introduction to many motivating data sets which are used as running examples throughout the book. An accessible discussion is also provided to debate issues like: “Lies, damned lies and statistics” and “Figures don’t lie but liars can figure.”
Sujit K. Sahu
2. Getting Started with R
Abstract
Chapter 2: This chapter introduces the R software package and discusses how to get started with many examples. It revisits some of the data sets already mentioned in Chap. 1 by drawing simple graphs and obtaining summary statistics.
Sujit K. Sahu

Introduction to Probability

Frontmatter
3. Introduction to Probability
Abstract
Chapter 3: The basic concepts of probability are introduced in this chapter. Elementary methods of counting, the number of permutations and the number of combinations are introduced and illustrated. Elementary methods for calculating probabilities are discussed and the general urn problem in probability is defined.
Sujit K. Sahu
4. Conditional Probability and Independence
Abstract
Chapter 4: This chapter introduces many advanced laws of probability such as the total probability theorem, conditional probability and the Bayes theorem. The famous Monty Python problem is discussed and illustrated using a simulation tool in R. The concept of independence is discussed and illustrated with many examples such system reliability and randomised response methods.
Sujit K. Sahu
5. Random Variables and Their Probability Distributions
Abstract
Chapter 5 defines the random variables and their probability distributions. Many properties such as mean, variance, and quantiles of random variables are also defined here. Laws for expectations and variances of linear functions of random variables are also discussed.
Sujit K. Sahu
6. Standard Discrete Distributions
Abstract
Chapter 6: This chapter introduces the standard discrete distributions: Bernoulli, binomial, Poisson, geometric, hypergeometric and negative binomial. In each case the basic properties, such as mean and variance are obtained and the R commands to obtain probabilities and cumulative probabilities are illustrated.
Sujit K. Sahu
7. Standard Continuous Distributions
Abstract
Chapter 7: This chapter introduces standard continuous distributions: exponential, normal, gamma and beta. As in Chap. 6, here we find the means and variances and also discuss the R commands for finding various quantities for each distribution.
Sujit K. Sahu
8. Joint Distributions and the CLT
Abstract
Chapter 8: This chapter introduces the joint probability distribution for multiple random variables. It also discusses conditional and marginal distributions, conditional expectations, covariance and correlation. Finally it introduces the central limit theorem for the sum of independent random variables.
Sujit K. Sahu

Introduction to Statistical Inference

Frontmatter
9. Introduction to Statistical Inference
Abstract
Chapter 9: This chapter introduces the basic concepts of statistical inference and statistical modelling. It distinguishes between population distributions and sample statistics (quantities). The concepts of estimators and their sampling (probability) distributions are also introduced. The properties of bias and mean square errors of estimators and defined.
Sujit K. Sahu
10. Methods of Point Estimation
Abstract
Chapter 10: This chapter discusses three important methods for point estimation: method of moments, maximum likelihood and Bayesian methods.
Sujit K. Sahu
11. Interval Estimation
Abstract
Chapter 11 discusses techniques such as the pivoting method for interval estimation. The central limit theorem is used to derive confidence intervals for the mean parameters of binomial, Poisson and normal distributions. For the normal distribution we also discuss the exact confidence interval using the t-distribution without actually deriving the sampling distribution of the t-statistic.
Sujit K. Sahu
12. Hypothesis Testing
Abstract
Chapter 12 discusses testing of statistical hypotheses called null and alternative hypothesis. Definintions of many related keywords, e.g. critical region, types of errors while testing statistical hypothesis, power function, sensitivity and specificity are provided. These are illustrated with the t-test for testing hypothesis regarding the mean of one ir two normal distributions. This chapter ends with a discussion on designs of experiments for estimation and testing purposes.
Sujit K. Sahu

Advanced Distribution Theory and Probability

Frontmatter
13. Generating Functions
Abstract
Chapter 13 starts the Part III of this book on advanced distribution theory and probability. It discusses the moment generating function, cumulant generating function and probability generating function for discrete random variables. The uniqueness theorem for the moment generating function is also stated here to facilitate many proofs in statistical distribution theory.
Sujit K. Sahu
14. Transformation and Transformed Distributions
Abstract
Chapter 14 is devoted to deriving distributions of transformed random variables in one and multiple dimensions. These techniques are used to derive sampling distributions of quantities of statistical interests while sampling from the normal distribution. Three new distributions: chi-squared, t and F are derived and their properties are discussed.
Sujit K. Sahu
15. Multivariate Distributions
Abstract
Chapter 15 discusses bivariate and multivariate probability distributions. In particular, it discusses the marginal and conditional distributions associated with bivariate and multivariate normal distributions. It also discusses the joint moment generating function for the multivariate normal distribution. In the discrete case it introduces the multinomial distribution as a generalisation of the binomial distribution.
Sujit K. Sahu
16. Convergence of Estimators
Abstract
Chapter 16 discusses asymptotic theories which are often required to guarantee good properties of statistical inference techniques. Three types of modes of convergence in statistics are discussed and illustrated with the help of simulation using R routines. Large sample properties of the maximum likelihood estimators are stated and so are the laws of large numbers.
Sujit K. Sahu

Introduction to Statistical Modelling

Frontmatter
17. Simple Linear Regression Model
Abstract
Chapter 17 kicks off Part V of the book on introduction to statistical modelling. It discusses the concepts related to simple regression modelling with many practical examples. The concepts of estimation, inference and predictions are discussed along with the required theoretical derivations. Simultaneously, illustrations are carried along with R code so that the reader can immediately transfer their skills into the practical domain.
Sujit K. Sahu
18. Multiple Linear Regression Model
Abstract
This chapter generalises the simple regression techniques of the previous chapter to the case where there are multiple possible explanatory variables. This chapter describes the foundational basics for machine learning where the simple and multiple regression techniques are exploited heavily for practical problems. Again, the techniques are described both theoretically and using practical modelling examples in R so that the reader can easily form their own transferable skills.
Sujit K. Sahu
19. Analysis of Variance
Abstract
Finally, this chapter introduces the concepts of analysis of variance which is a seen as a general model comparison technique where there are categorical explanatory variables. Theoretical generalisation of the techniques from the two preceding chapters are included and so are illustrations using R. In particular, the one way analysis of variance technique is illustrated by using an ecological example on modelling body weights of brushtail possums—a nocturnal animal only native to Australia.
Sujit K. Sahu
20. Solutions to Selected Exercises
Abstract
This chapter provides solutions to selected exercises.
Sujit K. Sahu
Backmatter
Metadaten
Titel
Introduction to Probability, Statistics & R
verfasst von
Sujit K. Sahu
Copyright-Jahr
2024
Electronic ISBN
978-3-031-37865-2
Print ISBN
978-3-031-37864-5
DOI
https://doi.org/10.1007/978-3-031-37865-2

Premium Partner