Skip to main content
Top

2022 | Book

Applied Multivariate Statistics with R

insite
SEARCH

About this book

Now in its second edition, this book brings multivariate statistics to graduate-level practitioners, making these analytical methods accessible without lengthy mathematical derivations. Using the open source shareware program R, Dr. Zelterman demonstrates the process and outcomes for a wide array of multivariate statistical applications. Chapters cover graphical displays; linear algebra; univariate, bivariate and multivariate normal distributions; factor methods; linear regression; discrimination and classification; clustering; time series models; and additional methods. He uses practical examples from diverse disciplines, to welcome readers from a variety of academic specialties. Each chapter includes exercises, real data sets, and R implementations. The book avoids theoretical derivations beyond those needed to fully appreciate the methods. Prior experience with R is not necessary.

New to this edition are chapters devoted to longitudinal studies and the clustering of large data. It is an excellent resource for students of multivariate statistics, as well as practitioners in the health and life sciences who are looking to integrate statistics into their work.

Table of Contents

Frontmatter
Chapter 1. Introduction
Abstract
WE ARE SURROUNDED by data. How is multivariate data analysis different from more familiar univariate methods? This chapter provides a summary of most of the major topics covered in this book. We also want to provide advocacy for the multivariate methods developed.
Daniel Zelterman
Chapter 2. Elements of R
Abstract
THE SOFTWARE PACKAGE Rhas become very popular over the past decade for good reason. It has open source, meaning you can examine exactly what steps the program is performing. Compare this to the “black box” approach adopted by so many other software packages whose authors hope you will just push the Enter button and accept the results. Another feature of open software is if you identify a problem, you can fix it, or at least, publicize the error until it gets fixed. Finally, once you become proficient at R you can contribute to it. One of the great features of R is the availability of packages of programs written by R users for other R users. Best of all, R is free for the asking and easy to install on your computer.
Daniel Zelterman
Chapter 3. Graphical Displays
Abstract
THERE ARE MANY GRAPHICAL METHODS to be demonstrated with little or no explanation. You are likely familiar with histograms and scatterplots. Many options in R have improved on these in interesting and useful ways. The ability to produce statistical graphics is a clear strength of R.
Daniel Zelterman
Chapter 4. Some Linear Algebra
Abstract
MANY OPERATIONS performed on multivariate data are facilitated using vector and matrix notation. In this chapter, we introduce the basic operations and properties of these and then show how to perform them in R.
Daniel Zelterman
Chapter 5. The Univariate Normal Distribution
Abstract
THE NORMAL DISTRIBUTION is central to much of statistics. In this chapter and the two following, we develop the normal model from the univariate, bivariate, and then, finally, the more general distribution with an arbitrary number of dimensions.
Daniel Zelterman
Chapter 6. Bivariate Normal Distribution
Abstract
THE BIVARIATE NORMAL DISTRIBUTION helps us make the important leap from the univariate normal to the more general multivariate normal distribution. To accomplish this, we need to make the transition from the scalar univariate notation of the previous chapter to the matrix notation of the following chapter.
Daniel Zelterman
Chapter 7. Multivariate Normal Distribution
Abstract
IN THIS CHAPTER, we generalize the bivariate normal distribution from the previous chapter to an arbitrary number of dimensions. We also make use of the matrix notation. The mathematics is generally more dense and relies on the linear algebra notation covered in Chapter 4. In Sect. 4.​5, we pointed out there is a limit on what computations we can reasonably perform by hand. For this reason, we illustrate these various operations with the help of R.
Daniel Zelterman
Chapter 8. Factor Methods
Abstract
THE PREVIOUS CHAPTER described inference on the multivariate normal distribution. Sometimes this is more than we actually need. The multivariate distribution is used as a basis of modeling means and covariances. The covariances describe the multivariate relationship between pairs of individual attributes. In this chapter, we go further and describe methods for identifying relationships between several variables concurrently. In the following chapter, we will use regression methods to model the means.
Daniel Zelterman
Chapter 9. Multivariable Linear Regression
Abstract
LINEAR REGRESSION is probably one of the most powerful and useful tools available to the applied statistician. This method uses one or more variables to explain the values of another. Statistics alone cannot prove a cause and effect relationship, but we can do show how changes in one set of measurements are associated with changes of the average values in another.
Daniel Zelterman
Chapter 10. Discrimination and Classification
Abstract
IF WE HAVE multivariate observations from two or more identified populations, how can we characterize them? Is there a combination of measurements to clearly distinguish between these groups? It is not good enough to simply say the mean of one variable is statistically higher in one group in order to solve this problem, because the histograms of the groups may have considerable overlap making the discriminatory process only a little better than guesswork.
Daniel Zelterman
Chapter 11. Clustering Methods
Abstract
CLUSTERING is a nonparametric method of arranging similar observations together, often in a graphical display used to detect patterns of grouping and outliers. The approach is usually considered nonparametric because there is no specified underlying distribution or model we need to assume. R offers great flexibility in graphical capability making these methods possible. The largest difference between these methods and those considered in the previous chapter is in this chapter we do not know group membership a priori, or whether in fact there are different groups at all. Similarly, part of the methods discussed here includes estimates of the number of dissimilar groups present in the data.
Daniel Zelterman
Chapter 12. Basic Models for Longitudinal Data
Abstract
Longitudinal studies are a common form of studies including clinical trials where the treatment effect is visible only after several measurements are made on the same individual over a period of time. This chapter begins with an example of a randomized trial of an experimental medication.
Daniel Zelterman
Chapter 13. Time Series Models
Abstract
THE MODELS for data described so far have been concerned with independent observations on multivariate values. The data examined in this chapter are for settings where successive observations are also correlated. The subject matter is not usually associated with multivariate methods but our choice of applications makes these methods more relevant.
Daniel Zelterman
Chapter 14. Other Useful Methods
Abstract
THIS FINAL CHAPTER provides a collection of useful multivariate methods not fitting into any of the previous chapters. The Bradley–Terry model gives us a way to rank a set of objects examined by pairwise comparisons. Such examples include sports teams playing against each other. Canonical correlations generalize the definition of correlation of a pair of scalar-valued variates to two groups of several variables considered jointly. The study of extremes allows us to examine several of the largest values in a collection of data.
Daniel Zelterman
Backmatter
Metadata
Title
Applied Multivariate Statistics with R
Author
Daniel Zelterman
Copyright Year
2022
Electronic ISBN
978-3-031-13005-2
Print ISBN
978-3-031-13004-5
DOI
https://doi.org/10.1007/978-3-031-13005-2

Premium Partner