Skip to main content

2003 | Buch

Applied Multivariate Statistical Analysis

verfasst von: Wolfgang Härdle, Léopold Simar

Verlag: Springer Berlin Heidelberg

insite
SUCHEN

Über dieses Buch

Most of the observable phenomena in the empirical sciences are of multivariate nature. This book presents the tools and concepts of multivariate data analysis with a strong focus on applications. The text is devided into three parts. The first part is devoted to graphical techniques describing the distributions of the involved variables. The second part deals with multivariate random variables and presents from a theoretical point of view distributions, estimators and tests for various practical situations. The last part covers multivariate techniques and introduces the reader into the wide basket of tools for multivariate data analysis. The text presents a wide range of examples and 228 exercises.

Inhaltsverzeichnis

Frontmatter

Descriptive Techniques

Frontmatter
1. Comparison of Batches
Abstract
Multivariate statistical analysis is concerned with analyzing and understanding data in high dimensions. We suppose that we are given a set
$$\left\{ {{x_i}} \right\}_i^n = 1$$
of n observations of a variable vector X in ℝ p .
Wolfgang Härdle, Léopold Simar

Multivariate Random Variables

Frontmatter
2. A Short Excursion into Matrix Algebra
Abstract
This chapter is a reminder of basic concepts of matrix algebra, which are particularly useful in multivariate analysis. It also introduces the notations used in this book for vectors and matrices. Eigenvalues and eigenvectors play an important role in multivariate techniques. In Sections 2.2 and 2.3, we present the spectral decomposition of matrices and consider the maximization (minimization) of quadratic forms given some constraints.
Wolfgang Härdle, Léopold Simar
3. Moving to Higher Dimensions
Abstract
We have seen in the previous chapters how very simple graphical devices can help in understanding the structure and dependency of data. The graphical tools were based on either univariate (bivariate) data representations or on “slick” transformations of multivariate information perceivable by the human eye. Most of the tools are extremely useful in a modelling step, but unfortunately, do not give the full picture of the data set. One reason for this is that the graphical tools presented capture only certain dimensions of the data and do not necessarily concentrate on those dimensions or subparts of the data under analysis that carry the maximum structural information. In Part III of this book, powerful tools for reducing the dimension of a data set will be presented. In this chapter, as a starting point, simple and basic tools are used to describe dependency. They are constructed from elementary facts of probability theory and introductory statistics (for example, the covariance and correlation between two variables).
Wolfgang Härdle, Léopold Simar
4. Multivariate Distributions
Abstract
The preceeding chapter showed that by using the two first moments of a multivariate distribution (the mean and the covariance matrix), a lot of information on the relationship between the variables can be made available. Only basic statistical theory was used to derive tests of independence or of linear relationships. In this chapter we give an introduction to the basic probability tools useful in statistical multivariate analysis.
Wolfgang Härdle, Léopold Simar
5. Theory of the Multinormal
Abstract
In the preceeding chapter we saw how the multivariate normal distribution comes into play in many applications. It is useful to know more about this distribution, since it is often a good approximate distribution in many situations. Another reason for considering the multinormal distribution relies on the fact that it has many appealing properties: it is stable under linear transforms, zero correlation corresponds to independence, the marginals and all the conditionals are also multivariate normal variates, etc. The mathematical properties of the multinormal make analyses much simpler.
Wolfgang Härdle, Léopold Simar
6. Theory of Estimation
Abstract
We know from our basic knowledge of statistics that one of the objectives in statistics is to better understand and model the underlying process which generates the data. This is known as statistical inference: we infer from information contained in a sample properties of the population from which the observations are taken. In multivariate statistical inference, we do exactly the same. The basic ideas were introduced in Section 4.5 on sampling theory: we observed the values of a multivariate random variable X and obtained a sample
$$\left\{ {{x_i}} \right\}_i^n = 1$$
. Under random sampling, these observations are considered to be realizations of a sequence of i.i.d. random variables X 1, ..., X n where each X i is a p-variate random variable which replicates the parent or population random variable X. In this chapter, for notational convenience, we will no longer differentiate between a random variable X i and an observation of it, x i , in our notation. We will simply write x i and it should be clear from the context whether a random variable or an observed value is meant.
Wolfgang Härdle, Léopold Simar
7. Hypothesis Testing
Abstract
In the preceding chapter, the theoretical basis of estimation theory was presented. Now we turn our interest towards testing issues: we want to test the hypothesis H 0 that the unknown parameter θ belongs to some subspace of ℝ q . This subspace is called the null set and will be denoted by Ω0 ⊂ ℝ q .
Wolfgang Härdle, Léopold Simar

Multivariate Techniques

Frontmatter
8. Decomposition of Data Matrices by Factors
Abstract
In Chapter 1 basic descriptive techniques we developed which provided tools for “looking” at multivariate data. They were based on adaptations of bivariate or univariate devices used to reduce the dimensions of the observations. In the following three chapters, issues of reducing the dimension of a multivariate data set will be discussed. The perspectives will be different but the tools will be related.
Wolfgang Härdle, Léopold Simar
9. Principal Components Analysis
Abstract
Chapter 8 presented the basic geometric tools needed to produce a lower dimensional description of the rows and columns of a multivariate data matrix. Principal components analysis has the same objective with the exception that the rows of the data matrix x will now be considered as observations from a p-variate random variable X. The principle idea of reducing the dimension of X is achieved through linear combinations. Low dimensional linear combinations are often easier to interpret and serve as an intermediate step in a more complex data analysis. More precisely one looks for linear combinations which create the largest spread among the values of X. In other words, one is searching for linear combinations with the largest variances.
Wolfgang Härdle, Léopold Simar
10. Factor Analysis
Abstract
A frequently applied paradigm in analyzing data from multivariate observations is to model the relevant information (represented in a multivariate variable X) as coming from a limited number of latent factors. In a survey on household consumption, for example, the consumption levels, X, of p different goods during one month could be observed. The variations and covariations of the p components of X throughout the survey might in fact be explained by two or three main social behavior factors of the household. For instance, a basic desire of comfort or the willingness to achieve a certain social level or other social latent concepts might explain most of the consumption behavior. These unobserved factors are much more interesting to the social scientist than the observed quantitative measures (X) themselves, because they give a better understanding of the behavior of households. As shown in the examples below, the same kind of factor analysis is of interest in many fields such as psychology, marketing, economics, politic sciences, etc.
Wolfgang Härdle, Léopold Simar
11. Cluster Analysis
Abstract
The next two chapters address classification issues from two varying perspectives. When considering groups of objects in a multivariate data set, two situations can arise. Given a data set containing measurements on individuals, in some cases we want to see if some natural groups or classes of individuals exist, and in other cases, we want to classify the individuals according to a set of existing groups. Cluster analysis develops tools and methods concerning the former case, that is, given a data matrix containing multivariate measurements on a large number of individuals (or objects), the objective is to build some natural subgroups or clusters of individuals. This is done by grouping individuals that are “similar” according to some appropriate criterion. Once the clusters are obtained, it is generally useful to describe each group using some descriptive tool from Chapters 1, 8 or 9 to create a better understanding of the differences that exist among the formulated groups.
Wolfgang Härdle, Léopold Simar
12. Discriminant Analysis
Abstract
Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into these known groups. For instance, in credit scoring, a bank knows from past experience that there are good customers (who repay their loan without any problems) and bad customers (who showed difficulties in repaying their loan). When a new customer asks for a loan, the bank has to decide whether or not to give the loan. The past records of the bank provides two data sets: multivariate observations x i on the two categories of customers (including for example age, salary, marital status, the amount of the loan, etc.). The new customer is a new observation x with the same variables. The discrimination rule has to classify the customer into one of the two existing groups and the discriminant analysis should evaluate the risk of a possible “bad decision”.
Wolfgang Härdle, Léopold Simar
13. Correspondence Analysis
Abstract
Correspondence analysis provides tools for analyzing the associations between rows and columns of contingency tables. A contingency table is a two-entry frequency table where the joint frequencies of two qualitative variables are reported. For instance a (2 × 2) table could be formed by observing from a sample of n individuals two qualitative variables: the individual’s sex and whether the individual smokes. The table reports the observed joint frequencies. In general (n × p) tables may be considered.
Wolfgang Härdle, Léopold Simar
14. Canonical Correlation Analysis
Abstract
Complex multivariate data structures are better understood by studying low-dimensional projections. For a joint study of two data sets, we may ask what type of low-dimensional projection helps in finding possible joint structures for the two samples. The canonical correlation analysis is a standard tool of multivariate statistical analysis for discovery and quantification of associations between two sets of variables.
Wolfgang Härdle, Léopold Simar
15. Multidimensional Scaling
Abstract
One major aim of multivariate data analysis is dimension reduction. For data measured in Euclidean coordinates, Factor Analysis and Principal Component Analysis are dominantly used tools. In many applied sciences data is recorded as ranked information. For example, in marketing, one may record “product A is better than product B”. High-dimensional observations therefore often have mixed data characteristics and contain relative information (w.r.t. a defined standard) rather than absolute coordinates that would enable us to employ one of the multivariate techniques presented so far.
Wolfgang Härdle, Léopold Simar
16. Conjoint Measurement Analysis
Abstract
Conjoint Measurement Analysis plays an important role in marketing. In the design of new products it is valuable to know which components carry what kind of utility for the customer. Marketing and advertisement strategies are based on the perception of the new product’s overall utility. It can be valuable information for a car producer to know whether a change in sportiness or a change in safety equipment is perceived as a higher increase in overall utility. The Conjoint Measurement Analysis is a method for attributing utilities to the components (part worths) on the basis of ranks given to different outcomes (stimuli) of the product. An important assumption is that the overall utility is decomposed as a sum of the utilities of the components.
Wolfgang Härdle, Léopold Simar
17. Applications in Finance
Abstract
A portfolio is a linear combination of assets. Each asset contributes with a weight c j , to the portfolio. The performance of such a portfolio is a function of the various returns of the assets and of the weights c = (c 1,, c p )T. In this chapter we investigate the “optimal choice” of the portfolio weights c. The optimality criterion is the mean-variance efficiency of the portfolio. Usually investors are risk-averse, therefore, we can define a mean-variance efficient portfolio to be a portfolio that has a minimal variance for a given desired mean return. Equivalently, we could try to optimize the weights for the portfolios with maximal mean return for a given variance (risk structure). We develop this methodology in the situations of (non)existence of riskless assets and discuss relations with the Capital Assets Pricing Model (CAPM).
Wolfgang Härdle, Léopold Simar
18. Highly Interactive, Computationally Intensive Techniques
Abstract
It is generally accepted that training in statistics must include some exposure to the mechanics of computational statistics. This exposure to computational methods is of an essential nature when we consider extremely high dimensional data. Computer aided techniques can help us discover dependencies in high dimensions without complicated mathematical tools. A draftman’s plot (i.e., a matrix of pairwise scatterplots like in Figure 1.14) may lead us immediately to a theoretical hypothesis (on a lower dimensional space) about the relationship of the variables. Computer aided techniques are therefore at the heart of multivariate statistical analysis.
Wolfgang Härdle, Léopold Simar
Backmatter
Metadaten
Titel
Applied Multivariate Statistical Analysis
verfasst von
Wolfgang Härdle
Léopold Simar
Copyright-Jahr
2003
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-662-05802-2
Print ISBN
978-3-540-03079-9
DOI
https://doi.org/10.1007/978-3-662-05802-2