main-content

## Über dieses Buch

This textbook will familiarize students in economics and business, as well as practitioners, with the basic principles, techniques, and applications of applied statistics, statistical testing, and multivariate data analysis. Drawing on practical examples from the business world, it demonstrates the methods of univariate, bivariate, and multivariate statistical analysis. The textbook covers a range of topics, from data collection and scaling to the presentation and simple univariate analysis of quantitative data, while also providing advanced analytical procedures for assessing multivariate relationships. Accordingly, it addresses all topics typically covered in university courses on statistics and advanced applied data analysis. In addition, it does not limit itself to presenting applied methods, but also discusses the related use of Excel, SPSS, and Stata.

## Inhaltsverzeichnis

### 1. Statistics and Empirical Research

Abstract
“I don’t trust any statistics I haven’t falsified myself”.
Thomas Cleff

### 2. From Disarray to Dataset

Abstract
Let us begin with the first step of the intelligence cycle: data collection. Many businesses gather crucial information—on expenditures and sales, say—but few enter it into a central database for systematic evaluation. The first task of the statistician is to mine this valuable information. Often, this requires skills of persuasion: employees may be hesitant to give up data for the purpose of systematic analysis, for this may reveal past failures.
Thomas Cleff

### 3. Univariate Data Analysis

Abstract
Let us return to our students from the previous chapter. After completing their survey of bread spreads, they have now coded the data from the 850 respondents and entered them into a computer. In the first step of data assessment, they investigate each variable—for example, average respondent age—separately. This is called univariate analysis (see Fig. 3.1). By contrast, when researchers analyse the relationship between two variables—for example, between gender and choice of spread—this is called bivariate analysis (see Chap. 4). With relationships between more than two variables, one speaks of multivariate analysis (see Sect. 9.​5 and Chaps. 10, 12, and 13).
Thomas Cleff

### 4. Bivariate Association

Abstract
In the first stage of data analysis, we learned how to examine variables and survey traits individually, or univariately. In this chapter, we’ll learn how to assess the association between two variables using methods known as bivariate analyses. This is where statistics starts getting interesting—practically as well as theoretically.
Thomas Cleff

### 5. Classical Measurement Theory

Abstract
The term descriptive statistics refers to all techniques used to obtain information based on the description of data from a population. The calculation of figures and parameters and the generation of graphics and tables are just some of the methods and techniques used in descriptive statistics. Inferential statistics, sometimes referred to as inductive statistics, did not develop until much later. It uses samples to make conclusions, or inferences, about a population. Many of the methods of inferential statistics go back to discoveries made by such thinkers as Jacob Bernoulli (1654–1705), Abraham de Moivre (1667–1754), Thomas Bayes (1702–1761), Pierre-Simon Laplace (1749–1827), Carl Friedrich Gauß (1777–1855), Pafnuty Lvovich Chebyshev (1821–1894), Francis Galton (1822–1911), Ronald A. Fisher (1890–1962), and William Sealy Gosset (1876–1937). Thanks to their work, we no longer have to count and measure each individual within a population, but can instead conduct a smaller, more manageable survey. This comes in handy when a full survey would be too expensive or take too long, or when collecting the data damages the elements under investigation (as with various kinds of material testing, such as wine tasting). But inferential statistics has a price: because the data are collected from a sample, not from a total population, our conclusions about the data carry a certain degree of uncertainty. But inferential statistics can also define the “price” of this uncertainty using margins of error. Classical measurement theory gives us the tools to calculate statistical error.
Thomas Cleff

### 6. Calculating Probability

Abstract
Let me summarize what I have said about samples so far. Attempts to acquire and collect data about all elements in a given population should be avoided when doing it.
Thomas Cleff

### 7. Random Variables and Probability Distributions

Abstract
In the previous chapter, we learned about various principles and calculation techniques dealing with probability. Building on these lessons, we will now examine some theoretical probability distributions that allow us to make inferences about populations from sample data. These probability distributions are based on the idea of random variables. A random variable is a variable whose numerical values represent the outcomes of random experiments. Random variables are symbolized with capital letters such as “X”. The individual values of random variables are represented either with a capital Roman letter followed by a subscript “i” (e.g. “X i”) or with the lower case of the random variable letter (e.g. “x”). Generally, there are two types of random variables:
Thomas Cleff

### 8. Parameter Estimation

Abstract
Now that we have laid the theoretical foundations of probability calculus in the past chapters, let us recall what all the effort was about. The main purpose of inductive statistics is to develop methods for making generalizations about a population from sample data. This chapter will present these methods, known as statistical estimation of parameters. Statistical estimation is a procedure for estimating the value of an unknown population parameter—for example, average age. There are two types of estimation procedures: point estimation and interval estimation.
Thomas Cleff

### 9. Hypothesis Testing

Abstract
Among the most important techniques in statistics is hypothesis testing. A hypothesis is a supposition about a certain state of affairs. It does not spring from a sudden epiphany or a long-standing conviction; rather, it offers a testable explanation of a specific phenomenon. A hypothesis is something that we can accept (verify) or reject (falsify) based on empirical data.
Thomas Cleff

### 10. Regression Analysis

Abstract
Regression analysis—often referred to simply as regression—is an important tool in statistical analysis. The concept first appeared in an 1877 study on sweet-pea seeds by Sir Francis Galton (1822–1911). He used the idea of regression again in a later study on the heights of fathers and sons. He discovered that sons of tall fathers are tall but somewhat shorter than their fathers, while sons of short fathers are short but somewhat taller than their fathers. In other words, body height tends toward the mean. Galton called this process a regression—literally, a step back or decline. We can perform a correlation to measure the association between the heights of sons and fathers. We can also infer the causal direction of the association. The height of sons depends on the height of fathers and not the other way around. Galton indicated causal direction by referring to the height of sons as the dependent variable and the height of fathers as the independent variable. But take heed: regression does not necessarily prove the causality of the association. The direction of effect must be derived theoretically before it can be empirically proven with regression. Sometimes the direction of causality cannot be determined, as, for example, between the ages of couples getting married. Does the age of the groom determine the age of the bride or vice versa? Or do the groom’s age and the bride’s age determine each other mutually? Sometimes the causality is obvious. So, for instance, blood pressure has no influence on age, but age has influence on blood pressure. Body height has an influence on weight, but the reverse association is unlikely (Swoboda 1971, p. 308).
Thomas Cleff

### 11. Time Series and Indices

Abstract
In the preceding chapter, we used a variety of independent variables to predict dress sales. All the trait values for sales (dependent variable) and for catalogue image size (independent variable) were recorded over the same period of time. Studies like these are called cross-sectional analyses. When the data is measured at successive time intervals, it is called a time series analysis or a longitudinal study. This type of study requires a time series in which data for independent and dependent variables are observed for specific points of time (t = 1,…, n). In its simplest version, time is the only independent variable and is plotted on the x-axis. This kind of time series does nothing more than link variable data over different periods. Figure 11.1 shows an example with a graph of diesel fuel prices by year.
Thomas Cleff

### 12. Cluster Analysis

Abstract
Before we turn to the subject of cluster analysis, think for a moment about the meaning of the word cluster. The term refers to a group of individuals or objects that converge around a certain point and are thus closely related in their position. In astronomy there are clusters of stars; in chemistry, clusters of atoms. Economic research often relies on techniques that consider groups within a total population. For instance, firms that engage in target group marketing must first divide consumers into segments, or clusters of potential customers. Indeed, in many contexts researchers and economists need accurate methods for delineating homogenous groups within a set of observations. Groups may contain individuals (such as people or their behaviours) or objects (such as firms, products, or patents). This chapter thus takes a cue from Goethe’s Faust (1987, Line 1943–45): “You soon will [understand]; just carry on as planned/You’ll learn reductive demonstrations/And all the proper classifications”.
Thomas Cleff

### 13. Factor Analysis

Abstract
Frequently, empirical studies rely on a wide variety of variables—so-called item batteries—to describe a certain state of affairs. An example for such a collection of variables is the study of preferred toothpaste attributes by Malhotra (2010, p. 639). Thirty people were asked the questions in Fig. 13.1.
Thomas Cleff

### Backmatter

Weitere Informationen