nach oben

Erschienen in:

2013 | OriginalPaper | Buchkapitel

3. Multiple Regression Analysis

verfasst von : Stephanie R. Thomas

Erschienen in: Compensating Your Employees Fairly

Verlag: Apress

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

How do we examine compensation data for the presence or absence of discrimination?

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Types of Discrimination in Compensation

Nächstes Kapitel The Data

($15.00 + $15.00 + $15.00 + $14.00 + $13.00) / 5 = $14.40.

($13.00 + $12.00 + $11.00 + $11.00 + $10.00) / 5 = $11.40.

$14.40 – $11.40 = $3.00.

(5 + 5 + 5 + 4 + 3) / 5 = 4.4 years.

(3 + 2 + 1 + 1 + 0) / 5 = 1.4 years.

4.4 – 1.4 = 3 years.

Note that based on this analysis, there is no evidence of pay discrimination. The difference in average seniority by gender may be based on legitimate, nondiscriminatory factors, or it may be based on discrimination. For example, it may be the case that no qualified female applicants applied for employment with this employer until three years ago. This would be a legitimate, nondiscriminatory reason for the difference in average seniority by gender. It may also be the case that the employer refused to hire qualified female applicants before three years ago based on the gender of the applicant. This would be a discriminatory reason for differences in seniority. Although this may support a claim of hiring discrimination, this is a separate issue unrelated to the claim of compensation discrimination.

For simplicity in explanation, I focus on a two-variable model. The discussion will be expanded to a multivariable model later in this chapter.

Ramona Paetzhold and Steven Willborn, The Statistics of Discrimination: Using Statistical Evidence in Discrimination Cases (Thompson/West, 2006), 263.

Robert Matthews, “Storks Deliver Babies (p = 0.008)?,” Teaching Statistics 22, no. 2 (Summer 2000), 36–38.

One out of 125 is equivalent to a probability value of 0.008, or approximately 2.4 units of standard deviation. As discussed in a subsequent section, this result is statistically significant.

Matthews, “Storks Deliver Babies,” 38; emphasis in original.

Paetzhold and Willborn argue that the phrase “independent variable” is misleading, because the explanatory variables are not independent of the dependent variable, and it is the relationship between the dependent and explanatory variables that is being examined (The Statistics of Discrimination, p. 267). However, the dependent variable/independent variable nomenclature is commonly used in statistics and social sciences and is therefore presented herein.

It is possible, and under some circumstances likely, that a nonlinear relationship exists between variables. Nonlinear relationships between variables are discussed in a subsequent section of this chapter. The estimation of nonlinear models is beyond the scope of the current discussion.

Readers familiar with the mathematical formula for a straight line ($ y=mX+\beta $) will recognize the similarity between this equation and the one describing the relationship between hourly rates of pay and seniority.

A linear relationship between two or more variables will be expressed as a straight line. Nonlinear relationships can take various curvilinear forms and are beyond the scope of the current discussion.

Robert Pindyck and Daniel Rubinfeld, Econometric Models and Economic Forecasts, 4th ed. (Irwin/McGraw-Hill, 1998), 5.

This point is fully discussed in Chapter 4.

In this equation, gender would be represented by a dummy variable that would take on a value of 0 for men and a value of 1 for women. Dummy variables are discussed in detail in Chapter 4.

In statistical language, these coefficients are referred to as “beta one hat,” “beta two hat,” and “alpha hat.” The “hat”—or accent mark over the coefficient—is used to indicate that the coefficient is estimated, and not necessarily the true parameter of the study population. The true parameters are expressed without the hat.

“All other things held constant” is frequently expressed in its Latin equivalent, ceteris paribus.

Note that this difference refers only to the size of the estimated coefficient and does not consider whether the difference is statistically significant or “meaningful” from a statistical perspective. Interpretation of regression results are discussed in detail in a subsequent section.

Note that matrix form is printed in bold to distinguish it from non-matrix form.

It should be noted that the law does not require all relevant explanatory variables to be included in the model. Rather, as stated in Bazemore v. Friday: “failure to include variables will affect the analysis’ probativeness, not its admissibility” (Bazemore v. Friday, 478 U.S. 385, 106 S. Ct. 3000, 92 L. Ed. 2d 315, 32 Ed. Law Rep. 1223, 41 Fair Empl. Prac. Cas. [BNA] 92, 40 Empl. Prac. Dec. [CCH] ¶ 36199, 4 Fed. R. Serv. 3d 1259 [1986]).

Omitted variable bias has been described by some as “the biggest problem in econometrics.” See, for example, Charlie Gibbons, Lecture on “Omitted Variable Bias,” University of Berkeley, October 18, 2009.

There are different kinds of bias. Omitted variable bias is just one kind.

This is sometimes referred to as the “kitchen sink” approach, because everything possible—including the kitchen sink—is included in the model.

This problem becomes even more pronounced when regression analysis is used to develop “predicted” pay rates and internal equity is assessed on the basis of the difference between “predicted” and actual rates of pay. This is discussed in detail in Chapter 5.

A variety of transformations exist; common ones include logarithmic and power transformations.

It should be noted that measurement error differs from data error. Measurement error is systemic in that all observations of a given variable are impacted. A simple example of measurement error is weighing a group of individuals using a scale that is incorrectly calibrated. The measurement of each person’s weight is impacted by the incorrect calibration of the scale. Data errors, on the other hand, are not systemic; they are random in the sense that they affect some observations but don’t affect others.

Potential sources of measurement error in independent variables are discussed in detail in Chapter 4.

A data set is said to be homoscedastic if it has constant variance across all observations.

This example was studied in detail by S. Prais and H. Houthkker, The Analysis of Family Budgets (Cambridge: Cambridge University Press, 1955).

The Goldfield-Quandt test (S. Goldfield and R. Quandt, “Some Tests for Homoscedasticity,” Journal of the American Statistical Society 60, [1965], 539–47) and the Breusch-Pagan test (T. Breusch and A. Pagan, “A Simple Test for Heteroscedasticity and Random Coefficient Variation,” Econometrica 47 [1979], 1287–94) are two of the commonly used tests.

A cross-sectional data set contains observations at a given point in time across a sample of things. An example of a cross-section data set would be hourly pay rates as of December 31, 2013, for all employees in a specific job title. A time-series data set is one that contains observations across time for a given thing. An example of a time-series data set would be John Smith’s annual earnings for the years 1999 to 2013.

Similarly situated employee groupings are discussed in detail in Chapter 4.

Autocorrelation is sometimes referred to as serial correlation.

As noted by Pindyk and Rubinfeld, “positive serial correlation frequently occurs in time series studies either because of correlation in the measurement error component of the error term or, more likely, because of the high degree of correlation over time that is present in the cumulative effects of omitted variables.” Pindyck and Rubinfeld, Econometric Models and Economic Forecasts, 159.

J. Durbin and G. S. Watson, “Testing for Serial Correlation in Least-Squares Regression,” Biometrioka 38 (1951), 159–77.

A perfectly linear relationship between$ {X}_{1}$ and $ {X_2} $would exist if, for example,$ {X}_{2}$could be calculated according to the following: ${X}_{2}=a\text{X}_{1}$.

Peter Kennedy, A Guide to Econometrics, 3rd ed. (MIT Press, 1992), 176.

We know, for example, that John Smith and Bob Jones received bonus payments in 2012 of $105,000 and $112,000, respectively. It does not matter whether we measure these bonus payments today, tomorrow, or one month from now; the amounts will not change. They are fixed because they have already occurred.

Readers who are interested in the estimation of simultaneous equations are referred to standard econometrics textbooks.

Available statistical software packages are briefly discussed at the end of this chapter.

Note that in this example, our dummy variable for gender takes on a value of 1 for female and 0 for male. If this were reversed, and the dummy variable took on a value of 1 for male and 0 for female, the sign of the estimated coefficient would be positive, indicating the amount of additional annual salary an employee is expected to receive for being male, ceteris paribus. I return to this point in Chapter 4.

Readers who are unfamiliar with the basic concept of statistical significance are referred to the Appendix.

The t-distribution is relevant because a sample estimate of the error variance, rather than the true error variance, is used for this statistical test. Pindyck and Rubinfeld, Econometric Models and Economic Forecasts, 67.

Ibid., 68.

Hazelwood School District v. United States, 433 U.S. 299 (1977).

Paetzhold and Willborn, The Statistics of Discrimination, 280.

Ibid., 281.

Hazelwood School District v. United States, 433 U.S. 299 (1977).

The issue of practical significance is discussed in more detail in Chapter 10.

The power of a hypothesis test is calculated as $ (1-p(Type\,\text{II}\,\text{error}))$, one minus the probability that the null hypothesis will be accepted as true when it is in fact false.

Pindyck and Rubinfeld, Econometric Models and Economic Forecasts, 44.

R squared will automatically increase when an additional independent variable is added to the model. Unlike R squared, adjusted R squared only increases if the additional independent variable improves the model more than would be expected by chance.

Paetzhold and Willborn, The Statistics of Discrimination, 279.

Residual analysis is discussed in detail in Chapter 5.

Paetzhold and Willborn, The Statistics of Discrimination, 281.

This listing of software packages should not be construed or interpreted as endorsement by author or publisher.

Titel: Multiple Regression Analysis
verfasst von: Stephanie R. Thomas
Verlag: Apress
Buch: Compensating Your Employees Fairly
Print ISBN: 978-1-4302-5040-1

Electronic ISBN: 978-1-4302-5042-5

Copyright-Jahr: 2013
DOI: https://doi.org/10.1007/978-1-4302-5042-5_3

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Premium Partner