Skip to main content
Top

2018 | OriginalPaper | Chapter

2. One-Factor Designs and the Analysis of Variance

Authors : Paul D. Berger, Robert E. Maurer, Giovana B. Celli

Published in: Experimental Design

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We begin this and subsequent chapters by presenting a real-world problem in the design and analysis of experiments on which at least one of the authors consulted. At the end of the chapter, we revisit the example and present analysis and results. The appendices will cover the analysis using statistical packages not covered in the main text, where appropriate. As you read the chapter, think about how the principles discussed here can be applied to this problem.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
The word level is traditionally used to denote the value, amount, or category of the independent variable or factor under study, to emphasize two issues: (1) the factor can be quantitative/numerical, in which case the word value would likely be appropriate, or it can be nominal (e.g., male/female, or supplier A/supplier B/supplier C), in which case the word category would likely be appropriate; (2) the analysis we perform, at least at the initial stage, always treats the variable as if it is in categories. Of course, any numerical variable can be represented as categorical: income, for example, can be represented as high, medium, or low.
 
2
Many different notational schemes are available, and notation is not completely consistent from one text/field/topic to another. We believe that using i for the row and j for the column is a natural, reader-friendly choice, and likewise for the choice of C for the number of columns and R for the number of rows. Naturally, when we go beyond just rows and columns, we will need to expand the notation (for example, if we wanted to investigate the impact of two factors, say region of residence and year of first purchase, with replication at each combination of levels of the two factors, we would need three indices). However, we believe that this choice of notation offers the wisest trade-off between being user-friendly (especially here at the initial stage of the text) and allowing an extrapolation of notation that remains consistent with the principles of the current notation.
 
3
Unequal sample sizes can result for various reasons and are a common issue, even in well-planned experiments. It can affect the design by compromising the random assignment of experimental units to the treatments, for example. However, in certain cases where one believes that the samples reflect the composition of a certain population, it might be possible to calculate an unweighted grand mean as the error variance is assumed to be constant across populations.
 
4
It would certainly seem that a notation of SSWc would be more consistent with SSBc , at least in this chapter. However, the former notation is virtually never used in English language texts. The authors suspect that this is because of the British ancestry of the field of experimental design and the sensitivity to WC as “water closet.” In subsequent chapters, the “within” sum of squares will not always be “in columns,” and the possible inconsistency becomes moot.
 
5
Of course, along with the sample sizes and difference in sample means, the standard deviation estimates for each town’s data need to be considered, along with a significance level, and so on; this description simply attempts to appeal to intuition.
 
6
Virtually all theoretical results in the field of statistics have some intuitive reasoning behind them; it remains for the instructor to convey it to the students.
 
7
As we did for the (n − 1) rule, we can think of this one degree of freedom as having been used in the calculation of the grand mean to estimate μ.
 
8
One could argue that this study really has two factors – one being the actual test device and the other the brand of the device. However, from another view, one can validly say that there are eight treatments of one factor. What is sacrificed in this latter view is the ability to separate the variability associated with the actual device from the one associated with the brand of the device. We view the study as a one-factor study so that it is appropriate for this chapter. The two-factor viewpoint is illustrated in later chapters.
 
9
Remember that we are assuming a constant variance. More details are discussed in Chap. 3.
 
10
One may say, “Why not examine (MSBc − MSW) and compare it to the value 0? Isn’t this conceptually just as good as comparing the ratio to 1?” The answer is a qualified yes. To have the ratio be 1 or different from 1 is equivalent to having the difference be 0 or different from 0. However, since MSBc and MSW are random variables, and do not exactly equal their respective parameter counterparts, (σ2 + V col) and σ2, as we shall discuss, we will need to know the probability distribution of the quantity examined. The distribution of the difference between MSBc and MSW depends critically on scale – in essence, the value of σ2, something we don’t know. Examining the ratio avoids this problem – the ratio is a dimensionless quantity! Its probability distribution is complex but can be determined with known information (R, C, and so on). Hence, we always study the ratio.
 
11
It is not exactly half the time for each, because, although the numerator and denominator of F calc both have the same expected value, their ratio does not have an expected value of 1; the expected value of F calc in our current discussion is (RC − C)/(RC − C − 2) > 1, although the result is only slightly more than 1 in most real-world cases. Also, as we shall soon see, the probability distribution of F calc is not symmetric.
 
12
It often happens, in an exposition such as this, that the best order of presentation is a function of the level of knowledge of the reader; background material required by one may be unnecessary for another. The flow of presentation is, of course, influenced by how these disparate needs are addressed. At this point, we present just enough of the hypothesis-testing background to allow us to continue with the analysis. Some readers may find it advantageous to first read Sect. 3.​3 and then return to the current section.
 
13
We assume that the values of τ j add to zero. If one of the τ j  ≠ 0, we have at least another τ j that is non-zero.
 
14
We are consistent in our notation; for example, when we encounter a quantity whose probability distribution is a t curve, we call the test statistic t calc.
 
15
You may recall from a basic statistics course that accepting or rejecting the null hypothesis is often based on the p-value, which in our study is the area on the F curve to the right of the F calc. The significance level (α) is a threshold value for p. We will see this in more detail in Chap. 3.
 
16
The vast majority of texts that include F tables have adopted the convention that the table has numerator df indexed across the top, and denominator df indexed going down the left-hand column (or, on occasion, the right-hand column for a right-side page).
 
17
An interval scale is the one formed by equal intervals in a certain order. For instance, the distance between the ratings 1 and 2 is the same as the one between 4 and 5 in our scale.
 
18
The reader should note that JMP and other statistical packages organize the data differently; i.e., each column is considered a new factor or response. In order to run this analysis, you will have to stack the columns (an option is available under Tables so you don’t have to do it manually). In this example, you will end up with 120 rows and 2 columns.
 
19
R 2 also represents the estimated proportion of variability in Y accounted for by variation in the level of the factor. We discuss this further in Chap. 14.
 
20
The adjusted R 2 is adjusting the R 2 statistic downward based on the number of factors under study; we also discuss this further in Chap. 14.
 
21
Excel output also contains a P-value column, which we will cover in the next chapter. It tells us that the p-value (the area to the right of F calc) is less than .05, indicating that the value 3.38 is in the critical region (rejection region) for α = .05.
 
Metadata
Title
One-Factor Designs and the Analysis of Variance
Authors
Paul D. Berger
Robert E. Maurer
Giovana B. Celli
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-64583-4_2

Premium Partner