On Similarity Coefficients for 2×2 Tables and Correction for Chance

Warrens, Matthijs J.

doi:10.1007/s11336-008-9059-y

On Similarity Coefficients for 2×2 Tables and Correction for Chance

Theory and Methods
Open access
Published: 01 March 2008

Volume 73, pages 487–502, (2008)
Cite this article

Download PDF

You have full access to this open access article

Psychometrika Aims and scope Submit manuscript

On Similarity Coefficients for 2×2 Tables and Correction for Chance

Download PDF

Matthijs J. Warrens¹

1446 Accesses
101 Citations
3 Altmetric
Explore all metrics

Abstract

This paper studies correction for chance in coefficients that are linear functions of the observed proportion of agreement. The paper unifies and extends various results on correction for chance in the literature. A specific class of coefficients is used to illustrate the results derived in this paper. Coefficients in this class, e.g. the simple matching coefficient and the Dice/Sørenson coefficient, become equivalent after correction for chance, irrespective of what expectation is used. The coefficients become either Cohen’s kappa, Scott’s pi, Mak’s rho, Goodman and Kruskal’s lambda, or Hamann’s eta, depending on what expectation is considered appropriate. Both a multicategorical generalization and a multivariate generalization are discussed.

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Article 04 June 2018

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment

Article Open access 01 March 2024

1 1. Introduction

Measures of resemblance play an important role in many domains of data analysis. A similarity coefficient is a measure of association or agreement of two entities or variables. A well-known coefficient for two continuous variables is Pearson’s product-moment correlation, but various other similarity coefficients may be used (see, e.g., Goodman & Kruskal, 1954; Zegers & Ten Berge, 1985; Gower & Legendre, 1986). In this paper we focus on similarity coefficients that can be defined using the four dependent proportions, a, b, c, and d, presented in Table 1. Instead of probabilities, Table 1 may also be defined on counts or frequencies; probabilities are used here for notational convenience.

Table 1 Bivariate proportions table for binary variables.

Full size table

The data in Table 1 may be obtained from a 2×2 reliability study: a, b, c, and d are observed proportions resulting from classifying m persons using a dichotomous response (Fleiss, 1975; Bloch & Kraemer, 1989; Blackman & Koval, 1993). In cluster analysis, Table 1 may be the result of comparing partitions from two clustering methods: a is the proportion of object pairs that were placed in the same cluster according to both clustering methods, b (c) is the proportion of pairs that were placed in the same cluster according to one method but not according to the other, and d is the proportion of pairs that were not in the same cluster according to either of the methods (Albatineh, Niewiadomska-Bugaj & Mihalko, 2006; Steinley, 2004).

Numerous 2 × 2 resemblance measures have been proposed in the literature (Gower & Legendre, 1986; Krippendorff, 1987; Hubálek, 1982; Baulieu, 1989; and Albatineh et al., 2006). Let a similarity coefficient be denoted by S. Table 2 presents ten similarity coefficients that will be used to illustrate the results in this paper. Following Sokal and Sneath (1963, p. 128) and Albatineh et al. (2006), the convention is adopted of calling a coefficient by its originator or the first we know to propose it. The coefficients in Table 2 may be considered both as population parameters as well as sample statistics; in this paper we use the latter. Some of these coefficients have been proposed in different domains of data analysis, but turn out to be equivalent after recoding.

Table 2 Ten 2 × 2 similarity coefficients.

Full size table

If the two variables are statistically independent, we may desire that the theoretical value of a similarity coefficient be zero. Coefficient S _Cohen satisfies this requirement; coefficients S _SM and S _Cze do not. If a coefficient does not have zero value under statistical independence, it may be corrected for agreement due to chance (Fleiss, 1975; Zegers, 1986; Krippendorff, 1987; Albatineh et al., 2006). After correction for chance, a similarity coefficient S has a form

$$CS = {{S - E(S)} \over {1 - E(S)}}$$

, where expectation E(S) is conditional upon fixed marginal proportions in Table 1. Various authors have noted that some coefficients become equivalent after correction (1). For example, Fleiss (1975) and Zegers (1986) showed that S _SM and S _Cze become S _Cohen after correction (1). In addition, Zegers (1986) showed that S _Ham, and Fleiss (1975) showed that S _GK1 and S _RG, become S _Cohen after correction for chance.

Albatineh et al. (2006) studied correction (1) for a specific family of coefficients. They showed that coefficients may coincide after correction for chance, irrespective of what expectation is used. The main result of their paper is Proposition 1 in Section 3. In this paper, we continue the general approach by Albatineh et al. (2006) and present several new results with respect to correction (1).

The paper is organized as follows. Similar to Albatineh et al. (2006) correction (1) is studied for a general family of coefficients. This family of coefficients, of a form S = λ + μ(a + d), is introduced in the next section. Section 3 is used to present the main results. In addition to a powerful result by Albatineh et al. (2006), Section 3 considers two additional functions. If coefficients are related by one of these functions, they become equivalent after correction (1), irrespective of what expectation E(S) is used.

Additional results may be obtained by considering different expectations E(S). The specific results in Section 4 unify and extend the findings for individual coefficients in Fleiss (1975) and Zegers (1986). Section 5 discusses corrected coefficients and some of their properties. Also in Section 5, we discuss a generalization of an inequality in Blackman and Koval (1993) for Cohen’s kappa and Scott’s pi. Sections 6 and 7 discuss two natural generalizations of the results in Sections 3 to 5. Section 6 presents a multicategorical extension; Section 7 describes a family of multivariate coefficients. Section 8 contains the discussion.

2 2. A Family of Coefficients

Consider a family L of coefficients of a form S = λ+μ(a +d), where proportions a and d are defined in Table 1, and where λ and μ, different for each coefficient, depend on the marginal probabilities of Table 1. Since S _SM = a + d, all members in L family are linear transformations of S _SM, the observed proportion of agreement, given the marginal probabilities. Clearly, S _SM is in L family. Furthermore, all ten coefficients in Table 2 are in L family.

Example 1. Coefficient S _Cze was independently proposed by Czekanowski (1932), Dice (1945), and Sørenson (1948). The coefficient is often attributed to Dice (1945), and it was also derived by Nei and Li (1979). Bray (1956) noted that coefficient S _Cze could be found in Gleason (1920). Coefficient S _Cze

$${S_{{\rm{Cze}}}} = {{2a} \over {{p_1} + {p_2}}} = {{(a + d) - 1} \over {{p_1} + {p_2}}} + 1$$

. Thus, coefficient S _Cze can be written in a form S _Cze = λ+μ(a +d), where

$$\lambda = {{ - 1} \over {{p_1} + {p_2}}} + 1\;{\rm{and}}\;\mu = {1 \over {{p_1} + {p_2}}}$$

.

Example 2. Scott (1955) proposed a measure of interrater-reliability denoted by the symbol pi. For two dichotomized variables Scott’s pi

$${S_{{\rm<Emphasis Type="SmallCaps">ott</Emphasis>}}} = {{4ad - {{(b + c)}^2}} \over {({p_1} + {p_2})({q_1} + {q_2})}}$$

. With respect to the numerator of S _Scott, we have

$$a(1 - a - b - c) - {{{{(b + c)}^2}} \over 4} = a - {{{{(a + b)}^2}} \over 4} - {{{{(a + c)}^2}} \over 4} - {{(a + b)(a + c)} \over 2} = a - {\left( {{{{p_1} + {p_2}} \over 2}} \right)^2}$$

. Similarly we have

$$\eqalign{ & a(1 - a - b - c) - {{{{(b + c)}^2}} \over 4} = a - {{{{(a + b)}^2}} \over 4} - {{{{(a + c)}^2}} \over 4} - {{(a + b)(a + c)} \over 2} = a - {\left( {{{{p_1} + {p_2}} \over 2}} \right)^2}. \cr & d(1 - b - c - d) - {{{{(b + c)}^2}} \over 4} = d - {\left( {{{{q_1} + {q_2}} \over 2}} \right)^2} \cr} $$

. Thus, coefficient S _Scott

$${S_{{\rm<Emphasis Type="SmallCaps">ott</Emphasis>}}} = {{4(a + d) - {{({p_1} + {p_2})}^2} - {{({q_1} + {q_2})}^2}} \over {2({p_1} + {p_2})({q_1} + {q_2})}}$$

can be written in a form S _Scott = λ +μ(a + d) where

$$\lambda = {{ - {{({p_1} + {p_2})}^2} - {{({q_1} + {q_2})}^2}} \over {2({p_1} + {p_2})({q_1} + {q_2})}}\;{\rm{and}}\;\mu = {2 \over {({p_1} + {p_2})({q_1} + {q_2})}}$$

.

Example 3. The best-known index for interrater-reliability is the kappa-statistic proposed by Cohen (1960). Since

$$\eqalign{ & ad - bc = a(1 - a - b - c) - bc = a - (a + b)(a + c) = a - {p_1}{p_2}\;{\rm{and}} \cr & ad - bc = d(1 - b - c - d) - bc = d - (b + d)(c + d) = d - {q_1}{q_2} \cr} $$

, Cohen’s kappa for two dichotomized variables is given by

$${S_{{\rm{Cohen}}}} = {{2(ad - bc)} \over {{p_1}{q_2} + {p_2}{q_1}}} = {{(a + d) - {p_1}{p_2} - {q_1}{q_2}} \over {{p_1}{q_2} + {p_2}{q_1}}}$$

. Coefficient S _Cohen can be written in a form S _Cohen = λ+ μ(a + d), where

$$\lambda = {{ - {p_1}{p_2} - {q_1}{q_2}} \over {{p_1}{q_2} + {p_2}{q_1}}}\;{\rm{and}}\;\mu = {1 \over {{p_1}{q_2} + {p_2}{q_1}}}$$

.

Since $a = {p_2} - {q_1} + d$, probabilities a and d are also linear in (a + d). Linear in (a + d) is therefore equivalent to linear in a and linear in d. Furthermore, Albatineh et al. (2006) studied coefficients that are linear in ΣΣn ²_ij , where n _ij is the number of data points placed in cluster i according to the first clustering method and in cluster j according to the second clustering method. Because $ma = (\sum {\sum {n_{ij}^2 - m)/2} } $, linear in ΣΣn ²_ij is equivalent to linear in a and equivalent to linear in (a + d).

A well-known similarity measure that cannot be written in a form $S = \lambda + \mu (a + d)$ is coefficient

$${S_{{\rm{Jac}}}} = {a \over {a + b + c}} = {a \over {{p_1} + {p_2} - a}}$$

by Jaccard (1912). Other examples of coefficients that do not belong to L family can be found in Albatineh et al. (2006) and Baulieu (1989).

3 3. Main Results

Albatineh et al. (2006) showed that correction (1) is relatively simple for coefficients that belong to L family. Two members in L family become equivalent after correction for chance agreement if they have the same ratio (2).

Proposition 1 (Albatineh et al., 2006, p. 309). Two members in L family become identical after correction (1) if they have the same ratio

$${{1 - \lambda } \over \mu }$$

.

Proof: E (S) = E[λ+μ(a +d)] = λ +μE(a +d) and consequently the CS becomes

$$\eqalign{ & CS = {{S - E(S)} \over {1 - E(S)}} = {{\lambda + \mu (a + d) - \lambda - \mu E(a + d)} \over {1 - \lambda - \mu E(a + d)}} \cr & = {{a + d - E(a + d)} \over {{\mu^{ - 1}}(1 - \lambda ) - E(a + d)}} \cr} $$

.

Thus, the value of a similarity coefficient after correction for chance depends on ratio (2), where λ and μ characterize the particular measure within L family.

Corollary 1 below extends Corollary 4.2(i) in Albatineh et al. (2006) from three measures (S _SM, S _Ham, and S _Cze) to the ten coefficients in Table 2. The coefficients in Table 2 coincide after correction (1), irrespective of what expectation E(S) is used.

Corollary 1. Coefficients S _SM, S _Ham, S _Cze, S _GK1, S _GK2, S _GK3, S _NS, S _RG, S _Scott, and S _Cohen become equivalent after correction (1).

Proof: By Proposition 1 it suffices to inspect ratio (2). Using the formulas of λ and μ corresponding to each coefficient we obtain the ratio (2)

$${{1 - \lambda } \over \mu } = 1$$

for all ten coefficients. Only the proofs for coefficients S _Scott and S _Cohen are presented. Using the formulas for λ and μ from Example 2 we obtain the ratio (2)

$${{1 - \lambda } \over \mu } = {{2({p_1} + {p_2})({q_1} + {q_2}) + {{({p_1} + {p_2})}^2} + {{({q_1} + {q_2})}^2}} \over 4} = {{{{({p_1} + {p_2} + {q_1} + {q_2})}^2}} \over 4} = 1$$

. Using the formulas for λ and μ from Example 3 we obtain the ratio (2)

$$\eqalign{ & {{1 - \lambda } \over \mu } = {{2({p_1} + {p_2})({q_1} + {q_2}) + {{({p_1} + {p_2})}^2} + {{({q_1} + {q_2})}^2}} \over 4} = {{{{({p_1} + {p_2} + {q_1} + {q_2})}^2}} \over 4} = 1. \cr & {{1 - \lambda } \over \mu } = {p_1}{q_2} + {p_2}{q_1} + {p_1}{p_2} + {q_1}{q_2} = ({p_1} + {q_1})({p_2} + {q_2}) = 1 \cr} $$

.

Note that (1 − λ)/μ = 1 for all coefficients in Table 2 (ratio (4)). The value 1 is also the maximum value regardless of the marginal probabilities of these similarity coefficients.

Due to Proposition 1, ratio (2) may be used to inspect whether coefficients become equivalent after correction for chance. Alternatively, it can be shown that coefficients that have a specific relationship coincide after correction. In the remainder of this section we consider two functions that may relate similarity coefficients:

$${S_2} = 2{S_1} - 1\;{\rm{and}}\;{S_3} = {{{S_1} + {S_2}} \over 2}$$

. Both functions may be used to construct new resemblance measures from existing similarity coefficients. It is not difficult to show that S ₂ = 2S ₁ − 1 is in L family if and only if S ₁ is in L family, and if S ₁ and S ₂ are in L family, then S ₃ = (S ₁ + S ₂)/2 is in L family. Two coefficients S ₁ and S ₂ that are related by S ₂ = 2S ₁ − 1 become equivalent after correction for chance.

Proposition 2. Let S ₁ be a member in L family. S ₁ and S ₂ = 2S ₁ − 1 become identical after correction (1).

Proof: S ₂ = 2λ + 2μ(a + d) − 1 and E(S ₂) = 2λ − 1 + 2μE(a+ d). Consequently the CS ₂ becomes

$$\eqalign{ & C{S_2} = {{2\lambda + 2\mu (a + d) - 1 - 2\lambda - 2\mu E(a + d) + 1} \over {1 - 2\lambda - 2\mu E(a + d) + 1}} = {{\lambda + \mu (a + d) - \lambda - \mu E(a + d)} \over {1 - \lambda - \mu E(a + d)}} \cr & = {{{S_1} - E({S_1})} \over {1 - E({S_1})}} = C{S_1} \cr} $$

Example 4. Various similarity coefficients have a relationship S ₂ = 2S ₁ − 1. Examples from Table 2 are S _Ham = 2S _SM−1, S _GK1 = 2S _Cze − 1, and S _GK2 = 2S _NS − 1. Due to either Proposition 1 with Corollary 1 or Proposition 2, these coefficients coincide after correction (1).

Theorem 1. Let S _i for i = 1, 2, …, n be members in L family that become identical after correction (1). Then S _i for i = 1, 2, …, n and the arithmetic mean

$$AM = {1 \over n}\sum\limits_{i = 1}^n {{S_i}} $$

become equivalent after correction (1).

Remark. The original proof has been simplified with the help of an anonymous referee.

Proof:

$$E(AM) = {1 \over n}\left( {\sum\limits_{i = 1}^n {{\lambda _i} + \sum\limits_{i = 1}^n {{\mu _i}E(a + d} } } \right)$$

. Using (5) and (6) in (1) we obtain

$$CS = {{a + d - E(a + d)} \over {y - E(a + d)}}\;{\rm{where}}\;y = {{n - \sum\nolimits_{i = 1}^n {{\lambda _i}} } \over {\sum\nolimits_{i = 1}^n {{\mu _i}} }}$$

. Let

$$x = {{1 - {\lambda _1}} \over {{\mu _1}}} = {{1 - {\lambda _2}} \over {{\mu _2}}} = \ldots = {{1 - {\lambda _n}} \over {{\mu _n}}}$$

.

Due to Proposition 1, it must be shown that ratio y equals ratio x. We have

$$y = {{\sum\nolimits_{i = 1}^n {(1 - {\lambda _i})} } \over {\sum\nolimits_{i = 1}^n {{\mu _i}} }} = {{\sum\nolimits_{i = 1}^n {x{\mu _i}} } \over {\sum\nolimits_{i = 1}^n {{\mu _i}} }} = {{x\sum\nolimits_{i = 1}^n {{\mu _i}} } \over {\sum\nolimits_{i = 1}^n {{\mu _i}} }} = x.$$

This completes the proof.

Example 5. Coefficient

$${S_{{\rm{RG}}}} = {a \over {2a + b + c}} + {d \over {b + c + 2d}} = {{{S_{{\rm{Cze}}}} + {S_{{\rm{NS}}}}} \over 2}$$

is the arithmetic mean of S _Cze and S _NS. Due to either Proposition 1 with Corollary 1 or Theorem 1, these three coefficients become equivalent after correction (1).

4 4. Specific Results

Remember that (4) holds for all coefficients in Table 2. Due to Corollary 1 these coefficients coincide after correction (1). The corrected coefficient corresponding to the resemblance measures in Corollary 1 has a form

$${{a + d - E(a + d)} \over {1 - E(a + d)}}$$

. Coefficient (7) may be obtained by using (4) in (3). Since expectation E(a + d) is unspecified, coefficient (7) is a general corrected coefficient. Specific cases of (7) may be obtained by specifying E(a +d) in (7).

Different opinions have been stated on what the appropriate expectations are for the 2 × 2 contingency table. Detailed discussions on the various ways of regarding data as the product of chance can be found in Krippendorff (1987), Mak (1988), Bloch and Kraemer (1989), and Pearson (1947). In cluster analysis it is general consensus that the popular coefficient S _SM, called the Rand index, should be corrected for agreement due to chance (Morey & Agresti, 1984; Hubert & Arabie, 1985), although there is some debate on what expectation is appropriate (Hubert & Arabie, 1985; Steinley, 2004; Albatineh et al., 2006). We consider five examples of E(a + d).

Example 6a. Suppose it is assumed that the frequency distribution underlying the two variables in Table 1 is the same for both variables (Scott, 1955; Krippendorff, 1987, p. 113). Coefficients used in this case are sometimes referred to as agreement indices. The common parameter p must be either known or it must be estimated from p ₁ and p ₂. Different functions may be used. For example, Scott (1955) and Krippendorff (1987) used the arithmetic mean

$$p = {{{p_1} + {p_2}} \over 2}$$

.

Following Scott (1955) and Krippendorff (1987, p. 113) we have

$$E{(a + d)_{{\rm<Emphasis Type="SmallCaps">ott</Emphasis>}}} = {\left( {{{{p_1} + {p_2}} \over 2}} \right)^2} + {\left( {{{{q_1} + {q_2}} \over 2}} \right)^2}$$

.

Let m denote the number of elements of the binary variables. Mak (1988) proposed the expectation

$$E{(a + d)_{{\rm{Mak}}}} = 1 - {{m({p_1} + {p_2})({q_1} + {q_2}) - (b + c)} \over {2(m - 1)}}$$

(see also, Blackman & Koval, 1993).

Example 6b. Instead of a single distribution function, it may be assumed that the data are a product of chance concerning two different frequency distributions, each with its own parameter (Cohen, 1960; Krippendorff, 1987). Coefficients used in this case are sometimes referred to as association indices. The expectation of an entry in Table 1 under statistical independence is defined by the product of the marginal probabilities. We have

$$E{(a + d)_{{\rm{Cohen}}}} = {p_1}{p_2} + {q_1}{q_2}$$

. The expectation E(a + d)_Cohen can be obtained by considering all permutations of the observations of one of the two variables, while preserving the order of the observations of the other variable. For each permutation the value of (a + d) can be determined. The arithmetic mean of these values is p ₁ p ₂ +q ₁ q ₂.

Example 6c. A third possibility is that there are no relevant underlying continua. For this case two forms of E(a +d) may be found in the literature. Note that a and d in Table 1 may be interpreted as the proportions of positive and negative matches, whereas b and c are the proportions of nonmatching observations. Goodman and Kruskal (1954, p. 757) used expectation

$$E{(a + d)_{{\rm{GK}}}} = {{\max ({p_1} + {p_2},{q_1} + {q_2})} \over 2} = {{2\max (a,d) + b + c} \over 2}$$

.

Expectation E(a + d)_GK focuses on the largest group of matching observations. According to Krippendorff (1987, p. 114) an equity coefficient is characterized by expectation

$$E{(a + d)_{{\rm{Kripp}}}} = {1 \over 2}$$

.

In the case of association (Example 6b) the observations are regarded as ordered pairs. In the case of agreement (Example 6a) the observations are considered as pairs without regard for their order; a mismatch is a mismatch regardless of the kind. In the case of equity one only distinguishes between matching and nonmatching observations (cf. Krippendorff, 1987).

Theorem 2 below unifies and extends findings in Fleiss (1975) and Zegers (1986) on what coefficients become Cohen’s kappa after correction for chance. Depending on what expectation E(a +d) from Examples 6a to 6c is used, the coefficients in Table 2 become, after correction for chance, either Scott’s (1955) pi (S _Scott), Cohen’s (1960) kappa (S _Cohen), Goodman and Kruskal’s (1954) lambda (S _GK3), Hamann’s (1961) eta (S _Ham), or Mak’s (1988) rho. The latter coefficient can be written as

$${S_{{\rm{Mak}}}} = {{4mad - m{{(b + c)}^2} + (b + c)} \over {m({p_1} + {p_2})({q_1} + {q_2}) - (b + c)}}$$

where m is the length of the binary variables.

Theorem 2. Let S be a member in L family for which ratio (4) holds. If the appropriate expectation is

(i)
E(a+d)_Scott, then S becomes S _Scott,
(ii)
E(a+d)_Mak, then S becomes S _Mak,
(iii)
E(a+d)_Cohen, then S becomes S _Cohen,
(iv)
E(a+d)_GK, then S becomes S _GK3,
(v)
E(a+d)_Kripp, then S becomes S _Ham, after correction (1).

Proof: (i): Using E(a + d)_Scott in (7) we obtain an index of which the numerator equals

$$a + d - {\left( {{{{p_1} + {p_2}} \over 2}} \right)^2} - {\left( {{{{q_1} + {q_2}} \over 2}} \right)^2} = 2ad - {{{{(b + c)}^2}} \over 2}$$

(see Example 2) and the denominator equals

$${{{{({p_1} + {p_2} + {q_1} + {q_2})}^2} - {{({p_1} + {p_2})}^2} - {{({q_1} + {q_2})}^2}} \over 4} = {{({p_1} + {p_2})({q_1} + {q_2})} \over 2}$$

. Dividing the right-hand part of (8) by the right-hand part of (9) we obtain

$${{4ad - {{(b + c)}^2}} \over {({p_1} + {p_2})({q_1} + {q_2})}} = {S_{{\rm<Emphasis Type="SmallCaps">ott</Emphasis>}}}$$

.

(ii): Using E(a +d)_Mak in (7) and multiplying the result by 2(m ⁻¹) we obtain an index of which the numerator equals

$$\eqalign{ & 2(a + d - 1)(m - 1) + m({p_1} + {p_2})({q_{^1}} + {q_2}) - (b + c) \cr & = m(2a + b + c)(b + c + 2d) - 2m(b + c) + (b + c) \cr} $$

, and the denominator equals

$$m({p_1} + {p_2})({q_{^1}} + {q_2}) - (b + c)$$

. We have

$$\eqalign{ & (2a + b + c)(b + c + 2d) - 2(b + c) \cr & = 4ad + (2a + 2d)(b + c) + {(b + c)^2} - 2(b + c) \cr & = 4ad + (2a + 2d - 2)(b + c) + {(b + c)^2} \cr & = 4ad - 2{(b + c)^2} + {(b + c)^2} = 4ad - {(b + c)^2} \cr} $$

. Using the right-hand part of (12), numerator (Equ10) can be written as

$$m\left[ {4ad - {{(b + c)}^2}} \right] + (b + c)$$

Dividing (13) by (11) we obtain coefficient S _Mak.

(iii): Using E(a +d)_Cohen in (7) we obtain

$${{a + d - {p_1}{p_2} - {q_1}{q_2}} \over {({p_1} + {q_1})({p_2} + {q_2}) - {p_1}{p_2} - {q_1}{q_2}}} = {{2(ad - bc)} \over {{p_1}{q_2} + {p_2}{q_1}}} = {S_{{\rm{Cohen}}}}$$

.

(iv): Using E(a +d)_GK in (7) we obtain

$${{2[a + d - \max (a,d)] - b - c} \over {2 - 2\max (a,d) - b - c}} = {{2\min (a,d) - b - c} \over {2\min (a,d) + b + d}} = {S_{{\rm{GK3}}}}$$

.

(v): Using E(a +d)_Kripp in (7) we obtain

$$2(a + d) - 1 = a - b - c + d = {S_{{\rm{Ham}}}}$$

.

5 5. Corrected Coefficients

The coefficients in Table 2 become either S _Scott, S _Mak, S _Cohen, S _GK3, or S _Ham, depending on what expectation E(a+d) is used. Note that corrected coefficients S _Scott, S _Cohen, S _GK3, and S _Ham belong to the class of resemblance measures that is considered in Corollary 1 and Theorem 2. This suggests that corrected coefficients may have some interesting properties, which are the topic of this section. If E(S) in (1) depends on the marginal probabilities in Table 1, then CS in (1) belongs to L family.

Proposition 3. Let E(S) in (1) depend on the marginal probabilities. If S is in L family, then CS in (1) is in L family.

Proof: Expectation E(S) = E[λ1 +μ ₁(a +d)] is a function of the marginal probabilities. Thus E(a + d), λ, and μ in (3) are functions of the marginal proportions. Equation (3) can therefore be written in a form λ ₂ +μ ₂(a + d) where

$${\lambda _2} = {{ - E(a + d)} \over {\mu _1^{ - 1}(1 - {\lambda _1}) - E(a + d)}}\;{\rm{and}}\;{\mu _2} = {1 \over {\mu _1^{ - 1}(1 - {\lambda _1}) - E(a + d)}}$$

.

Examples of corrected coefficients that are in L family are S _Scott, S _Cohen, S _GK3, and S _Ham. These coefficients may be considered as corrected coefficients as well as ordinary coefficients that may be corrected for agreement due to chance. For example, S _Scott, S _GK3, and S _Ham (and S _Cohen) become S _Cohen after correction (1) if expectation E(a + d)_Cohen is used. Coefficient S _Mak cannot be written in a form λ+μ(a + d), and does therefore not belong to L family.

At the end of this section we consider the following problem. Suppose a coefficient S in L family is corrected twice, using two different expectations, E(a + d) and E(a + d)*. Let the corrected coefficients be given by

$$CS = {{a + d - E(a + d)} \over {{\mu^{ - 1}}(1 - \lambda ) - E(a + d)}}\;{\rm{and}}\;C{S^*} = {{a + d - E{{(a + d)}^*}} \over {{\mu^{ - 1}}(1 - \lambda ) - E{{(a + d)}^*}}}$$

. Note that μ ⁻¹(1 − λ) corresponding to coefficient S, is the same in both CS and CS*. The problem is then as follows: if E(a + d) ≥ E(a + d)*, how are CS and CS* related? It turns out that CS is a decreasing function of E(a + d). Proposition 4 is limited to coefficients in L family of which the maximum value is 1, that is

$$\lambda + \mu (a + d) \le 1\;{\rm{if}}\;{\rm{and}}\;{\rm{only}}\;{\rm{if}}{{1 - \lambda } \over \mu } \ge (a + d)$$

. It can be verified that the similarity coefficients in Table 2 and S _Mak satisfy this condition.

Proposition 4. CS is a decreasing function of E(a + d).

Proof: CS ≤ CS* if and only if

$$E(a + d)\left[ {{{1 - \lambda } \over \mu } - (a + d)} \right] \ge E{(a + d)^*}\left[ {{{1 - \lambda } \over \mu } - (a + d)} \right]$$

. The requirement λ +μ(a + d) ≤ 1 completes the proof.

In the following, let S = λ +μ(a + d) be in L family and let

$$C{S_{{\rm{Name}}}} = {{a + d - E{{(a + d)}_{{\rm{Name}}}}} \over {{\mu^{ - 1}}(1 - \lambda ) - E{{(a + d)}_{{\rm{Name}}}}}}$$

be a corrected coefficient using expectation E(a +d)_Name. Using specific expectations E(a +d) in combination with Proposition 4, we obtain the following result.

Theorem 3. It holds that $C{S_{{\rm{GK}}}}\mathop \le \limits^{({\rm{i}})} C{S_{{\rm<Emphasis Type="SmallCaps">ott</Emphasis>}}}\mathop \le \limits^{({\rm{ii}})} C{S_{{\rm{Cohen}}}}$.

Proof: (i): Due to Proposition 4, it must be shown that E(a + d)_GK ≥ E(a + d)_Scott. Suppose (p ₁ +p ₂) ≥ (q ₁ + q ₂). We have

$$\eqalign{ & E{(a + d)_{{\rm{GK}}}} \ge E{(a + d)_{{\rm<Emphasis Type="SmallCaps">ott</Emphasis>}}}, \cr & {{{p_1} + {p_2}} \over 2} \ge {\left( {{{{p_1} + {p_2}} \over 2}} \right)^2} + {\left( {{{{q_1} + {q_2}} \over 2}} \right)^2}, \cr & {{{p_1} + {p_2}} \over 2}\left( {1 - {{{p_1} + {p_2}} \over 2}} \right) \ge {\left( {{{{q_1} + {q_2}} \over 2}} \right)^2}, \cr & {{{p_1} + {p_2}} \over 2}\left( {{{{q_1} + {q_2}} \over 2}} \right) \ge {\left( {{{{q_1} + {q_2}} \over 2}} \right)^2}, \cr & ({p_1} + {p_2}) \ge ({q_1} + {q_2}) \cr} $$

.

(ii): It must be shown that E(a +d)_Scott ≥ E(a +d)_Cohen. We have

$${\left( {{{{p_1} + {p_2}} \over 2}} \right)^2} \ge {p_1}{p_2}$$

if and only if

$${\left( {{{{p_1} - {p_2}} \over 2}} \right)^2} \ge 0$$

. Furthermore, we have

$${\left( {{{{q_1} - {q_2}} \over 2}} \right)^2} \ge {q_1}{q_2}$$

if and only if

$${\left( {{{{q_1} - {q_2}} \over 2}} \right)^2} \ge 0$$

. Inequalities (14) and (16) are true because (15) and (17) are true. Adding (14) and (16) we obtain the desired inequality.

Blackman and Koval (1993, p. 216) derived the inequality S _Scott ≤ S _Cohen. Note that this inequality follows from the more general result Theorem 3 by using a coefficient S for which (4) holds (all coefficients in Table 2).

6 6. Multicategorical Generalization

Suppose the data consist of two nominal variables with identical categories, e.g. two psychologists each distribute m people among a set of k mutually exclusive categories. Let N be a contingency table with entries n _ij, where n _ij indicates the number of persons placed in category i by the first psychologist and in category j by the second psychologist. Furthermore, let n _i+ and n _+j denote the marginal counts (row and column totals) of N. Moreover, suppose that the categories of both variables are in the same order, so that the diagonal elements n _ii reflect the number of people put in the same category by the psychologists. If the variables are dichotomized, m ^t-1 N equals Table 1. A straightforward measure of similarity is the observed proportion of agreement given by

$$P = {1 \over m}\sum\limits_{i = 1}^k {{n_{ii}} = {{{\rm{tr(N)}}} \over m}} $$

. Using S = P in (1) we obtain

$${{P - E(P)} \over {1 - E(P)}}$$

.

Goodman and Kruskal (1954), Scott (1955), and Cohen (1960) proposed measures that incorporate correction for chance agreement of a form (18). The different expectations E(P) are defined as follows.

No underlying continua: $E{(P)_{{\rm{GK}}}} = \max _i^k\left( {{{{n_{i + }} + {n_{ + i}}} \over {2m}}} \right)$.

One frequency distribution: $E{(P)_{{\rm<Emphasis Type="SmallCaps">ott</Emphasis>}}} = \sum\limits_{i = 1}^k {{{\left( {{{{n_{i + }} + {n_{ + i}}} \over {2m}}} \right)}^2}} $.

Two frequency distributions: $E{(P)_{{\rm{Cohen}}}} = {1 \over {{m^2}}}\sum\limits_{i = 1}^k {{n_{i + }}{n_{ + i}}} $.

Note that P is a natural extension of S _SM = a+d to nominal variables. Family L can be extended to coefficients of a form S = λ+μP, where λ and μ, unique for each coefficient, depend on the marginal probabilities of contingency table N. All results for the 2 × 2 case naturally generalize to coefficients of a form S = λ + μP. Coefficient P and the multicategorical versions of S _GK3, S _Scott, and S _Cohen that are obtained by using expectations E(P)_GK, E(P)_Scott, and E(P)_Cohen in (18), belong to L family (have a form S = λ+μP; note Proposition 3). Furthermore, it is not difficult to show that ratio (4) holds for multicategorical coefficients P, S _GK3, S _Scott, and S _Cohen. In this section only the generalization of Proposition 1, the powerful result by Albatineh et al. (2006), is presented.

Proposition 1b. Two members in L family become identical after correction (1) if they have the same ratio μ ⁻¹(1 −λ).

Proof: E(S) = E(λ+ μP) = λ +μE(P) and consequently the corrected coefficient becomes

$$CS = {{P - E(P)} \over {{\mu^{ - 1}}(1 - \lambda ) - E(P)}}$$

.

7 7. Multivariate Generalization

Multivariate coefficients may be used to determine the degree of agreement among three or more raters in psychological assessment, or to compare partitions from three different cluster algorithms. Multivariate versions of Cohen’s kappa (S _Cohen) can for instance be found in Fleiss (1971), Light (1971), Popping (1983), and Heuvelmans and Sanders (1993).

Suppose we want to determine the agreement among k raters. Similar to Table 1, we may construct k(k−1)/2 bivariate 2 × 2 tables: each proportion table compares two variables i and j. Let a _ij denote the proportion of people that possess a characteristic according to both psychologists i and j, let d _ij denote the proportion of people that lack the characteristic according to both psychologists, and let p _i denote the proportion of people that possess the characteristic according to psychologist i. Family L may be extended to a multivariate family L ^(k) of coefficients of a form

$${\lambda^{(k)}} + {{2{\mu^{(k)}}} \over {k(k - 1)}}\sum\limits_{i = 1}^{k - 1} {\sum\limits_{j = i + 1}^k {({a_{ij}} + {d_{ij}}),} } $$

where λ ^(k) and μ ^(k) depend on the marginal probabilities of the 2 × 2 tables only. Note that

$$S_{{\rm{SM}}}^{(k)} = {2 \over {k(k - 1)}}\sum\limits_{i = 1}^{k - 1} {\sum\limits_{j = i + 1}^k {({a_{ij}} + {d_{ij}})} } $$

is a straightforward multivariate generalization of S _SM. Quantity 2/k(k − 1) is used to ensure that the value of coefficient S ^(k)_SM lies between 0 and 1. Let us present some other examples of coefficients that belong to L ^(k) family.

Example 1b. A three-way formulation of S _Cze = 2_a ₁₂/(p ₁ + p ₂) (Example 1), such that the coefficient is a linear transformation of S ⁽³⁾_SM , is given by

$$S_{{\rm{Cze}}}^{(3)} = {{{a_{12}} + {a_{13}} + {a_{23}}} \over {{p_1} + {p_2} + {p_3}}} = {{3S_{{\rm{SM}}}^{(3)} - 3} \over {2({p_1} + {p_2} + {p_3})}} + 1$$

. A general multivariate version of S _Cze is given by

$$S_{{\rm{Cze}}}^{(k)} = {{2\sum\nolimits_{i = 1}^{k - 1} {\sum\nolimits_{j = i + 1}^k {{a_{ij}}} } } \over {(k - 1)\sum\nolimits_{i = 1}^k {{p_i}} }} = {{k[S_{{\rm{SM}}}^{(k)} - 1]} \over {2\sum\nolimits_{i = 1}^k {{p_i}} }} + 1$$

. Coefficient S ^(k)_Cze can be written in a form S ^(k)_Cze = λ ^(k) + μ ^(k) S ^(k)_SM , where

$${\lambda^{(k)}} = {{ - k} \over {2\sum\nolimits_{i = 1}^k {{p_i}} }} + 1 = 1 - {\mu^{(k)}}\;{\rm{and}}\;{\mu^{(k)}} = {k \over {2\sum\nolimits_{i = 1}^k {{p_i}} }}$$

. Quantities λ ^(k) and μ ^(k) naturally extend λ and μ corresponding to S _Cze in Example 1.

Example 3b. Popping (1983) and Heuvelmans and Sanders (1993) describe the same multivariate extension of Cohen’s (1960) kappa. For k dichotomized variables, the multivariate kappa is given by

$$S_{{\rm{Cohen}}}^{(k)} = {{\sum\nolimits_{i = 1}^{k - 1} {\sum\nolimits_{j = i + 1}^k {({a_{ij}} + {d_{ij}} - {p_i}{p_j} - {q_i}{q_j})} } } \over {\sum\nolimits_{i = 1}^{k - 1} {\sum\nolimits_{j = i + 1}^k {({p_i}{q_j} - {p_j}{q_i})} } }}$$

.

Coefficient S ^(k)_Cohen can be written in a form S ^(k)_Cohen = λ ^(k) +μ ^(k) S ^(k)_SM , where

$${\lambda^{(k)}} = {{ - \sum\nolimits_{i = 1}^{k - 1} {\sum\nolimits_{j = i + 1}^k {({p_i}{p_j} + {q_i}{q_j})} } } \over {\sum\nolimits_{i = 1}^{k - 1} {\sum\nolimits_{j = i + 1}^k {({p_i}{q_j} + {p_j}{q_i})} } }}\;{\rm{and}}\;{\mu^{(k)}} = {{k(k - 1)} \over {2\sum\nolimits_{i = 1}^{k - 1} {\sum\nolimits_{j = i + 1}^k {({p_i}{q_j} + {p_j}{q_i})} } }}$$

.

Using the heuristics in Examples 1b and 3b one may obtain multivariate formulations of other coefficients in Table 2. The remainder of the section is used to present generalizations of Proposition 1, the main result in Albatineh et al. (2006), and Corollary 1. Both extensions show that family L ^(k) naturally generalizes family L, with respect to correction (1), to multivariate coefficients.

Proposition 1c. Two members in L ^(k) family become identical after correction (1) if they have the same ratio

$${{1 - {\lambda^{(k)}}} \over {{\mu^{(k)}}}}$$

.

Proof:

$$E\left[ {{S^{(k)}}} \right] = {\lambda^{(k)}} + {\mu^{(k)}}E\left[ {{2 \over {k(k - 1)}}\sum\limits_{i = 1}^{k - 1} {\sum\limits_{j = i + 1}^k {({a_{ij}} + {d_{ij}})} } } \right] = {\lambda^{(k)}} + {\mu^{(k)}}E\left[ {S_{{\rm{SM}}}^{(k)}} \right]$$

.

Consequently, the corrected coefficient becomes

$$C{S^{(k)}} = {{S_{{\rm{SM}}}^{(k)} - E\left[ {S_{{\rm{SM}}}^{(k)}} \right]} \over {\left( {1 - {\lambda^{(k)}}} \right)/{\mu^{(k)}} - E\left[ {S_{{\rm{SM}}}^{(k)}} \right]}}$$

.

Corollary 1b. Coefficients {tiS} ^(k)_SM , S ^(k)_Cze , and S ^(k)_Cohen become equivalent after correction (1).

Proof: Using the formulas of λ ^(k) and μ ^(k) corresponding to each coefficient, we obtain the ratio (19)

$${{1 - {\lambda^{(k)}}} \over {{\mu^{(k)}}}} = 1$$

for all three coefficients. Obtaining ratio (19) for coefficients S ^(k)_SM and S ^(k)_Cze is straightforward.

Using the formulas for λ ^(k) and μ ^(k) from Example 3b, we obtain the ratio (19)

$${2 \over {k(k - 1)}}\sum\limits_{i = 1}^{k - 1} {\sum\limits_{j = i + 1}^k {({p_i}{q_j} + {p_j}{q_i} + {p_i}{p_j} + {q_i}{q_j}) = } } {2 \over {k(k - 1)}}\sum\limits_{i = 1}^{k - 1} {\sum\limits_{j = i + 1}^k {({p_i} + {q_i})({p_j} + {q_j}) = 1} } $$

.

8 8. Discussion

The inspiration for this work came from the paper by Albatineh et al. (2006), who studied correction for chance for similarity coefficients from a general perspective. For a specific family of coefficients they showed that coefficients may coincide after correction for chance, irrespective of what expectation is used.

The study of correction for chance presented in this paper focused on resemblance measures for 2 × 2 tables. It is surprising how much output has been generated for this simple case (Pearson, 1947; Fleiss, 1975; Gower and Legendre, 1986; Krippendorff, 1987; Mak, 1988; Blackman and Koval, 1993; Albatineh et al., 2006; Warrens, 2008, in press). Furthermore, for the 2 × 2 case we have many similarity coefficients at our disposal, and some of these were used to illustrate the results in this paper. As suggested by the multicategorical and multivariate generalizations in Sections 6 and 7, the properties derived in this paper apply to coefficients of a form S = λ+μx, for which we have

$$E(S) = E[\lambda + \mu x] = \lambda + \mu E(x)$$

, where λ and μ depend on the marginals of the table corresponding to the data type. Property (20) is central in Proposition 1, the main result in Albatineh et al. (2006), and several other results in this paper. The general coefficients for metric scales in Zegers and Ten Berge (1985), for instance, satisfy condition (20).

References

Albatineh, A.N., Niewiadomska-Bugaj, M., & Mihalko, D. (2006). On similarity indices and correction for chance agreement. Journal of Classification, 23, 301–313.
Article Google Scholar
Baulieu, F.B. (1989). A classification of presence/absence based dissimilarity coefficients. Journal of Classification, 6, 233–246.
Article Google Scholar
Blackman, N.J.-M., & Koval, J.J. (1993). Estimating rater agreement in 2×2 tables: Correction for chance and intraclass correlation. Applied Psychological Measurement, 17, 211–223.
Article Google Scholar
Bloch, D.A., & Kraemer, H.C. (1989). 2×2 Kappa coefficients: Measures of agreement or association. Biometrics, 45, 269–287.
Article PubMed Google Scholar
Bray, J.R. (1956). A study of mutual occurrence of plant species. Ecology, 37, 21–28.
Article Google Scholar
Brennan, R.L., & Light, R.J. (1974). Measuring agreement when two observers classify people into categories not defined in advance. British Journal of Mathematical and Statistical Psychology, 27, 154–163.
Google Scholar
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.
Article Google Scholar
Czekanowski, J. (1932). Coefficient of racial likeliness und Durchschnittliche Differenz. Anothropologidcher, 14, 227–249.
Google Scholar
Dice, L.R. (1945). Measures of the amount of ecologic association between species. Ecology, 26, 297–302.
Article Google Scholar
Fleiss, J.L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76, 378–382.
Article Google Scholar
Fleiss, J.L. (1975). Measuring agreement between two judges on the presence or absence of a trait. Biometrics, 31, 651–659.
Article PubMed Google Scholar
Gleason, H.A. (1920). Some applications of the quadrat method. Bulletin of the Torrey Botanical Club, 47, 21–33.
Article Google Scholar
Goodman, L.A., & Kruskal, W.H. (1954). Measures of association for cross classifications. Journal of the American Statistical Association, 49, 732–764.
Article Google Scholar
Gower, J.C., & Legendre, P. (1986). Metric and Euclidean properties of dissimilarity coefficients. Journal of Classification, 3, 5–48.
Article Google Scholar
Hamann, U. (1961). Merkmalsbestand und Verwandtschaftsbeziehungen der Farinose. Ein Betrag zum System der Monokotyledonen. Willdenowia, 2, 639–768.
Google Scholar
Heuvelmans, A.P.J.M., & Sanders, P.F. (1993). Beoordelaarsovereenstemming. In T.J.H.M. Eggen & P.F. Sanders (Eds.), Psychometrie in de praktijk (pp. 443–470). Arnhem: Cito Instituut voor Toetsontwikkeling.
Google Scholar
Hubálek, Z. (1982). Coefficients of association and similarity based on binary (presence-absence) data: An evaluation. Biological Reviews, 57, 669–689.
Article Google Scholar
Hubert, L.J. (1977). Nominal scale response agreement as a generalized correlation. British Journal of Mathematical and Statistical Psychology, 30, 98–103.
Google Scholar
Hubert, L.J., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
Article Google Scholar
Jaccard, P. (1912). The distribution of the flora in the Alpine zone. The New Phytologist, 11, 37–50.
Article Google Scholar
Krippendorff, K. (1987). Association, agreement, and equity. Quality and Quantity, 21, 109–123.
Article Google Scholar
Light, R.J. (1971). Measures of response agreement for qualitative data: Some generalizations and alternatives. Psychological Bulletin, 76, 365–377.
Article Google Scholar
Mak, T.K. (1988). Analyzing intraclass correlation for dichotomous variables. Applied Statistics, 37, 344–352.
Article Google Scholar
Morey, L.C., & Agresti, A. (1984). The measurement of classification agreement: An adjustment to the Rand statistic for chance agreement. Educational and Psychological Measurement, 44, 33–37.
Article Google Scholar
Nei, M., & Li, W.-H. (1979). Mathematical model for studying genetic variation in terms of restriction endonucleases. Proceedings of the National Academy of Sciences, 76, 5269–5273.
Article Google Scholar
Pearson, E.S. (1947). The choice of statistical tests illustrated on the interpretation of data classed in a 2×2 table. Biometrika, 34, 139–167.
Google Scholar
Popping, R. (1983). Overeenstemmingsmaten voor nominale data. Ph.D. thesis, Groningen, Rijksuniversiteit Groningen.
Rand, W. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846–850.
Article Google Scholar
Rogot, E., & Goldberg, I.D. (1966). A proposed index for measuring agreement in test-retest studies. Journal of Chronic Disease, 19, 991–1006.
Article Google Scholar
Scott, W.A. (1955). Reliability of content analysis: the case of nominal scale coding. Public Opinion Quarterly, 19, 321–325.
Article Google Scholar
Sokal, R.R., & Michener, C.D. (1958). A statistical method for evaluating systematic relationships. University of Kansas Science Bulletin, 38, 1409–1438.
Google Scholar
Sokal, R.R., & Sneath, P.H. (1963). Principles of numerical taxonomy. San Francisco: Freeman.
Google Scholar
Sørenson, T. (1948). A method of stabilizing groups of equivalent amplitude in plant sociology based on the similarity of species content and its application to analyses of the vegetation on Danish commons. Kongelige Danske Videnskabernes Selskab Biologiske Skrifter, 5, 1–34.
Google Scholar
Steinley, D. (2004). Properties of the Hubert–Arabie adjusted Rand index. Psychological Methods, 9, 386–396.
Article PubMed Google Scholar
Warrens, M.J. (2008, in press). On the indeterminacy of resemblance measures for binary (presence/absence) data. Journal of Classification.
Zegers, F.E. (1986). A General family of association coefficients. Ph.D. thesis, Groningen, Rijksuniversiteit Groningen.
Zegers, F.E., & Ten Berge, J.M.F. (1985). A family of association coefficients for metric scales. Psychometrika, 50, 17–24.
Article Google Scholar

Download references

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Author information

Authors and Affiliations

Psychometrics and Research Methodology Group, Leiden University Institute for Psychological Research, Leiden University, Wassenaarseweg 52, P.O. Box 9555, 2300 RB, Leiden, The Netherlands
Matthijs J. Warrens

Authors

Matthijs J. Warrens
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthijs J. Warrens.

Additional information

The author thanks two anonymous reviewers for their helpful comments and valuable suggestions on earlier versions of this article.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Warrens, M.J. On Similarity Coefficients for 2×2 Tables and Correction for Chance. Psychometrika 73, 487–502 (2008). https://doi.org/10.1007/s11336-008-9059-y

Download citation

Received: 16 February 2007
Revised: 21 December 2007
Published: 01 March 2008
Issue Date: September 2008
DOI: https://doi.org/10.1007/s11336-008-9059-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On Similarity Coefficients for 2×2 Tables and Correction for Chance

Abstract

Similar content being viewed by others

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment

1 1. Introduction

2 2. A Family of Coefficients

3 3. Main Results

4 4. Specific Results

5 5. Corrected Coefficients

6 6. Multicategorical Generalization

7 7. Multivariate Generalization

8 8. Discussion

References

Open Access

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On Similarity Coefficients for 2×2 Tables and Correction for Chance

Abstract

Similar content being viewed by others

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment

1 1. Introduction

2 2. A Family of Coefficients

3 3. Main Results

4 4. Specific Results

5 5. Corrected Coefficients

6 6. Multicategorical Generalization

7 7. Multivariate Generalization

8 8. Discussion

References

Open Access

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation