Power comparisons for disease clustering tests
Introduction
A large number of different tests for spatial randomness that adjust for an uneven background population have been proposed. Such test statistics are used to test, among other things, whether the geographical distribution of disease is random or not, adjusting for the geographical distribution of the population at large. They are also used in areas such as archaeology, botany, criminology, demography, ecology, economics, engineering, forestry, genetics, geography, history, neurology, sociology and zoology. Several review articles have been written (Biggeri and Marchi, 1993; Elliott et al., 1995; Heywood, 1991; Kulldorff, 1998; Lawson et al., 1999; Marshall, 1991; Moore and Carpenter, 1999; Orton, 1982; Sokal and Oden, 1978; Waller and Jacquez, 1995), the most extensive having identified over 100 different test statistics (Kulldorff).
There have been few systematic comparative evaluations of tests for spatial randomness. Different tests have sometimes been applied to the same data sets (Alexander and Boyle, 1996; Draper, 1991; Glaser, 1990; Shaw et al., 1988; Turnbull et al., 1990; Zoellner and Schmidtmann, 1999), but for a formal comparison of test statistics it is important to evaluate their power (Wartenberg, 1990), and only a small fraction of the proposed tests has undergone such evaluations. Three major considerations when designing a power comparison study are (i) the reproducibility of the clustering process, (ii) the clustering models considered as the alternative hypotheses, and (iii) minimization of the bias and variance when estimating the difference in power for different tests.
While very important, simulated power comparisons are tedious, time consuming and unglamorous to perform. Each of the methods to be evaluated must be programmed, the simulated data must be generated, and each test statistic must be calculated for each simulated data set. If there are previously published power evaluations, it is sometimes possible to avoid redoing the calculations for already evaluated test statistics, but that requires that the earlier simulation models are described in complete detail, which is seldom the case. The ideal is to go one step further though, and build on previous power evaluations using not only the same alternative models but also the exact same simulated data. That minimizes the random variation when the methods are compared.
In this paper, we present and provide access to a set of benchmark simulated data sets. Using this benchmark, we evaluate the power of three test statistics for disease clustering. Other researchers can then easily compare tests of their interest to previously evaluated test statistics by simply reanalyzing the benchmark data sets. This is the most economical way to conduct power comparisons of many different test statistics. Past evaluations of tests for spatial randomness have for natural reasons been done mostly as pairwise power comparisons or more rarely in groups of three or four (Kulldorff and Nagarwalla, 1995; Oden, 1995; Rogerson, 1999; Swartz, 1998; Tango, 1995; Tango, 1999a; Tango, 2000; Vach, 1994). By establishing the benchmark data sets, any new test evaluated will automatically be compared with all previously evaluated test statistics.
With one exception, earlier power comparisons all considered first-order clustering models where cases are located independently of each other, but where the relative risk is different in different geographical areas. Most of these evaluated the power for a clustering model with one (Kulldorff and Nagarwalla, 1995; Rogerson, 1999; Swartz, 1998; Tango, 1995; Tango, 1999a; Tango, 2000), two (Swartz, 1998; Tango, 1995; Tango, 2000), three (Tango, 2000) or four (Swartz, 1998) hot-spot clusters. As an important alternative, Oden (1995) used a clustering model with a different relative risk in each census area. Vach (1994) is the only one who has considered a second-order clustering model. In his model, the location of one case is dependent on the location of other previously generated cases, with the risk varying geographically at the same time. There has not been any power comparison using a pure second-order model, where each particular case is randomly located, so that the relative risk is constant throughout the map, but where the location of cases are dependent on each other. It is important to realize that while first- and second-order models are very different in how the points are generated, the resulting point patterns may be exactly the same, and hence indistinguishable. Bailey and Gatrell (1995, Chapter 3) provide an excellent discussion of this.
In this study we use 61 different clustering models, 15 with a single hot-spot cluster, 20 with multiple hot-spot clusters, and 26 with purely second-order clustering models where the risk is constant throughout so that any one particular case is spatially randomly located, but where the location of different cases are dependent on each other. For each model, the power is calculated conditioned on two different levels of the total number of cases. The number of alternative clustering models considered have in past studies been in the range of 3–8, with the exception of Vach (1994) and Rogerson (1999), who considered 12 and 20 different clustering models respectively. Tango (1995) is the only one who has evaluated the power for the same models but with different number of cases.
Another important aspect of a clustering model is the background population used. We use a real data set consisting of all women in 245 counties in Northeastern United States during 1988–1992. This is a fairly typical epidemiological data set, with data aggregated into a mix of rural and urban census areas.
For some tests it is possible to evaluate the power using an asymptotic approximation of the test statistic distribution (Oden, 1995; Rogerson, 1999; Tango, 1995). Unfortunately, asymptotic approximations do not exist for most test statistics. When they do exist, the asymptotics may be defined in terms of the geographical area, the population size or the number of cases going to infinity, with the other two parameters held at a specific constant or rate, and the approximations must be interpreted considering these asymptotic concepts. Unless the approximations for all test statistics are very good, it is necessary for comparison purposes to obtain the critical values through a large number of simulated data sets randomized under the null hypothesis. In this paper we present two groups of 100,000 simulated data sets to estimate the critical values, with 600 and 6000 cases, respectively.
In order to minimize the variability of the estimated power difference between tests, it is important to condition the analysis on a particular population distribution, and on the total number of cases. Moreover, different tests should be evaluated using the same random data sets.
Another factor determining the variance of an estimated power difference is the number of random data sets generated under each alternative hypothesis. As part of the benchmark data, we present 10,000 random data sets for each alternative.
Tests for spatial randomness can be classified based on their purpose. Focused tests are designed to test whether a local cluster exist around a predetermined point source, while general tests looks for clusters without any preconceived assumptions about their location (Besag and Newell, 1991). Among general tests, cluster detection tests are used both to detect local clusters, without any preconceived idea of their location, and to determine their statistical significance. Global clustering tests, on the other hand, are used to determine whether there is clustering present throughout the study area, without determining statistical significance of individual clusters (Kulldorff, 1998; Tango, 1999a).
Discussions regarding the differences between the latter two types of general tests have been provided elsewhere (Kulldorff, 1998; Tango, 1999a), but their important difference is not always considered, and there has never been a formal study showing how they differ in terms of their power to detect different types of clustering. In fact, the power of global clustering tests has typically been evaluated using hot-spot cluster models. In this paper we evaluate the power of the spatial scan statistic (Kulldorff, 1997), the maximized excess events test (Tango, 2000) and Bonetti–Pagano's nonparametric M statistic (Bonetti and Pagano 2001a, Bonetti and Pagano 2001b). These were chosen so as to not only compare three different tests, but equally important, to illustrate the differences between the two types of tests. We show that the spatial scan statistic, a cluster detection test, has good power for hot-spot cluster alternatives, while the maximized excess events test, a global clustering test, has good power when clustering occurs throughout the geographical region of study. The M statistic, also a global clustering test, performs well for multiple hot-spot clusters.
Most tests for spatial clustering depend on a parameter that determines the scale of clustering. This includes the λ in Tango's excess events test (Tango, 1995), the k in Cuzick–Edward's k-nearest neighbor test (Cuzick and Edwards, 1990), and the radius of the circle in Turnbull's CEPP (Turnbull et al., 1990). The three tests compared in this paper do not depend on such a pre-specified parameter. This was a main reason for evaluating these particular test statistics. We expect that more such tests will be proposed as extensions of earlier methods, and it will then be of special interest to compare them with the tests evaluated here.
Section snippets
Benchmark data sets
For the benchmark data sets we use the female population in the 245 counties and county equivalents in the Northeastern United States, consisting of the states of Maine, New Hampshire, Vermont, Massachusetts, Rhode Island, Connecticut, New York, New Jersey, Pennsylvania, Delaware and Maryland, as well as the District of Columbia. Each county is geographically represented by a centroid coordinate. As the population for each county we used the number of women living there according to the 1990
A cluster detection test and two global clustering tests
The clustering models described above can be used for power analysis of any number of disease clustering tests. In this paper we estimate the power of one cluster detection test, the spatial scan statistic (Kulldorff, 1997), and two global clustering tests, the maximized excess events test (Tango, 2000) and Bonetti–Pagano's M statistic (Bonetti and Pagano 2001a, Bonetti and Pagano 2001b). As others use the same data sets to evaluate other disease clustering tests, they only need to do the power
Hot-spot clusters
The results of the power analyses for the hot-spot clusters are shown in Table 2. For the rural clusters, the spatial scan statistic has very high power while the power of the other two tests is low. For the mixed clusters, all tests have high power with a slight advantage for the spatial scan statistic. For the urban clusters, it is instead the maximized excess events test that has a slight advantage.
All tests have higher power when there are two or three different hot-spot clusters in the
Discussion
The results of this study clearly show how different disease clustering tests are good for different types of alternative hypotheses. If one is interested in detecting and evaluating localized clusters, it is better to use the spatial scan statistic, while the maximized excess events test is better at detecting global type clustering that is present throughout the study region. This is to some extent intuitive.
The maximized excess events test is based on the evidence of clustering found
Appendix
Suppose we have a circle with radius one centered at (0,0). The distance from (1,0) to the point on the circle corresponding to x degrees isThe distance to a point 22 percent along the circle is . The expected distance from (1,0) to a random point on the circle is
Acknowledgements
The authors thank Marco Bonetti for advice concerning the implementation of the M test, and two anonymous reviewers for valuable comments that improved the quality of the paper.
References (55)
- et al.
Geographical distribution of variant Creutzfeldt–Jakob disease in Great Britain, 1994–2000
The Lancet
(2001) - et al.
The origins of a new Trypanosoma brucei rhodesiense sleeping sickness outbreak in eastern Uganda
The Lancet
(2001) Stochastic process and archaeological mechanism in spatial analysis
J. Archaeol. Sci.
(1982)- Alexander, F.E., Boyle, P. (Eds.), 1996. Methods for investigating localized clustering of disease. IARC scientific...
- et al.
Interactive Spatial Data Analysis
(1995) - et al.
The detection of clusters in rare diseases
J. Roy. Statist. Soc.
(1991) - Biggeri, A., Marchi, M., 1993. Metodi di analisi spazio-temporale in campo epidemiologico: una rassegna. In: Zani...
- Bithell, J.F., 1999. Disease mapping using the relative risk function estimated from areal data. In: Lawson et al....
- Bonetti, M., Pagano, M., 2001a. On detecting clustering. Proceedings of the Biometrics Section, American Statistical...
- Bonetti, M., Pagano, M., 2001b. The interpoint distance distribution as a descriptor of point patterns: an application...
Spatial Autocorrelation
Statistics for Spatial Data
Spatial clustering for inhomogeneous populations
J. Roy. Statist. Soc.
Second-order analysis of spatial clustering for inhomogeneous populations
Biometrics
Modified randomization tests for nonparametric hypotheses
Ann. Math. Statist.
Spatial statistical methods in environmental epidemiology: a critique
Statist. Meth. Med. Res.
Spatial clustering of Hodgkin's disease in San Francisco Bay area
Amer. J. Epidemiol.
A versatile test for clustering and a proximity analysis of neurons
Meth. Inform. Med.
Spatial analysis of genetic variation in plant populations
Annu. Rev. Ecol. Systematics
A spatial scan statistic
Commun. Statist. Theory and Methods
An isotonic spatial scan statistic for geographical disease surveillance
J. Natl. Inst. Public Health
Spatial disease clusters: detection and inference
Statist. Med.
Cited by (194)
Confidence intervals for spatial scan statistic
2021, Computational Statistics and Data AnalysisA network-based scan statistic for detecting the exact location and extent of hotspots along urban streets
2020, Computers, Environment and Urban SystemsCitation Excerpt :In particular, Kulldorff's Scan Statistic (Kulldorff, 1997; Kulldorff, 2009; Kulldorff & Nagarwalla, 1995) and its variants, Rushton's spatial filtering (Ozdenerol, Williams, Kang, & Magsumbol, 2005) and Bayesian disease mapping (Aamodt, Samuelsen, & Skrondal, 2006), have been widely used for analysing the concentration of diseases and health concerns (Desjardins, Whiteman, Casas, & Delmelle, 2018; Li et al., 2019; Luquero et al., 2011; Osei & Duker, 2008) as well as more prevalent diseases such as various types of cancer (DeChello & Sheehan, 2007; Henry, Niu, & Boscoe, 2009; Jemal et al., 2002). Scan Statistic (Kulldorff, Tango, & Park, 2003; Song & Kulldorff, 2003) belongs to a group of methods that identifies statistically significant clusters using a search window. It is considered to have overcome two main challenges shared by many other methods, and these are:
MIDAS: Multilinear detection at scale
2019, Journal of Parallel and Distributed ComputingDetecting spatiotemporal clusters of dementia mortality in the United States, 2000–2010
2018, Spatial and Spatio-temporal EpidemiologyFlexible-Elliptical Spatial Scan Method
2023, Mathematics
- 1
Current address: Children's Hospital Informatics Program, 320 Longwood Ave, Boston, MA 02115, USA.