Abstract
Given a spatial dataset placed on an n ×n grid, our goal is to find the rectangular regions within which subsets of the dataset exhibit anomalous behavior. We develop algorithms that, given any user-supplied arbitrary likelihood function, conduct a likelihood ratio hypothesis test (LRT) over each rectangular region in the grid, rank all of the rectangles based on the computed LRT statistics, and return the top few most interesting rectangles. To speed this process, we develop methods to prune rectangles without computing their associated LRT statistics.
- Agarwal, D., McGregor, A., Phillips, J. M., Venkatasubramanian, S., and Zhu, Z. 2006. Spatial scan statistics: Approximations and performance study. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 24--33. Google ScholarDigital Library
- Agarwal, D., Phillips, J. M., and Venkatasubramanian, S. 2006. The hunting of the bump: On maximizing statistical discrepancy. In Proceedings of the SIAM Symposium on Discrete Algorithms (SODA’06). 1137. Google ScholarDigital Library
- Bradley, E. 2004. Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Am. Statist. Assn. 99, 96--104.Google ScholarCross Ref
- Bradley, E. 2007. Correlation and large-scale simultaneous significance testing. J. Am. Statist. Assn. 102, 93--103.Google ScholarCross Ref
- Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. Series B 39, 1, 1--38.Google ScholarCross Ref
- Dudoit, S., Shaffer, J. P., and Boldrick, J. C. 2003. Multiple hypothesis testing in microarray experiments. Statist. Sci. 18, 71--103.Google ScholarCross Ref
- Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. AAAI Press, 226--231.Google Scholar
- Kulldorff, M. 1997. A spatial scan statistic. Comm. Statis.: Theory Methods 26, 6, 1481--1496.Google ScholarCross Ref
- Kulldorff, M. 1999. Spatial scan statistics: Model, calculations, and applications. In Scan Statistics and Applications, J. Glaz and M. Balakrishnan, Eds., Birkhauses, 303--322.Google Scholar
- Loh, J. M. and Zhu, Z. 2007. Accounting for spatial correlation in the scan statistic. Annals Appl. Statist. 1, 560--584.Google ScholarCross Ref
- Neill, D. B. and Moore, A. W. 2003. A fast multi-resolution method for detection of significant spatial disease clusters. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’03). MIT Press, 256--265. Google ScholarDigital Library
- Neill, D. B. and Moore, A. W. 2004. Rapid detection of significant spatial clusters. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 256--265. Google ScholarDigital Library
- Ng, R. and Han, J. 2002. Clarans: A method for clustering objects for spatial data mining. IEEE Trans. Knowl. Data Engin. 14, 5, 1003--1016. Google ScholarDigital Library
- Wang, W., Yang, J., and Muntz, R. R. 1997. Sting: A statistical information grid approach to spatial data mining. In Proceedings of the International Conference on Very Large Databases (VLDB’97). 186--195. Google ScholarDigital Library
- Wilks, S. S. 1938. The large sample distribution of the likelihood ratio for testing composite hypotheses. Annals Math. Statist. 9, 1, 60--62.Google ScholarCross Ref
- Wu, M., Song, X., Jermaine, C., Ranka, S., and Gums, J. 2009. A LRT framework for fast spatial anomaly detection. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). 887--896. Google ScholarDigital Library
Index Terms
- A Model-Agnostic Framework for Fast Spatial Anomaly Detection
Recommendations
A LRT framework for fast spatial anomaly detection
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data miningGiven a spatial data set placed on an n x n grid, our goal is to find the rectangular regions within which subsets of the data set exhibit anomalous behavior. We develop algorithms that, given any user-supplied arbitrary likelihood function, conduct a ...
A variance shift model for detection of outliers in the linear mixed model
A variance shift outlier model (VSOM), previously used for detecting outliers in the linear model, is extended to the variance components model. This VSOM accommodates outliers as observations with inflated variance, with the status of the ith ...
Testing hypotheses in the Birnbaum-Saunders distribution under type-II censored samples
The two-parameter Birnbaum-Saunders distribution has been used successfully to model fatigue failure times. Although censoring is typical in reliability and survival studies, little work has been published on the analysis of censored data for this ...
Comments