skip to main content
research-article

A Model-Agnostic Framework for Fast Spatial Anomaly Detection

Published:01 October 2010Publication History
Skip Abstract Section

Abstract

Given a spatial dataset placed on an n ×n grid, our goal is to find the rectangular regions within which subsets of the dataset exhibit anomalous behavior. We develop algorithms that, given any user-supplied arbitrary likelihood function, conduct a likelihood ratio hypothesis test (LRT) over each rectangular region in the grid, rank all of the rectangles based on the computed LRT statistics, and return the top few most interesting rectangles. To speed this process, we develop methods to prune rectangles without computing their associated LRT statistics.

References

  1. Agarwal, D., McGregor, A., Phillips, J. M., Venkatasubramanian, S., and Zhu, Z. 2006. Spatial scan statistics: Approximations and performance study. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 24--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Agarwal, D., Phillips, J. M., and Venkatasubramanian, S. 2006. The hunting of the bump: On maximizing statistical discrepancy. In Proceedings of the SIAM Symposium on Discrete Algorithms (SODA’06). 1137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bradley, E. 2004. Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Am. Statist. Assn. 99, 96--104.Google ScholarGoogle ScholarCross RefCross Ref
  4. Bradley, E. 2007. Correlation and large-scale simultaneous significance testing. J. Am. Statist. Assn. 102, 93--103.Google ScholarGoogle ScholarCross RefCross Ref
  5. Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. Series B 39, 1, 1--38.Google ScholarGoogle ScholarCross RefCross Ref
  6. Dudoit, S., Shaffer, J. P., and Boldrick, J. C. 2003. Multiple hypothesis testing in microarray experiments. Statist. Sci. 18, 71--103.Google ScholarGoogle ScholarCross RefCross Ref
  7. Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. AAAI Press, 226--231.Google ScholarGoogle Scholar
  8. Kulldorff, M. 1997. A spatial scan statistic. Comm. Statis.: Theory Methods 26, 6, 1481--1496.Google ScholarGoogle ScholarCross RefCross Ref
  9. Kulldorff, M. 1999. Spatial scan statistics: Model, calculations, and applications. In Scan Statistics and Applications, J. Glaz and M. Balakrishnan, Eds., Birkhauses, 303--322.Google ScholarGoogle Scholar
  10. Loh, J. M. and Zhu, Z. 2007. Accounting for spatial correlation in the scan statistic. Annals Appl. Statist. 1, 560--584.Google ScholarGoogle ScholarCross RefCross Ref
  11. Neill, D. B. and Moore, A. W. 2003. A fast multi-resolution method for detection of significant spatial disease clusters. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’03). MIT Press, 256--265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Neill, D. B. and Moore, A. W. 2004. Rapid detection of significant spatial clusters. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 256--265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ng, R. and Han, J. 2002. Clarans: A method for clustering objects for spatial data mining. IEEE Trans. Knowl. Data Engin. 14, 5, 1003--1016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Wang, W., Yang, J., and Muntz, R. R. 1997. Sting: A statistical information grid approach to spatial data mining. In Proceedings of the International Conference on Very Large Databases (VLDB’97). 186--195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Wilks, S. S. 1938. The large sample distribution of the likelihood ratio for testing composite hypotheses. Annals Math. Statist. 9, 1, 60--62.Google ScholarGoogle ScholarCross RefCross Ref
  16. Wu, M., Song, X., Jermaine, C., Ranka, S., and Gums, J. 2009. A LRT framework for fast spatial anomaly detection. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). 887--896. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Model-Agnostic Framework for Fast Spatial Anomaly Detection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 4, Issue 4
      October 2010
      121 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/1857947
      Issue’s Table of Contents

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 October 2010
      • Accepted: 1 June 2010
      • Received: 1 January 2010
      Published in tkdd Volume 4, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader