research-article

A Model-Agnostic Framework for Fast Spatial Anomaly Detection

Authors:
Mingxi Wu

Oracle Corporation

Oracle Corporation
View Profile

,
Chris Jermaine

Rice University

Rice University
View Profile

,
Sanjay Ranka

University of Florida

University of Florida
View Profile

,
Xiuyao Song

Yahoo! Inc

Yahoo! Inc
View Profile

,
John Gums

University of Florida

University of Florida
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 4 Issue 4Article No.: 20pp 1–30https://doi.org/10.1145/1857947.1857952

Published:01 October 2010Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Given a spatial dataset placed on an n ×n grid, our goal is to find the rectangular regions within which subsets of the dataset exhibit anomalous behavior. We develop algorithms that, given any user-supplied arbitrary likelihood function, conduct a likelihood ratio hypothesis test (LRT) over each rectangular region in the grid, rank all of the rectangles based on the computed LRT statistics, and return the top few most interesting rectangles. To speed this process, we develop methods to prune rectangles without computing their associated LRT statistics.

References

Agarwal, D., McGregor, A., Phillips, J. M., Venkatasubramanian, S., and Zhu, Z. 2006. Spatial scan statistics: Approximations and performance study. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 24--33. Google ScholarDigital Library
Agarwal, D., Phillips, J. M., and Venkatasubramanian, S. 2006. The hunting of the bump: On maximizing statistical discrepancy. In Proceedings of the SIAM Symposium on Discrete Algorithms (SODA’06). 1137. Google ScholarDigital Library
Bradley, E. 2004. Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Am. Statist. Assn. 99, 96--104.Google ScholarCross Ref
Bradley, E. 2007. Correlation and large-scale simultaneous significance testing. J. Am. Statist. Assn. 102, 93--103.Google ScholarCross Ref
Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. Series B 39, 1, 1--38.Google ScholarCross Ref
Dudoit, S., Shaffer, J. P., and Boldrick, J. C. 2003. Multiple hypothesis testing in microarray experiments. Statist. Sci. 18, 71--103.Google ScholarCross Ref
Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. AAAI Press, 226--231.Google Scholar
Kulldorff, M. 1997. A spatial scan statistic. Comm. Statis.: Theory Methods 26, 6, 1481--1496.Google ScholarCross Ref
Kulldorff, M. 1999. Spatial scan statistics: Model, calculations, and applications. In Scan Statistics and Applications, J. Glaz and M. Balakrishnan, Eds., Birkhauses, 303--322.Google Scholar
Loh, J. M. and Zhu, Z. 2007. Accounting for spatial correlation in the scan statistic. Annals Appl. Statist. 1, 560--584.Google ScholarCross Ref
Neill, D. B. and Moore, A. W. 2003. A fast multi-resolution method for detection of significant spatial disease clusters. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’03). MIT Press, 256--265. Google ScholarDigital Library
Neill, D. B. and Moore, A. W. 2004. Rapid detection of significant spatial clusters. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 256--265. Google ScholarDigital Library
Ng, R. and Han, J. 2002. Clarans: A method for clustering objects for spatial data mining. IEEE Trans. Knowl. Data Engin. 14, 5, 1003--1016. Google ScholarDigital Library
Wang, W., Yang, J., and Muntz, R. R. 1997. Sting: A statistical information grid approach to spatial data mining. In Proceedings of the International Conference on Very Large Databases (VLDB’97). 186--195. Google ScholarDigital Library
Wilks, S. S. 1938. The large sample distribution of the likelihood ratio for testing composite hypotheses. Annals Math. Statist. 9, 1, 60--62.Google ScholarCross Ref
Wu, M., Song, X., Jermaine, C., Ranka, S., and Gums, J. 2009. A LRT framework for fast spatial anomaly detection. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). 887--896. Google ScholarDigital Library

Index Terms

A Model-Agnostic Framework for Fast Spatial Anomaly Detection
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

A LRT framework for fast spatial anomaly detection
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Given a spatial data set placed on an n x n grid, our goal is to find the rectangular regions within which subsets of the data set exhibit anomalous behavior. We develop algorithms that, given any user-supplied arbitrary likelihood function, conduct a ...
Read More
A variance shift model for detection of outliers in the linear mixed model

A variance shift outlier model (VSOM), previously used for detecting outliers in the linear model, is extended to the variance components model. This VSOM accommodates outliers as observations with inflated variance, with the status of the ith ...
Read More
Testing hypotheses in the Birnbaum-Saunders distribution under type-II censored samples

The two-parameter Birnbaum-Saunders distribution has been used successfully to model fatigue failure times. Although censoring is typical in reliability and survival studies, little work has been published on the analysis of censored data for this ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Knowledge Discovery from Data Volume 4, Issue 4
October 2010
121 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/1857947
Issue’s Table of Contents

Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 October 2010
- Accepted: 1 June 2010
- Received: 1 January 2010
Published in tkdd Volume 4, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Anomaly detection
antibiotic resistance
likelihood ratio test
spatial anomaly
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 432
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Model-Agnostic Framework for Fast Spatial Anomaly Detection

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

A LRT framework for fast spatial anomaly detection

A variance shift model for detection of outliers in the linear mixed model

Testing hypotheses in the Birnbaum-Saunders distribution under type-II censored samples

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Model-Agnostic Framework for Fast Spatial Anomaly Detection

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

A LRT framework for fast spatial anomaly detection

A variance shift model for detection of outliers in the linear mixed model

Testing hypotheses in the Birnbaum-Saunders distribution under type-II censored samples

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media