survey

Group Deviation Detection Methods: A Survey

Authors:
Edward Toth

School of Information Technologies, The University of Sydney, NSW, Australia

School of Information Technologies, The University of Sydney, NSW, Australia

0000-0002-5966-5248
View Profile

,
Sanjay Chawla

Qatar Computing Research Institute, Hamad bin Khalifa University, Doha, Qatar

Qatar Computing Research Institute, Hamad bin Khalifa University, Doha, Qatar
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 51 Issue 4Article No.: 77pp 1–38https://doi.org/10.1145/3203246

Published:25 July 2018Publication History

ACM Computing Surveys

Abstract

Pointwise anomaly detection and change detection focus on the study of individual data instances; however, an emerging area of research involves groups or collections of observations. From applications of high-energy particle physics to health care collusion, group deviation detection techniques result in novel research discoveries, mitigation of risks, prevention of malicious collaborative activities, and other interesting explanatory insights. In particular, static group anomaly detection is the process of identifying groups that are not consistent with regular group patterns, while dynamic group change detection assesses significant differences in the state of a group over a period of time. Since both group anomaly detection and group change detection share fundamental ideas, this survey article provides a clearer and deeper understanding of group deviation detection research in static and dynamic situations.

References

Claudio Agostini, Eduardo Saavedra, and Manuel Willington. 2011. Collusion on private health insurance coverage in chile. Journal of Competition Law and Economics 7, 1 (2011), 205--240.Google ScholarCross Ref
Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, and Eric P. Xing. 2008. Mixed membership stochastic blockmodels. Journal of Machine Learning Research 9 (2008), 1981--2014. Google ScholarDigital Library
H. Akaike. 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control 19, 6 (Dec. 1974), 716--723.Google ScholarCross Ref
Yoav Benjamini and Yosef Hochberg. 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological) 57, 1 (1995), 289--300.Google ScholarCross Ref
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3 (March 2003), 993--1022. Google ScholarDigital Library
Stephen P. Borgatti, Ajay Mehra, Daniel J. Brass, and Giuseppe Labianca. 2009. Network analysis in the social sciences. Science 323, 5916 (Feb. 2009), 892--895.Google ScholarCross Ref
Fred H. Borgen and Mark J. Seling. 1978. Uses of discriminant analysis following MANOVA: Multivariate statistics for multivariate purposes.Journal of Applied Psychology 63, 6 (1978), 689.Google Scholar
George E. P. Box. 1954. Some theorems on quadratic forms applied in the study of analysis of variance problems, II. Effects of inequality of variance and of correlation between errors in the two-way classification. Annals of Mathematical Statistics 25, 3 (1954), 484--498.Google ScholarCross Ref
Guilherme O. Campos, Arthur Zimek, Jörg Sander, Ricardo J. G. B. Campello, Barbora Micenková, Erich Schubert, Ira Assent, and Michael E. Houle. 2016. On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study. Vol. 30. Springer, 891--927. Google ScholarDigital Library
Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. Computer Surveys 41, 3, Article 15 (2009), 58 pages. Google ScholarDigital Library
Xiaofan Chen and Shunzheng Yu. 2016. A collaborative intrusion detection system against DDoS for SDN. IEICE Transactions on Information and Systems 99, 9 (2016), 2395--2399.Google ScholarCross Ref
Xi C. Chen, Abdullah Mueen, Vijay K. Narayanan, Nikos Karampatziakis, Gagan Bansal, and Vipin Kumar. 2014. Online discovery of group level events in time series. In Proceedings of the 2014 SIAM International Conference on Data Mining, 632--640.Google ScholarCross Ref
Timothy Costigan. 2005. Bonferroni inequalities and intervals. In Encyclopedia of Biostatistics.Google Scholar
Hanbo Dai, Feida Zhu, Ee-Peng Lim, and Hwee Hwa Pang. 2012. Detecting extreme rank anomalous collections. In Proceedings of the 2012 SIAM International Conference on Data Mining. SIAM, 883--894.Google ScholarCross Ref
Donald A. Darling. 1957. The Kolmogorov-Smirnov, Cramer-Von Mises tests. Annals of Mathematical Statistics 28, 4 (1957), 823--838.Google ScholarCross Ref
Arnaud Doucet and Adam M. Johansen. 2009. A tutorial on particle filtering and smoothing: Fifteen years later. Handbook of Nonlinear Filtering 12, 656--704 (2009), 3.Google Scholar
Ines Fãrber, Stephan Gũnnemann, Hans-peter Kriegel, Peer Krõger, Emmanuel Mũller, Erich Schubert, Thomas Seidl, and Arthur Zimek. 2010. On Using Class-Labels in Evaluation of Clusterings. Association for Computing Machinery (ACM).Google Scholar
Ullas Gargi, Rangachar Kasturi, and Susan H. Strayer. 2000. Performance characterization of video-shot-change detection methods. IEEE Transactions on Circuits and Systems for Video Technology 10, 1 (2000), 1--13. Google ScholarDigital Library
Samuel J. Gershman and David M. Blei. 2012. A tutorial on Bayesian nonparametric models. Journal of Mathematical Psychology 56, 1 (2012), 1--12.Google ScholarCross Ref
Walter R. Gilks, Sylvia Richardson, and David Spiegelhalter. 1995. Markov Chain Monte Carlo in Practice. CRC Press.Google Scholar
A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. Ch. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley. 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 101, 23 (2000), e215--e220.Google ScholarCross Ref
Jorge Guevara, Stephane Canu, and R Hirata. 2015. Support measure data description for group anomaly detection. ODDx3 Workshop on Outlier Definition, Detection, and Description at ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’15).Google Scholar
Manish Gupta, Jing Gao, Charu Aggarwal, Jiawei Han, and Jiawei Gupta. 2013. Outlier detection for temporal data: A survey. IEEE Transactions on Knowledge and Data Engineering (TKDE’13) 25, 1 (2013), 1--20.Google Scholar
Maria Halkidi, Yannis Batistakis, and Michalis Vazirgiannis. 2001. On clustering validation techniques. Journal of Intelligent Information Systems 17, 2--3 (2001), 107--145. Google ScholarDigital Library
David V. Hinkley. 1975. On power transformations to symmetry. Biometrika 62, 1 (1975), 101--111.Google ScholarCross Ref
Victoria J. Hodge and Jim Austin. 2004. A survey of outlier detection methodologies. Artificial Intelligence Review 22 (2004), 2004. Google ScholarDigital Library
Matthew D. Hoffman, David M. Blei, Chong Wang, and John Paisley. 2013. Stochastic variational inference. Journal of Machine Learning Research 14, 1 (2013), 1303--1347. Google ScholarDigital Library
Anil K. Jain. 2010. Data clustering: 50 years beyond K-means. Pattern Recognition Letters 31, 8 (2010), 651--666. Google ScholarDigital Library
Gordon V. Kass. 1980. An exploratory technique for investigating large quantities of categorical data. Applied Statistics 29, 2 (1980), 119--127.Google ScholarCross Ref
Mikaela Keller and Samy Bengio. 2005. Theme topic mixture model: A graphical model for document representation. Idiap-Research Report 04-05 (2005).Google Scholar
Elyor Kodirov, Tao Xiang, Zhenyong Fu, and Shaogang Gong. 2015. Unsupervised domain adaptation for zero-shot learning. In Proceedings of the IEEE International Conference on Computer Vision. 2452--2460. Google ScholarDigital Library
Steve W. J. Kozlowski and Bradford S. Bell. 2003. Work groups and teams in organizations. In Handbook of Psychology (Vol. 12): Industrial and Organizational Psychology, W. C. Borman, D. R. Ilgen, and R. J. Klimoski (Eds.). New York, Wiley-Blackwell, 333--375.Google Scholar
Pavel Laskov, Patrick Düssel, Christin Schäfer, and Konrad Rieck. 2005. Learning intrusion detection: Supervised or unsupervised? In Image Analysis and Processing (ICIAP’05), 50--57. Google ScholarDigital Library
Rainer Lienhart. 2001. Reliable transition detection in videos: A survey and practitioner’s guide. International Journal of Image and Graphics 1, 3 (2001), 469--486.Google ScholarCross Ref
J. J. A. Moors. 1988. A quantile alternative for kurtosis. The Statistician: Journal of the Institute of Statisticians 37 (1988), 25--32.Google ScholarCross Ref
Krikamol Muandet and Bernhard Schölkopf. 2013. One-class support measure machines for group anomaly detection. In Conference on Uncertainty in Artificial Intelligence. Google ScholarDigital Library
Arjun Mukherjee, Bing Liu, Junhui Wang, Natalie Glance, and Nitin Jindal. 2011. Detecting group review spam. In Proceedings of the 20th International Conference Companion on World Wide Web (WWW’11). ACM, New York, 93--94. Google ScholarDigital Library
Jorge Luis Rivero Pérez and Bernardete Ribeiro. 2016. Attribute learning for network intrusion detection. In International Neural Network Society Conference on Big Data (INNS’16). Springer, 39--49.Google Scholar
Ian Porteous, David Newman, Alexander Ihler, Arthur Asuncion, Padhraic Smyth, and Max Welling. 2008. Fast collapsed Gibbs sampling for latent Dirichlet allocation. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 569--577. Google ScholarDigital Library
Jean-François Quessy, Anne-Catherine Favre, Mĺriem Saŕd, and Maryse Champagne. 2011. Statistical inference in Lombard’s smooth-change model. Environmetrics 22, 7 (2011), 882--893.Google ScholarCross Ref
Jaxk Reeves, Jien Chen, Xiaolan L. Wang, Robert Lund, and Qi Qi Lu. 2007. A review and comparison of changepoint detection techniques for climate data. Journal of Applied Meteorology and Climatology 46, 6 (2007), 900--915.Google ScholarCross Ref
Douglas A. Reynolds, Thomas F. Quatieri, and Robert B. Dunn. 2000. Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 10, 1--3 (2000), 19--41. Google ScholarDigital Library
Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. 2010. Earthquake shakes Twitter users: Real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web. ACM, 851--860. Google ScholarDigital Library
Bernhard Schölkopf, John C. Platt, John C. Shawe-Taylor, Alex J. Smola, and Robert C. Williamson. 2001. Estimating the support of a high-dimensional distribution. Neural Computing 13, 7 (July 2001), 1443--1471. Google ScholarDigital Library
Gideon Schwarz. 1978. Estimating the dimension of a model. Annals of Statistics 6, 2 (1978), 461--464.Google ScholarCross Ref
Ashbindu Singh. 1989. Review article digital change detection techniques using remotely-sensed data. International Journal of Remote Sensing 10, 6 (1989), 989--1003.Google ScholarCross Ref
Richard Socher, Milind Ganjoo, Christopher D. Manning, and Andrew Ng. 2013. Zero-shot learning through cross-modal transfer. In Advances in Neural Information Processing Systems. 935--943. Google ScholarDigital Library
Hossein Soleimani and David J. Miller. 2015. Parsimonious topic models with salient word discovery. IEEE Transactions on Knowledge and Data Engineering 27, 3 (2015), 824--837.Google ScholarDigital Library
Hossein Soleimani and David J. Miller. 2016. ATD: Anomalous topic discovery in high dimensional discrete data. IEEE Transactions on Knowledge and Data Engineering 28, 9 (Sept. 2016), 2267--2280. Google ScholarDigital Library
Charles Spearman. 1904. The proof and measurement of association between two things. American Journal of Psychology 15 (1904), 72--101.Google ScholarCross Ref
Michael Steinbach, Levent Ertöz, and Vipin Kumar. 2004. The challenges of clustering high dimensional data. In New Directions in Statistical Physics. Springer, 273--309.Google Scholar
Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. 2005. Introduction to Data Mining. Addison-Wesley Longman Publishing Co., Boston.Google ScholarDigital Library
David M. J. Tax and Robert P. W. Duin. 2004. Support vector data description. Machine Learning 54, 1 (2004), 45--66. Google ScholarDigital Library
T. Vatanen, M. Kuusela, E. Malmi, T. Raiko, T. Aaltonen, and Y. Nagai. 2012. Semi-supervised detection of collective anomalies with an application in high energy particle physics. In The 2012 International Joint Conference on Neural Networks (IJCNN’12). 1--8.Google Scholar
Rand R. Wilcox. 1995. Comparing two independent groups via multiple quantiles. Journal of the Royal Statistical Society: Series D (The Statistician) 44, 1 (1995), 91.Google Scholar
Rand R. Wilcox and David M. Erceg-Hurn. 2012. Comparing two dependent groups via quantiles. Journal of Applied Statistics 39, 12 (2012), 2655--2664.Google ScholarCross Ref
Weng-Keen Wong, Andrew Moore, Gregory Cooper, and Michael Wagner. 2002. Rule-based anomaly pattern detection for detecting disease outbreaks. In Proceedings of the 18th National Conference on Artificial Intelligence. MIT Press. Google ScholarDigital Library
Weng-Keen Wong, Andrew Moore, Gregory Cooper, and Michael Wagner. 2003. WSARE: What’s strange about recent events?Journal of Urban Health: Bulletin of the New York Academy of Medicine 80, Suppl 1 (2003), i66.Google Scholar
Yongqin Xian, Christoph H. Lampert, Bernt Schiele, and Zeynep Akata. 2017. Zero-shot learning-A comprehensive evaluation of the good, the bad and the ugly. arXiv preprint arXiv:1707.00600 (2017).Google Scholar
Yao Xie and David Siegmund. 2012. Sequential multi-sensor change-point detection. ArXiv e-prints (July 2012).Google Scholar
Liang Xiong. 2013. On learning from collective data. In Dissertations, 560.Google Scholar
Liang Xiong, Barnabás Póczos, and Jeff Schneider. 2011. Group anomaly detection using flexible genre models. In Advances in Neural Information Processing Systems 24. Curran Associates, 1071--1079. Google ScholarDigital Library
Liang Xiong, Barnabás Póczos, Jeff Schneider, Andrew Connolly, and Jake VanderPlas. 2011. Hierarchical probabilistic models for group anomaly detection. In International Conference on Artificial Intelligence and Statistics (AISTATS’11).Google Scholar
Rose Yu, Xinran He, and Yan Liu. 2014. GLAD: Group anomaly detection in social media analysis. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, New York, 372--381. Google ScholarDigital Library
Rose Yu, Huida Qiu, Zhen Wen, Ching Yung Lin, and Yan Liu. 2016. A survey on social media anomaly detection. ArXiv e-prints (Jan. 2016).Google Scholar

Index Terms

Group Deviation Detection Methods: A Survey

Recommendations

Group Anomaly Detection: Past Notions, Present Insights, and Future Prospects
Abstract
Anomaly detection has evolved as a successful research subject in the areas such as bibliometrics, informatics and computer networks including security-based and social networks. Almost all existing anomaly detection techniques have some ...
Read More
Group Anomaly Detection Using Deep Generative Models
Machine Learning and Knowledge Discovery in Databases
Abstract
Unlike conventional anomaly detection research that focuses on point anomalies, our goal is to detect anomalous collections of individual data points. In particular, we perform group anomaly detection (GAD) with an emphasis on irregular group ...
Read More
Group anomaly detection for spatio-temporal collective behaviour scenarios in smart cities
IWCTS '22: Proceedings of the 15th ACM SIGSPATIAL International Workshop on Computational Transportation Science

Group anomaly detection in terms of detecting and predicting abnormal behaviour from entities as a group rather than as an individual, addresses a variety of challenges in spatio-temporal environments like e.g. traffic and transportation systems, smart ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 51, Issue 4
July 2019
765 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3236632
Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering / University of Florida / Gainesville, FL 32611
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 July 2018
- Accepted: 1 March 2018
- Revised: 1 February 2018
- Received: 1 November 2017
Published in csur Volume 51, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Group deviation detection
discriminative methods
generative models
group anomaly detection
group change detection
hypothesis testing
machine learning
Qualifiers
- survey
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 567
  Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Group Deviation Detection Methods: A Survey

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Group Anomaly Detection: Past Notions, Present Insights, and Future Prospects

Group Anomaly Detection Using Deep Generative Models

Group anomaly detection for spatio-temporal collective behaviour scenarios in smart cities