skip to main content
survey

Group Deviation Detection Methods: A Survey

Authors Info & Claims
Published:25 July 2018Publication History
Skip Abstract Section

Abstract

Pointwise anomaly detection and change detection focus on the study of individual data instances; however, an emerging area of research involves groups or collections of observations. From applications of high-energy particle physics to health care collusion, group deviation detection techniques result in novel research discoveries, mitigation of risks, prevention of malicious collaborative activities, and other interesting explanatory insights. In particular, static group anomaly detection is the process of identifying groups that are not consistent with regular group patterns, while dynamic group change detection assesses significant differences in the state of a group over a period of time. Since both group anomaly detection and group change detection share fundamental ideas, this survey article provides a clearer and deeper understanding of group deviation detection research in static and dynamic situations.

References

  1. Claudio Agostini, Eduardo Saavedra, and Manuel Willington. 2011. Collusion on private health insurance coverage in chile. Journal of Competition Law and Economics 7, 1 (2011), 205--240.Google ScholarGoogle ScholarCross RefCross Ref
  2. Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, and Eric P. Xing. 2008. Mixed membership stochastic blockmodels. Journal of Machine Learning Research 9 (2008), 1981--2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Akaike. 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control 19, 6 (Dec. 1974), 716--723.Google ScholarGoogle ScholarCross RefCross Ref
  4. Yoav Benjamini and Yosef Hochberg. 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological) 57, 1 (1995), 289--300.Google ScholarGoogle ScholarCross RefCross Ref
  5. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3 (March 2003), 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Stephen P. Borgatti, Ajay Mehra, Daniel J. Brass, and Giuseppe Labianca. 2009. Network analysis in the social sciences. Science 323, 5916 (Feb. 2009), 892--895.Google ScholarGoogle ScholarCross RefCross Ref
  7. Fred H. Borgen and Mark J. Seling. 1978. Uses of discriminant analysis following MANOVA: Multivariate statistics for multivariate purposes.Journal of Applied Psychology 63, 6 (1978), 689.Google ScholarGoogle Scholar
  8. George E. P. Box. 1954. Some theorems on quadratic forms applied in the study of analysis of variance problems, II. Effects of inequality of variance and of correlation between errors in the two-way classification. Annals of Mathematical Statistics 25, 3 (1954), 484--498.Google ScholarGoogle ScholarCross RefCross Ref
  9. Guilherme O. Campos, Arthur Zimek, Jörg Sander, Ricardo J. G. B. Campello, Barbora Micenková, Erich Schubert, Ira Assent, and Michael E. Houle. 2016. On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study. Vol. 30. Springer, 891--927. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. Computer Surveys 41, 3, Article 15 (2009), 58 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Xiaofan Chen and Shunzheng Yu. 2016. A collaborative intrusion detection system against DDoS for SDN. IEICE Transactions on Information and Systems 99, 9 (2016), 2395--2399.Google ScholarGoogle ScholarCross RefCross Ref
  12. Xi C. Chen, Abdullah Mueen, Vijay K. Narayanan, Nikos Karampatziakis, Gagan Bansal, and Vipin Kumar. 2014. Online discovery of group level events in time series. In Proceedings of the 2014 SIAM International Conference on Data Mining, 632--640.Google ScholarGoogle ScholarCross RefCross Ref
  13. Timothy Costigan. 2005. Bonferroni inequalities and intervals. In Encyclopedia of Biostatistics.Google ScholarGoogle Scholar
  14. Hanbo Dai, Feida Zhu, Ee-Peng Lim, and Hwee Hwa Pang. 2012. Detecting extreme rank anomalous collections. In Proceedings of the 2012 SIAM International Conference on Data Mining. SIAM, 883--894.Google ScholarGoogle ScholarCross RefCross Ref
  15. Donald A. Darling. 1957. The Kolmogorov-Smirnov, Cramer-Von Mises tests. Annals of Mathematical Statistics 28, 4 (1957), 823--838.Google ScholarGoogle ScholarCross RefCross Ref
  16. Arnaud Doucet and Adam M. Johansen. 2009. A tutorial on particle filtering and smoothing: Fifteen years later. Handbook of Nonlinear Filtering 12, 656--704 (2009), 3.Google ScholarGoogle Scholar
  17. Ines Fãrber, Stephan Gũnnemann, Hans-peter Kriegel, Peer Krõger, Emmanuel Mũller, Erich Schubert, Thomas Seidl, and Arthur Zimek. 2010. On Using Class-Labels in Evaluation of Clusterings. Association for Computing Machinery (ACM).Google ScholarGoogle Scholar
  18. Ullas Gargi, Rangachar Kasturi, and Susan H. Strayer. 2000. Performance characterization of video-shot-change detection methods. IEEE Transactions on Circuits and Systems for Video Technology 10, 1 (2000), 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Samuel J. Gershman and David M. Blei. 2012. A tutorial on Bayesian nonparametric models. Journal of Mathematical Psychology 56, 1 (2012), 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  20. Walter R. Gilks, Sylvia Richardson, and David Spiegelhalter. 1995. Markov Chain Monte Carlo in Practice. CRC Press.Google ScholarGoogle Scholar
  21. A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. Ch. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley. 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 101, 23 (2000), e215--e220.Google ScholarGoogle ScholarCross RefCross Ref
  22. Jorge Guevara, Stephane Canu, and R Hirata. 2015. Support measure data description for group anomaly detection. ODDx3 Workshop on Outlier Definition, Detection, and Description at ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’15).Google ScholarGoogle Scholar
  23. Manish Gupta, Jing Gao, Charu Aggarwal, Jiawei Han, and Jiawei Gupta. 2013. Outlier detection for temporal data: A survey. IEEE Transactions on Knowledge and Data Engineering (TKDE’13) 25, 1 (2013), 1--20.Google ScholarGoogle Scholar
  24. Maria Halkidi, Yannis Batistakis, and Michalis Vazirgiannis. 2001. On clustering validation techniques. Journal of Intelligent Information Systems 17, 2--3 (2001), 107--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. David V. Hinkley. 1975. On power transformations to symmetry. Biometrika 62, 1 (1975), 101--111.Google ScholarGoogle ScholarCross RefCross Ref
  26. Victoria J. Hodge and Jim Austin. 2004. A survey of outlier detection methodologies. Artificial Intelligence Review 22 (2004), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Matthew D. Hoffman, David M. Blei, Chong Wang, and John Paisley. 2013. Stochastic variational inference. Journal of Machine Learning Research 14, 1 (2013), 1303--1347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Anil K. Jain. 2010. Data clustering: 50 years beyond K-means. Pattern Recognition Letters 31, 8 (2010), 651--666. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Gordon V. Kass. 1980. An exploratory technique for investigating large quantities of categorical data. Applied Statistics 29, 2 (1980), 119--127.Google ScholarGoogle ScholarCross RefCross Ref
  30. Mikaela Keller and Samy Bengio. 2005. Theme topic mixture model: A graphical model for document representation. Idiap-Research Report 04-05 (2005).Google ScholarGoogle Scholar
  31. Elyor Kodirov, Tao Xiang, Zhenyong Fu, and Shaogang Gong. 2015. Unsupervised domain adaptation for zero-shot learning. In Proceedings of the IEEE International Conference on Computer Vision. 2452--2460. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Steve W. J. Kozlowski and Bradford S. Bell. 2003. Work groups and teams in organizations. In Handbook of Psychology (Vol. 12): Industrial and Organizational Psychology, W. C. Borman, D. R. Ilgen, and R. J. Klimoski (Eds.). New York, Wiley-Blackwell, 333--375.Google ScholarGoogle Scholar
  33. Pavel Laskov, Patrick Düssel, Christin Schäfer, and Konrad Rieck. 2005. Learning intrusion detection: Supervised or unsupervised? In Image Analysis and Processing (ICIAP’05), 50--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Rainer Lienhart. 2001. Reliable transition detection in videos: A survey and practitioner’s guide. International Journal of Image and Graphics 1, 3 (2001), 469--486.Google ScholarGoogle ScholarCross RefCross Ref
  35. J. J. A. Moors. 1988. A quantile alternative for kurtosis. The Statistician: Journal of the Institute of Statisticians 37 (1988), 25--32.Google ScholarGoogle ScholarCross RefCross Ref
  36. Krikamol Muandet and Bernhard Schölkopf. 2013. One-class support measure machines for group anomaly detection. In Conference on Uncertainty in Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Arjun Mukherjee, Bing Liu, Junhui Wang, Natalie Glance, and Nitin Jindal. 2011. Detecting group review spam. In Proceedings of the 20th International Conference Companion on World Wide Web (WWW’11). ACM, New York, 93--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jorge Luis Rivero Pérez and Bernardete Ribeiro. 2016. Attribute learning for network intrusion detection. In International Neural Network Society Conference on Big Data (INNS’16). Springer, 39--49.Google ScholarGoogle Scholar
  39. Ian Porteous, David Newman, Alexander Ihler, Arthur Asuncion, Padhraic Smyth, and Max Welling. 2008. Fast collapsed Gibbs sampling for latent Dirichlet allocation. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 569--577. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Jean-François Quessy, Anne-Catherine Favre, Mĺriem Saŕd, and Maryse Champagne. 2011. Statistical inference in Lombard’s smooth-change model. Environmetrics 22, 7 (2011), 882--893.Google ScholarGoogle ScholarCross RefCross Ref
  41. Jaxk Reeves, Jien Chen, Xiaolan L. Wang, Robert Lund, and Qi Qi Lu. 2007. A review and comparison of changepoint detection techniques for climate data. Journal of Applied Meteorology and Climatology 46, 6 (2007), 900--915.Google ScholarGoogle ScholarCross RefCross Ref
  42. Douglas A. Reynolds, Thomas F. Quatieri, and Robert B. Dunn. 2000. Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 10, 1--3 (2000), 19--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. 2010. Earthquake shakes Twitter users: Real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web. ACM, 851--860. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Bernhard Schölkopf, John C. Platt, John C. Shawe-Taylor, Alex J. Smola, and Robert C. Williamson. 2001. Estimating the support of a high-dimensional distribution. Neural Computing 13, 7 (July 2001), 1443--1471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Gideon Schwarz. 1978. Estimating the dimension of a model. Annals of Statistics 6, 2 (1978), 461--464.Google ScholarGoogle ScholarCross RefCross Ref
  46. Ashbindu Singh. 1989. Review article digital change detection techniques using remotely-sensed data. International Journal of Remote Sensing 10, 6 (1989), 989--1003.Google ScholarGoogle ScholarCross RefCross Ref
  47. Richard Socher, Milind Ganjoo, Christopher D. Manning, and Andrew Ng. 2013. Zero-shot learning through cross-modal transfer. In Advances in Neural Information Processing Systems. 935--943. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Hossein Soleimani and David J. Miller. 2015. Parsimonious topic models with salient word discovery. IEEE Transactions on Knowledge and Data Engineering 27, 3 (2015), 824--837.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Hossein Soleimani and David J. Miller. 2016. ATD: Anomalous topic discovery in high dimensional discrete data. IEEE Transactions on Knowledge and Data Engineering 28, 9 (Sept. 2016), 2267--2280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Charles Spearman. 1904. The proof and measurement of association between two things. American Journal of Psychology 15 (1904), 72--101.Google ScholarGoogle ScholarCross RefCross Ref
  51. Michael Steinbach, Levent Ertöz, and Vipin Kumar. 2004. The challenges of clustering high dimensional data. In New Directions in Statistical Physics. Springer, 273--309.Google ScholarGoogle Scholar
  52. Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. 2005. Introduction to Data Mining. Addison-Wesley Longman Publishing Co., Boston.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. David M. J. Tax and Robert P. W. Duin. 2004. Support vector data description. Machine Learning 54, 1 (2004), 45--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. T. Vatanen, M. Kuusela, E. Malmi, T. Raiko, T. Aaltonen, and Y. Nagai. 2012. Semi-supervised detection of collective anomalies with an application in high energy particle physics. In The 2012 International Joint Conference on Neural Networks (IJCNN’12). 1--8.Google ScholarGoogle Scholar
  55. Rand R. Wilcox. 1995. Comparing two independent groups via multiple quantiles. Journal of the Royal Statistical Society: Series D (The Statistician) 44, 1 (1995), 91.Google ScholarGoogle Scholar
  56. Rand R. Wilcox and David M. Erceg-Hurn. 2012. Comparing two dependent groups via quantiles. Journal of Applied Statistics 39, 12 (2012), 2655--2664.Google ScholarGoogle ScholarCross RefCross Ref
  57. Weng-Keen Wong, Andrew Moore, Gregory Cooper, and Michael Wagner. 2002. Rule-based anomaly pattern detection for detecting disease outbreaks. In Proceedings of the 18th National Conference on Artificial Intelligence. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Weng-Keen Wong, Andrew Moore, Gregory Cooper, and Michael Wagner. 2003. WSARE: What’s strange about recent events?Journal of Urban Health: Bulletin of the New York Academy of Medicine 80, Suppl 1 (2003), i66.Google ScholarGoogle Scholar
  59. Yongqin Xian, Christoph H. Lampert, Bernt Schiele, and Zeynep Akata. 2017. Zero-shot learning-A comprehensive evaluation of the good, the bad and the ugly. arXiv preprint arXiv:1707.00600 (2017).Google ScholarGoogle Scholar
  60. Yao Xie and David Siegmund. 2012. Sequential multi-sensor change-point detection. ArXiv e-prints (July 2012).Google ScholarGoogle Scholar
  61. Liang Xiong. 2013. On learning from collective data. In Dissertations, 560.Google ScholarGoogle Scholar
  62. Liang Xiong, Barnabás Póczos, and Jeff Schneider. 2011. Group anomaly detection using flexible genre models. In Advances in Neural Information Processing Systems 24. Curran Associates, 1071--1079. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Liang Xiong, Barnabás Póczos, Jeff Schneider, Andrew Connolly, and Jake VanderPlas. 2011. Hierarchical probabilistic models for group anomaly detection. In International Conference on Artificial Intelligence and Statistics (AISTATS’11).Google ScholarGoogle Scholar
  64. Rose Yu, Xinran He, and Yan Liu. 2014. GLAD: Group anomaly detection in social media analysis. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, New York, 372--381. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Rose Yu, Huida Qiu, Zhen Wen, Ching Yung Lin, and Yan Liu. 2016. A survey on social media anomaly detection. ArXiv e-prints (Jan. 2016).Google ScholarGoogle Scholar

Index Terms

  1. Group Deviation Detection Methods: A Survey

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Computing Surveys
          ACM Computing Surveys  Volume 51, Issue 4
          July 2019
          765 pages
          ISSN:0360-0300
          EISSN:1557-7341
          DOI:10.1145/3236632
          • Editor:
          • Sartaj Sahni
          Issue’s Table of Contents

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 July 2018
          • Accepted: 1 March 2018
          • Revised: 1 February 2018
          • Received: 1 November 2017
          Published in csur Volume 51, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • survey
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader