skip to main content
research-article

A Survey on Anomaly detection in Evolving Data: [with Application to Forest Fire Risk Prediction]

Published:29 May 2018Publication History
Skip Abstract Section

Abstract

Traditionally most of the anomaly detection algorithms have been designed for 'static' datasets, in which all the observations are available at one time. In non-stationary environments on the other hand, the same algorithms cannot be applied as the underlying data distributions change constantly and the same models are not valid. Hence, we need to devise adaptive models that take into account the dynamically changing characteristics of environments and detect anomalies in 'evolving' data. Over the last two decades, many algorithms have been proposed to detect anomalies in evolving data. Some of them consider scenarios where a sequence of objects (called data streams) with one or multiple features evolves over time. Whereas the others concentrate on more complex scenarios, where streaming objects with one or multiple features have causal/non-causal relationships with each other. The latter can be represented as evolving graphs. In this paper, we categorize existing strategies for detecting anomalies in both scenarios including the state-of-the-art techniques. Since label information is mostly unavailable in real-world applications when data evolves, we review the unsupervised approaches in this paper. We then present an interesting application example, i.e., forest re risk prediction, and conclude the paper with future research directions in this eld for researchers and industry.

References

  1. Australian bureau of meteorology weather stations. http://www.bom.gov.au/vic/forecasts/ re-map.shtml.Google ScholarGoogle Scholar
  2. Black saturday bush res. https://en.wikipedia.org/wiki/black saturday bush- res.Google ScholarGoogle Scholar
  3. Emergency management victoria strategic action plan. https://www.emv.vic.gov.au/plans/strategic-actionplan/.Google ScholarGoogle Scholar
  4. The human cost of natural disasters 2015: a global perspective, http://reliefweb.int/report/world/humancost- natural-disasters-2015-global-perspective.Google ScholarGoogle Scholar
  5. The united nations office for disaster risk reduction, http://www.unisdr.org/archive/42814.Google ScholarGoogle Scholar
  6. C. C. Aggarwal. Outlier Analysis. Springer, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In International Conference on Very Large Data Bases (VLDB), pages 81-92, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Agyemang, K. Barker, and R. Alhajj. A comprehensive survey of numeric and symbolic outlier mining techniques. Intelligent Data Analysis, 10(6):521-538, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Akoglu, H. Tong, and D. Koutra. Graph based anomaly detection and description: a survey. Data Min- ing and Knowledge Discovery, 29(3):626-688, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. Angiulli and F. Fassetti. Detecting distance-based outliers in streams of data. In Internaitonal Conference on Information and Knowledge Management (CIKM), pages 811-820, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. F. Angiulli and C. Pizzuti. Fast outlier detection in high dimensional spaces. In European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), volume 2, pages 15-26, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Araujo, S. Papadimitriou, S. Günnemann, C. Faloutsos, P. Basu, A. Swami, E. E. Papalexakis, and D. Koutra. Com2: fast automatic discovery of temporal (comet) communities. In Proceedings of the 18th Paci c-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 271-283. Springer, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  13. I. Assent, P. Kranen, C. Baldauf, and T. Seidl. Anyout: Anytime outlier detection on streaming data. In Database Systems for Advanced Applications (DAS- FAA), pages 228-242, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. V. Barnett and T. Lewis. Outliers in statistical data, volume 3. Wiley New York, 1994.Google ScholarGoogle Scholar
  15. L. Becchetti, C. Castillo, D. Donato, S. Leonardi, and R. A. Baeza-Yates. Link-based characterization and detection of web spam. In Proceedings of the 2nd Interna- tional Workshop on Adversarial Information Retrieval on the Web (AIRWeb), pages 1-8, 2006.Google ScholarGoogle Scholar
  16. R. J. Beckman and R. D. Cook. Outliers. Technomet- rics, 25(2):119-149, 1983.Google ScholarGoogle Scholar
  17. M. Berlingerio, D. Koutra, T. Eliassi-Rad, and C. Faloutsos. Netsimile: a scalable approach to size-independent network similarity. arXiv preprint arXiv:1209.2684, 2012.Google ScholarGoogle Scholar
  18. A. Bifet and R. Gavalda. Learning from time-changing data with adaptive windowing. In SIAM International Conference on Data Mining, pages 443-448, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  19. M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. LOF: identifying density-based local outliers. In ACM SIGMOD, volume 29, pages 93-104, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. F. Cao, M. Ester, W. Qian, and A. Zhou. Densitybased clustering over an evolving data stream with noise. In SIAM International Conference on Data Min- ing (SDM), pages 328-339, 2006.Google ScholarGoogle Scholar
  21. L. Cao, D. Yang, Q. Wang, Y. Yu, J. Wang, and E. A. Rundensteiner. Scalable distance-based outlier detection over high-volume data streams. In International Conference on Data Engineering (ICDE), pages 76-87, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  22. V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Computing Surveys, 41(3):1- 58, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection for discrete sequences: A survey. IEEE Trans- actions on Knowledge and Data Engineering (TKDE), 24(5):823-839, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Chen and L. Tu. Density-based clustering for realtime stream data. In International Conference on Knowledge Discovery and Data Mining (KDD), pages 133-142. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Chenaghlou, M. Moshtaghi, C. Leckie, and M. Salehi. An efficient method for anomaly detection in non-stationary data streams. In IEEE Global Commu- nications Conference (GLOBECOM), pages 1-6. IEEE, 2017.Google ScholarGoogle Scholar
  26. M. Chenaghlou, M. Moshtaghi, C. Leckie, and M. Salehi. Online clustering for evolving data streams with online anomaly detection. In Paci c-Asia Con- ference on Knowledge Discovery and Data Mining. Springer, 2018.Google ScholarGoogle Scholar
  27. Q. Ding, N. Katenka, P. Barford, E. Kolaczyk, and M. Crovella. Intrusion as (anti) social communication: characterization and detection. In Proceedings of the 18th ACM International Conference on Knowledge Dis- covery and Data Mining (SIGKDD), pages 886-894. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Elahi, K. Li, W. Nisar, X. Lv, and H. Wang. Ef- cient clustering-based outlier detection algorithm for dynamic data stream. In International Conference on Fuzzy Systems and Knowledge Discovery, volume 5, pages 298-304, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In International Confer- ence on Knowledge Discovery and Data Mining (KDD), volume 96, pages 226-231, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. C. A. Farris, C. Pezeshki, and L. F. Neuenschwander. A comparison of re probability maps derived from gis modeling and direct simulation techniques. In Joint Fire Science Conference and Workshop, pages 131-138, 1999.Google ScholarGoogle Scholar
  31. M. A. Finney. The challenge of quantitative risk analysis for wildland re. Forest Ecology and Management, 211(1):97-108, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  32. S. Guha, N. Mishra, G. Roy, and O. Schrijvers. Robust random cut forest based anomaly detection on streams. In International Conference on Machine Learning, pages 2712-2721, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Gupta, J. Gao, Y. Sun, and J. Han. Integrating community matching and outlier detection for mining evolutionary community outliers. In Proceedings of the 18th ACM International Conference on Knowledge Discov- ery and Data Mining (SIGKDD), pages 859-867. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. D. M. Hawkins. Identi cation of outliers, volume 11. Springer, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  35. K. Henderson, T. Eliassi-Rad, C. Faloutsos, L. Akoglu, L. Li, K. Maruhashi, B. A. Prakash, and H. Tong. Metric forensics: a multi-level approach for mining volatile graphs. In Proceedings of the 16th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 163-172. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. E. M. Knorr and R. T. Ng. A Uni ed Notion of Outliers: Properties and Computation. In ACM Interna- tional Conference on Knowledge Discovery and Data Minning (KDD), pages 219-222, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. E. M. Knorr, R. T. Ng, and V. Tucakov. Distancebased outliers: algorithms and applications. Interna- tional Journal on Very Large Data Bases (VLDB), 8(3- 4):237-253, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. E. M. Knox and R. T. Ng. Algorithms for mining distancebased outliers in large datasets. In International Conference on Very Large Data Bases (VLDB), pages 392-403, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Kontaki, A. Gounaris, A. N. Papadopoulos, K. Tsichlas, and Y. Manolopoulos. Continuous monitoring of distance-based outliers over data streams. In International Conference on Data Engineering (ICDE), pages 135-146, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. D. Koutra, N. Shah, J. T. Vogelstein, B. Gallagher, and C. Faloutsos. Delta con: Principled massive-graph similarity function with attribution. ACM Transactions on Knowledge Discovery from Data (TKDD), 10(3):28, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. P. Kranen, I. Assent, C. Baldauf, and T. Seidl. The clustree: indexing micro-clusters for anytime stream mining. Knowledge and Information Systems, 29(2):249- 272, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. H.-P. Kriegel, P. Kröger, and A. Zimek. Outlier detection techniques. In Tutorial at the International Confer- ence on Knowledge Discovery and Data Mining (KDD), 2010.Google ScholarGoogle Scholar
  43. A. G. McArthur. Fire behaviour in eucalypt forests. 1967.Google ScholarGoogle Scholar
  44. C. Miller and A. A. Ager. A review of recent advances in risk analysis for wild re management. International journal of wildland re, 22(1):1-14, 2013.Google ScholarGoogle Scholar
  45. M. Mongiovi, P. Bogdanov, R. Ranca, E. E. Papalexakis, C. Faloutsos, and A. K. Singh. Netspot: Spotting signi cant anomalous regions on dynamic networks. In Proceedings of the 13th SIAM International Conference on Data Mining (SDM), pages 28-36. SIAM, 2013.Google ScholarGoogle Scholar
  46. M. Moshtaghi, J. C. Bezdek, T. C. Havens, C. Leckie, S. Karunasekera, S. Rajasegarar, and M. Palaniswami. Streaming analysis in wireless sensor networks. Wireless Communications and Mobile Computing, 14(9):905- 921, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. M. Moshtaghi, J. C. Bezdek, C. Leckie, S. Karunasekera, and M. Palaniswami. Evolving Fuzzy Rules for Anomaly Detection in Data Streams. IEEE Transac- tions on Fuzzy Systems, 2014.Google ScholarGoogle Scholar
  48. I. Noble, A. Gill, and G. Bary. Mcarthur's re-danger meters expressed as equations. Australian Journal of Ecology, 5(2):201-203, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  49. E. E. Papalexakis, C. Faloutsos, and N. D. Sidiropoulos. Parcube: Sparse parallelizable tensor decompositions. In Machine Learning and Knowledge Discovery in Databases, pages 521-536. Springer, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. D. Pokrajac, A. Lazarevic, and L. J. Latecki. Incremental local outlier detection for data streams. In Sympo- sium on Computational Intelligence and Data Mining (CIDM), pages 504-515. IEEE, 2007.Google ScholarGoogle Scholar
  51. S. Ramaswamy, R. Rastogi, and K. Shim. Efficient algorithms for mining outliers from large data sets. In ACM International Conference in Management of Data (SIGMOD), volume 29, pages 427-438, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. L. Rashidi, A. Kan, J. Bailey, J. Chan, C. Leckie, W. Liu, S. Rajasegarar, and K. Ramamohanarao. Node re-ordering as a means of anomaly detection in timeevolving graphs. In Joint European Conference on Ma- chine Learning and Knowledge Discovery in Databases, pages 162-178. Springer, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  53. L. Rashidi, S. Rajasegarar, and C. Leckie. An embedding scheme for detecting anomalous block structured graphs. In Paci c-Asia Conference on Knowledge Dis- covery and Data Mining, pages 215-227. Springer, 2015.Google ScholarGoogle Scholar
  54. J. Ren and R. Ma. Density-based data streams clustering over sliding windows. In International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), volume 5, pages 248-252. IEEE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. R. A. Rossi, B. Gallagher, J. Neville, and K. Henderson. Modeling dynamic behavior in large evolving graphs. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining, pages 667-676. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. P. J. Rousseeuw and A. M. Leroy. Robust regression and outlier detection, volume 589. John Wiley & Sons, 2005.Google ScholarGoogle Scholar
  57. M. Salehi, C. Leckie, J. C. Bezdek, T. Vaithianathan, and X. Zhang. Fast memory efficient local outlier detection in data streams. IEEE Transactions on Knowledge and Data Engineering, 28(12):3246-3260, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. M. Salehi, C. A. Leckie, M. Moshtaghi, and T. Vaithianathan. A relevance weighted ensemble model for anomaly detection in switching data streams. In Paci c-Asia Conference on Knowledge Discovery and Data Mining, pages 461-473. Springer, 2014.Google ScholarGoogle Scholar
  59. M. Salehi, L. I. Rusu, T. Lynar, and A. Phan. Dynamic and robust wild re risk prediction system: an unsupervised approach. In SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 245- 254. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. S. Sathe and C. C. Aggarwal. Subspace histograms for outlier detection in linear time. Knowledge and Infor- mation Systems, pages 1-25, 2018.Google ScholarGoogle Scholar
  61. K. Yamanishi and J.-i. Takeuchi. A unifying framework for detecting outliers and change points from nonstationary time series data. In International Confer- ence on Knowledge Discovery and Data Mining (KDD), pages 676-681, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. K. Yamanishi, J.-I. Takeuchi, G. Williams, and P. Milne. On-line unsupervised outlier detection using nite mixtures with discounting learning algorithms. In International Conference on Knowledge Discovery and Data Mining (KDD), pages 320-324, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. D. Yang, E. A. Rundensteiner, and M. O. Ward. Neighbor-based pattern detection for windows over streaming data. In Advances in DB Tech., pages 529- 540, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. L. Yu, N. Wang, and X. Meng. Real-time forest re detection with wireless sensor networks. In International Conference on Wireless Communications, Networking and Mobile Computing, volume 2, pages 1214-1217, 2005.Google ScholarGoogle Scholar
  65. J. Zhang, Q. Gao, and H. Wang. Spot: A system for detecting projected outliers from high-dimensional data streams. In IEEE International Conference on Data Engineering, pages 1628-1631. IEEE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: an efficient data clustering method for very large databases. In ACM SIGMOD Record, volume 25, pages 103-114. ACM, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Y. Zhu and D. Shasha. Statstream: Statistical monitoring of thousands of data streams in real time. In International Conference on Very Large Data Bases (VLDB), pages 358-369. VLDB Endowment, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Survey on Anomaly detection in Evolving Data: [with Application to Forest Fire Risk Prediction]
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGKDD Explorations Newsletter
      ACM SIGKDD Explorations Newsletter  Volume 20, Issue 1
      June 2018
      59 pages
      ISSN:1931-0145
      EISSN:1931-0153
      DOI:10.1145/3229329
      Issue’s Table of Contents

      Copyright © 2018 Authors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 May 2018

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader