Abstract
Traditionally most of the anomaly detection algorithms have been designed for 'static' datasets, in which all the observations are available at one time. In non-stationary environments on the other hand, the same algorithms cannot be applied as the underlying data distributions change constantly and the same models are not valid. Hence, we need to devise adaptive models that take into account the dynamically changing characteristics of environments and detect anomalies in 'evolving' data. Over the last two decades, many algorithms have been proposed to detect anomalies in evolving data. Some of them consider scenarios where a sequence of objects (called data streams) with one or multiple features evolves over time. Whereas the others concentrate on more complex scenarios, where streaming objects with one or multiple features have causal/non-causal relationships with each other. The latter can be represented as evolving graphs. In this paper, we categorize existing strategies for detecting anomalies in both scenarios including the state-of-the-art techniques. Since label information is mostly unavailable in real-world applications when data evolves, we review the unsupervised approaches in this paper. We then present an interesting application example, i.e., forest re risk prediction, and conclude the paper with future research directions in this eld for researchers and industry.
- Australian bureau of meteorology weather stations. http://www.bom.gov.au/vic/forecasts/ re-map.shtml.Google Scholar
- Black saturday bush res. https://en.wikipedia.org/wiki/black saturday bush- res.Google Scholar
- Emergency management victoria strategic action plan. https://www.emv.vic.gov.au/plans/strategic-actionplan/.Google Scholar
- The human cost of natural disasters 2015: a global perspective, http://reliefweb.int/report/world/humancost- natural-disasters-2015-global-perspective.Google Scholar
- The united nations office for disaster risk reduction, http://www.unisdr.org/archive/42814.Google Scholar
- C. C. Aggarwal. Outlier Analysis. Springer, 2013. Google ScholarDigital Library
- C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In International Conference on Very Large Data Bases (VLDB), pages 81-92, 2003. Google ScholarDigital Library
- M. Agyemang, K. Barker, and R. Alhajj. A comprehensive survey of numeric and symbolic outlier mining techniques. Intelligent Data Analysis, 10(6):521-538, 2006. Google ScholarDigital Library
- L. Akoglu, H. Tong, and D. Koutra. Graph based anomaly detection and description: a survey. Data Min- ing and Knowledge Discovery, 29(3):626-688, 2015. Google ScholarDigital Library
- F. Angiulli and F. Fassetti. Detecting distance-based outliers in streams of data. In Internaitonal Conference on Information and Knowledge Management (CIKM), pages 811-820, 2007. Google ScholarDigital Library
- F. Angiulli and C. Pizzuti. Fast outlier detection in high dimensional spaces. In European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), volume 2, pages 15-26, 2002. Google ScholarDigital Library
- M. Araujo, S. Papadimitriou, S. Günnemann, C. Faloutsos, P. Basu, A. Swami, E. E. Papalexakis, and D. Koutra. Com2: fast automatic discovery of temporal (comet) communities. In Proceedings of the 18th Paci c-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 271-283. Springer, 2014.Google ScholarCross Ref
- I. Assent, P. Kranen, C. Baldauf, and T. Seidl. Anyout: Anytime outlier detection on streaming data. In Database Systems for Advanced Applications (DAS- FAA), pages 228-242, 2012. Google ScholarDigital Library
- V. Barnett and T. Lewis. Outliers in statistical data, volume 3. Wiley New York, 1994.Google Scholar
- L. Becchetti, C. Castillo, D. Donato, S. Leonardi, and R. A. Baeza-Yates. Link-based characterization and detection of web spam. In Proceedings of the 2nd Interna- tional Workshop on Adversarial Information Retrieval on the Web (AIRWeb), pages 1-8, 2006.Google Scholar
- R. J. Beckman and R. D. Cook. Outliers. Technomet- rics, 25(2):119-149, 1983.Google Scholar
- M. Berlingerio, D. Koutra, T. Eliassi-Rad, and C. Faloutsos. Netsimile: a scalable approach to size-independent network similarity. arXiv preprint arXiv:1209.2684, 2012.Google Scholar
- A. Bifet and R. Gavalda. Learning from time-changing data with adaptive windowing. In SIAM International Conference on Data Mining, pages 443-448, 2007.Google ScholarCross Ref
- M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. LOF: identifying density-based local outliers. In ACM SIGMOD, volume 29, pages 93-104, 2000. Google ScholarDigital Library
- F. Cao, M. Ester, W. Qian, and A. Zhou. Densitybased clustering over an evolving data stream with noise. In SIAM International Conference on Data Min- ing (SDM), pages 328-339, 2006.Google Scholar
- L. Cao, D. Yang, Q. Wang, Y. Yu, J. Wang, and E. A. Rundensteiner. Scalable distance-based outlier detection over high-volume data streams. In International Conference on Data Engineering (ICDE), pages 76-87, 2014.Google ScholarCross Ref
- V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Computing Surveys, 41(3):1- 58, 2009. Google ScholarDigital Library
- V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection for discrete sequences: A survey. IEEE Trans- actions on Knowledge and Data Engineering (TKDE), 24(5):823-839, 2012. Google ScholarDigital Library
- Y. Chen and L. Tu. Density-based clustering for realtime stream data. In International Conference on Knowledge Discovery and Data Mining (KDD), pages 133-142. ACM, 2007. Google ScholarDigital Library
- M. Chenaghlou, M. Moshtaghi, C. Leckie, and M. Salehi. An efficient method for anomaly detection in non-stationary data streams. In IEEE Global Commu- nications Conference (GLOBECOM), pages 1-6. IEEE, 2017.Google Scholar
- M. Chenaghlou, M. Moshtaghi, C. Leckie, and M. Salehi. Online clustering for evolving data streams with online anomaly detection. In Paci c-Asia Con- ference on Knowledge Discovery and Data Mining. Springer, 2018.Google Scholar
- Q. Ding, N. Katenka, P. Barford, E. Kolaczyk, and M. Crovella. Intrusion as (anti) social communication: characterization and detection. In Proceedings of the 18th ACM International Conference on Knowledge Dis- covery and Data Mining (SIGKDD), pages 886-894. ACM, 2012. Google ScholarDigital Library
- M. Elahi, K. Li, W. Nisar, X. Lv, and H. Wang. Ef- cient clustering-based outlier detection algorithm for dynamic data stream. In International Conference on Fuzzy Systems and Knowledge Discovery, volume 5, pages 298-304, 2008. Google ScholarDigital Library
- M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In International Confer- ence on Knowledge Discovery and Data Mining (KDD), volume 96, pages 226-231, 1996. Google ScholarDigital Library
- C. A. Farris, C. Pezeshki, and L. F. Neuenschwander. A comparison of re probability maps derived from gis modeling and direct simulation techniques. In Joint Fire Science Conference and Workshop, pages 131-138, 1999.Google Scholar
- M. A. Finney. The challenge of quantitative risk analysis for wildland re. Forest Ecology and Management, 211(1):97-108, 2005.Google ScholarCross Ref
- S. Guha, N. Mishra, G. Roy, and O. Schrijvers. Robust random cut forest based anomaly detection on streams. In International Conference on Machine Learning, pages 2712-2721, 2016. Google ScholarDigital Library
- M. Gupta, J. Gao, Y. Sun, and J. Han. Integrating community matching and outlier detection for mining evolutionary community outliers. In Proceedings of the 18th ACM International Conference on Knowledge Discov- ery and Data Mining (SIGKDD), pages 859-867. ACM, 2012. Google ScholarDigital Library
- D. M. Hawkins. Identi cation of outliers, volume 11. Springer, 1980.Google ScholarCross Ref
- K. Henderson, T. Eliassi-Rad, C. Faloutsos, L. Akoglu, L. Li, K. Maruhashi, B. A. Prakash, and H. Tong. Metric forensics: a multi-level approach for mining volatile graphs. In Proceedings of the 16th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 163-172. ACM, 2010. Google ScholarDigital Library
- E. M. Knorr and R. T. Ng. A Uni ed Notion of Outliers: Properties and Computation. In ACM Interna- tional Conference on Knowledge Discovery and Data Minning (KDD), pages 219-222, 1997. Google ScholarDigital Library
- E. M. Knorr, R. T. Ng, and V. Tucakov. Distancebased outliers: algorithms and applications. Interna- tional Journal on Very Large Data Bases (VLDB), 8(3- 4):237-253, 2000. Google ScholarDigital Library
- E. M. Knox and R. T. Ng. Algorithms for mining distancebased outliers in large datasets. In International Conference on Very Large Data Bases (VLDB), pages 392-403, 1998. Google ScholarDigital Library
- M. Kontaki, A. Gounaris, A. N. Papadopoulos, K. Tsichlas, and Y. Manolopoulos. Continuous monitoring of distance-based outliers over data streams. In International Conference on Data Engineering (ICDE), pages 135-146, 2011. Google ScholarDigital Library
- D. Koutra, N. Shah, J. T. Vogelstein, B. Gallagher, and C. Faloutsos. Delta con: Principled massive-graph similarity function with attribution. ACM Transactions on Knowledge Discovery from Data (TKDD), 10(3):28, 2016. Google ScholarDigital Library
- P. Kranen, I. Assent, C. Baldauf, and T. Seidl. The clustree: indexing micro-clusters for anytime stream mining. Knowledge and Information Systems, 29(2):249- 272, 2011. Google ScholarDigital Library
- H.-P. Kriegel, P. Kröger, and A. Zimek. Outlier detection techniques. In Tutorial at the International Confer- ence on Knowledge Discovery and Data Mining (KDD), 2010.Google Scholar
- A. G. McArthur. Fire behaviour in eucalypt forests. 1967.Google Scholar
- C. Miller and A. A. Ager. A review of recent advances in risk analysis for wild re management. International journal of wildland re, 22(1):1-14, 2013.Google Scholar
- M. Mongiovi, P. Bogdanov, R. Ranca, E. E. Papalexakis, C. Faloutsos, and A. K. Singh. Netspot: Spotting signi cant anomalous regions on dynamic networks. In Proceedings of the 13th SIAM International Conference on Data Mining (SDM), pages 28-36. SIAM, 2013.Google Scholar
- M. Moshtaghi, J. C. Bezdek, T. C. Havens, C. Leckie, S. Karunasekera, S. Rajasegarar, and M. Palaniswami. Streaming analysis in wireless sensor networks. Wireless Communications and Mobile Computing, 14(9):905- 921, 2014. Google ScholarDigital Library
- M. Moshtaghi, J. C. Bezdek, C. Leckie, S. Karunasekera, and M. Palaniswami. Evolving Fuzzy Rules for Anomaly Detection in Data Streams. IEEE Transac- tions on Fuzzy Systems, 2014.Google Scholar
- I. Noble, A. Gill, and G. Bary. Mcarthur's re-danger meters expressed as equations. Australian Journal of Ecology, 5(2):201-203, 1980.Google ScholarCross Ref
- E. E. Papalexakis, C. Faloutsos, and N. D. Sidiropoulos. Parcube: Sparse parallelizable tensor decompositions. In Machine Learning and Knowledge Discovery in Databases, pages 521-536. Springer, 2012.Google ScholarDigital Library
- D. Pokrajac, A. Lazarevic, and L. J. Latecki. Incremental local outlier detection for data streams. In Sympo- sium on Computational Intelligence and Data Mining (CIDM), pages 504-515. IEEE, 2007.Google Scholar
- S. Ramaswamy, R. Rastogi, and K. Shim. Efficient algorithms for mining outliers from large data sets. In ACM International Conference in Management of Data (SIGMOD), volume 29, pages 427-438, 2000. Google ScholarDigital Library
- L. Rashidi, A. Kan, J. Bailey, J. Chan, C. Leckie, W. Liu, S. Rajasegarar, and K. Ramamohanarao. Node re-ordering as a means of anomaly detection in timeevolving graphs. In Joint European Conference on Ma- chine Learning and Knowledge Discovery in Databases, pages 162-178. Springer, 2016.Google ScholarCross Ref
- L. Rashidi, S. Rajasegarar, and C. Leckie. An embedding scheme for detecting anomalous block structured graphs. In Paci c-Asia Conference on Knowledge Dis- covery and Data Mining, pages 215-227. Springer, 2015.Google Scholar
- J. Ren and R. Ma. Density-based data streams clustering over sliding windows. In International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), volume 5, pages 248-252. IEEE, 2009. Google ScholarDigital Library
- R. A. Rossi, B. Gallagher, J. Neville, and K. Henderson. Modeling dynamic behavior in large evolving graphs. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining, pages 667-676. ACM, 2013. Google ScholarDigital Library
- P. J. Rousseeuw and A. M. Leroy. Robust regression and outlier detection, volume 589. John Wiley & Sons, 2005.Google Scholar
- M. Salehi, C. Leckie, J. C. Bezdek, T. Vaithianathan, and X. Zhang. Fast memory efficient local outlier detection in data streams. IEEE Transactions on Knowledge and Data Engineering, 28(12):3246-3260, 2016. Google ScholarDigital Library
- M. Salehi, C. A. Leckie, M. Moshtaghi, and T. Vaithianathan. A relevance weighted ensemble model for anomaly detection in switching data streams. In Paci c-Asia Conference on Knowledge Discovery and Data Mining, pages 461-473. Springer, 2014.Google Scholar
- M. Salehi, L. I. Rusu, T. Lynar, and A. Phan. Dynamic and robust wild re risk prediction system: an unsupervised approach. In SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 245- 254. ACM, 2016. Google ScholarDigital Library
- S. Sathe and C. C. Aggarwal. Subspace histograms for outlier detection in linear time. Knowledge and Infor- mation Systems, pages 1-25, 2018.Google Scholar
- K. Yamanishi and J.-i. Takeuchi. A unifying framework for detecting outliers and change points from nonstationary time series data. In International Confer- ence on Knowledge Discovery and Data Mining (KDD), pages 676-681, 2002. Google ScholarDigital Library
- K. Yamanishi, J.-I. Takeuchi, G. Williams, and P. Milne. On-line unsupervised outlier detection using nite mixtures with discounting learning algorithms. In International Conference on Knowledge Discovery and Data Mining (KDD), pages 320-324, 2000. Google ScholarDigital Library
- D. Yang, E. A. Rundensteiner, and M. O. Ward. Neighbor-based pattern detection for windows over streaming data. In Advances in DB Tech., pages 529- 540, 2009. Google ScholarDigital Library
- L. Yu, N. Wang, and X. Meng. Real-time forest re detection with wireless sensor networks. In International Conference on Wireless Communications, Networking and Mobile Computing, volume 2, pages 1214-1217, 2005.Google Scholar
- J. Zhang, Q. Gao, and H. Wang. Spot: A system for detecting projected outliers from high-dimensional data streams. In IEEE International Conference on Data Engineering, pages 1628-1631. IEEE, 2008. Google ScholarDigital Library
- T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: an efficient data clustering method for very large databases. In ACM SIGMOD Record, volume 25, pages 103-114. ACM, 1996. Google ScholarDigital Library
- Y. Zhu and D. Shasha. Statstream: Statistical monitoring of thousands of data streams in real time. In International Conference on Very Large Data Bases (VLDB), pages 358-369. VLDB Endowment, 2002. Google ScholarDigital Library
Index Terms
- A Survey on Anomaly detection in Evolving Data: [with Application to Forest Fire Risk Prediction]
Recommendations
Unsupervised Anomaly Detection in Stream Data with Online Evolving Spiking Neural Networks
AbstractUnsupervised anomaly discovery in stream data is a research topic with many practical applications. However, in many cases, it is not easy to collect enough training data with labeled anomalies for supervised learning of an anomaly ...
Evolving anomaly detection for network streaming data
AbstractNetwork security has always been a concern because it remains to be an unresolved problem. Unlike signature-based methods, anomaly-based methods can detect novel attacks and thus have gained increasing attention over the past decades. However, as ...
Adaptive anomaly detection with evolving connectionist systems
Special issue: Network and information security: A computational intelligence approachAnomaly detection holds great potential for detecting previously unknown attacks. In order to be effective in a practical environment, anomaly detection systems have to be capable of online learning and handling concept drift. In this paper, a new ...
Comments