ABSTRACT
In this work a method for detecting distance-based outliers in data streams is presented. We deal with the sliding window model, where outlier queries are performed in order to detect anomalies in the current window. Two algorithms are presented. The first one exactly answers outlier queries, but has larger space requirements. The second algorithm is directly derived from the exact one, has limited memory requirements and returns an approximate answer based on accurate estimations with a statistical guarantee. Several experiments have been accomplished, confirming the effectiveness of the proposed approach and the high quality of approximate solutions.
- C. C. Aggarwal and P. S. Yu. Outlier detection for high dimensional data. In Proc. Int. Conference on Managment of Data (SIGMOD'01), 2001. Google ScholarDigital Library
- Charu C. Aggarwal. On abnormality detection in spuriously populated data streams. In SIAM Data Mining, 2005.Google ScholarCross Ref
- F. Angiulli, S. Basta, and C. Pizzuti. Distance-based detection and prediction of outliers. IEEE Transaction on Knowledge and Data Engineering, 18(2):145--160, February 2006. Google ScholarDigital Library
- F. Angiulli and C. Pizzuti. Fast outlier detection in large high-dimensional data sets. In Proc. Int. Conf. on Principles of Data Mining and Knowledge Discovery (PKDD'02), pages 15--26, 2002. Google ScholarDigital Library
- F. Angiulli and C. Pizzuti. Outlier mining in large high-dimensional data sets. IEEE Transaction on Knowledge and Data Engineering, 17(2):203--215, February 2005. Google ScholarDigital Library
- A. Arning, C. Aggarwal, and P. Raghavan. A linear method for deviation detection in large databases. In Proc. Int. Conf. on Knowledge Discovery and Data Mining (KDD'96), pages 164--169, 1996.Google Scholar
- Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. Models and issues in data stream systems. In PODS, pages 1--16, 2002. Google ScholarDigital Library
- V. Barnett and T. Lewis. Outliers in Statistical Data. John Wiley & Sons, 1994.Google Scholar
- S. D. Bay and M. Schwabacher. Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In Proc. Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), 2003. Google ScholarDigital Library
- N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The r*-tree: An efficient and robust access method for points and rectangles. In Proc. of the SIGMOD Conference, pages 322--331, 1990. Google ScholarDigital Library
- M. M. Breunig, H. Kriegel, R. T. Ng, and J. Sander. Lof: Identifying density-based local outliers. In Proc. Int. Conf. on Managment of Data (SIGMOD'00), 2000. Google ScholarDigital Library
- Edgar Chávez, Gonzalo Navarro, Ricardo A. Baeza-Yates, and José L. Marroquín. Searching in metric spaces. ACM Comput. Surv., 33(3):273--321, 2001. Google ScholarDigital Library
- Defense Advanced Research Projects Agency DARPA. Intrusion detection evaluation. In http://www.ll.mit.edu/IST/ideval/index.html.Google Scholar
- E. Eskin, A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo. A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. In Applications of Data Mining in Computer Security, Kluwer, 2002.Google Scholar
- Lukasz Golab and M. Tamer &3214;zsu. Issues in data stream management. SIGMOD Record, 32(2):5--14, 2003. Google ScholarDigital Library
- W. Jin, A. K. H. Tung, and J. Han. Mining top-n local outliers in large databases. In Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'01), 2001. Google ScholarDigital Library
- E. Knorr and R. Ng. Algorithms for mining distance-based outliers in large datasets. In Proc. Int. Conf. on Very Large Databases (VLDB98), pages 392--403, 1998. Google ScholarDigital Library
- E. Knorr and R. Ng. Finding intensional knowledge of distance-based outliers. In Proc. Int. Conf. on Very Large Databases (VLDB99), pages 211--222, 1999. Google ScholarDigital Library
- E. Knorr, R. Ng, and V. Tucakov. Distance-based outlier: algorithms and applications. VLDB Journal, 8(3-4):237--253, 2000. Google ScholarDigital Library
- Donald Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching. Addison-Wesley, 1997. Google ScholarDigital Library
- A. Lazarevic, L. Ertöz, V. Kumar, A. Ozgur, and J. Srivastava. A comparative study of anomaly detection schemes in network intrusion detection. In Proc. of the SIAM Int. Conf. on Data Mining, 2003.Google ScholarCross Ref
- S. Papadimitriou, H. Kitagawa, P. B. Gibbons, and C. Faloutsos. Loci: Fast outlier detection using the local correlation integral. In Proc. Int. Conf. on Data Enginnering (ICDE), pages 315--326, 2003.Google ScholarCross Ref
- Spiros Papadimitriou, Hiroyuki Kitagawa, Phillip B. Gibbons, and Christos Faloutsos. Loci: Fast outlier detection using the local correlation integral. In ICDE, pages 315--326, 2003.Google ScholarCross Ref
- S. Ramaswamy, R. Rastogi, and K. Shim. Efficient algorithms for mining outliers from large data sets. In Proc. Int. Conf. on Managment of Data (SIGMOD'00), pages 427--438, 2000. Google ScholarDigital Library
- S. Subramaniam, T. Palpanas, D. Papadopoulos, V. Kalogeraki, and D. Gunopulos. Online outlier detection in sensor data using non-parametric models. In International Conference on Very Large Data Bases, Seoul, Korea, September 12--15 2006. Google ScholarDigital Library
- O. Watanabe. Simple sampling techniques for discovery science. TIEICE: IEICE Transactions on Communications/Electronics/Information and Systems, E83-D(1):19--26, 2000.Google Scholar
- Kenji Yamanishi, Jun ichi Takeuchi, Graham J. Williams, and Peter Milne. On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. In KDD, pages 320--324, 2000. Google ScholarDigital Library
Index Terms
- Detecting distance-based outliers in streams of data
Recommendations
Distance-based outlier queries in data streams: the novel task and algorithms
This work proposes a method for detecting distance-based outliers in data streams under the sliding window model. The novel notion of one-time outlier query is introduced in order to detect anomalies in the current window at arbitrary points-in-time. ...
A Fast kNN-Based Approach for Time Sensitive Anomaly Detection over Data Streams
Computational Science – ICCS 2019AbstractAnomaly detection is an important data mining method aiming to discover outliers that show significant diversion from their expected behavior. A widely used criteria for determining outliers is based on the number of their neighboring elements, ...
Very efficient mining of distance-based outliers
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge managementIn this work a novel algorithm, named DOLPHIN, for detecting distance-based outliers is presented.
The proposed algorithm performs only two sequential scans of the dataset. It needs to store into main memory a portion of the dataset, to efficiently ...
Comments