ABSTRACT
Outlier detection is a well established area of statistics but most of the existing outlier detection techniques are designed for applications where the entire dataset is available for random access. A typical outlier detection technique constructs a standard data distribution or model and identifies the deviated data points from the model as outliers. Evidently these techniques are not suitable for online data streams where the entire dataset, due to its unbounded volume, is not available for random access. Moreover, the data distribution in data streams change over time which challenges the existing outlier detection techniques that assume a constant standard data distribution for the entire dataset. In addition, data streams are characterized by uncertainty which imposes further complexity. In this paper we propose an adaptive, online outlier detection technique addressing the aforementioned characteristics of data streams, called Adaptive Outlier Detection for Data Streams (A-ODDS), which identifies outliers with respect to all the received data points as well as temporally close data points. The temporally close data points are selected based on time and change of data distribution. We also present an efficient and online implementation of the technique and a performance study showing the superiority of A-ODDS over existing techniques in terms of accuracy and execution time on a real-life dataset collected from meteorological applications.
- D. Agarwal. An empirical bayes approach to detect anomalies in dynamic multidimensional arrays. In ICDM, pages 26--33, 2005. Google ScholarDigital Library
- S. Ando and E. Suzuki. Detection of unique temporal segments by information theoretic meta-clustering. In SIGKDD, pages 59--68, 2009. Google ScholarDigital Library
- F. Anguiulli and F. Fassetti. Detecting distance-based outliers in streams of data. In CIKM, pages 811--820, 2007. Google ScholarDigital Library
- V. Barnett and T. Lewis. Outliers in Statistical Data. John Wiley & Sons Inc., 1994.Google Scholar
- G. Barrenetxea, F. Ingelrest, G. Schaefer, and M. Vetterli. The hitchhiker's guide to successful wireless sensor network deployments. In SenSys, pages 43--56, 2008. Google ScholarDigital Library
- S. Basu and M. Meckesheimer. Automatic outlier detection for time series: an application to sensor data. Knowledge Information System, pages 137--154, 2007. Google ScholarDigital Library
- V. Chandola, A. Banarjee, and V. Kumar. Outlier detection: A survey. Technical report, University of Minnesota, 2007.Google Scholar
- S. Chen, H. Wang, S. Zhou, and P. Yu. Stop chasing trends: Discovering high order models in evolving data. In ICDE, pages 923--932, 2010. Google ScholarDigital Library
- CIMIS. California irrigation management information system, August.Google Scholar
- D. Curiac, O. Banias, F. Dragan, C. Volosencu, and O. Dranga. Malicious node detection in wireless sensor networks using an autoregression technique. In ICNS, pages 83--88, 2007. Google ScholarDigital Library
- W. Elderton and N. Johnson. System of Frequency Curves. Cambridge University Press, 1969.Google ScholarCross Ref
- J. Fan and J. S. Fast implementations of nonparametric curve estimators. Journal of Computational and Graphical Statistics, pages 35--56, 1994.Google Scholar
- G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In KDD, pages 97--106, 2001. Google ScholarDigital Library
- K. Ishida and H. Kitagawa. Detecting current outliers: Continuous outlier detection over time-series data streams. In DEXA, pages 255--268, 2008. Google ScholarDigital Library
- B. T. J., P. J. H. W., and T. R. D. Selecting the forgetting factor in subset autoregressive modelling. Time Series Analysis, 23:629--650, 2002.Google ScholarCross Ref
- N. Jiang and L. Gruenwald. Research issues in data stream association rule mining. ACM SIGMOD Record, pages 14--19, 2006. Google ScholarDigital Library
- R. Ng and H. J. Efficient and effective clustering methods for spatial data mining. In VLDB, pages 144--155, 2000. Google ScholarDigital Library
- S. Papadimitriou, H. Kitagawa, P. Gibbons, and C. Faloutsos. Loci: Fast outlier detection using the local correlation. In ICDE, pages 315--326, 2003.Google ScholarCross Ref
- V. Puttagunta and K. Kalpakis. Adaptive methods for activity monitoring of streaming data. In ICMLA, pages 197--203, 2002.Google Scholar
- S. Sadik. Outlier detection for data streams. Master's thesis, University of Oklahoma, 2010.Google Scholar
- S. Sadik and L. Gruenwald. Dbod-ds: Distance based outlier detection for data stream. In DEXA, pages 122--136, 2010. Google ScholarDigital Library
- D. Scott. Multivariate Density Estimation. A Wiley-Interscience Publication, 1992.Google Scholar
- K. Sequeira and M. Zaki. Admit: Anomaly-based data mining for intrusions. In SIGKDD, pages 386--395, 2002. Google ScholarDigital Library
- H. Solberg and A. Lahti. Detection of outliers in reference distributions: Performance of horn's algorithm. General Clinical Chemistry, pages 2326--2332, 2005.Google ScholarCross Ref
- M. Stonebraker, Çetintemel U., and S. Zdonik. The 8 requirements of real-time stream processing. ACM SIGMOD Record, pages 42--47, 2005. Google ScholarDigital Library
- S. Subramaniam, T. Palpanas, D. Papadoppoulos, V. Kalogeraki, and D. Gunopulos. Online outlier detection in sensor data using non-parametric models. In VLDB, pages 187--199, 2006. Google ScholarDigital Library
Index Terms
- Online outlier detection for data streams
Recommendations
Distance-based outlier detection in data streams
Continuous outlier detection in data streams has important applications in fraud detection, network security, and public health. The arrival and departure of data objects in a streaming manner impose new challenges for outlier detection algorithms, ...
Continuous outlier detection in data streams: an extensible framework and state-of-the-art algorithms
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of DataAnomaly detection is an important data mining task, aiming at the discovery of elements that show significant diversion from the expected behavior; such elements are termed as outliers. One of the most widely employed criteria for determining whether an ...
Comments