skip to main content
10.1145/2076623.2076635acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

Online outlier detection for data streams

Published:21 September 2011Publication History

ABSTRACT

Outlier detection is a well established area of statistics but most of the existing outlier detection techniques are designed for applications where the entire dataset is available for random access. A typical outlier detection technique constructs a standard data distribution or model and identifies the deviated data points from the model as outliers. Evidently these techniques are not suitable for online data streams where the entire dataset, due to its unbounded volume, is not available for random access. Moreover, the data distribution in data streams change over time which challenges the existing outlier detection techniques that assume a constant standard data distribution for the entire dataset. In addition, data streams are characterized by uncertainty which imposes further complexity. In this paper we propose an adaptive, online outlier detection technique addressing the aforementioned characteristics of data streams, called Adaptive Outlier Detection for Data Streams (A-ODDS), which identifies outliers with respect to all the received data points as well as temporally close data points. The temporally close data points are selected based on time and change of data distribution. We also present an efficient and online implementation of the technique and a performance study showing the superiority of A-ODDS over existing techniques in terms of accuracy and execution time on a real-life dataset collected from meteorological applications.

References

  1. D. Agarwal. An empirical bayes approach to detect anomalies in dynamic multidimensional arrays. In ICDM, pages 26--33, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Ando and E. Suzuki. Detection of unique temporal segments by information theoretic meta-clustering. In SIGKDD, pages 59--68, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. F. Anguiulli and F. Fassetti. Detecting distance-based outliers in streams of data. In CIKM, pages 811--820, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. V. Barnett and T. Lewis. Outliers in Statistical Data. John Wiley & Sons Inc., 1994.Google ScholarGoogle Scholar
  5. G. Barrenetxea, F. Ingelrest, G. Schaefer, and M. Vetterli. The hitchhiker's guide to successful wireless sensor network deployments. In SenSys, pages 43--56, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Basu and M. Meckesheimer. Automatic outlier detection for time series: an application to sensor data. Knowledge Information System, pages 137--154, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. V. Chandola, A. Banarjee, and V. Kumar. Outlier detection: A survey. Technical report, University of Minnesota, 2007.Google ScholarGoogle Scholar
  8. S. Chen, H. Wang, S. Zhou, and P. Yu. Stop chasing trends: Discovering high order models in evolving data. In ICDE, pages 923--932, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. CIMIS. California irrigation management information system, August.Google ScholarGoogle Scholar
  10. D. Curiac, O. Banias, F. Dragan, C. Volosencu, and O. Dranga. Malicious node detection in wireless sensor networks using an autoregression technique. In ICNS, pages 83--88, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. Elderton and N. Johnson. System of Frequency Curves. Cambridge University Press, 1969.Google ScholarGoogle ScholarCross RefCross Ref
  12. J. Fan and J. S. Fast implementations of nonparametric curve estimators. Journal of Computational and Graphical Statistics, pages 35--56, 1994.Google ScholarGoogle Scholar
  13. G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In KDD, pages 97--106, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Ishida and H. Kitagawa. Detecting current outliers: Continuous outlier detection over time-series data streams. In DEXA, pages 255--268, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. T. J., P. J. H. W., and T. R. D. Selecting the forgetting factor in subset autoregressive modelling. Time Series Analysis, 23:629--650, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  16. N. Jiang and L. Gruenwald. Research issues in data stream association rule mining. ACM SIGMOD Record, pages 14--19, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Ng and H. J. Efficient and effective clustering methods for spatial data mining. In VLDB, pages 144--155, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Papadimitriou, H. Kitagawa, P. Gibbons, and C. Faloutsos. Loci: Fast outlier detection using the local correlation. In ICDE, pages 315--326, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  19. V. Puttagunta and K. Kalpakis. Adaptive methods for activity monitoring of streaming data. In ICMLA, pages 197--203, 2002.Google ScholarGoogle Scholar
  20. S. Sadik. Outlier detection for data streams. Master's thesis, University of Oklahoma, 2010.Google ScholarGoogle Scholar
  21. S. Sadik and L. Gruenwald. Dbod-ds: Distance based outlier detection for data stream. In DEXA, pages 122--136, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Scott. Multivariate Density Estimation. A Wiley-Interscience Publication, 1992.Google ScholarGoogle Scholar
  23. K. Sequeira and M. Zaki. Admit: Anomaly-based data mining for intrusions. In SIGKDD, pages 386--395, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Solberg and A. Lahti. Detection of outliers in reference distributions: Performance of horn's algorithm. General Clinical Chemistry, pages 2326--2332, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  25. M. Stonebraker, Çetintemel U., and S. Zdonik. The 8 requirements of real-time stream processing. ACM SIGMOD Record, pages 42--47, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Subramaniam, T. Palpanas, D. Papadoppoulos, V. Kalogeraki, and D. Gunopulos. Online outlier detection in sensor data using non-parametric models. In VLDB, pages 187--199, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Online outlier detection for data streams

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        IDEAS '11: Proceedings of the 15th Symposium on International Database Engineering & Applications
        September 2011
        274 pages
        ISBN:9781450306270
        DOI:10.1145/2076623

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 September 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate74of210submissions,35%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader