skip to main content
10.1145/1321440.1321552acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Detecting distance-based outliers in streams of data

Published:06 November 2007Publication History

ABSTRACT

In this work a method for detecting distance-based outliers in data streams is presented. We deal with the sliding window model, where outlier queries are performed in order to detect anomalies in the current window. Two algorithms are presented. The first one exactly answers outlier queries, but has larger space requirements. The second algorithm is directly derived from the exact one, has limited memory requirements and returns an approximate answer based on accurate estimations with a statistical guarantee. Several experiments have been accomplished, confirming the effectiveness of the proposed approach and the high quality of approximate solutions.

References

  1. C. C. Aggarwal and P. S. Yu. Outlier detection for high dimensional data. In Proc. Int. Conference on Managment of Data (SIGMOD'01), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Charu C. Aggarwal. On abnormality detection in spuriously populated data streams. In SIAM Data Mining, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  3. F. Angiulli, S. Basta, and C. Pizzuti. Distance-based detection and prediction of outliers. IEEE Transaction on Knowledge and Data Engineering, 18(2):145--160, February 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. Angiulli and C. Pizzuti. Fast outlier detection in large high-dimensional data sets. In Proc. Int. Conf. on Principles of Data Mining and Knowledge Discovery (PKDD'02), pages 15--26, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. Angiulli and C. Pizzuti. Outlier mining in large high-dimensional data sets. IEEE Transaction on Knowledge and Data Engineering, 17(2):203--215, February 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Arning, C. Aggarwal, and P. Raghavan. A linear method for deviation detection in large databases. In Proc. Int. Conf. on Knowledge Discovery and Data Mining (KDD'96), pages 164--169, 1996.Google ScholarGoogle Scholar
  7. Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. Models and issues in data stream systems. In PODS, pages 1--16, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. V. Barnett and T. Lewis. Outliers in Statistical Data. John Wiley & Sons, 1994.Google ScholarGoogle Scholar
  9. S. D. Bay and M. Schwabacher. Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In Proc. Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The r*-tree: An efficient and robust access method for points and rectangles. In Proc. of the SIGMOD Conference, pages 322--331, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. M. Breunig, H. Kriegel, R. T. Ng, and J. Sander. Lof: Identifying density-based local outliers. In Proc. Int. Conf. on Managment of Data (SIGMOD'00), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Edgar Chávez, Gonzalo Navarro, Ricardo A. Baeza-Yates, and José L. Marroquín. Searching in metric spaces. ACM Comput. Surv., 33(3):273--321, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Defense Advanced Research Projects Agency DARPA. Intrusion detection evaluation. In http://www.ll.mit.edu/IST/ideval/index.html.Google ScholarGoogle Scholar
  14. E. Eskin, A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo. A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. In Applications of Data Mining in Computer Security, Kluwer, 2002.Google ScholarGoogle Scholar
  15. Lukasz Golab and M. Tamer &3214;zsu. Issues in data stream management. SIGMOD Record, 32(2):5--14, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. Jin, A. K. H. Tung, and J. Han. Mining top-n local outliers in large databases. In Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'01), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Knorr and R. Ng. Algorithms for mining distance-based outliers in large datasets. In Proc. Int. Conf. on Very Large Databases (VLDB98), pages 392--403, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. Knorr and R. Ng. Finding intensional knowledge of distance-based outliers. In Proc. Int. Conf. on Very Large Databases (VLDB99), pages 211--222, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. Knorr, R. Ng, and V. Tucakov. Distance-based outlier: algorithms and applications. VLDB Journal, 8(3-4):237--253, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Donald Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching. Addison-Wesley, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Lazarevic, L. Ertöz, V. Kumar, A. Ozgur, and J. Srivastava. A comparative study of anomaly detection schemes in network intrusion detection. In Proc. of the SIAM Int. Conf. on Data Mining, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  22. S. Papadimitriou, H. Kitagawa, P. B. Gibbons, and C. Faloutsos. Loci: Fast outlier detection using the local correlation integral. In Proc. Int. Conf. on Data Enginnering (ICDE), pages 315--326, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  23. Spiros Papadimitriou, Hiroyuki Kitagawa, Phillip B. Gibbons, and Christos Faloutsos. Loci: Fast outlier detection using the local correlation integral. In ICDE, pages 315--326, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  24. S. Ramaswamy, R. Rastogi, and K. Shim. Efficient algorithms for mining outliers from large data sets. In Proc. Int. Conf. on Managment of Data (SIGMOD'00), pages 427--438, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Subramaniam, T. Palpanas, D. Papadopoulos, V. Kalogeraki, and D. Gunopulos. Online outlier detection in sensor data using non-parametric models. In International Conference on Very Large Data Bases, Seoul, Korea, September 12--15 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. O. Watanabe. Simple sampling techniques for discovery science. TIEICE: IEICE Transactions on Communications/Electronics/Information and Systems, E83-D(1):19--26, 2000.Google ScholarGoogle Scholar
  27. Kenji Yamanishi, Jun ichi Takeuchi, Graham J. Williams, and Peter Milne. On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. In KDD, pages 320--324, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Detecting distance-based outliers in streams of data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
      November 2007
      1048 pages
      ISBN:9781595938039
      DOI:10.1145/1321440

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 November 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader