skip to main content
10.1145/1516360.1516422acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free Access

Neighbor-based pattern detection for windows over streaming data

Published:24 March 2009Publication History

ABSTRACT

The discovery of complex patterns such as clusters, outliers, and associations from huge volumes of streaming data has been recognized as critical for many domains. However, pattern detection with sliding window semantics, as required by applications ranging from stock market analysis to moving object tracking remains largely unexplored. Applying static pattern detection algorithms from scratch to every window is prohibitively expensive due to their high algorithmic complexity. This work tackles this problem by developing the first solution for incremental detection of neighbor-based patterns specific to sliding window scenarios. The specific pattern types covered in this work include density-based clusters and distance-based outliers. Incremental pattern computation in highly dynamic streaming environments is challenging, because purging a large amount of to-be-expired data from previously formed patterns may cause complex pattern changes including migration, splitting, merging and termination of these patterns. Previous incremental neighbor-based pattern detection algorithms, which were typically not designed to handle sliding windows, such as incremental DBSCAN, are not able to solve this problem efficiently in terms of both CPU and memory consumption. To overcome this, we exploit the "predictability" property of sliding windows to elegantly discount the effect of expiring objects on the remaining pattern structures. Our solution achieves minimal CPU utilization, while still keeping the memory utilization linear in the number of objects in the window. Our comprehensive experimental study, using both synthetic as well as real data from domains of stock trades and moving object monitoring, demonstrates superiority of our proposed strategies over alternate methods in both CPU and memory utilization.

References

  1. C. C. Aggarwal. On abnormality detection in spuriously populated data streams. In SDM, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  2. C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In VLDB, pages 81--92, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. F. Angiulli and F. Fassetti. Detecting distance-based outliers in streams of data. In CIKM, pages 811--820, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander. Optics: Ordering points to identify the clustering structure. In SIGMOD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Arasu, S. Babu, and J. Widom. The cql continuous query language: semantic foundations and query execution. VLDB J., 15(2):121--142, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Arasu and J. Widom. Resource sharing in continuous sliding-window aggregates. In VLDB, pages 336--347, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Babcock, M. Datar, R. Motwani, and L. O'Callaghan. Maintaining variance and k-medians over data stream windows. In PODS, pages 234--243, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Beringer and E. Hüllermeier. Online clustering of parallel data streams. Data Knowl. Eng., 58(2):180--204, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. Lof: identifying density-based local outliers. SIGMOD Rec., 29(2):93--104, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. Cao, M. Ester, W. Qian, and A. Zhou. Density-based clustering over an evolving data stream with noise. In SDM, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  11. Y. Chen and L. Tu. Density-based clustering for real-time stream data. In KDD, pages 133--142, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B.-R. Dai, J.-W. Huang, M.-Y. Yeh, and M.-S. Chen. Adaptive clustering for multiple evolving streams. IEEE Trans. Knowl. Data Eng., 18(9):1166--1180, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. N. Entzminger, C. A. Fowler, and W. J. Kenneally. Jointstars and gmti: Past, present and future. IEEE Transactions on Aerospace and Electronic Systems, 35(2):748--762, april 1999.Google ScholarGoogle ScholarCross RefCross Ref
  14. M. Ester, H.-P. Kriegel, J. Sander, M. Wimmer, and X. Xu. Incremental clustering for mining in a data warehousing environment. In A. Gupta, O. Shmueli, and J. Widom, editors, VLDB, pages 323--333, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pages 226--231, 1996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams: Theory and practice. IEEE Trans. Knowl. Data Eng., 15(3):515--528, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. In FOCS, pages 359--366, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. A. Hartigan and M. A. Wong. A k-means clustering algorithm. Applied Statistics, 28(1).Google ScholarGoogle Scholar
  19. I. INETATS. Stock trade traces. http://www.inetats.com/.Google ScholarGoogle Scholar
  20. E. M. Knorr and R. T. Ng. Algorithms for mining distance-based outliers in large datasets. In VLDB, pages 392--403, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Munkres. Topology. Prentice Hall, 2000.Google ScholarGoogle Scholar
  22. I. Ruts and P. J. Rousseeuw. Computing depth contours of bivariate point clouds. Comput. Stat. Data Anal., 23(1):153--168, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Subramaniam, T. Palpanas, D. Papadopoulos, V. Kalogeraki, and D. Gunopulos. Online outlier detection in sensor data using non-parametric models. In VLDB, pages 187--198, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Yang. Neighbor-based pattern detection for windows over streaming data. WPI Technical Report, 2008. http://users.wpi.edu/~diyang/str_patt_detect.pdf.Google ScholarGoogle Scholar
  25. T. Zhang, R. Ramakrishnan, and M. Livny. Birch: an efficient data clustering method for very large databases. SIGMOD Record, vol. 25(2), p. 103--14, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
    March 2009
    1180 pages
    ISBN:9781605584225
    DOI:10.1145/1516360

    Copyright © 2009 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 24 March 2009

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate7of10submissions,70%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader