ABSTRACT
The discovery of complex patterns such as clusters, outliers, and associations from huge volumes of streaming data has been recognized as critical for many domains. However, pattern detection with sliding window semantics, as required by applications ranging from stock market analysis to moving object tracking remains largely unexplored. Applying static pattern detection algorithms from scratch to every window is prohibitively expensive due to their high algorithmic complexity. This work tackles this problem by developing the first solution for incremental detection of neighbor-based patterns specific to sliding window scenarios. The specific pattern types covered in this work include density-based clusters and distance-based outliers. Incremental pattern computation in highly dynamic streaming environments is challenging, because purging a large amount of to-be-expired data from previously formed patterns may cause complex pattern changes including migration, splitting, merging and termination of these patterns. Previous incremental neighbor-based pattern detection algorithms, which were typically not designed to handle sliding windows, such as incremental DBSCAN, are not able to solve this problem efficiently in terms of both CPU and memory consumption. To overcome this, we exploit the "predictability" property of sliding windows to elegantly discount the effect of expiring objects on the remaining pattern structures. Our solution achieves minimal CPU utilization, while still keeping the memory utilization linear in the number of objects in the window. Our comprehensive experimental study, using both synthetic as well as real data from domains of stock trades and moving object monitoring, demonstrates superiority of our proposed strategies over alternate methods in both CPU and memory utilization.
- C. C. Aggarwal. On abnormality detection in spuriously populated data streams. In SDM, 2005.Google ScholarCross Ref
- C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In VLDB, pages 81--92, 2003. Google ScholarDigital Library
- F. Angiulli and F. Fassetti. Detecting distance-based outliers in streams of data. In CIKM, pages 811--820, 2007. Google ScholarDigital Library
- M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander. Optics: Ordering points to identify the clustering structure. In SIGMOD. Google ScholarDigital Library
- A. Arasu, S. Babu, and J. Widom. The cql continuous query language: semantic foundations and query execution. VLDB J., 15(2):121--142, 2006. Google ScholarDigital Library
- A. Arasu and J. Widom. Resource sharing in continuous sliding-window aggregates. In VLDB, pages 336--347, 2004. Google ScholarDigital Library
- B. Babcock, M. Datar, R. Motwani, and L. O'Callaghan. Maintaining variance and k-medians over data stream windows. In PODS, pages 234--243, 2003. Google ScholarDigital Library
- J. Beringer and E. Hüllermeier. Online clustering of parallel data streams. Data Knowl. Eng., 58(2):180--204, 2006. Google ScholarDigital Library
- M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. Lof: identifying density-based local outliers. SIGMOD Rec., 29(2):93--104, 2000. Google ScholarDigital Library
- F. Cao, M. Ester, W. Qian, and A. Zhou. Density-based clustering over an evolving data stream with noise. In SDM, 2006.Google ScholarCross Ref
- Y. Chen and L. Tu. Density-based clustering for real-time stream data. In KDD, pages 133--142, 2007. Google ScholarDigital Library
- B.-R. Dai, J.-W. Huang, M.-Y. Yeh, and M.-S. Chen. Adaptive clustering for multiple evolving streams. IEEE Trans. Knowl. Data Eng., 18(9):1166--1180, 2006. Google ScholarDigital Library
- J. N. Entzminger, C. A. Fowler, and W. J. Kenneally. Jointstars and gmti: Past, present and future. IEEE Transactions on Aerospace and Electronic Systems, 35(2):748--762, april 1999.Google ScholarCross Ref
- M. Ester, H.-P. Kriegel, J. Sander, M. Wimmer, and X. Xu. Incremental clustering for mining in a data warehousing environment. In A. Gupta, O. Shmueli, and J. Widom, editors, VLDB, pages 323--333, 1998. Google ScholarDigital Library
- M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pages 226--231, 1996.Google ScholarDigital Library
- S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams: Theory and practice. IEEE Trans. Knowl. Data Eng., 15(3):515--528, 2003. Google ScholarDigital Library
- S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. In FOCS, pages 359--366, 2000. Google ScholarDigital Library
- J. A. Hartigan and M. A. Wong. A k-means clustering algorithm. Applied Statistics, 28(1).Google Scholar
- I. INETATS. Stock trade traces. http://www.inetats.com/.Google Scholar
- E. M. Knorr and R. T. Ng. Algorithms for mining distance-based outliers in large datasets. In VLDB, pages 392--403, 1998. Google ScholarDigital Library
- J. Munkres. Topology. Prentice Hall, 2000.Google Scholar
- I. Ruts and P. J. Rousseeuw. Computing depth contours of bivariate point clouds. Comput. Stat. Data Anal., 23(1):153--168, 1996. Google ScholarDigital Library
- S. Subramaniam, T. Palpanas, D. Papadopoulos, V. Kalogeraki, and D. Gunopulos. Online outlier detection in sensor data using non-parametric models. In VLDB, pages 187--198, 2006. Google ScholarDigital Library
- D. Yang. Neighbor-based pattern detection for windows over streaming data. WPI Technical Report, 2008. http://users.wpi.edu/~diyang/str_patt_detect.pdf.Google Scholar
- T. Zhang, R. Ramakrishnan, and M. Livny. Birch: an efficient data clustering method for very large databases. SIGMOD Record, vol. 25(2), p. 103--14, 1996. Google ScholarDigital Library
Recommendations
Continuous Nearest Neighbor Queries over Sliding Windows
This paper studies continuous monitoring of nearest neighbor (NN) queries over sliding window streams. According to this model, data points continuously stream in the system, and they are considered valid only while they belong to a sliding window that ...
Mining neighbor-based patterns in data streams
Discovery of complex patterns such as clusters, outliers, and associations from huge volumes of streaming data has been recognized as critical for many application domains. However, little research effort has been made toward detecting patterns within ...
EclatDS: An efficient sliding window based frequent pattern mining method for data streams
Mining frequent patterns over data streams is an interesting problem due to its wide application area. The researchers in this field have been facing two key challenges, namely reduction in runtime and memory usage. In this study, a novel method for ...
Comments