Neighbor-based pattern detection for windows over streaming data

Authors:
Di Yang

Worcester Polytechnic Institute, Worcester, MA

Worcester Polytechnic Institute, Worcester, MA
View Profile

,
Elke A. Rundensteiner

Worcester Polytechnic Institute, Worcester, MA

Worcester Polytechnic Institute, Worcester, MA
View Profile

,
Matthew O. Ward

Worcester Polytechnic Institute, Worcester, MA

Worcester Polytechnic Institute, Worcester, MA
View Profile

EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database TechnologyMarch 2009Pages 529–540https://doi.org/10.1145/1516360.1516422

Published:24 March 2009Publication History

EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology

Pages 529–540

ABSTRACT

The discovery of complex patterns such as clusters, outliers, and associations from huge volumes of streaming data has been recognized as critical for many domains. However, pattern detection with sliding window semantics, as required by applications ranging from stock market analysis to moving object tracking remains largely unexplored. Applying static pattern detection algorithms from scratch to every window is prohibitively expensive due to their high algorithmic complexity. This work tackles this problem by developing the first solution for incremental detection of neighbor-based patterns specific to sliding window scenarios. The specific pattern types covered in this work include density-based clusters and distance-based outliers. Incremental pattern computation in highly dynamic streaming environments is challenging, because purging a large amount of to-be-expired data from previously formed patterns may cause complex pattern changes including migration, splitting, merging and termination of these patterns. Previous incremental neighbor-based pattern detection algorithms, which were typically not designed to handle sliding windows, such as incremental DBSCAN, are not able to solve this problem efficiently in terms of both CPU and memory consumption. To overcome this, we exploit the "predictability" property of sliding windows to elegantly discount the effect of expiring objects on the remaining pattern structures. Our solution achieves minimal CPU utilization, while still keeping the memory utilization linear in the number of objects in the window. Our comprehensive experimental study, using both synthetic as well as real data from domains of stock trades and moving object monitoring, demonstrates superiority of our proposed strategies over alternate methods in both CPU and memory utilization.

References

C. C. Aggarwal. On abnormality detection in spuriously populated data streams. In SDM, 2005.Google ScholarCross Ref
C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In VLDB, pages 81--92, 2003. Google ScholarDigital Library
F. Angiulli and F. Fassetti. Detecting distance-based outliers in streams of data. In CIKM, pages 811--820, 2007. Google ScholarDigital Library
M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander. Optics: Ordering points to identify the clustering structure. In SIGMOD. Google ScholarDigital Library
A. Arasu, S. Babu, and J. Widom. The cql continuous query language: semantic foundations and query execution. VLDB J., 15(2):121--142, 2006. Google ScholarDigital Library
A. Arasu and J. Widom. Resource sharing in continuous sliding-window aggregates. In VLDB, pages 336--347, 2004. Google ScholarDigital Library
B. Babcock, M. Datar, R. Motwani, and L. O'Callaghan. Maintaining variance and k-medians over data stream windows. In PODS, pages 234--243, 2003. Google ScholarDigital Library
J. Beringer and E. Hüllermeier. Online clustering of parallel data streams. Data Knowl. Eng., 58(2):180--204, 2006. Google ScholarDigital Library
M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. Lof: identifying density-based local outliers. SIGMOD Rec., 29(2):93--104, 2000. Google ScholarDigital Library
F. Cao, M. Ester, W. Qian, and A. Zhou. Density-based clustering over an evolving data stream with noise. In SDM, 2006.Google ScholarCross Ref
Y. Chen and L. Tu. Density-based clustering for real-time stream data. In KDD, pages 133--142, 2007. Google ScholarDigital Library
B.-R. Dai, J.-W. Huang, M.-Y. Yeh, and M.-S. Chen. Adaptive clustering for multiple evolving streams. IEEE Trans. Knowl. Data Eng., 18(9):1166--1180, 2006. Google ScholarDigital Library
J. N. Entzminger, C. A. Fowler, and W. J. Kenneally. Jointstars and gmti: Past, present and future. IEEE Transactions on Aerospace and Electronic Systems, 35(2):748--762, april 1999.Google ScholarCross Ref
M. Ester, H.-P. Kriegel, J. Sander, M. Wimmer, and X. Xu. Incremental clustering for mining in a data warehousing environment. In A. Gupta, O. Shmueli, and J. Widom, editors, VLDB, pages 323--333, 1998. Google ScholarDigital Library
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pages 226--231, 1996.Google ScholarDigital Library
S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams: Theory and practice. IEEE Trans. Knowl. Data Eng., 15(3):515--528, 2003. Google ScholarDigital Library
S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. In FOCS, pages 359--366, 2000. Google ScholarDigital Library
J. A. Hartigan and M. A. Wong. A k-means clustering algorithm. Applied Statistics, 28(1).Google Scholar
I. INETATS. Stock trade traces. http://www.inetats.com/.Google Scholar
E. M. Knorr and R. T. Ng. Algorithms for mining distance-based outliers in large datasets. In VLDB, pages 392--403, 1998. Google ScholarDigital Library
J. Munkres. Topology. Prentice Hall, 2000.Google Scholar
I. Ruts and P. J. Rousseeuw. Computing depth contours of bivariate point clouds. Comput. Stat. Data Anal., 23(1):153--168, 1996. Google ScholarDigital Library
S. Subramaniam, T. Palpanas, D. Papadopoulos, V. Kalogeraki, and D. Gunopulos. Online outlier detection in sensor data using non-parametric models. In VLDB, pages 187--198, 2006. Google ScholarDigital Library
D. Yang. Neighbor-based pattern detection for windows over streaming data. WPI Technical Report, 2008. http://users.wpi.edu/~diyang/str_patt_detect.pdf.Google Scholar
T. Zhang, R. Ramakrishnan, and M. Livny. Birch: an efficient data clustering method for very large databases. SIGMOD Record, vol. 25(2), p. 103--14, 1996. Google ScholarDigital Library

Recommendations

Continuous Nearest Neighbor Queries over Sliding Windows

This paper studies continuous monitoring of nearest neighbor (NN) queries over sliding window streams. According to this model, data points continuously stream in the system, and they are considered valid only while they belong to a sliding window that ...
Read More
Mining neighbor-based patterns in data streams

Discovery of complex patterns such as clusters, outliers, and associations from huge volumes of streaming data has been recognized as critical for many application domains. However, little research effort has been made toward detecting patterns within ...
Read More
EclatDS: An efficient sliding window based frequent pattern mining method for data streams

Mining frequent patterns over data streams is an interesting problem due to its wide application area. The researchers in this field have been facing two key challenges, namely reduction in runtime and memory usage. In this study, a novel method for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
March 2009
1180 pages
ISBN:9781605584225
DOI:10.1145/1516360
Editors:
Martin Kersten
CWI, The Netherlands
,
Boris Novikov
University of Saint Petersburg, Russia
,
Jens Teubner
ETH Zurich, Switzerland
,
Vladimir Polutin
HP Labs, Russia
,
Stefan Manegold
CWI, The Netherlands
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 March 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate7of10submissions,70%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 80
  Total Citations
  View Citations
- 831
  Total Downloads
- Downloads (Last 12 months)76
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Neighbor-based pattern detection for windows over streaming data

EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology

ABSTRACT

References

Cited By

Recommendations

Continuous Nearest Neighbor Queries over Sliding Windows

Mining neighbor-based patterns in data streams

EclatDS: An efficient sliding window based frequent pattern mining method for data streams