Abstract
Identifying the top-k most frequent elements is one of the many problems associated with data streams analysis. It is a well-known and difficult problem, especially if the analysis is to be performed and maintained up to date in near real time. Analyzing data streams in time sliding window model is of particular interest as only the most recent, more relevant events are considered. Approximate answers are usually adequate when dealing with this problem. This paper presents a new and innovative algorithm, the Filtered Space-Saving with Sliding Window Algorithm (FSW) that addresses this problem by introducing in the Filtered Space Saving (FSS) algorithm an approximated time sliding window counter. The algorithm provides the top-k list of elements, their frequency and an error estimate for each frequency value within the sliding window. It provides strong guarantees on the results, depending on the elements real frequencies. Experimental results detail performance on real life cases.
Similar content being viewed by others
References
Aggarwal C (ed) (2007) Data streams: models and algorithms, Springer, Berlin
Babcock B, Datar M, Motwani R, O’Callaghan L (2003) Maintaining variance and k-medians over data stream windows. In: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp 234–243
Bertsekas D (1995) Dynamic programming and optimal control. vol 1. Athena Scientific, Nashua
Braverman V, Ostrovsky R (2007) Smooth histograms for sliding window, in FOCS 2007, pp 283–293
Cohen E, Strauss M (2006) Maintaining time-decaying stream aggregates. J Algorithms 59(1):19–36
Cormode G, Hadjieleftheriou M (2010) Methods for finding frequent items in data streams. VLDB J, vol 19
Cormode G, Muthukrishnan S (2003) What’s hot and what’s not: tracking most frequent items dynamically. In: Proceedings of the 22nd ACM PODS symposium on principles of database systems, pp 296–306
Datar M, Gionis A, Indyk P, Motwani R (2002) Maintaining stream statistics over sliding windows. SIAM J Comput 31(6):1794–1813
Demaine E, López-Ortiz A, Munro J (2002) Frequency estimation of internet packet streams with limited space. In: Proceedings of the 10th ESA Annual European Symposium on Algorithms, pp 348–360
Dimitropoulos X, Hurley P, Kind A (2008) Probabilistic Lossy Counting: an efficient algorithm for finding heavy hitters, ACM SIGCOMM Comput Commun Rev, vol 38, no. 1
Estan C, Varghese G (2003) New directions in traffic measurement and accounting: focusing on the elephants, ignoring the mice. ACM Trans Comput Syst 21(3):270–313
Ganguly S (2003) Counting distinct items over update streams. Theoret Comput Sci 378(3):211–222
Gibbons P, Tirthapura S (2002) Distributed streams algorithms for sliding windows. In: Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, pp 63–72
Gilbert A, Guha S, Indyk P, Kotidis Y, Muthukrishnan S, Strauss M (2002) Fast, small-space algorithms for approximate histogram maintenance In: Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pp 389–398
Golab L, DeHaan D, Demaine E, Lopez-Ortiz A, Munro J, (2003), Identifying frequent items in sliding windows over on-line packet streams. In: Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement table of contents, pp 173–178
Homem N, Carvalho J (2010) Estimating Top-k destinations in data streams, computational intelligence for knowledge-based systems design. Springer, Berlin/Heidelberg, pp 290–299
Homem N, Carvalho J (2010) Finding top-k elements in data streams. Inform Sci 180(24):4958–4974
Hu T, Sung S, Xiong H, Fu Q (2008) Discovery of maximum length frequent itemsets. Inf Sci 178:69–87
Hua-Fu Li, Suh-Yin Lee (2009) Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Syst Appl 36(2):1466–1477
Lee L, Ting H (2006) A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In: Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART symposium on principles of satabase systems, pp 290–297
Manerikar N, Palpanas T (2009) Frequent items in streaming data: an experimental evaluation of the state-of-the-art. Data Knowl Eng 68(4):415–430
Manku G, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of the 28th ACM VLDB international conference on very large data bases, pp 346–357
Matias Y, Vitter J, Wang M (1998) Wavelet-based histograms for selectivity estimation. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, pp 448–459
Matias Y, Vitter J, Wang M (2000) Dynamic maintenance of wavelet-based histograms. VLDB, pp 101–110
Metwally A, Agrawal D, Abbadi A (2005) Efficient computation of frequent and Top-k elements in data streams. Technical Report 2005-23. University of California, Santa Barbara
Muthukrishnan S (2005) Data streams: algorithms and applications, foundations and trends. Theoret Comput Sci 1(2):117–236
Qiao L, Agrawal D, Abbadi A (2003) Supporting sliding window queries for continuous data streams. In: Proceedings of the 15th international conference on scientific and statistical database management, pp 85–94
Tanbeer S, Ahmed C, Jeong B, Lee Y (2009a) Efficient single-pass frequent pattern mining using a prefix-tree. Inf Sci 179(5):559–583
Tanbeer S, Ahmed C, Jeong B, Lee Y (2009b) Sliding window-based frequent pattern mining over data streams. Inf Sci 179(22):3843–3865
Tantono F, Manerikar N, Palpanas T (2008) Efficiently discovering recent frequent items in data streams In: Scientific and statistical database management, Lecture Notes in Computer Science, pp 222–239
Zhu Y, Shasha D (2002) Statistical monitoring of thousands of data streams in real time. In: Proceedings of the 28th international conference on very large data bases, pp 358–369
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Homem, N., Carvalho, J.P. Finding top-k elements in a time-sliding window. Evolving Systems 2, 51–70 (2011). https://doi.org/10.1007/s12530-010-9020-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12530-010-9020-z