Skip to main content
Log in

Finding top-k elements in a time-sliding window

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

Identifying the top-k most frequent elements is one of the many problems associated with data streams analysis. It is a well-known and difficult problem, especially if the analysis is to be performed and maintained up to date in near real time. Analyzing data streams in time sliding window model is of particular interest as only the most recent, more relevant events are considered. Approximate answers are usually adequate when dealing with this problem. This paper presents a new and innovative algorithm, the Filtered Space-Saving with Sliding Window Algorithm (FSW) that addresses this problem by introducing in the Filtered Space Saving (FSS) algorithm an approximated time sliding window counter. The algorithm provides the top-k list of elements, their frequency and an error estimate for each frequency value within the sliding window. It provides strong guarantees on the results, depending on the elements real frequencies. Experimental results detail performance on real life cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Aggarwal C (ed) (2007) Data streams: models and algorithms, Springer, Berlin

  • Babcock B, Datar M, Motwani R, O’Callaghan L (2003) Maintaining variance and k-medians over data stream windows. In: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp 234–243

  • Bertsekas D (1995) Dynamic programming and optimal control. vol 1. Athena Scientific, Nashua

  • Braverman V, Ostrovsky R (2007) Smooth histograms for sliding window, in FOCS 2007, pp 283–293

  • Cohen E, Strauss M (2006) Maintaining time-decaying stream aggregates. J Algorithms 59(1):19–36

    Google Scholar 

  • Cormode G, Hadjieleftheriou M (2010) Methods for finding frequent items in data streams. VLDB J, vol 19

  • Cormode G, Muthukrishnan S (2003) What’s hot and what’s not: tracking most frequent items dynamically. In: Proceedings of the 22nd ACM PODS symposium on principles of database systems, pp 296–306

  • Datar M, Gionis A, Indyk P, Motwani R (2002) Maintaining stream statistics over sliding windows. SIAM J Comput 31(6):1794–1813

    Google Scholar 

  • Demaine E, López-Ortiz A, Munro J (2002) Frequency estimation of internet packet streams with limited space. In: Proceedings of the 10th ESA Annual European Symposium on Algorithms, pp 348–360

  • Dimitropoulos X, Hurley P, Kind A (2008) Probabilistic Lossy Counting: an efficient algorithm for finding heavy hitters, ACM SIGCOMM Comput Commun Rev, vol 38, no. 1

  • Estan C, Varghese G (2003) New directions in traffic measurement and accounting: focusing on the elephants, ignoring the mice. ACM Trans Comput Syst 21(3):270–313

    Article  Google Scholar 

  • Ganguly S (2003) Counting distinct items over update streams. Theoret Comput Sci 378(3):211–222

    Google Scholar 

  • Gibbons P, Tirthapura S (2002) Distributed streams algorithms for sliding windows. In: Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, pp 63–72

  • Gilbert A, Guha S, Indyk P, Kotidis Y, Muthukrishnan S, Strauss M (2002) Fast, small-space algorithms for approximate histogram maintenance In: Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pp 389–398

  • Golab L, DeHaan D, Demaine E, Lopez-Ortiz A, Munro J, (2003), Identifying frequent items in sliding windows over on-line packet streams. In: Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement table of contents, pp 173–178

  • Homem N, Carvalho J (2010) Estimating Top-k destinations in data streams, computational intelligence for knowledge-based systems design. Springer, Berlin/Heidelberg, pp 290–299

  • Homem N, Carvalho J (2010) Finding top-k elements in data streams. Inform Sci 180(24):4958–4974

    Google Scholar 

  • Hu T, Sung S, Xiong H, Fu Q (2008) Discovery of maximum length frequent itemsets. Inf Sci 178:69–87

    Article  MathSciNet  Google Scholar 

  • Hua-Fu Li, Suh-Yin Lee (2009) Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Syst Appl 36(2):1466–1477

  • Lee L, Ting H (2006) A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In: Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART symposium on principles of satabase systems, pp 290–297

  • Manerikar N, Palpanas T (2009) Frequent items in streaming data: an experimental evaluation of the state-of-the-art. Data Knowl Eng 68(4):415–430

    Article  Google Scholar 

  • Manku G, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of the 28th ACM VLDB international conference on very large data bases, pp 346–357

  • Matias Y, Vitter J, Wang M (1998) Wavelet-based histograms for selectivity estimation. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, pp 448–459

  • Matias Y, Vitter J, Wang M (2000) Dynamic maintenance of wavelet-based histograms. VLDB, pp 101–110

  • Metwally A, Agrawal D, Abbadi A (2005) Efficient computation of frequent and Top-k elements in data streams. Technical Report 2005-23. University of California, Santa Barbara

  • Muthukrishnan S (2005) Data streams: algorithms and applications, foundations and trends. Theoret Comput Sci 1(2):117–236

    Google Scholar 

  • Qiao L, Agrawal D, Abbadi A (2003) Supporting sliding window queries for continuous data streams. In: Proceedings of the 15th international conference on scientific and statistical database management, pp 85–94

  • Tanbeer S, Ahmed C, Jeong B, Lee Y (2009a) Efficient single-pass frequent pattern mining using a prefix-tree. Inf Sci 179(5):559–583

    Article  MATH  MathSciNet  Google Scholar 

  • Tanbeer S, Ahmed C, Jeong B, Lee Y (2009b) Sliding window-based frequent pattern mining over data streams. Inf Sci 179(22):3843–3865

    Article  MathSciNet  Google Scholar 

  • Tantono F, Manerikar N, Palpanas T (2008) Efficiently discovering recent frequent items in data streams In: Scientific and statistical database management, Lecture Notes in Computer Science, pp 222–239

  • Zhu Y, Shasha D (2002) Statistical monitoring of thousands of data streams in real time. In: Proceedings of the 28th international conference on very large data bases, pp 358–369

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nuno Homem.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Homem, N., Carvalho, J.P. Finding top-k elements in a time-sliding window. Evolving Systems 2, 51–70 (2011). https://doi.org/10.1007/s12530-010-9020-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-010-9020-z

Keywords

Navigation