Skip to main content
Erschienen in: World Wide Web 5/2017

24.01.2017

Novel structures for counting frequent items in time decayed streams

verfasst von: Shanshan Wu, Huaizhong Lin, Leong Hou U, Yunjun Gao, Dongming Lu

Erschienen in: World Wide Web | Ausgabe 5/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Identifying frequently occurring items is a fundamental building block in many data stream applications. A great deal of work for efficiently identifying frequent items has been studied on the landmark and sliding window models. In this work, we revisit this problem on a new streaming model based on the time decay, where the importance of every arrival item is decreased over the time. To address the importance changes over time, we propose an innovative heap structure, named Quasi-heap, which maintains the item order using a lazy update mechanism. Two approximation algorithm, Space Saving with Quasi-heap (SSQ) and Filtered Space Saving with Quasi-heap (FSSQ), are proposed to find the frequently occurring items based on the Quasi-heap structure. To achieve better accuracy of frequency estimation for all the items in the stream, we introduce a new count-min-min (CMM) sketch structure, which can estimate the count of an item with almost error free. Extensive experiments conducted on both real-world and synthetic data demonstrate the superiority of proposed methods in terms of both efficiency (i.e., response time) and effectiveness (i.e., accuracy).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Frequent Itemset Mining Dataset Repository, available at http://​fimi.​cs.​helsinki.​fi/​data/​ (last accessed on 17 November, 2016)
 
Literatur
1.
Zurück zum Zitat Aouad, L. M., Le-Khac, N. A., Kechadi, T. M.: Performance study of distributed apriori-like frequent itemsets mining. Knowl. Inf. Syst. 23(1), 55–72 (2010)CrossRef Aouad, L. M., Le-Khac, N. A., Kechadi, T. M.: Performance study of distributed apriori-like frequent itemsets mining. Knowl. Inf. Syst. 23(1), 55–72 (2010)CrossRef
2.
Zurück zum Zitat Boley, M., Grosskreutz, H.: Approximating the number of frequent sets in dense data. Knowl. Inf. Syst. 21(1), 65–89 (2009)CrossRef Boley, M., Grosskreutz, H.: Approximating the number of frequent sets in dense data. Knowl. Inf. Syst. 21(1), 65–89 (2009)CrossRef
3.
Zurück zum Zitat Brijs, T., Swinnen, G., Vanhoof, K., Wets, G.: Using association rules for product assortment decisions: a case study. In: SIGKDD, pp. 254–260. ACM (1999) Brijs, T., Swinnen, G., Vanhoof, K., Wets, G.: Using association rules for product assortment decisions: a case study. In: SIGKDD, pp. 254–260. ACM (1999)
4.
Zurück zum Zitat Chakrabarti, A., Cormode, G., McGregor, A.: A near-optimal algorithm for computing the entropy of a stream. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 328–335. Society for Industrial and Applied Mathematics (2007) Chakrabarti, A., Cormode, G., McGregor, A.: A near-optimal algorithm for computing the entropy of a stream. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 328–335. Society for Industrial and Applied Mathematics (2007)
5.
Zurück zum Zitat Chang, J. H., Lee, W. S.: Finding recent frequent itemsets adaptively over online data streams. In: SIGKDD, pp. 487–492. ACM (2003) Chang, J. H., Lee, W. S.: Finding recent frequent itemsets adaptively over online data streams. In: SIGKDD, pp. 487–492. ACM (2003)
6.
Zurück zum Zitat Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Automata, Languages and Programming, pp. 693–703. Springer (2002) Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Automata, Languages and Programming, pp. 693–703. Springer (2002)
8.
Zurück zum Zitat Chen, L., Zhang, S., Tu, L.: An algorithm for mining frequent items on data stream using fading factor. In: COMPSAC, vol. 2, pp. 172–177. IEEE (2009) Chen, L., Zhang, S., Tu, L.: An algorithm for mining frequent items on data stream using fading factor. In: COMPSAC, vol. 2, pp. 172–177. IEEE (2009)
9.
Zurück zum Zitat Chen, L., Zou, L. J., Tu, L.: A clustering algorithm for multiple data streams based on spectral component similarity. Inform. Sci. 183(1), 35–47 (2012)CrossRef Chen, L., Zou, L. J., Tu, L.: A clustering algorithm for multiple data streams based on spectral component similarity. Inform. Sci. 183(1), 35–47 (2012)CrossRef
10.
Zurück zum Zitat Cormode, G., Hadjieleftheriou, M.: Finding the frequent items in streams of data. Commun. ACM 52(10), 97–105 (2009)CrossRef Cormode, G., Hadjieleftheriou, M.: Finding the frequent items in streams of data. Commun. ACM 52(10), 97–105 (2009)CrossRef
11.
Zurück zum Zitat Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms 55(1), 58–75 (2005)MathSciNetCrossRefMATH Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms 55(1), 58–75 (2005)MathSciNetCrossRefMATH
12.
Zurück zum Zitat Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans. Database Syst. 30(1), 249–278 (2005)CrossRef Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans. Database Syst. 30(1), 249–278 (2005)CrossRef
13.
Zurück zum Zitat Cormode, G., Shkapenyuk, V., Srivastava, D., Xu, B.: Forward decay: a practical time decay model for streaming systems. In: ICDE, pp. 138–149. IEEE (2009) Cormode, G., Shkapenyuk, V., Srivastava, D., Xu, B.: Forward decay: a practical time decay model for streaming systems. In: ICDE, pp. 138–149. IEEE (2009)
14.
15.
Zurück zum Zitat Golab, L., DeHaan, D., Demaine, E. D., Lopez-Ortiz, A., Munro, J. I.: Identifying frequent items in sliding windows over on-line packet streams. In: SIGCOMM, pp. 173–178. ACM (2003) Golab, L., DeHaan, D., Demaine, E. D., Lopez-Ortiz, A., Munro, J. I.: Identifying frequent items in sliding windows over on-line packet streams. In: SIGCOMM, pp. 173–178. ACM (2003)
16.
Zurück zum Zitat Homem, N., Carvalho, J. P.: Finding top-k elements in data streams. Inform. Sci. 180(24), 4958–4974 (2010)CrossRef Homem, N., Carvalho, J. P.: Finding top-k elements in data streams. Inform. Sci. 180(24), 4958–4974 (2010)CrossRef
17.
Zurück zum Zitat Jin, C., Qian, W., Sha, C., Yu, J. X., Zhou, A.: Dynamically maintaining frequent items over a data stream. In: CIKM, pp. 287–294. ACM (2003) Jin, C., Qian, W., Sha, C., Yu, J. X., Zhou, A.: Dynamically maintaining frequent items over a data stream. In: CIKM, pp. 287–294. ACM (2003)
18.
Zurück zum Zitat Karp, R. M., Shenker, S., Papadimitriou, C. H.: A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28(1), 51–55 (2003)CrossRef Karp, R. M., Shenker, S., Papadimitriou, C. H.: A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28(1), 51–55 (2003)CrossRef
19.
Zurück zum Zitat Li, H. F., Huang, H. Y., Lee, S. Y.: Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits. Knowl. Inf. Syst. 28(3), 495–522 (2011)CrossRef Li, H. F., Huang, H. Y., Lee, S. Y.: Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits. Knowl. Inf. Syst. 28(3), 495–522 (2011)CrossRef
20.
Zurück zum Zitat Lim, Y., Choi, J., Kang, U.: Fast, accurate, and space-efficient tracking of time-weighted frequent items from data streams. In: CIKM, pp. 1109–1118. ACM (2014) Lim, Y., Choi, J., Kang, U.: Fast, accurate, and space-efficient tracking of time-weighted frequent items from data streams. In: CIKM, pp. 1109–1118. ACM (2014)
21.
Zurück zum Zitat Lin, Z., Jiang, B., Pei, J., Jiang, D.: Mining discriminative items in multiple data streams. World Wide Web Journal 13(4), 497–522 (2010)CrossRef Lin, Z., Jiang, B., Pei, J., Jiang, D.: Mining discriminative items in multiple data streams. World Wide Web Journal 13(4), 497–522 (2010)CrossRef
22.
Zurück zum Zitat Manerikar, N., Palpanas, T.: Frequent items in streaming data: an experimental evaluation of the state-of-the-art. Data Knowl. Eng. 68(4), 415–430 (2009)CrossRef Manerikar, N., Palpanas, T.: Frequent items in streaming data: an experimental evaluation of the state-of-the-art. Data Knowl. Eng. 68(4), 415–430 (2009)CrossRef
23.
Zurück zum Zitat Manku, G. S., Motwani, R.: Approximate Frequency Counts over Data Streams. In: VLDB, pp. 346–357. VLDB Endowment (2002) Manku, G. S., Motwani, R.: Approximate Frequency Counts over Data Streams. In: VLDB, pp. 346–357. VLDB Endowment (2002)
24.
Zurück zum Zitat Mei, Q. L., Chen, L.: An algorithm for mining frequent stream data items using hash function and fading factor. In: Applied Mechanics and Materials, vol. 130, pp. 2661–2665. Trans Tech Publ (2012) Mei, Q. L., Chen, L.: An algorithm for mining frequent stream data items using hash function and fading factor. In: Applied Mechanics and Materials, vol. 130, pp. 2661–2665. Trans Tech Publ (2012)
25.
Zurück zum Zitat Metwally, A., Agrawal, D., Abbadi, A. E.: An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans. Database Syst. 31(3), 1095–1133 (2006)CrossRef Metwally, A., Agrawal, D., Abbadi, A. E.: An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans. Database Syst. 31(3), 1095–1133 (2006)CrossRef
26.
Zurück zum Zitat Shaker, A., Senge, R., Hüllermeier, E.: Evolving fuzzy pattern trees for binary classification on data streams. Inform. Sci. 220, 34–45 (2013)CrossRef Shaker, A., Senge, R., Hüllermeier, E.: Evolving fuzzy pattern trees for binary classification on data streams. Inform. Sci. 220, 34–45 (2013)CrossRef
27.
Zurück zum Zitat Tantono, F. I., Manerikar, N., Palpanas, T.: Efficiently discovering recent frequent items in data streams. In: Scientific and Statistical Database Management, pp. 222–239. Springer (2008) Tantono, F. I., Manerikar, N., Palpanas, T.: Efficiently discovering recent frequent items in data streams. In: Scientific and Statistical Database Management, pp. 222–239. Springer (2008)
28.
Zurück zum Zitat Tong, Y., Zhang, X., Chen, L.: Tracking frequent items over distributed probabilistic data. World Wide Web Journal, 1–26 (2015) Tong, Y., Zhang, X., Chen, L.: Tracking frequent items over distributed probabilistic data. World Wide Web Journal, 1–26 (2015)
29.
Zurück zum Zitat Wei, Z., Liu, X., Li, F., Shang, S., Du, X., Wen, J.: Matrix sketching over sliding windows. In: SIGMOD, pp. 1465–1480 (2016) Wei, Z., Liu, X., Li, F., Shang, S., Du, X., Wen, J.: Matrix sketching over sliding windows. In: SIGMOD, pp. 1465–1480 (2016)
30.
Zurück zum Zitat Woo, H. J., Lee, W. S.: Estmax: Tracing maximal frequent item sets instantly over online transactional data streams. IEEE Trans. Knowl. Data Eng. 21(10), 1418–1431 (2009)CrossRef Woo, H. J., Lee, W. S.: Estmax: Tracing maximal frequent item sets instantly over online transactional data streams. IEEE Trans. Knowl. Data Eng. 21(10), 1418–1431 (2009)CrossRef
31.
Zurück zum Zitat Wu, S., Lin, H., U, L.H., Gao, Y., Lu, D.: Finding frequent items in time decayed data streams. In: Apweb, pp. 17–29 (2016) Wu, S., Lin, H., U, L.H., Gao, Y., Lu, D.: Finding frequent items in time decayed data streams. In: Apweb, pp. 17–29 (2016)
32.
Zurück zum Zitat Zhang, S., Chen, L., Tu, L.: Frequent items mining on data stream based on time fading factor. In: AICI, vol. 4, pp. 336–340. IEEE (2009) Zhang, S., Chen, L., Tu, L.: Frequent items mining on data stream based on time fading factor. In: AICI, vol. 4, pp. 336–340. IEEE (2009)
33.
Zurück zum Zitat Zhang, S., Chen, L., Tu, L.: Frequent items mining on data stream using hash-table and heap. In: ICIS, vol. 1, pp. 141–145. IEEE (2009) Zhang, S., Chen, L., Tu, L.: Frequent items mining on data stream using hash-table and heap. In: ICIS, vol. 1, pp. 141–145. IEEE (2009)
Metadaten
Titel
Novel structures for counting frequent items in time decayed streams
verfasst von
Shanshan Wu
Huaizhong Lin
Leong Hou U
Yunjun Gao
Dongming Lu
Publikationsdatum
24.01.2017
Verlag
Springer US
Erschienen in
World Wide Web / Ausgabe 5/2017
Print ISSN: 1386-145X
Elektronische ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-017-0433-5

Weitere Artikel der Ausgabe 5/2017

World Wide Web 5/2017 Zur Ausgabe