nach oben

World Wide Web

Erschienen in:

24.01.2017

Novel structures for counting frequent items in time decayed streams

verfasst von: Shanshan Wu, Huaizhong Lin, Leong Hou U, Yunjun Gao, Dongming Lu

Erschienen in: World Wide Web | Ausgabe 5/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Identifying frequently occurring items is a fundamental building block in many data stream applications. A great deal of work for efficiently identifying frequent items has been studied on the landmark and sliding window models. In this work, we revisit this problem on a new streaming model based on the time decay, where the importance of every arrival item is decreased over the time. To address the importance changes over time, we propose an innovative heap structure, named Quasi-heap, which maintains the item order using a lazy update mechanism. Two approximation algorithm, Space Saving with Quasi-heap (SSQ) and Filtered Space Saving with Quasi-heap (FSSQ), are proposed to find the frequently occurring items based on the Quasi-heap structure. To achieve better accuracy of frequency estimation for all the items in the stream, we introduce a new count-min-min (CMM) sketch structure, which can estimate the count of an item with almost error free. Extensive experiments conducted on both real-world and synthetic data demonstrate the superiority of proposed methods in terms of both efficiency (i.e., response time) and effectiveness (i.e., accuracy).

Vorheriger Artikel Distributed stream join under workload variance

Nächster Artikel Discovery of probabilistic nearest neighbors in traffic-aware spatial networks

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Frequent Itemset Mining Dataset Repository, available at http://fimi.cs.helsinki.fi/data/ (last accessed on 17 November, 2016)

Aouad, L. M., Le-Khac, N. A., Kechadi, T. M.: Performance study of distributed apriori-like frequent itemsets mining. Knowl. Inf. Syst. 23(1), 55–72 (2010)CrossRef

Boley, M., Grosskreutz, H.: Approximating the number of frequent sets in dense data. Knowl. Inf. Syst. 21(1), 65–89 (2009)CrossRef

Brijs, T., Swinnen, G., Vanhoof, K., Wets, G.: Using association rules for product assortment decisions: a case study. In: SIGKDD, pp. 254–260. ACM (1999)

Chakrabarti, A., Cormode, G., McGregor, A.: A near-optimal algorithm for computing the entropy of a stream. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 328–335. Society for Industrial and Applied Mathematics (2007)

Chang, J. H., Lee, W. S.: Finding recent frequent itemsets adaptively over online data streams. In: SIGKDD, pp. 487–492. ACM (2003)

Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Automata, Languages and Programming, pp. 693–703. Springer (2002)

Chen, L., Mei, Q.: Mining frequent items in data stream using time fading model. Inform. Sci. 257, 54–69 (2014)MathSciNetCrossRefMATH

Chen, L., Zhang, S., Tu, L.: An algorithm for mining frequent items on data stream using fading factor. In: COMPSAC, vol. 2, pp. 172–177. IEEE (2009)

Chen, L., Zou, L. J., Tu, L.: A clustering algorithm for multiple data streams based on spectral component similarity. Inform. Sci. 183(1), 35–47 (2012)CrossRef

10.

Cormode, G., Hadjieleftheriou, M.: Finding the frequent items in streams of data. Commun. ACM 52(10), 97–105 (2009)CrossRef

11.

Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms 55(1), 58–75 (2005)MathSciNetCrossRefMATH

12.

Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans. Database Syst. 30(1), 249–278 (2005)CrossRef

13.

Cormode, G., Shkapenyuk, V., Srivastava, D., Xu, B.: Forward decay: a practical time decay model for streaming systems. In: ICDE, pp. 138–149. IEEE (2009)

14.

Floyd, R. W.: Algorithm 245: Treesort. Commun. ACM 7(12), 701 (1964)CrossRef

15.

Golab, L., DeHaan, D., Demaine, E. D., Lopez-Ortiz, A., Munro, J. I.: Identifying frequent items in sliding windows over on-line packet streams. In: SIGCOMM, pp. 173–178. ACM (2003)

16.

Homem, N., Carvalho, J. P.: Finding top-k elements in data streams. Inform. Sci. 180(24), 4958–4974 (2010)CrossRef

17.

Jin, C., Qian, W., Sha, C., Yu, J. X., Zhou, A.: Dynamically maintaining frequent items over a data stream. In: CIKM, pp. 287–294. ACM (2003)

18.

Karp, R. M., Shenker, S., Papadimitriou, C. H.: A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28(1), 51–55 (2003)CrossRef

19.

Li, H. F., Huang, H. Y., Lee, S. Y.: Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits. Knowl. Inf. Syst. 28(3), 495–522 (2011)CrossRef

20.

Lim, Y., Choi, J., Kang, U.: Fast, accurate, and space-efficient tracking of time-weighted frequent items from data streams. In: CIKM, pp. 1109–1118. ACM (2014)

21.

Lin, Z., Jiang, B., Pei, J., Jiang, D.: Mining discriminative items in multiple data streams. World Wide Web Journal 13(4), 497–522 (2010)CrossRef

22.

Manerikar, N., Palpanas, T.: Frequent items in streaming data: an experimental evaluation of the state-of-the-art. Data Knowl. Eng. 68(4), 415–430 (2009)CrossRef

23.

Manku, G. S., Motwani, R.: Approximate Frequency Counts over Data Streams. In: VLDB, pp. 346–357. VLDB Endowment (2002)

24.

Mei, Q. L., Chen, L.: An algorithm for mining frequent stream data items using hash function and fading factor. In: Applied Mechanics and Materials, vol. 130, pp. 2661–2665. Trans Tech Publ (2012)

25.

Metwally, A., Agrawal, D., Abbadi, A. E.: An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans. Database Syst. 31(3), 1095–1133 (2006)CrossRef

26.

Shaker, A., Senge, R., Hüllermeier, E.: Evolving fuzzy pattern trees for binary classification on data streams. Inform. Sci. 220, 34–45 (2013)CrossRef

27.

Tantono, F. I., Manerikar, N., Palpanas, T.: Efficiently discovering recent frequent items in data streams. In: Scientific and Statistical Database Management, pp. 222–239. Springer (2008)

28.

Tong, Y., Zhang, X., Chen, L.: Tracking frequent items over distributed probabilistic data. World Wide Web Journal, 1–26 (2015)

29.

Wei, Z., Liu, X., Li, F., Shang, S., Du, X., Wen, J.: Matrix sketching over sliding windows. In: SIGMOD, pp. 1465–1480 (2016)

30.

Woo, H. J., Lee, W. S.: Estmax: Tracing maximal frequent item sets instantly over online transactional data streams. IEEE Trans. Knowl. Data Eng. 21(10), 1418–1431 (2009)CrossRef

31.

Wu, S., Lin, H., U, L.H., Gao, Y., Lu, D.: Finding frequent items in time decayed data streams. In: Apweb, pp. 17–29 (2016)

32.

Zhang, S., Chen, L., Tu, L.: Frequent items mining on data stream based on time fading factor. In: AICI, vol. 4, pp. 336–340. IEEE (2009)

33.

Zhang, S., Chen, L., Tu, L.: Frequent items mining on data stream using hash-table and heap. In: ICIS, vol. 1, pp. 141–145. IEEE (2009)

Titel: Novel structures for counting frequent items in time decayed streams
verfasst von: Shanshan Wu
Huaizhong Lin
Leong Hou U
Yunjun Gao
Dongming Lu
Publikationsdatum: 24.01.2017
Verlag: Springer US
Erschienen in: World Wide Web / Ausgabe 5/2017
Print ISSN: 1386-145X
Elektronische ISSN: 1573-1413
DOI: https://doi.org/10.1007/s11280-017-0433-5

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 5/2017

Dual graph regularized NMF model for social event detection from Flickr data

Efficient computation of distance labeling for decremental updates in large dynamic graphs

Latent semantic diagnosis in traditional chinese medicine

Reverse direction-based surrounder queries for mobile recommendations

A semantic based Web page classification strategy using multi-layered domain ontology

Tide-tree: A self-tuning indexing scheme for hybrid storage system