Top

Journal of Intelligent Information Systems

Published in:

01-02-2014

Mining top-k frequent patterns over data streams sliding window

Author: Hui Chen

Published in: Journal of Intelligent Information Systems | Issue 1/2014

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Frequent pattern mining in data streams is an important research topic in the data mining community. In previous studies, a minimum support threshold was assumed to be available for mining frequent patterns. However, setting such a threshold is typically difficult. Hence, it is more reasonable to ask users to set a bound on the result size. The present study considers mining top-k frequent patterns from data streams using a sliding window technique. A single-pass algorithm, called MSWTP, is developed for the generation of top-k frequent patterns without a threshold. In the method, the content of the transactions in the sliding window is incrementally maintained in a summary data structure, named SWTP-tree, by scanning the stream only once. To make the mining operation efficient, insignificant patterns are distinguished from others by applying the Chernoff bound. Two kinds of obsolete pattern and one kind of insignificant pattern are periodically pruned from the pattern tree. Whenever necessary, the k most frequent patterns can be selected from SWTP-tree in order of their descending frequency. The performance of the proposed technique is evaluated via simulation experiments. The results show that the proposed method is both efficient and scalable, and that it outperforms comparable algorithms.

previous article Content based image retrieval via a transductive model

next article Human perspective to anomaly detection for cybersecurity

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Agrawal, C.C. (2009). On high dimensional projected clustering of uncertain data streams. In Proceedings of the IEEE 25th International Conference on Data Engineering, ICDE’09 (pp. 1152–1154). IEEE Press.

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of the 20th international conference on Very Large Data Bases, VLDB (pp. 487–499).

Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J. (2002). Models and issues in data stream systems. In Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (pp. 1–16). ACM Press.

Chang, J.H. & Lee, W.S. (2006). Finding recently frequent itemsets adaptively over online transactional data streams. Information Systems, 31(8), 849–869.CrossRef

Chen, H., Shu, L.C., Xia, J., Deng, Q. (2012a). Mining frequent patterns in a varying-size sliding window of online transactional data streams. Information Sciences, 215, 15–36.CrossRefMathSciNet

Chen, L., Zou, L.-J., Tu, L. (2012b). A clustering algorithm for multiple data streams based on spectral component similarity. Information Sciences, 183(1), 35–47.CrossRef

Cheung, Y.-L. & Fu, A.W.-C. (2004). Mining frequent itemsets without support threshold: with and without item constraints. IEEE Transactions on Knowledge and Data Engineering, 16(9), 1052–1069.CrossRef

Fu, A.W.-C. & Wong, R.C.-W. (2006). Mining top-k frequent itemsets from data streams. Data Mining and Knowledge Discovery, 13(2), 193–217.CrossRefMathSciNet

Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S. (2003). Mining frequent patterns in data streams at multiple time granularities. Next generations on data mining (pp. 191–212).

Han, J., Pei, J., Yin, Y. (2000). Mining frequent patterns without candidate generation. In Proceedings of the 2000 ACM SIGMOD international conference of management of data (pp. 1–12). ACM Press.

Han, J., Wang, J., Lu, Y., Tzvetkov, P. (2002). Mining top-k frequent closed patterns without minimum support. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02) (pp. 211–218). IEEE Press.

Hashemi, S., Yang, Y., Mirzamomen, Z., Kangavari, M. (2009). Adapted one-versus-all decision trees for data stream classification. IEEE Transaction on Knowledge and Data Engineering, 21(5), 624–637.CrossRef

Homem, N. & Carvalho, J.P. (2010). Finding top-k elements in data streams. Information Sciences, 180(24), 4958–4974.CrossRef

Jea, K.-F. & Li, C.-W. (2010). A sliding-window based adaptive approximating method to discover recent frequent itemsets from data streams. Lecture Notes in Engineering and Computer Science, 2180(1), 532–539.

Karp, R.M., Shenker, S., Papadimitriou, C.H. (2003). A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems (TODS), 28(1), 51–55.CrossRef

Kranen, P., Assent, I., Balduaf, C., Seidl, T. (2011). The clustree: indexing micro-clusters for anytime stream mining. Knowledge and Information Systems, 29(2), 249–272.CrossRef

Lam, H.T. & Calders, T. (2010). Mining top-k frequent items in a data stream with flexible sliding windows. In Proceedings of 16th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 283–292). ACM Press.

Law, Y.-N., Wang, H., Zaniolo, C. (2011). Rational languages and data models for continuous queries on sequence and data streams. ACM Transactions on Database Systems, 36(2), 8:1–8:32.CrossRef

Leung, C.K.-S. & Khan, Q.I. (2006). Dstree: a tree structure for the mining of frequent sets from data streams. In Proceedings of the 6th International Conference on Data Mining (ICDM’06) (pp. 928–932). IEEE Press.

Li, H.-F. (2009). A sliding window method for finding top-k path traversal patterns over streaming web click-sequences. Expert Systems with Applications, 36, 4382–4386.CrossRef

Li, H.-F., Shan, M.-K., Lee, S.-Y. (2008). Dsm-fi: an efficient algorithm for mining frequent itemsets in data streams. Knowledge and Information Systems, 17(1), 79–97.CrossRef

Manku, G.S. & Motwani, R. (2002). Approximate frequency counts over data streams. In Proceedings of the 28th VLDB conference (pp. 346–357). VLDB.

Metwally, A., Agrawal, D., Abbadi, A.E. (2005). Efficient computation of frequent and top-k elements in data streams. In Proceedings of the 10th international conference on database theory (pp. 398–412). Springer Press.

Mozafari, B. & Hetal Thakkar, C.Z. (2008). Verifying and mining frequent patterns from large windows over data streams. In Proceedings of the IEEE 24th International Conference on Data Engineering, ICDE’08, (pp. 179–188). IEEE Press.

Mozafari, B., Thakkar, H., Zaniolo, C. (2008). Verifying and mining frequent patterns from large windows over data streams. In Proceedings of IEEE 24th international conference of data engineering (pp. 179–188). IEEE Press.

Park, N.H., Oh, S.H., Lee, W.S. (2010). Anomaly intrusion detection by clustering transactional audit streams in a host computer. Information Sciences, 180(12), 2375–2389.CrossRef

Tanbeer, S.K., Ahmed, C.F., Jeong, B.-S., Lee, Y.-K. (2009). Sliding window-based frequent pattern mining over data streams. Information Sciences, 179(22), 3843–3865.CrossRefMathSciNet

Tsai, P.S.M. (2010). Mining top-k frequent closed itemsets over data streams using the sliding window model. Expert Systems with Applications, 37, 6968–6973.CrossRef

Wong, R.C.-W. & Fu, A.W.-C. (2005). Mining top-k itemsets over sliding window based zipfian distribution. In Proceedings of the 2005 SIAM international conference on data mining. SIAM Press.

Yu, J.X., Chong, Z., Lu, H., Zhangd, Z., Zhou, A. (2006). A false negative approach to mining frequent itemsets from high speed transactional data streams. Information Sciences, 176, 1986–2015.CrossRef

Title: Mining top-k frequent patterns over data streams sliding window
Author: Hui Chen
Publication date: 01-02-2014
Publisher: Springer US
Published in: Journal of Intelligent Information Systems / Issue 1/2014
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
DOI: https://doi.org/10.1007/s10844-013-0265-4

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 1/2014

Human perspective to anomaly detection for cybersecurity

Hierarchical directory mapping for category-constrained meta-search

Bayesian analysis of GUHA hypotheses

Cost-sensitive three-way email spam filtering

Clustering-based diversity improvement in top-N recommendation

Dealing with trajectory streams by clustering and mathematical transforms

Premium Partner