Skip to main content
Top
Published in: Journal of Intelligent Information Systems 1/2014

01-02-2014

Mining top-k frequent patterns over data streams sliding window

Author: Hui Chen

Published in: Journal of Intelligent Information Systems | Issue 1/2014

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Frequent pattern mining in data streams is an important research topic in the data mining community. In previous studies, a minimum support threshold was assumed to be available for mining frequent patterns. However, setting such a threshold is typically difficult. Hence, it is more reasonable to ask users to set a bound on the result size. The present study considers mining top-k frequent patterns from data streams using a sliding window technique. A single-pass algorithm, called MSWTP, is developed for the generation of top-k frequent patterns without a threshold. In the method, the content of the transactions in the sliding window is incrementally maintained in a summary data structure, named SWTP-tree, by scanning the stream only once. To make the mining operation efficient, insignificant patterns are distinguished from others by applying the Chernoff bound. Two kinds of obsolete pattern and one kind of insignificant pattern are periodically pruned from the pattern tree. Whenever necessary, the k most frequent patterns can be selected from SWTP-tree in order of their descending frequency. The performance of the proposed technique is evaluated via simulation experiments. The results show that the proposed method is both efficient and scalable, and that it outperforms comparable algorithms.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Agrawal, C.C. (2009). On high dimensional projected clustering of uncertain data streams. In Proceedings of the IEEE 25th International Conference on Data Engineering, ICDE’09 (pp. 1152–1154). IEEE Press. Agrawal, C.C. (2009). On high dimensional projected clustering of uncertain data streams. In Proceedings of the IEEE 25th International Conference on Data Engineering, ICDE’09 (pp. 1152–1154). IEEE Press.
go back to reference Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of the 20th international conference on Very Large Data Bases, VLDB (pp. 487–499). Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of the 20th international conference on Very Large Data Bases, VLDB (pp. 487–499).
go back to reference Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J. (2002). Models and issues in data stream systems. In Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (pp. 1–16). ACM Press. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J. (2002). Models and issues in data stream systems. In Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (pp. 1–16). ACM Press.
go back to reference Chang, J.H. & Lee, W.S. (2006). Finding recently frequent itemsets adaptively over online transactional data streams. Information Systems, 31(8), 849–869.CrossRef Chang, J.H. & Lee, W.S. (2006). Finding recently frequent itemsets adaptively over online transactional data streams. Information Systems, 31(8), 849–869.CrossRef
go back to reference Chen, H., Shu, L.C., Xia, J., Deng, Q. (2012a). Mining frequent patterns in a varying-size sliding window of online transactional data streams. Information Sciences, 215, 15–36.CrossRefMathSciNet Chen, H., Shu, L.C., Xia, J., Deng, Q. (2012a). Mining frequent patterns in a varying-size sliding window of online transactional data streams. Information Sciences, 215, 15–36.CrossRefMathSciNet
go back to reference Chen, L., Zou, L.-J., Tu, L. (2012b). A clustering algorithm for multiple data streams based on spectral component similarity. Information Sciences, 183(1), 35–47.CrossRef Chen, L., Zou, L.-J., Tu, L. (2012b). A clustering algorithm for multiple data streams based on spectral component similarity. Information Sciences, 183(1), 35–47.CrossRef
go back to reference Cheung, Y.-L. & Fu, A.W.-C. (2004). Mining frequent itemsets without support threshold: with and without item constraints. IEEE Transactions on Knowledge and Data Engineering, 16(9), 1052–1069.CrossRef Cheung, Y.-L. & Fu, A.W.-C. (2004). Mining frequent itemsets without support threshold: with and without item constraints. IEEE Transactions on Knowledge and Data Engineering, 16(9), 1052–1069.CrossRef
go back to reference Fu, A.W.-C. & Wong, R.C.-W. (2006). Mining top-k frequent itemsets from data streams. Data Mining and Knowledge Discovery, 13(2), 193–217.CrossRefMathSciNet Fu, A.W.-C. & Wong, R.C.-W. (2006). Mining top-k frequent itemsets from data streams. Data Mining and Knowledge Discovery, 13(2), 193–217.CrossRefMathSciNet
go back to reference Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S. (2003). Mining frequent patterns in data streams at multiple time granularities. Next generations on data mining (pp. 191–212). Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S. (2003). Mining frequent patterns in data streams at multiple time granularities. Next generations on data mining (pp. 191–212).
go back to reference Han, J., Pei, J., Yin, Y. (2000). Mining frequent patterns without candidate generation. In Proceedings of the 2000 ACM SIGMOD international conference of management of data (pp. 1–12). ACM Press. Han, J., Pei, J., Yin, Y. (2000). Mining frequent patterns without candidate generation. In Proceedings of the 2000 ACM SIGMOD international conference of management of data (pp. 1–12). ACM Press.
go back to reference Han, J., Wang, J., Lu, Y., Tzvetkov, P. (2002). Mining top-k frequent closed patterns without minimum support. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02) (pp. 211–218). IEEE Press. Han, J., Wang, J., Lu, Y., Tzvetkov, P. (2002). Mining top-k frequent closed patterns without minimum support. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02) (pp. 211–218). IEEE Press.
go back to reference Hashemi, S., Yang, Y., Mirzamomen, Z., Kangavari, M. (2009). Adapted one-versus-all decision trees for data stream classification. IEEE Transaction on Knowledge and Data Engineering, 21(5), 624–637.CrossRef Hashemi, S., Yang, Y., Mirzamomen, Z., Kangavari, M. (2009). Adapted one-versus-all decision trees for data stream classification. IEEE Transaction on Knowledge and Data Engineering, 21(5), 624–637.CrossRef
go back to reference Homem, N. & Carvalho, J.P. (2010). Finding top-k elements in data streams. Information Sciences, 180(24), 4958–4974.CrossRef Homem, N. & Carvalho, J.P. (2010). Finding top-k elements in data streams. Information Sciences, 180(24), 4958–4974.CrossRef
go back to reference Jea, K.-F. & Li, C.-W. (2010). A sliding-window based adaptive approximating method to discover recent frequent itemsets from data streams. Lecture Notes in Engineering and Computer Science, 2180(1), 532–539. Jea, K.-F. & Li, C.-W. (2010). A sliding-window based adaptive approximating method to discover recent frequent itemsets from data streams. Lecture Notes in Engineering and Computer Science, 2180(1), 532–539.
go back to reference Karp, R.M., Shenker, S., Papadimitriou, C.H. (2003). A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems (TODS), 28(1), 51–55.CrossRef Karp, R.M., Shenker, S., Papadimitriou, C.H. (2003). A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems (TODS), 28(1), 51–55.CrossRef
go back to reference Kranen, P., Assent, I., Balduaf, C., Seidl, T. (2011). The clustree: indexing micro-clusters for anytime stream mining. Knowledge and Information Systems, 29(2), 249–272.CrossRef Kranen, P., Assent, I., Balduaf, C., Seidl, T. (2011). The clustree: indexing micro-clusters for anytime stream mining. Knowledge and Information Systems, 29(2), 249–272.CrossRef
go back to reference Lam, H.T. & Calders, T. (2010). Mining top-k frequent items in a data stream with flexible sliding windows. In Proceedings of 16th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 283–292). ACM Press. Lam, H.T. & Calders, T. (2010). Mining top-k frequent items in a data stream with flexible sliding windows. In Proceedings of 16th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 283–292). ACM Press.
go back to reference Law, Y.-N., Wang, H., Zaniolo, C. (2011). Rational languages and data models for continuous queries on sequence and data streams. ACM Transactions on Database Systems, 36(2), 8:1–8:32.CrossRef Law, Y.-N., Wang, H., Zaniolo, C. (2011). Rational languages and data models for continuous queries on sequence and data streams. ACM Transactions on Database Systems, 36(2), 8:1–8:32.CrossRef
go back to reference Leung, C.K.-S. & Khan, Q.I. (2006). Dstree: a tree structure for the mining of frequent sets from data streams. In Proceedings of the 6th International Conference on Data Mining (ICDM’06) (pp. 928–932). IEEE Press. Leung, C.K.-S. & Khan, Q.I. (2006). Dstree: a tree structure for the mining of frequent sets from data streams. In Proceedings of the 6th International Conference on Data Mining (ICDM’06) (pp. 928–932). IEEE Press.
go back to reference Li, H.-F. (2009). A sliding window method for finding top-k path traversal patterns over streaming web click-sequences. Expert Systems with Applications, 36, 4382–4386.CrossRef Li, H.-F. (2009). A sliding window method for finding top-k path traversal patterns over streaming web click-sequences. Expert Systems with Applications, 36, 4382–4386.CrossRef
go back to reference Li, H.-F., Shan, M.-K., Lee, S.-Y. (2008). Dsm-fi: an efficient algorithm for mining frequent itemsets in data streams. Knowledge and Information Systems, 17(1), 79–97.CrossRef Li, H.-F., Shan, M.-K., Lee, S.-Y. (2008). Dsm-fi: an efficient algorithm for mining frequent itemsets in data streams. Knowledge and Information Systems, 17(1), 79–97.CrossRef
go back to reference Manku, G.S. & Motwani, R. (2002). Approximate frequency counts over data streams. In Proceedings of the 28th VLDB conference (pp. 346–357). VLDB. Manku, G.S. & Motwani, R. (2002). Approximate frequency counts over data streams. In Proceedings of the 28th VLDB conference (pp. 346–357). VLDB.
go back to reference Metwally, A., Agrawal, D., Abbadi, A.E. (2005). Efficient computation of frequent and top-k elements in data streams. In Proceedings of the 10th international conference on database theory (pp. 398–412). Springer Press. Metwally, A., Agrawal, D., Abbadi, A.E. (2005). Efficient computation of frequent and top-k elements in data streams. In Proceedings of the 10th international conference on database theory (pp. 398–412). Springer Press.
go back to reference Mozafari, B. & Hetal Thakkar, C.Z. (2008). Verifying and mining frequent patterns from large windows over data streams. In Proceedings of the IEEE 24th International Conference on Data Engineering, ICDE’08, (pp. 179–188). IEEE Press. Mozafari, B. & Hetal Thakkar, C.Z. (2008). Verifying and mining frequent patterns from large windows over data streams. In Proceedings of the IEEE 24th International Conference on Data Engineering, ICDE’08, (pp. 179–188). IEEE Press.
go back to reference Mozafari, B., Thakkar, H., Zaniolo, C. (2008). Verifying and mining frequent patterns from large windows over data streams. In Proceedings of IEEE 24th international conference of data engineering (pp. 179–188). IEEE Press. Mozafari, B., Thakkar, H., Zaniolo, C. (2008). Verifying and mining frequent patterns from large windows over data streams. In Proceedings of IEEE 24th international conference of data engineering (pp. 179–188). IEEE Press.
go back to reference Park, N.H., Oh, S.H., Lee, W.S. (2010). Anomaly intrusion detection by clustering transactional audit streams in a host computer. Information Sciences, 180(12), 2375–2389.CrossRef Park, N.H., Oh, S.H., Lee, W.S. (2010). Anomaly intrusion detection by clustering transactional audit streams in a host computer. Information Sciences, 180(12), 2375–2389.CrossRef
go back to reference Tanbeer, S.K., Ahmed, C.F., Jeong, B.-S., Lee, Y.-K. (2009). Sliding window-based frequent pattern mining over data streams. Information Sciences, 179(22), 3843–3865.CrossRefMathSciNet Tanbeer, S.K., Ahmed, C.F., Jeong, B.-S., Lee, Y.-K. (2009). Sliding window-based frequent pattern mining over data streams. Information Sciences, 179(22), 3843–3865.CrossRefMathSciNet
go back to reference Tsai, P.S.M. (2010). Mining top-k frequent closed itemsets over data streams using the sliding window model. Expert Systems with Applications, 37, 6968–6973.CrossRef Tsai, P.S.M. (2010). Mining top-k frequent closed itemsets over data streams using the sliding window model. Expert Systems with Applications, 37, 6968–6973.CrossRef
go back to reference Wong, R.C.-W. & Fu, A.W.-C. (2005). Mining top-k itemsets over sliding window based zipfian distribution. In Proceedings of the 2005 SIAM international conference on data mining. SIAM Press. Wong, R.C.-W. & Fu, A.W.-C. (2005). Mining top-k itemsets over sliding window based zipfian distribution. In Proceedings of the 2005 SIAM international conference on data mining. SIAM Press.
go back to reference Yu, J.X., Chong, Z., Lu, H., Zhangd, Z., Zhou, A. (2006). A false negative approach to mining frequent itemsets from high speed transactional data streams. Information Sciences, 176, 1986–2015.CrossRef Yu, J.X., Chong, Z., Lu, H., Zhangd, Z., Zhou, A. (2006). A false negative approach to mining frequent itemsets from high speed transactional data streams. Information Sciences, 176, 1986–2015.CrossRef
Metadata
Title
Mining top-k frequent patterns over data streams sliding window
Author
Hui Chen
Publication date
01-02-2014
Publisher
Springer US
Published in
Journal of Intelligent Information Systems / Issue 1/2014
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-013-0265-4

Other articles of this Issue 1/2014

Journal of Intelligent Information Systems 1/2014 Go to the issue

Premium Partner