Skip to main content
Erschienen in: Knowledge and Information Systems 3/2014

01.09.2014 | Regular Paper

An approach of support approximation to discover frequent patterns from concept-drifting data streams based on concept learning

verfasst von: Chao-Wei Li, Kuen-Fang Jea

Erschienen in: Knowledge and Information Systems | Ausgabe 3/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In an online data stream, the composition and distribution of the data may change over time, which is a phenomenon known as concept drift. The occurrence of concept drift can affect considerably the performance of a data stream mining method, especially in relation to mining accuracy. In this paper, we study the problem of mining frequent patterns from transactional data streams in the presence of concept drift, considering the important issue of mining accuracy preservation. In terms of frequent-pattern mining, we give the definitions of concept and concept drift with respect to streaming data; moreover, we present a categorization for concept drift. The concept of streaming data is considered the relationships of frequency between different patterns. Accordingly, we devise approaches to describe the concept concretely and to learn the concept through frequency relationship modeling. Based on concept learning, we propose a method of support approximation for discovering data stream frequent patterns. Our analyses and experimental results have shown that in several studied cases of concept drift, the proposed method not only performs efficiently in terms of time and memory but also preserves mining accuracy well on concept-drifting data streams.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases. Santiago de Chile, Chile, pp 487–499 Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases. Santiago de Chile, Chile, pp 487–499
2.
Zurück zum Zitat Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, Washington, DC, USA, pp 207–216 Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, Washington, DC, USA, pp 207–216
3.
Zurück zum Zitat Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. Madison, WI, USA, pp 1–16 Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. Madison, WI, USA, pp 1–16
4.
Zurück zum Zitat Branke J (1995) Evolutionary algorithms for neural network design and training. In: Proceedings of the 1st Nordic workshop on genetic algorithms and its applications. Vaasa, Finland, pp 145–163 Branke J (1995) Evolutionary algorithms for neural network design and training. In: Proceedings of the 1st Nordic workshop on genetic algorithms and its applications. Vaasa, Finland, pp 145–163
5.
Zurück zum Zitat Chang JH, Lee WS (2004) A sliding window method for finding recently frequent itemsets over online data streams. J Inf Sci Eng 20(4):753–762 Chang JH, Lee WS (2004) A sliding window method for finding recently frequent itemsets over online data streams. J Inf Sci Eng 20(4):753–762
6.
Zurück zum Zitat Chang JH, Lee WS (2006) Finding recently frequent itemsets adaptively over online transactional data streams. Inf Syst 31(8):849–869CrossRef Chang JH, Lee WS (2006) Finding recently frequent itemsets adaptively over online transactional data streams. Inf Syst 31(8):849–869CrossRef
8.
Zurück zum Zitat Cheng J, Ke Y, Ng W (2008) A survey on algorithms for mining frequent itemsets over data streams. Knowl Inf Syst 16(1):1–27MathSciNetCrossRef Cheng J, Ke Y, Ng W (2008) A survey on algorithms for mining frequent itemsets over data streams. Knowl Inf Syst 16(1):1–27MathSciNetCrossRef
9.
Zurück zum Zitat Dai W, Yang Q, Xue GR, Yu Y (2008) Self-taught clustering. In: Proceedings of the 25th international conference on machine learning. Helsinki, Finland, pp 200–207 Dai W, Yang Q, Xue GR, Yu Y (2008) Self-taught clustering. In: Proceedings of the 25th international conference on machine learning. Helsinki, Finland, pp 200–207
10.
Zurück zum Zitat Dematos G, Boyd MS, Kermanshahi B, Kohzadi N, Kaastra I (1996) Feedforward versus recurrent neural networks for forecasting monthly Japanese yen exchange rates. Asia Pac Financ Markets 3(1):59–75 Dematos G, Boyd MS, Kermanshahi B, Kohzadi N, Kaastra I (1996) Feedforward versus recurrent neural networks for forecasting monthly Japanese yen exchange rates. Asia Pac Financ Markets 3(1):59–75
11.
Zurück zum Zitat Gaber MM, Zaslavsky A, Krishnaswamy S (2010) Data stream mining. In: Data mining and knowledge discovery handbook, Springer 2010, pp 759–787 Gaber MM, Zaslavsky A, Krishnaswamy S (2010) Data stream mining. In: Data mining and knowledge discovery handbook, Springer 2010, pp 759–787
12.
Zurück zum Zitat Giannella C, Han J, Pei J, Yan X, Yu PS (2004) Mining frequent patterns in data streams at multiple time granularities. In: Data mining: next generation challenges and future directions. MIT Press, Cambridge, pp 191–212 Giannella C, Han J, Pei J, Yan X, Yu PS (2004) Mining frequent patterns in data streams at multiple time granularities. In: Data mining: next generation challenges and future directions. MIT Press, Cambridge, pp 191–212
13.
Zurück zum Zitat Hagan MT, Demuth HB, Beale MH (1996) Neural network design, 1st edn. PWS Publishing Company, Boston Hagan MT, Demuth HB, Beale MH (1996) Neural network design, 1st edn. PWS Publishing Company, Boston
14.
Zurück zum Zitat Hallas M, Dorffner G (1998) A comparative study on feedforward and recurrent neural networks in time series prediction using gradient descent learning. In: Proceedings of the 14th European meetings on cybernetics and systems research. Vienna, Austria, pp 644–647 Hallas M, Dorffner G (1998) A comparative study on feedforward and recurrent neural networks in time series prediction using gradient descent learning. In: Proceedings of the 14th European meetings on cybernetics and systems research. Vienna, Austria, pp 644–647
15.
Zurück zum Zitat Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, San Francisco Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, San Francisco
16.
Zurück zum Zitat Han J, Pei J, Yin Y (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Discov 8(1):53–87MathSciNetCrossRef Han J, Pei J, Yin Y (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Discov 8(1):53–87MathSciNetCrossRef
17.
Zurück zum Zitat Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw. 2(5):359–366CrossRef Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw. 2(5):359–366CrossRef
18.
Zurück zum Zitat Jea KF, Chang MY (2008) Discovering frequent itemsets by support approximation and itemset clustering. Data Knowl. Eng. 65(1):90–107CrossRef Jea KF, Chang MY (2008) Discovering frequent itemsets by support approximation and itemset clustering. Data Knowl. Eng. 65(1):90–107CrossRef
19.
Zurück zum Zitat Jea KF, Li CW (2009) Discovering frequent itemsets over transactional data streams through an efficient and stable approximate approach. Expert Syst. Appl. 36(10):12323–12331CrossRef Jea KF, Li CW (2009) Discovering frequent itemsets over transactional data streams through an efficient and stable approximate approach. Expert Syst. Appl. 36(10):12323–12331CrossRef
20.
Zurück zum Zitat Jiang N, Gruenwald L (2006) CFI-Stream: mining closed frequent itemsets in data streams. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. Philadelphia, PA, USA, pp 592–597 Jiang N, Gruenwald L (2006) CFI-Stream: mining closed frequent itemsets in data streams. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. Philadelphia, PA, USA, pp 592–597
21.
Zurück zum Zitat Jin R, Agrawal G (2005) An algorithm for in-core frequent itemset mining on streaming data. In: Proceedings of the 5th international conference on data mining. Houston, Texas, USA, pp 210–217 Jin R, Agrawal G (2005) An algorithm for in-core frequent itemset mining on streaming data. In: Proceedings of the 5th international conference on data mining. Houston, Texas, USA, pp 210–217
22.
Zurück zum Zitat Karp RM, Shenker S, Papadimitriou CH (2003) A simple algorithm for finding frequent elements in streams and bags. ACM Trans Database Syst 28(1):51–55CrossRef Karp RM, Shenker S, Papadimitriou CH (2003) A simple algorithm for finding frequent elements in streams and bags. ACM Trans Database Syst 28(1):51–55CrossRef
23.
Zurück zum Zitat Leung CKS, Khan QI (2006) DSTree: a tree structure for the mining of frequent sets from data streams. In: Proceedings of the 6th international conference on data mining. Hong Kong, China, pp 928–932 Leung CKS, Khan QI (2006) DSTree: a tree structure for the mining of frequent sets from data streams. In: Proceedings of the 6th international conference on data mining. Hong Kong, China, pp 928–932
24.
Zurück zum Zitat Li CW, Jea KF (2011) An adaptive approximation method to discover frequent itemsets over sliding-window-based data streams. Expert Syst Appl 38(10):13386–13404CrossRef Li CW, Jea KF (2011) An adaptive approximation method to discover frequent itemsets over sliding-window-based data streams. Expert Syst Appl 38(10):13386–13404CrossRef
25.
Zurück zum Zitat Li HF, Lee SY (2009) Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Syst Appl 36(2):1466–1477CrossRef Li HF, Lee SY (2009) Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Syst Appl 36(2):1466–1477CrossRef
26.
Zurück zum Zitat Li HF, Shan MK, Lee SY (2008) DSM-FI: an efficient algorithm for mining frequent itemsets in data streams. Knowl Inf Syst 17(1):79–97CrossRef Li HF, Shan MK, Lee SY (2008) DSM-FI: an efficient algorithm for mining frequent itemsets in data streams. Knowl Inf Syst 17(1):79–97CrossRef
27.
Zurück zum Zitat Lin CH, Chiu DY, Wu YH, Chen ALP (2005) Mining frequent itemsets from data streams with a time-sensitive sliding window. In: Proceedings of the 5th SIAM international conference on data mining. Newport Beach, California, pp 68–79 Lin CH, Chiu DY, Wu YH, Chen ALP (2005) Mining frequent itemsets from data streams with a time-sensitive sliding window. In: Proceedings of the 5th SIAM international conference on data mining. Newport Beach, California, pp 68–79
29.
Zurück zum Zitat Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of the 28th international conference on very large data bases, Hong Kong, China, pp 346–357 Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of the 28th international conference on very large data bases, Hong Kong, China, pp 346–357
30.
Zurück zum Zitat Minku LL, White AP, Yao X (2010) The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22(05):730–742CrossRef Minku LL, White AP, Yao X (2010) The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22(05):730–742CrossRef
31.
Zurück zum Zitat Narasimhamurthy A, Kuncheva LI (2007) A framework for generating data to simulate changing environments. In: Proceedings of the 25th IASTED international multi-conference on artificial intelligence and applications. Innsbruck, Austria, pp 384–389 Narasimhamurthy A, Kuncheva LI (2007) A framework for generating data to simulate changing environments. In: Proceedings of the 25th IASTED international multi-conference on artificial intelligence and applications. Innsbruck, Austria, pp 384–389
32.
Zurück zum Zitat Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans.Knowl Data Eng 22(10):1345–1359CrossRef Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans.Knowl Data Eng 22(10):1345–1359CrossRef
33.
Zurück zum Zitat Park JS, Chen MS, Yu PS (1995) An effective hash-based algorithm for mining association rules. In: Proceedings of the 1995 ACM SIGMOD international conference on management of data, San Jose, California, pp 175–186 Park JS, Chen MS, Yu PS (1995) An effective hash-based algorithm for mining association rules. In: Proceedings of the 1995 ACM SIGMOD international conference on management of data, San Jose, California, pp 175–186
34.
Zurück zum Zitat Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th international conference on machine learning. Corvallis, Oregon, USA, pp 759–766 Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th international conference on machine learning. Corvallis, Oregon, USA, pp 759–766
35.
Zurück zum Zitat Ramamurthy S, Bhatnagar R (2007) Tracking recurrent concept drift in streaming data using ensemble classifiers. In: Proceedings of the 6th international conference on machine learning and applications. Washington, DC, USA, pp 404–409 Ramamurthy S, Bhatnagar R (2007) Tracking recurrent concept drift in streaming data using ensemble classifiers. In: Proceedings of the 6th international conference on machine learning and applications. Washington, DC, USA, pp 404–409
36.
Zurück zum Zitat Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536CrossRef Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536CrossRef
37.
Zurück zum Zitat Silvestri C, Orlando S (2007) Approximate mining of frequent patterns on streams. Intell Data Anal 11(1):49–73 Silvestri C, Orlando S (2007) Approximate mining of frequent patterns on streams. Intell Data Anal 11(1):49–73
38.
Zurück zum Zitat Tanbeer SK, Ahmed CF, Jeong BS, Lee YK (2009) Sliding window-based frequent pattern mining over data streams. Inf Sci 179(22):3843–3865MathSciNetCrossRef Tanbeer SK, Ahmed CF, Jeong BS, Lee YK (2009) Sliding window-based frequent pattern mining over data streams. Inf Sci 179(22):3843–3865MathSciNetCrossRef
39.
Zurück zum Zitat Wang ET, Chen ALP (2009) A novel hash-based approach for mining frequent itemsets over data streams requiring less memory space. Data Min Knowl Discov 19(1):132–172MathSciNetCrossRef Wang ET, Chen ALP (2009) A novel hash-based approach for mining frequent itemsets over data streams requiring less memory space. Data Min Knowl Discov 19(1):132–172MathSciNetCrossRef
40.
Zurück zum Zitat Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining. Washington, DC, USA, pp 226–235 Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining. Washington, DC, USA, pp 226–235
41.
Zurück zum Zitat Wang P, Wang H, Wu X, Wang W, Shi B (2005) On reducing classifier granularity in mining concept-drifting data streams. In: Proceedings of the 5th international conference on data mining. Houston, Texas, USA, pp 474–481 Wang P, Wang H, Wu X, Wang W, Shi B (2005) On reducing classifier granularity in mining concept-drifting data streams. In: Proceedings of the 5th international conference on data mining. Houston, Texas, USA, pp 474–481
42.
Zurück zum Zitat Yu JX, Chong Z, Lu H, Zhang Z, Zhou A (2006) A false negative approach to mining frequent itemsets from high speed transactional data streams. Inf Sci 176(14):1986–2015CrossRef Yu JX, Chong Z, Lu H, Zhang Z, Zhou A (2006) A false negative approach to mining frequent itemsets from high speed transactional data streams. Inf Sci 176(14):1986–2015CrossRef
43.
Zurück zum Zitat Zadrozny B (2004) Learning and evaluating classifiers under sample selection bias. In: Proceedings of the 21st international conference on machine learning, Banff, Alberta, Canada, pp 114 Zadrozny B (2004) Learning and evaluating classifiers under sample selection bias. In: Proceedings of the 21st international conference on machine learning, Banff, Alberta, Canada, pp 114
44.
Zurück zum Zitat Zhu Y, Shasha D (2002) StatStream: statistical monitoring of thousands of data streams in real time. In: Proceedings of 28th international conference on very large data bases. Hong Kong, China, pp 358–369 Zhu Y, Shasha D (2002) StatStream: statistical monitoring of thousands of data streams in real time. In: Proceedings of 28th international conference on very large data bases. Hong Kong, China, pp 358–369
Metadaten
Titel
An approach of support approximation to discover frequent patterns from concept-drifting data streams based on concept learning
verfasst von
Chao-Wei Li
Kuen-Fang Jea
Publikationsdatum
01.09.2014
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 3/2014
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-013-0656-4

Weitere Artikel der Ausgabe 3/2014

Knowledge and Information Systems 3/2014 Zur Ausgabe

Premium Partner