Skip to main content
Top
Published in: Knowledge and Information Systems 5/2021

15-02-2021 | Regular Paper

Mining discriminative itemsets in data streams using the tilted-time window model

Authors: Majid Seyfi, Richi Nayak, Yue Xu, Shlomo Geva

Published in: Knowledge and Information Systems | Issue 5/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

A discriminative itemset is a frequent itemset in the target data stream with much higher frequency than that of the same itemset in the rest of the data streams in the dataset. The discriminative itemsets describe the distinguishing features between data streams. Mining discriminative itemsets in data streams is very important, where continuously arriving transactions can be inserted in fast speed and large volume. Compared with frequent itemset mining in single data stream, there are additional challenges in the discriminative itemset mining process as the Apriori property of subset is not applicable. We propose an efficient and high accurate method for mining discriminative itemsets in data streams using a tilted-time window model. The proposed single-pass H-DISSparse algorithm is designed particularly based on several well-defined characteristics aiming to improve the approximate frequencies of the itemsets in the tilted-time window model. The data structures are dynamically adjusted in offline time intervals to reflect the discriminative itemset frequencies in different time periods in unsynchronized data streams. Empirical analysis shows the efficient time and space complexity of the proposed method in the fast-growing big data streams.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
2.
go back to reference Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases VLDB. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases VLDB.
3.
go back to reference Alhammady H, Ramamohanarao K (2005) Mining emerging patterns and classification in data streams. In: The proceedings of IEEE/WIC/ACM international conference on web intelligence, pp 272–275 Alhammady H, Ramamohanarao K (2005) Mining emerging patterns and classification in data streams. In: The proceedings of IEEE/WIC/ACM international conference on web intelligence, pp 272–275
4.
go back to reference Amagata D, Hara T (2017) Mining top-k co-occurrence patterns across multiple streams. IEEE Trans Knowl Data Eng 29(10):2249–2262CrossRef Amagata D, Hara T (2017) Mining top-k co-occurrence patterns across multiple streams. IEEE Trans Knowl Data Eng 29(10):2249–2262CrossRef
5.
go back to reference Bailey J, Loekito E (2010) Efficient incremental mining of contrast patterns in changing data. Inf Process Lett 110(3):88–92MathSciNetCrossRef Bailey J, Loekito E (2010) Efficient incremental mining of contrast patterns in changing data. Inf Process Lett 110(3):88–92MathSciNetCrossRef
6.
go back to reference Bailey J, Manoukian T, Ramamohanarao K (2002) Fast algorithms for mining emerging patterns. In: Proceedings of the 6th European conference on principles of data mining and knowledge discovery Bailey J, Manoukian T, Ramamohanarao K (2002) Fast algorithms for mining emerging patterns. In: Proceedings of the 6th European conference on principles of data mining and knowledge discovery
7.
go back to reference Chang JH, Lee WS (2003) Finding recent frequent itemsets adaptively over online data streams. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM Chang JH, Lee WS (2003) Finding recent frequent itemsets adaptively over online data streams. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM
8.
go back to reference Cheng H, Yan X, Han J et al (2008) Direct discriminative pattern mining for effective classification. In: 2008 IEEE 24th international conference on data engineering, IEEE Cheng H, Yan X, Han J et al (2008) Direct discriminative pattern mining for effective classification. In: 2008 IEEE 24th international conference on data engineering, IEEE
9.
go back to reference Chi Y, Wang H, Philip SY et al (2004) Moment: maintaining closed frequent itemsets over a stream sliding window. In: Fourth IEEE international conference on data mining ICDM '04 Chi Y, Wang H, Philip SY et al (2004) Moment: maintaining closed frequent itemsets over a stream sliding window. In: Fourth IEEE international conference on data mining ICDM '04
10.
go back to reference Chi Y, Wang H, Philip SY et al (2006) Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowl Inf Syst 10(3):265–294CrossRef Chi Y, Wang H, Philip SY et al (2006) Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowl Inf Syst 10(3):265–294CrossRef
11.
go back to reference Dong G, Bailey J (2012) Contrast data mining: concepts, algorithms, and applications. CRC Press, Boca Raton Dong G, Bailey J (2012) Contrast data mining: concepts, algorithms, and applications. CRC Press, Boca Raton
12.
go back to reference Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
13.
go back to reference Fan H, Ramamohanarao K (2002) An efficient single-scan algorithm for mining essential jumping emerging patterns for classification. In: Proceedings of the 6th Pacific-Asia conference on advances in knowledge discovery and data mining Fan H, Ramamohanarao K (2002) An efficient single-scan algorithm for mining essential jumping emerging patterns for classification. In: Proceedings of the 6th Pacific-Asia conference on advances in knowledge discovery and data mining
14.
go back to reference Fan W, Zhang K, Cheng H et al (2008) Direct mining of discriminative and essential frequent patterns via model-based search tree. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining Fan W, Zhang K, Cheng H et al (2008) Direct mining of discriminative and essential frequent patterns via model-based search tree. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
15.
go back to reference Fournier-Viger P, Lin JC-W, Gomariz A et al (2016) The SPMF open-source data mining library version 2. In: Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2016, Riva del Garda, Italy, 19–23 Sept 2016, Proceedings, part III. Springer, Cham, pp 36–40 Fournier-Viger P, Lin JC-W, Gomariz A et al (2016) The SPMF open-source data mining library version 2. In: Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2016, Riva del Garda, Italy, 19–23 Sept 2016, Proceedings, part III. Springer, Cham, pp 36–40
16.
go back to reference Giannella C, Han J, Pei J et al (2003) Mining frequent patterns in data streams at multiple time granularities. Next Gener Data Min 212:191–212 Giannella C, Han J, Pei J et al (2003) Mining frequent patterns in data streams at multiple time granularities. Next Gener Data Min 212:191–212
17.
go back to reference Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, AmsterdamMATH Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, AmsterdamMATH
18.
go back to reference Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM sigmod record. ACM, New York Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM sigmod record. ACM, New York
19.
go back to reference He Z, Gu F, Zhao C et al (2017) Conditional discriminative pattern mining. Inf Sci 375(3):1–15CrossRef He Z, Gu F, Zhao C et al (2017) Conditional discriminative pattern mining. Inf Sci 375(3):1–15CrossRef
20.
go back to reference He Z, Zhang S, Gu F et al (2019) Mining conditional discriminative sequential patterns. Inf Sci 478:524–539CrossRef He Z, Zhang S, Gu F et al (2019) Mining conditional discriminative sequential patterns. Inf Sci 478:524–539CrossRef
21.
go back to reference Leonardo P, Fabio V (2018) Efficient mining of the most significant patterns with permutation testing. In: Proceedings of the 24th ACM sigkdd international conference on knowledge discovery & data mining. London, United Kingdom. ACM, pp 2070–2079 Leonardo P, Fabio V (2018) Efficient mining of the most significant patterns with permutation testing. In: Proceedings of the 24th ACM sigkdd international conference on knowledge discovery & data mining. London, United Kingdom. ACM, pp 2070–2079
22.
go back to reference Li J, Liu G, Wong L (2007) Mining statistically important equivalence classes and delta-discriminative emerging patterns. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM Li J, Liu G, Wong L (2007) Mining statistically important equivalence classes and delta-discriminative emerging patterns. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM
23.
go back to reference Lin Z, Jiang B, Pei J et al (2010) Mining discriminative items in multiple data streams. World Wide Web 13(4):497–522CrossRef Lin Z, Jiang B, Pei J et al (2010) Mining discriminative items in multiple data streams. World Wide Web 13(4):497–522CrossRef
24.
go back to reference Manku GS (2016) Frequent itemset mining over data streams. In: Garofalakis M, Gehrke J, Rastogi R (eds) Data stream management: processing high-speed data streams. Springer, Berlin, pp 209–219CrossRef Manku GS (2016) Frequent itemset mining over data streams. In: Garofalakis M, Gehrke J, Rastogi R (eds) Data stream management: processing high-speed data streams. Springer, Berlin, pp 209–219CrossRef
25.
go back to reference Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of the 28th international conference on very large data bases, VLDB endowment Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of the 28th international conference on very large data bases, VLDB endowment
26.
go back to reference Quinlan JR (2014) C4.5: programs for machine learning. Elsevier, Amsterdam Quinlan JR (2014) C4.5: programs for machine learning. Elsevier, Amsterdam
27.
go back to reference Seyfi M (2011) Mining discriminative items in multiple data streams with hierarchical counters approach. In: Fourth international workshop on advanced computational intelligence (IWACI), 2011, IEEE Seyfi M (2011) Mining discriminative items in multiple data streams with hierarchical counters approach. In: Fourth international workshop on advanced computational intelligence (IWACI), 2011, IEEE
28.
go back to reference Seyfi M (2018) Mining discriminative itemsets in data streams using different window models. Queensland University of Technology, BrisbaneCrossRef Seyfi M (2018) Mining discriminative itemsets in data streams using different window models. Queensland University of Technology, BrisbaneCrossRef
29.
go back to reference Seyfi M, Geva S, Nayak R (2014) Mining discriminative itemsets in data streams. In: International conference on web information systems engineering. Springer Seyfi M, Geva S, Nayak R (2014) Mining discriminative itemsets in data streams. In: International conference on web information systems engineering. Springer
30.
go back to reference Seyfi M, Nayak R, Xu Y et al (2017) Efficient mining of discriminative itemsets. In: Proceedings of the international conference on web intelligence, Leipzig, Germany. ACM, pp 451–459 Seyfi M, Nayak R, Xu Y et al (2017) Efficient mining of discriminative itemsets. In: Proceedings of the international conference on web intelligence, Leipzig, Germany. ACM, pp 451–459
31.
go back to reference Tanbeer SK, Ahmed CF, Jeong B-S et al (2009) Sliding window-based frequent pattern mining over data streams. Inf Sci 179(22):3843–3865MathSciNetCrossRef Tanbeer SK, Ahmed CF, Jeong B-S et al (2009) Sliding window-based frequent pattern mining over data streams. Inf Sci 179(22):3843–3865MathSciNetCrossRef
32.
go back to reference Yu K, Ding W, Simovici DA et al (2015) Classification with streaming features: an emerging-pattern mining approach. ACM Trans Knowl Discov Data 9(4):1–31CrossRef Yu K, Ding W, Simovici DA et al (2015) Classification with streaming features: an emerging-pattern mining approach. ACM Trans Knowl Discov Data 9(4):1–31CrossRef
33.
go back to reference Yu K, Ding W, Wang H et al (2013) Bridging causal relevance and pattern discriminability: Mining emerging patterns from high-dimensional data. IEEE Trans Knowl Data Eng 25(12):2721–2739CrossRef Yu K, Ding W, Wang H et al (2013) Bridging causal relevance and pattern discriminability: Mining emerging patterns from high-dimensional data. IEEE Trans Knowl Data Eng 25(12):2721–2739CrossRef
34.
go back to reference Zhang X, Dong G, Kotagiri R (2000) Exploring constraints to efeciently mine emerging patterns from large high-dimensional datasets. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining Zhang X, Dong G, Kotagiri R (2000) Exploring constraints to efeciently mine emerging patterns from large high-dimensional datasets. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining
Metadata
Title
Mining discriminative itemsets in data streams using the tilted-time window model
Authors
Majid Seyfi
Richi Nayak
Yue Xu
Shlomo Geva
Publication date
15-02-2021
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 5/2021
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-021-01550-y

Other articles of this Issue 5/2021

Knowledge and Information Systems 5/2021 Go to the issue

Premium Partner