Skip to main content
Top

2017 | OriginalPaper | Chapter

Mining Strongly Closed Itemsets from Data Streams

Authors : Daniel Trabold, Tamás Horváth

Published in: Discovery Science

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We consider the problem of mining strongly closed itemsets from transactional data streams. Compactness and stability against changes in the input are two characteristic features of this kind of itemsets that make them appealing for different applications. Utilizing their algebraic and algorithmic properties, we propose an algorithm based on reservoir sampling for approximating this type of itemsets in the landmark streaming setting, prove its correctness, and show empirically that it yields a considerable speed-up over a straightforward naive algorithm without any significant loss in precision and recall. As a motivating application, we experimentally demonstrate the suitability of strongly closed itemsets to concept drift detection in transactional data streams.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Due to space limitations we omit frequency constraints in this short version.
 
2
We note that Hoeffding’s inequality applies to samples without replacement as well [5]. A tighter bound can be derived from Serfling’s inequality [12]. The improvement becomes however marginal with increasing data stream length.
 
4
We are going to present further practical applications (e.g., computer aided product configuration) in the long version of this paper.
 
Literature
1.
go back to reference Boley, M., Horváth, T., Poigné, A., Wrobel, S.: Listing closed sets of strongly accessible set systems with applications to data mining. Theoret. Comput. Sci. 411(3), 691–700 (2010)MathSciNetCrossRefMATH Boley, M., Horváth, T., Poigné, A., Wrobel, S.: Listing closed sets of strongly accessible set systems with applications to data mining. Theoret. Comput. Sci. 411(3), 691–700 (2010)MathSciNetCrossRefMATH
2.
go back to reference Boley, M., Horváth, T., Wrobel, S.: Efficient discovery of interesting patterns based on strong closedness. Stat. Anal. Data Mining 2(5–6), 346–360 (2009)MathSciNetCrossRef Boley, M., Horváth, T., Wrobel, S.: Efficient discovery of interesting patterns based on strong closedness. Stat. Anal. Data Mining 2(5–6), 346–360 (2009)MathSciNetCrossRef
3.
go back to reference Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014)CrossRefMATH Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014)CrossRefMATH
6.
go back to reference Iwanuma, K., Yamamoto, Y., Fukuda, S.: An on-line approximation algorithm for mining frequent closed itemsets based on incremental intersection. In: Proceedings of the 19th International Conference on Extending Database Technology, pp. 704–705 (2016) Iwanuma, K., Yamamoto, Y., Fukuda, S.: An on-line approximation algorithm for mining frequent closed itemsets based on incremental intersection. In: Proceedings of the 19th International Conference on Extending Database Technology, pp. 704–705 (2016)
7.
go back to reference Knuth, D.E.: The Art of Computer Programming. Seminumerical Algorithms, vol. 2. Addison-Wesley, Reading (1997)MATH Knuth, D.E.: The Art of Computer Programming. Seminumerical Algorithms, vol. 2. Addison-Wesley, Reading (1997)MATH
8.
go back to reference Lichman, M.: UCI machine learning repository (2013) Lichman, M.: UCI machine learning repository (2013)
9.
go back to reference Liu, X., Guan, J., Hu, P.: Mining frequent closed itemsets from a landmark window over online data streams. Comput. Math. Appl. 57(6), 927–936 (2009)CrossRefMATH Liu, X., Guan, J., Hu, P.: Mining frequent closed itemsets from a landmark window over online data streams. Comput. Math. Appl. 57(6), 927–936 (2009)CrossRefMATH
10.
go back to reference Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases (VLDB), pp. 346–357. VLDB Endowment (2002) Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases (VLDB), pp. 346–357. VLDB Endowment (2002)
11.
go back to reference Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Inf. Syst. 24(1), 25–46 (1999)CrossRefMATH Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Inf. Syst. 24(1), 25–46 (1999)CrossRefMATH
12.
14.
go back to reference Yen, S.J., Wu, C.W., Lee, Y.S., Tseng, V.S., Hsieh, C.H.: A fast algorithm for mining frequent closed itemsets over stream sliding window. In: IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 996–1002 (2011) Yen, S.J., Wu, C.W., Lee, Y.S., Tseng, V.S., Hsieh, C.H.: A fast algorithm for mining frequent closed itemsets over stream sliding window. In: IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 996–1002 (2011)
Metadata
Title
Mining Strongly Closed Itemsets from Data Streams
Authors
Daniel Trabold
Tamás Horváth
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-67786-6_18

Premium Partner