Skip to main content

2017 | OriginalPaper | Buchkapitel

Mining Strongly Closed Itemsets from Data Streams

verfasst von : Daniel Trabold, Tamás Horváth

Erschienen in: Discovery Science

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We consider the problem of mining strongly closed itemsets from transactional data streams. Compactness and stability against changes in the input are two characteristic features of this kind of itemsets that make them appealing for different applications. Utilizing their algebraic and algorithmic properties, we propose an algorithm based on reservoir sampling for approximating this type of itemsets in the landmark streaming setting, prove its correctness, and show empirically that it yields a considerable speed-up over a straightforward naive algorithm without any significant loss in precision and recall. As a motivating application, we experimentally demonstrate the suitability of strongly closed itemsets to concept drift detection in transactional data streams.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Due to space limitations we omit frequency constraints in this short version.
 
2
We note that Hoeffding’s inequality applies to samples without replacement as well [5]. A tighter bound can be derived from Serfling’s inequality [12]. The improvement becomes however marginal with increasing data stream length.
 
4
We are going to present further practical applications (e.g., computer aided product configuration) in the long version of this paper.
 
Literatur
1.
Zurück zum Zitat Boley, M., Horváth, T., Poigné, A., Wrobel, S.: Listing closed sets of strongly accessible set systems with applications to data mining. Theoret. Comput. Sci. 411(3), 691–700 (2010)MathSciNetCrossRefMATH Boley, M., Horváth, T., Poigné, A., Wrobel, S.: Listing closed sets of strongly accessible set systems with applications to data mining. Theoret. Comput. Sci. 411(3), 691–700 (2010)MathSciNetCrossRefMATH
2.
Zurück zum Zitat Boley, M., Horváth, T., Wrobel, S.: Efficient discovery of interesting patterns based on strong closedness. Stat. Anal. Data Mining 2(5–6), 346–360 (2009)MathSciNetCrossRef Boley, M., Horváth, T., Wrobel, S.: Efficient discovery of interesting patterns based on strong closedness. Stat. Anal. Data Mining 2(5–6), 346–360 (2009)MathSciNetCrossRef
3.
Zurück zum Zitat Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014)CrossRefMATH Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014)CrossRefMATH
4.
5.
6.
Zurück zum Zitat Iwanuma, K., Yamamoto, Y., Fukuda, S.: An on-line approximation algorithm for mining frequent closed itemsets based on incremental intersection. In: Proceedings of the 19th International Conference on Extending Database Technology, pp. 704–705 (2016) Iwanuma, K., Yamamoto, Y., Fukuda, S.: An on-line approximation algorithm for mining frequent closed itemsets based on incremental intersection. In: Proceedings of the 19th International Conference on Extending Database Technology, pp. 704–705 (2016)
7.
Zurück zum Zitat Knuth, D.E.: The Art of Computer Programming. Seminumerical Algorithms, vol. 2. Addison-Wesley, Reading (1997)MATH Knuth, D.E.: The Art of Computer Programming. Seminumerical Algorithms, vol. 2. Addison-Wesley, Reading (1997)MATH
8.
Zurück zum Zitat Lichman, M.: UCI machine learning repository (2013) Lichman, M.: UCI machine learning repository (2013)
9.
Zurück zum Zitat Liu, X., Guan, J., Hu, P.: Mining frequent closed itemsets from a landmark window over online data streams. Comput. Math. Appl. 57(6), 927–936 (2009)CrossRefMATH Liu, X., Guan, J., Hu, P.: Mining frequent closed itemsets from a landmark window over online data streams. Comput. Math. Appl. 57(6), 927–936 (2009)CrossRefMATH
10.
Zurück zum Zitat Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases (VLDB), pp. 346–357. VLDB Endowment (2002) Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases (VLDB), pp. 346–357. VLDB Endowment (2002)
11.
Zurück zum Zitat Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Inf. Syst. 24(1), 25–46 (1999)CrossRefMATH Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Inf. Syst. 24(1), 25–46 (1999)CrossRefMATH
12.
14.
Zurück zum Zitat Yen, S.J., Wu, C.W., Lee, Y.S., Tseng, V.S., Hsieh, C.H.: A fast algorithm for mining frequent closed itemsets over stream sliding window. In: IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 996–1002 (2011) Yen, S.J., Wu, C.W., Lee, Y.S., Tseng, V.S., Hsieh, C.H.: A fast algorithm for mining frequent closed itemsets over stream sliding window. In: IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 996–1002 (2011)
Metadaten
Titel
Mining Strongly Closed Itemsets from Data Streams
verfasst von
Daniel Trabold
Tamás Horváth
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-67786-6_18

Premium Partner