Skip to main content
Top
Published in: Datenbank-Spektrum 1/2012

01-03-2012 | Fachbeitrag

An Index-Inspired Algorithm for Anytime Classification on Evolving Data Streams

Authors: Philipp Kranen, Ira Assent, Thomas Seidl

Published in: Datenbank-Spektrum | Issue 1/2012

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Due to the ever growing presence of data streams there has been a considerable amount of research on stream data mining over the past years. Anytime algorithms are particularly well suited for stream mining, since they flexibly use all available time on streams of varying data rates, and are also shown to outperform traditional budget approaches on constant streams. In this article we present an index-inspired algorithm for Bayesian anytime classification on evolving data streams and show its performance on benchmark data sets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Show more products
Footnotes
1
Similarity search on large data sets can be sped up to some extend by indexes and other appropriate methods.
 
2
Refining 26 classes for the letter data set takes less than 26 pages due a root compression technique employed in [22].
 
Literature
1.
go back to reference Arai B, Das G, Gunopulos D, Koudas N (2009) Anytime measures for top-k algorithms on exact and fuzzy data sets. VLDB J 18(2):407–427 CrossRef Arai B, Das G, Gunopulos D, Koudas N (2009) Anytime measures for top-k algorithms on exact and fuzzy data sets. VLDB J 18(2):407–427 CrossRef
2.
go back to reference Beckmann N, Kriegel HP, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. In: SIGMOD, pp 322–331 Beckmann N, Kriegel HP, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. In: SIGMOD, pp 322–331
3.
go back to reference Boddy MS (1991) Anytime problem solving using dynamic programming. In: AAAI, pp 738–743 Boddy MS (1991) Anytime problem solving using dynamic programming. In: AAAI, pp 738–743
4.
go back to reference Borodin A, El-Yaniv R (1998) Online computation and competitive analysis. Cambridge University Press, Cambridge MATH Borodin A, El-Yaniv R (1998) Online computation and competitive analysis. Cambridge University Press, Cambridge MATH
5.
go back to reference Dean T, Boddy MS (1988) An analysis of time-dependent planning. In: AAAI, pp 49–54 Dean T, Boddy MS (1988) An analysis of time-dependent planning. In: AAAI, pp 49–54
6.
go back to reference DeCoste D (2002) Anytime interval-valued outputs for kernel machines: fast support vector machine classification via distance geometry. In: ICML, pp 99–106 DeCoste D (2002) Anytime interval-valued outputs for kernel machines: fast support vector machine classification via distance geometry. In: ICML, pp 99–106
7.
go back to reference DeCoste D (2003) Anytime query-tuned kernel machines via Cholesky factorization. In: SDM, pp 186–193 DeCoste D (2003) Anytime query-tuned kernel machines via Cholesky factorization. In: SDM, pp 186–193
8.
go back to reference DeCoste D, Mazzoni D (2003) Fast query-optimized kernel machine classification via incremental approximate nearest support vectors. In: ICML, pp 115–122 DeCoste D, Mazzoni D (2003) Fast query-optimized kernel machine classification via incremental approximate nearest support vectors. In: ICML, pp 115–122
9.
go back to reference Dempster AP, Laird NML, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc B 39(1):1–38 MathSciNetMATH Dempster AP, Laird NML, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc B 39(1):1–38 MathSciNetMATH
10.
go back to reference Esmeir S, Markovitch S (2011) Anytime learning of anycost classifiers. Mach Learn (25th Anniversary) 82(3):445–473 CrossRef Esmeir S, Markovitch S (2011) Anytime learning of anycost classifiers. Mach Learn (25th Anniversary) 82(3):445–473 CrossRef
11.
go back to reference Flores MJ, Gámez JA, Martínez AM, Puerta JM (2009) Gaode and haode: two proposals based on aode to deal with continuous variables. In: ICML, pp 40–47 Flores MJ, Gámez JA, Martínez AM, Puerta JM (2009) Gaode and haode: two proposals based on aode to deal with continuous variables. In: ICML, pp 40–47
13.
go back to reference Grass J, Zilberstein S (1996) Anytime algorithm development tools. SIGART Bull 7(2):20–27 CrossRef Grass J, Zilberstein S (1996) Anytime algorithm development tools. SIGART Bull 7(2):20–27 CrossRef
14.
go back to reference Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: ACM SIGMOD, pp 47–57 Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: ACM SIGMOD, pp 47–57
15.
go back to reference Keogh EJ, Pazzani MJ (2002) Learning the structure of augmented Bayesian classifiers. Int J Artif Intell Tools 11(4):587–601 CrossRef Keogh EJ, Pazzani MJ (2002) Learning the structure of augmented Bayesian classifiers. Int J Artif Intell Tools 11(4):587–601 CrossRef
16.
go back to reference Kranen P, Assent I, Baldauf C, Seidl T (2011) The clustree: indexing micro-clusters for anytime stream mining. Knowl Inf Syst J 29:249–272 CrossRef Kranen P, Assent I, Baldauf C, Seidl T (2011) The clustree: indexing micro-clusters for anytime stream mining. Knowl Inf Syst J 29:249–272 CrossRef
17.
go back to reference Kranen P, Günnemann S, Fries S, Seidl T (2010) MC-tree: improving Bayesian anytime classification. In: SSDBM. Lecture notes in computer science, pp 252–269 Kranen P, Günnemann S, Fries S, Seidl T (2010) MC-tree: improving Bayesian anytime classification. In: SSDBM. Lecture notes in computer science, pp 252–269
18.
go back to reference Kranen P, Krieger R, Denker S, Seidl T (2010) Bulk loading hierarchical mixture models for efficient stream classification. In: PAKDD, pp 325–334 Kranen P, Krieger R, Denker S, Seidl T (2010) Bulk loading hierarchical mixture models for efficient stream classification. In: PAKDD, pp 325–334
19.
go back to reference Kranen P, Seidl T (2009) Harnessing the strengths of anytime algorithms for constant data streams. Data Min Knowl Discov 19(2):245–260 MathSciNetCrossRef Kranen P, Seidl T (2009) Harnessing the strengths of anytime algorithms for constant data streams. Data Min Knowl Discov 19(2):245–260 MathSciNetCrossRef
20.
21.
go back to reference Likhachev M, Gordon GJ, Thrun S (2003) ARA*: anytime A* with provable bounds on sub-optimality. In: NIPS. Likhachev M, Gordon GJ, Thrun S (2003) ARA*: anytime A* with provable bounds on sub-optimality. In: NIPS.
22.
go back to reference Seidl T, Assent I, Kranen P, Krieger R, Herrmann J (2009) Indexing density models for incremental learning and anytime classification on data streams. In: EDBT/ICDT, pp 311–322 Seidl T, Assent I, Kranen P, Krieger R, Herrmann J (2009) Indexing density models for incremental learning and anytime classification on data streams. In: EDBT/ICDT, pp 311–322
23.
go back to reference Shieh J, Keogh E (2010) Polishing the right apple: anytime classification also benefits data streams with constant arrival times. In: ICDM, pp 461–470 Shieh J, Keogh E (2010) Polishing the right apple: anytime classification also benefits data streams with constant arrival times. In: ICDM, pp 461–470
24.
go back to reference Turaga DS, Verscheure O, Chaudhari UV, Amini L (2006) Resource management for networked classifiers in distributed stream mining systems. In: ICDM, pp 1102–1107 Turaga DS, Verscheure O, Chaudhari UV, Amini L (2006) Resource management for networked classifiers in distributed stream mining systems. In: ICDM, pp 1102–1107
25.
go back to reference Ueno K, Xi X, Keogh EJ, Lee DJ (2006) Anytime classification using the nearest neighbor algorithm with applications to stream mining. In: ICDM, pp 623–632 Ueno K, Xi X, Keogh EJ, Lee DJ (2006) Anytime classification using the nearest neighbor algorithm with applications to stream mining. In: ICDM, pp 623–632
26.
go back to reference Vlachos M, Lin J, Keogh EJ, Gunopulos D (2003) A wavelet-based anytime algorithm for k-means clustering of time series. In: Workshop on clustering high dimensionality data and its applications. Vlachos M, Lin J, Keogh EJ, Gunopulos D (2003) A wavelet-based anytime algorithm for k-means clustering of time series. In: Workshop on clustering high dimensionality data and its applications.
27.
go back to reference Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: KDD, pp 226–235 Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: KDD, pp 226–235
28.
go back to reference Webb GI, Boughton JR, Wang Z (2005) Not so naive Bayes: aggregating one-dependence estimators. Mach Learn 58(1):5–24 MATHCrossRef Webb GI, Boughton JR, Wang Z (2005) Not so naive Bayes: aggregating one-dependence estimators. Mach Learn 58(1):5–24 MATHCrossRef
29.
go back to reference Yang Y, Webb GI, Cerquides J, Korb KB, Boughton JR, Ting KM (2007) To select or to weigh: a comparative study of linear combination schemes for superparent-one-dependence estimators. IEEE Trans Knowl Data Eng 19(12):1652–1665 CrossRef Yang Y, Webb GI, Cerquides J, Korb KB, Boughton JR, Ting KM (2007) To select or to weigh: a comparative study of linear combination schemes for superparent-one-dependence estimators. IEEE Trans Knowl Data Eng 19(12):1652–1665 CrossRef
30.
go back to reference Yang Y, Webb GI, Korb KB, Ting KM (2007) Classifying under computational resource constraints: anytime classification using probabilistic estimators. Mach Learn 69(1):35–53 CrossRef Yang Y, Webb GI, Korb KB, Ting KM (2007) Classifying under computational resource constraints: anytime classification using probabilistic estimators. Mach Learn 69(1):35–53 CrossRef
31.
go back to reference Zheng F, Webb GI (2006) Efficient lazy elimination for averaged one-dependence estimators. In: ICML, pp 1113–1120 CrossRef Zheng F, Webb GI (2006) Efficient lazy elimination for averaged one-dependence estimators. In: ICML, pp 1113–1120 CrossRef
32.
go back to reference Zheng F, Webb GI (2007) Finding the right family: parent and child selection for averaged one-dependence estimators. In: ECML PKDD, pp 490–501 Zheng F, Webb GI (2007) Finding the right family: parent and child selection for averaged one-dependence estimators. In: ECML PKDD, pp 490–501
33.
go back to reference Zilberstein S (1996) Using anytime algorithms in intelligent systems. AI Mag 17(3):73–83 Zilberstein S (1996) Using anytime algorithms in intelligent systems. AI Mag 17(3):73–83
Metadata
Title
An Index-Inspired Algorithm for Anytime Classification on Evolving Data Streams
Authors
Philipp Kranen
Ira Assent
Thomas Seidl
Publication date
01-03-2012
Publisher
Springer-Verlag
Published in
Datenbank-Spektrum / Issue 1/2012
Print ISSN: 1618-2162
Electronic ISSN: 1610-1995
DOI
https://doi.org/10.1007/s13222-012-0083-9

Other articles of this Issue 1/2012

Datenbank-Spektrum 1/2012 Go to the issue

Premium Partner