Skip to main content
Erschienen in: Neural Computing and Applications 7/2011

01.10.2011 | ICONIP2009

On-line learning from streaming data with delayed attributes: a comparison of classifiers and strategies

verfasst von: Mónica Millán-Giraldo, J. Salvador Sánchez, V. Javier Traver

Erschienen in: Neural Computing and Applications | Ausgabe 7/2011

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In many real applications, data are not all available at the same time, or it is not affordable to process them all in a batch process, but rather, instances arrive sequentially in a stream. The scenario of streaming data introduces new challenges to the machine learning community, since difficult decisions have to be made. The problem addressed in this paper is that of classifying incoming instances for which one attribute arrives only after a given delay. In this formulation, many open issues arise, such as how to classify the incomplete instance, whether to wait for the delayed attribute before performing any classification, or when and how to update a reference set. Three different strategies are proposed which address these issues differently. Orthogonally to these strategies, three classifiers of different characteristics are used. Keeping on-line learning strategies independent of the classifiers facilitates system design and contrasts with the common alternative of carefully crafting an ad hoc classifier. To assess how good learning is under these different strategies and classifiers, they are compared using learning curves and final classification errors for fifteen data sets. Results indicate that learning in this stringent context of streaming data and delayed attributes can successfully take place even with simple on-line strategies. Furthermore, active strategies behave generally better than more conservative passive ones. Regarding the classifiers, it was found that simple instance-based classifiers such as the well-known nearest neighbor may outperform more elaborate classifiers such as the support vector machines, especially if some measure of classification confidence is considered in the process.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agarwal C (2004) On-demand classification of data streams. In: Proceedings of the ACM international conference on knowledge discovery and data mining, pp 503–508 Agarwal C (2004) On-demand classification of data streams. In: Proceedings of the ACM international conference on knowledge discovery and data mining, pp 503–508
2.
Zurück zum Zitat Agarwal C (2007) Data streams: models and algorithms. Springer, New York Agarwal C (2007) Data streams: models and algorithms. Springer, New York
4.
Zurück zum Zitat Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 1–16 Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 1–16
5.
Zurück zum Zitat Bruzzone L, Roli R, Serpico SB (1995) An extension of the Jeffreys–Matusita distance to multiclass cases for feature selection. IEEE Trans Geosci Remote Sens 33(6):1318–1321CrossRef Bruzzone L, Roli R, Serpico SB (1995) An extension of the Jeffreys–Matusita distance to multiclass cases for feature selection. IEEE Trans Geosci Remote Sens 33(6):1318–1321CrossRef
7.
Zurück zum Zitat Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNet Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNet
8.
Zurück zum Zitat Ganti V, Gehrke J, Ramakrishnan R (2001) Demon: mining and monitoring evolving data. IEEE Trans Knowl Data Eng 13(1):50–63CrossRef Ganti V, Gehrke J, Ramakrishnan R (2001) Demon: mining and monitoring evolving data. IEEE Trans Knowl Data Eng 13(1):50–63CrossRef
9.
Zurück zum Zitat Gelman A, Meng XL (2004) Applied Bayesian modeling and causal inference from incomplete data perspectives. Wiley, ChichesterMATHCrossRef Gelman A, Meng XL (2004) Applied Bayesian modeling and causal inference from incomplete data perspectives. Wiley, ChichesterMATHCrossRef
10.
Zurück zum Zitat Hashemi S, Yang Y (2009) Flexible decision tree for data stream classification in the presence of concept change, noise and missing values. Data Min Knowl Discov 19(1):95–131MathSciNetCrossRef Hashemi S, Yang Y (2009) Flexible decision tree for data stream classification in the presence of concept change, noise and missing values. Data Min Knowl Discov 19(1):95–131MathSciNetCrossRef
11.
Zurück zum Zitat Keerthi SS, Lin CJ (2003) Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Comput 15(7):1667–1689MATHCrossRef Keerthi SS, Lin CJ (2003) Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Comput 15(7):1667–1689MATHCrossRef
12.
Zurück zum Zitat Kuncheva LI (2008) Classifier ensembles for detecting concept change in streaming data: overview and perspectives. In: Proceedings of the 2nd workshop on supervised and unsupervised ensemble methods and their applications, pp 5–10 Kuncheva LI (2008) Classifier ensembles for detecting concept change in streaming data: overview and perspectives. In: Proceedings of the 2nd workshop on supervised and unsupervised ensemble methods and their applications, pp 5–10
13.
Zurück zum Zitat Maimon O, Rokach L (2005) Data mining and knowledge discovery handbook. Springer Science+Business Media, New YorkMATHCrossRef Maimon O, Rokach L (2005) Data mining and knowledge discovery handbook. Springer Science+Business Media, New YorkMATHCrossRef
14.
Zurück zum Zitat Marwala T (2009) Computational intelligence for missing data imputation, estimation and management: knowledge optimization techniques. Information Science Reference, HersheyCrossRef Marwala T (2009) Computational intelligence for missing data imputation, estimation and management: knowledge optimization techniques. Information Science Reference, HersheyCrossRef
15.
Zurück zum Zitat Millán-Giraldo M, Sánchez JS, Traver VJ (2009) Exploring early classification strategies of streaming data with delayed attributes. In: 16th International conference on neural information processing, LNCS 6863, part I, Bangkok, pp 875–883 Millán-Giraldo M, Sánchez JS, Traver VJ (2009) Exploring early classification strategies of streaming data with delayed attributes. In: 16th International conference on neural information processing, LNCS 6863, part I, Bangkok, pp 875–883
16.
Zurück zum Zitat Millán-Giraldo M, Duin RPW, Sánchez JS (2010) Dissimilarity-based classification of data with missing attributes. In: The 2nd international workshop on cognitive information processing (submitted) Millán-Giraldo M, Duin RPW, Sánchez JS (2010) Dissimilarity-based classification of data with missing attributes. In: The 2nd international workshop on cognitive information processing (submitted)
17.
Zurück zum Zitat Muthukrishnan S (2005) Data streams: algorithms and applications. Found Trends Theor Comput Sci 1(2):117–236MathSciNetCrossRef Muthukrishnan S (2005) Data streams: algorithms and applications. Found Trends Theor Comput Sci 1(2):117–236MathSciNetCrossRef
18.
Zurück zum Zitat Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, CambridgeMATH Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, CambridgeMATH
19.
Zurück zum Zitat Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New YorkMATH Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New YorkMATH
20.
Zurück zum Zitat Saar-Tsechansky M, Provost F (2007) Handling missing values when applying classification models. J Mach Learn Res 8:1625–1657 Saar-Tsechansky M, Provost F (2007) Handling missing values when applying classification models. J Mach Learn Res 8:1625–1657
21.
Zurück zum Zitat Street WN, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th international conference on knowledge discovery and data mining, pp 377–382 Street WN, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th international conference on knowledge discovery and data mining, pp 377–382
22.
Zurück zum Zitat Takeuchi J, Yamanishi K (2006) A unifying framework for detecting outliers and change points from time series. IEEE Trans Knowl Data Eng 18(4):482–492CrossRef Takeuchi J, Yamanishi K (2006) A unifying framework for detecting outliers and change points from time series. IEEE Trans Knowl Data Eng 18(4):482–492CrossRef
23.
Zurück zum Zitat Tsymbal A (2004) The problem of concept drift: definitions and related work. Technical report, Department of Computer Science, Trinity College, Dublin Tsymbal A (2004) The problem of concept drift: definitions and related work. Technical report, Department of Computer Science, Trinity College, Dublin
24.
Zurück zum Zitat Vázquez F, Sánchez JS, Pla F (2005) A stochastic approach to Wilsons editing algorithm. In: Proceedings of the 2nd Iberian conference on pattern recognition and image analysis, pp 35–42 Vázquez F, Sánchez JS, Pla F (2005) A stochastic approach to Wilsons editing algorithm. In: Proceedings of the 2nd Iberian conference on pattern recognition and image analysis, pp 35–42
25.
Zurück zum Zitat Widyantoro DH, Yen J (2005) Relevant data expansion for learning concept drift from sparsely labeled data. IEEE Trans Knowl Data Eng 17(3):401–412CrossRef Widyantoro DH, Yen J (2005) Relevant data expansion for learning concept drift from sparsely labeled data. IEEE Trans Knowl Data Eng 17(3):401–412CrossRef
Metadaten
Titel
On-line learning from streaming data with delayed attributes: a comparison of classifiers and strategies
verfasst von
Mónica Millán-Giraldo
J. Salvador Sánchez
V. Javier Traver
Publikationsdatum
01.10.2011
Verlag
Springer-Verlag
Erschienen in
Neural Computing and Applications / Ausgabe 7/2011
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-010-0402-8

Weitere Artikel der Ausgabe 7/2011

Neural Computing and Applications 7/2011 Zur Ausgabe