Skip to main content

2012 | OriginalPaper | Buchkapitel

8. Instance-Based Classification and Regression on Data Streams

verfasst von : Ammar Shaker, Eyke Hüllermeier

Erschienen in: Learning in Non-Stationary Environments

Verlag: Springer New York

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In order to be useful and effectively applicable in dynamically evolving environments, machine learning methods have to meet several requirements, including the ability to analyze incoming data in an online, incremental manner, to observe tight time and memory constraints, and to appropriately respond to changes of the data characteristics and underlying distributions. This paper advocates an instance-based learning algorithm for that purpose, both for classification and regression problems. This algorithm has a number of desirable properties that are not, at least not as a whole, shared by currently existing alternatives. Notably, our method is very flexible and thus able to adapt to an evolving environment quickly, a point of utmost importance in the data stream context. At the same time, the algorithm is relatively robust and thus applicable to streams with different characteristics.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Of course, this maxim disregards other criteria, such as the complexity of the method.
 
2
This choice of k test aims at including in the test environment the similarity environments of all examples in the similarity environment of x 0; of course, it does not guarantee to do so.
 
3
Note that, if this error, p, is estimated from the last k instances, the variance of this estimation is \(\approx p(1 - p)/k\). Moreover, the estimate is unbiased, provided that the error remained constant during the last k time steps. The value k = 20 provides a good trade-off between bias and precision.
 
5
To make the transformation more robust toward outliers, it makes sense to replace max and min by appropriate percentiles of the empirical distribution.
 
Literatur
1.
Zurück zum Zitat Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of VLDB 2003, the 29th International Conference on Very Large Data Bases. Berlin, Germany (2003) Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of VLDB 2003, the 29th International Conference on Very Large Data Bases. Berlin, Germany (2003)
2.
Zurück zum Zitat Aha, D.W. (ed.): Lazy Learning. Kluwer Academic Publ., Dordrecht, Netherlands (1997)MATH Aha, D.W. (ed.): Lazy Learning. Kluwer Academic Publ., Dordrecht, Netherlands (1997)MATH
3.
Zurück zum Zitat Aha, D.W., Kibler, D.F., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6(1), 37–66 (1991) Aha, D.W., Kibler, D.F., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6(1), 37–66 (1991)
4.
Zurück zum Zitat Angelov, P.P., Filev, D.P., Kasabov, N.: Evolving Intelligent Systems. John Wiley and Sons, New York (2010)CrossRef Angelov, P.P., Filev, D.P., Kasabov, N.: Evolving Intelligent Systems. John Wiley and Sons, New York (2010)CrossRef
5.
Zurück zum Zitat Angelov, P.P., Lughofer, E., Zhou, X.: Evolving fuzzy classifiers using different model architectures. Fuzzy Sets and Systems 159(23), 3160–3182 (2008)MathSciNetMATHCrossRef Angelov, P.P., Lughofer, E., Zhou, X.: Evolving fuzzy classifiers using different model architectures. Fuzzy Sets and Systems 159(23), 3160–3182 (2008)MathSciNetMATHCrossRef
6.
Zurück zum Zitat Beringer, J., Hüllermeier, E.: Efficient instance-based learning on data streams. Intelligent Data Analysis 11(6), 627–650 (2007) Beringer, J., Hüllermeier, E.: Efficient instance-based learning on data streams. Intelligent Data Analysis 11(6), 627–650 (2007)
7.
Zurück zum Zitat Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. Journal of Machine Learning Research 11, 1601–1604 (2010) Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. Journal of Machine Learning Research 11, 1601–1604 (2010)
8.
Zurück zum Zitat Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 139–148. Paris, France (2009) Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 139–148. Paris, France (2009)
9.
Zurück zum Zitat Bifet, A., Kirkby, R.: Massive Online Analysis Manual (2009) Bifet, A., Kirkby, R.: Massive Online Analysis Manual (2009)
10.
Zurück zum Zitat Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: Tracking most frequent items dynamically. In: ACM Symposium on Principles of Database Systems (PODS). San Diego, California (2003) Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: Tracking most frequent items dynamically. In: ACM Symposium on Principles of Database Systems (PODS). San Diego, California (2003)
11.
Zurück zum Zitat Dasarathy, B.V. (ed.): Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos, California (1991) Dasarathy, B.V. (ed.): Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos, California (1991)
12.
Zurück zum Zitat Dawid, A.P.: Statistical theory: The prequential approach. In: Journal of the Royal Statistical Society-A, pp. 147:278–292 (1984) Dawid, A.P.: Statistical theory: The prequential approach. In: Journal of the Royal Statistical Society-A, pp. 147:278–292 (1984)
13.
Zurück zum Zitat Domingos, P.: Rule induction and instance-based learning: A unified approach. In: C. Mellish (ed.) Proceedings of the 14th International Joint Conference on Artificial Intelligence, IJCAI 95, vol. 2, pp. 1226–1232. Morgan Kaufmann, Montral, Qubec, Canada (1995) Domingos, P.: Rule induction and instance-based learning: A unified approach. In: C. Mellish (ed.) Proceedings of the 14th International Joint Conference on Artificial Intelligence, IJCAI 95, vol. 2, pp. 1226–1232. Morgan Kaufmann, Montral, Qubec, Canada (1995)
14.
Zurück zum Zitat Domingos, P.: Unifying instance-based and rule-based induction. Machine Learning 24, 141–168 (1996) Domingos, P.: Unifying instance-based and rule-based induction. Machine Learning 24, 141–168 (1996)
15.
Zurück zum Zitat Domingos, P., Hulten, G.: A general framework for mining massive data streams. Journal of Computational and Graphical Statistics 12 (2003) Domingos, P., Hulten, G.: A general framework for mining massive data streams. Journal of Computational and Graphical Statistics 12 (2003)
17.
Zurück zum Zitat Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: A review. ACM SIGMOD Record, ACM Special Interest Group on Management of Data 34(1) (2005) Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: A review. ACM SIGMOD Record, ACM Special Interest Group on Management of Data 34(1) (2005)
18.
Zurück zum Zitat Gama, J., Gaber, M.M.: Learning from Data Streams. Springer-Verlag, Berlin, New York (2007)MATHCrossRef Gama, J., Gaber, M.M.: Learning from Data Streams. Springer-Verlag, Berlin, New York (2007)MATHCrossRef
19.
Zurück zum Zitat Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Proceedings SBIA 2004, the 17th Brazilian Symposium on Artificial Intelligence, pp. 286–295. São Luis, Maranhão, Brazil (2004) Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Proceedings SBIA 2004, the 17th Brazilian Symposium on Artificial Intelligence, pp. 286–295. São Luis, Maranhão, Brazil (2004)
20.
Zurück zum Zitat Gama, J., Medas, P., Rodrigues, P.: Learning decision trees from dynamic data streams. In: SAC ’05: Proceedings of the 2005 ACM symposium on Applied computing, pp. 573–577. ACM Press, New York, NY, USA (2005). DOI http://doi.acm.org/10. 1145/1066677.1066809 Gama, J., Medas, P., Rodrigues, P.: Learning decision trees from dynamic data streams. In: SAC ’05: Proceedings of the 2005 ACM symposium on Applied computing, pp. 573–577. ACM Press, New York, NY, USA (2005). DOI http://​doi.​acm.​org/​10.​ 1145/1066677.1066809
21.
Zurück zum Zitat Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in evaluation of stream learning algorithms. In: Proceedings of 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Paris, France (2009) Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in evaluation of stream learning algorithms. In: Proceedings of 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Paris, France (2009)
22.
Zurück zum Zitat Hulten, G., Spencer, L., Domingos, P.: Mining timechanging data streams. In: Proceedings 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106. San Francisco, CA, USA (2001) Hulten, G., Spencer, L., Domingos, P.: Mining timechanging data streams. In: Proceedings 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106. San Francisco, CA, USA (2001)
23.
Zurück zum Zitat Kolodner, J.L.: Case-based Reasoning. Morgan Kaufmann, San Mateo (1993) Kolodner, J.L.: Case-based Reasoning. Morgan Kaufmann, San Mateo (1993)
24.
Zurück zum Zitat Lughofer, E.: FLEXFIS: A robust incremental learning approach for evolving takagi-sugeno fuzzy models. IEEE Transactions on Fuzzy Systems 16(6), 1393–1410 (2008)CrossRef Lughofer, E.: FLEXFIS: A robust incremental learning approach for evolving takagi-sugeno fuzzy models. IEEE Transactions on Fuzzy Systems 16(6), 1393–1410 (2008)CrossRef
25.
Zurück zum Zitat Lughofer, E.: Evolving Fuzzy Systems: Methodologies, Advanced Concepts and Applications. Springer-Verlag, Berlin, Heidelberg (2011)MATHCrossRef Lughofer, E.: Evolving Fuzzy Systems: Methodologies, Advanced Concepts and Applications. Springer-Verlag, Berlin, Heidelberg (2011)MATHCrossRef
26.
Zurück zum Zitat Oza, N.C., Russell, S.: Online bagging and boosting. Artificial Intelligence and Statistics pp. 105–112 (2001) Oza, N.C., Russell, S.: Online bagging and boosting. Artificial Intelligence and Statistics pp. 105–112 (2001)
27.
Zurück zum Zitat Salzberg, S.: A nearest hyperrectangle learning method. Machine Learning 6, 251–276 (1991) Salzberg, S.: A nearest hyperrectangle learning method. Machine Learning 6, 251–276 (1991)
28.
Zurück zum Zitat Stanfill, C., Waltz, D.: Toward memory-based reasoning. Communications of the ACM 29, 1213–1228 (1986)CrossRef Stanfill, C., Waltz, D.: Toward memory-based reasoning. Communications of the ACM 29, 1213–1228 (1986)CrossRef
29.
Zurück zum Zitat Takagi, T., Sugeno, M.: Fuzzy identification of systems and its applications to modeling and control. IEEE Transactions on Systems, Man, and Cybernetics 15(1), 116–132 (1985)MATH Takagi, T., Sugeno, M.: Fuzzy identification of systems and its applications to modeling and control. IEEE Transactions on Systems, Man, and Cybernetics 15(1), 116–132 (1985)MATH
30.
Zurück zum Zitat Utgoff, P.E.: Incremental induction of decision trees. Machine Learning 4, 161–186 (1989)CrossRef Utgoff, P.E.: Incremental induction of decision trees. Machine Learning 4, 161–186 (1989)CrossRef
31.
Zurück zum Zitat Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2 edn. Morgan Kaufmann, San Francisco (2005)MATH Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2 edn. Morgan Kaufmann, San Francisco (2005)MATH
32.
Zurück zum Zitat Widmer, G. and Kubat, M.: Learning in the Presence of Concept Drift and Hidden Contexts. Machine Learning 23, 69–101 (1996) Widmer, G. and Kubat, M.: Learning in the Presence of Concept Drift and Hidden Contexts. Machine Learning 23, 69–101 (1996)
Metadaten
Titel
Instance-Based Classification and Regression on Data Streams
verfasst von
Ammar Shaker
Eyke Hüllermeier
Copyright-Jahr
2012
Verlag
Springer New York
DOI
https://doi.org/10.1007/978-1-4419-8020-5_8

Premium Partner