Skip to main content

2016 | OriginalPaper | Buchkapitel

On the Impact of Class Imbalance in GP Streaming Classification with Label Budgets

verfasst von : Sara Khanchi, Malcolm I. Heywood, Nur Zincir-Heywood

Erschienen in: Genetic Programming

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Streaming data scenarios introduce a set of requirements that do not exist under supervised learning paradigms typically employed for classification. Specific examples include, anytime operation, non-stationary processes, and limited label budgets. From the perspective of class imbalance, this implies that it is not even possible to guarantee that all classes are present in the samples of data used to construct a model. Moreover, when decisions are made regarding what subset of data to sample, no label information is available. Only after sampling is label information provided. This represents a more challenging task than encountered under non-streaming (offline) scenarios because the training partition contains label information. In this work, we investigate the utility of different protocols for sampling from the stream under the above constraints. Adopting a uniform sampling protocol was previously shown to be reasonably effective under both evolutionary and non-evolutionary streaming classifiers. In this work, we introduce a scheme for using the current ‘champion’ classifier to bias the sampling of training instances during the course of the stream. The resulting streaming framework for genetic programming is more effective at sampling minor classes and therefore reacting to changes in the underlying process responsible for generating the data stream.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Earlier work with SBB under streaming data assumed that label information could be used to ensure the data subset was always balanced [15].
 
2
Valid class labels appear over the interval [1, ... , C].
 
3
Given the later benchmarking parameterization this corresponds to no more than 25 generations.
 
4
Previous studies had compared StreamGP under the uniform sampling protocol to non-evolutionary streaming algorithms [16, 17].
 
6
Electricity and Cover Type are available from: http://​moa.​cms.​waikato.​ac.​nz/​datasets/​.
 
7
Any more than five resulted in negligible improvement [16].
 
8
Violin plots were used to establish that the distributions did not conform to a normal distribution. Space precludes their inclusion.
 
Literatur
1.
Zurück zum Zitat Bifet, A.: Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams. Frontiers in Artificial Intelligence and Applications, vol. 207. IOS Press, Amsterdam (2010)MATH Bifet, A.: Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams. Frontiers in Artificial Intelligence and Applications, vol. 207. IOS Press, Amsterdam (2010)MATH
2.
Zurück zum Zitat Bifet, A., Read, J., Žliobaitė, I., Pfahringer, B., Holmes, G.: Pitfalls in benchmarking data stream classification and how to avoid them. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part I. LNCS, vol. 8188, pp. 465–479. Springer, Heidelberg (2013)CrossRef Bifet, A., Read, J., Žliobaitė, I., Pfahringer, B., Holmes, G.: Pitfalls in benchmarking data stream classification and how to avoid them. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part I. LNCS, vol. 8188, pp. 465–479. Springer, Heidelberg (2013)CrossRef
3.
Zurück zum Zitat Brameier, M., Banzhaf, W.: Evolving teams of predictors with linear genetic programming. Genet. Program. Evolvable Mach. 2(4), 381–408 (2001)CrossRefMATH Brameier, M., Banzhaf, W.: Evolving teams of predictors with linear genetic programming. Genet. Program. Evolvable Mach. 2(4), 381–408 (2001)CrossRefMATH
4.
Zurück zum Zitat Dempsey, I., O’Neill, M., Brabazon, A.: Grammatical Evolution. In: Dempsey, I., O’Neill, M., Brabazon, A. (eds.) Foundations in Grammatical Evolution for Dynamic Environments. SCI, vol. 194, pp. 9–24. Springer, Heidelberg (2009)CrossRef Dempsey, I., O’Neill, M., Brabazon, A.: Grammatical Evolution. In: Dempsey, I., O’Neill, M., Brabazon, A. (eds.) Foundations in Grammatical Evolution for Dynamic Environments. SCI, vol. 194, pp. 9–24. Springer, Heidelberg (2009)CrossRef
5.
Zurück zum Zitat Ditzler, G., Roveri, M., Alippi, C., Polikar, R.: Learning in nonstationary environments: a survey. IEEE Comput. Intell. Mag. 10(4), 12–25 (2015)CrossRef Ditzler, G., Roveri, M., Alippi, C., Polikar, R.: Learning in nonstationary environments: a survey. IEEE Comput. Intell. Mag. 10(4), 12–25 (2015)CrossRef
6.
Zurück zum Zitat Fan, W., Huang, Y., Wang, H., Yu, P.S.: Active mining of data streams. In: SIAM International Conference on Data Mining, pp. 457–461 (2004) Fan, W., Huang, Y., Wang, H., Yu, P.S.: Active mining of data streams. In: SIAM International Conference on Data Mining, pp. 457–461 (2004)
7.
8.
9.
Zurück zum Zitat Heywood, M.I.: Evolutionary model building under streaming data for classification tasks: opportunities and challenges. Genet. Program. Evolvable Mach. 16(3), 283–326 (2015)CrossRef Heywood, M.I.: Evolutionary model building under streaming data for classification tasks: opportunities and challenges. Genet. Program. Evolvable Mach. 16(3), 283–326 (2015)CrossRef
10.
Zurück zum Zitat Lichodzijewski, P., Heywood, M.I.: Managing team-based problem solving with symbiotic bid-based genetic programming. In: ACM Genetic and Evolutionary Computation Conference, pp. 363–370 (2008) Lichodzijewski, P., Heywood, M.I.: Managing team-based problem solving with symbiotic bid-based genetic programming. In: ACM Genetic and Evolutionary Computation Conference, pp. 363–370 (2008)
11.
Zurück zum Zitat Lichodzijewski, P., Heywood, M.I.: Symbiosis, complexification and simplicity under GP. In: ACM Genetic and Evolutionary Computation Conference, pp. 853–860 (2010) Lichodzijewski, P., Heywood, M.I.: Symbiosis, complexification and simplicity under GP. In: ACM Genetic and Evolutionary Computation Conference, pp. 853–860 (2010)
12.
Zurück zum Zitat Polikar, R., Alippi, C.: Guest editorial: learning in nonstationary and evolving environments. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 9–11 (2014)CrossRef Polikar, R., Alippi, C.: Guest editorial: learning in nonstationary and evolving environments. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 9–11 (2014)CrossRef
13.
Zurück zum Zitat Thomason, R., Soule, T.: Novel ways of improving cooperation and performance in ensemble classifiers. In: ACM Genetic and Evolutionary Computation Conference, pp. 1708–1715 (2007) Thomason, R., Soule, T.: Novel ways of improving cooperation and performance in ensemble classifiers. In: ACM Genetic and Evolutionary Computation Conference, pp. 1708–1715 (2007)
14.
Zurück zum Zitat Žliobaitė, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with drifting streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 27–54 (2014)CrossRef Žliobaitė, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with drifting streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 27–54 (2014)CrossRef
15.
Zurück zum Zitat Vahdat, A., Atwater, A., McIntyre, A.R., Heywood, M.I.: On the application of GP to streaming data classification tasks with label budgets. In: ACM GECCO (Companion), pp. 1287–1294 (2014) Vahdat, A., Atwater, A., McIntyre, A.R., Heywood, M.I.: On the application of GP to streaming data classification tasks with label budgets. In: ACM GECCO (Companion), pp. 1287–1294 (2014)
16.
Zurück zum Zitat Vahdat, A., Morgan, J., McIntyre, A., Heywood, M., Zincir-Heywood, A.: Evolving GP classifiers for streaming data tasks with concept change and label budgets: a benchmarking study. In: Gandomi, A.H., Alavi, A.H., Ryan, C. (eds.) Handbook of Genetic Programming Applications, pp. 451–480. Springer, Switzerland (2015)CrossRef Vahdat, A., Morgan, J., McIntyre, A., Heywood, M., Zincir-Heywood, A.: Evolving GP classifiers for streaming data tasks with concept change and label budgets: a benchmarking study. In: Gandomi, A.H., Alavi, A.H., Ryan, C. (eds.) Handbook of Genetic Programming Applications, pp. 451–480. Springer, Switzerland (2015)CrossRef
17.
Zurück zum Zitat Vahdat, A., Morgan, J., McIntyre, A., Heywood, M., Zincir-Heywood, A.: Tapped delay lines for GP streaming data classification with label budgets. In: Machado, P., et al. (eds.) Genetic Programming. LNCS, vol. 9025, pp. 126–138. Springer, Switzerland (2015) Vahdat, A., Morgan, J., McIntyre, A., Heywood, M., Zincir-Heywood, A.: Tapped delay lines for GP streaming data classification with label budgets. In: Machado, P., et al. (eds.) Genetic Programming. LNCS, vol. 9025, pp. 126–138. Springer, Switzerland (2015)
18.
Zurück zum Zitat Wagner, N., Michalewicz, Z., Khouja, M., McGregor, R.R.: Time series forecasting for dynamic environments: the DyFor genetic program model. IEEE Trans. Evol. Comput. 11(4), 433–452 (2007)CrossRef Wagner, N., Michalewicz, Z., Khouja, M., McGregor, R.R.: Time series forecasting for dynamic environments: the DyFor genetic program model. IEEE Trans. Evol. Comput. 11(4), 433–452 (2007)CrossRef
19.
Zurück zum Zitat Wu, S., Banzhaf, W.: Rethinking multilevel selection in genetic programming. In: ACM Genetic and Evolutionary Computation Conference, pp. 1403–1410 (2011) Wu, S., Banzhaf, W.: Rethinking multilevel selection in genetic programming. In: ACM Genetic and Evolutionary Computation Conference, pp. 1403–1410 (2011)
20.
Zurück zum Zitat Zhu, X., Zhang, P., Lin, X., Shi, Y.: Active learning from stream data using optimal weight classifier ensemble. IEEE Trans. Syst. Man Cybern.: Part B 40(6), 1607–1621 (2010)CrossRef Zhu, X., Zhang, P., Lin, X., Shi, Y.: Active learning from stream data using optimal weight classifier ensemble. IEEE Trans. Syst. Man Cybern.: Part B 40(6), 1607–1621 (2010)CrossRef
Metadaten
Titel
On the Impact of Class Imbalance in GP Streaming Classification with Label Budgets
verfasst von
Sara Khanchi
Malcolm I. Heywood
Nur Zincir-Heywood
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-30668-1_3