Skip to main content

2019 | OriginalPaper | Buchkapitel

Continuous Variable Binning Algorithm to Maximize Information Value Using Genetic Algorithm

verfasst von : Nattawut Vejkanchana, Pramote Kuacharoen

Erschienen in: Applied Informatics

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Binning (bucketing or discretization) is a commonly used data pre-processing technique for continuous predictive variables in machine learning. There are guidelines for good binning which can be treated as constraints. However, there are also statistics which should be optimized. Therefore, we view the binning problem as a constrained optimization problem. This paper presents a novel supervised binning algorithm for binary classification problems using a genetic algorithm, named GAbin, and demonstrates usage on a well-known dataset. It is inspired by the way that human bins continuous variables. To bin a variable, first, we choose output shapes (e.g., monotonic or best bins in the middle). Second, we define constraints (e.g., minimum samples in each bin). Finally, we try to maximize key statistics to assess the quality of the output bins. The algorithm automates these steps. Results from the algorithm are in the user-desired shapes and satisfy the constraints. The experimental results reveal that the proposed GAbin provides competitive results when compared to other binning algorithms. Moreover, GAbin maximizes information value and can satisfy user-desired constraints such as monotonicity or output shape controls.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Siddiqi, N.: Credit Risk Scorecards, pp. 79–82. Wiley, Hoboken (2013) Siddiqi, N.: Credit Risk Scorecards, pp. 79–82. Wiley, Hoboken (2013)
2.
Zurück zum Zitat Thomas, L., Edelman, D., Crook, J.: Credit scoring and its applications, pp. 131–139. SIAM, Society for industrial and applied mathematics, Philadelphia (2002) Thomas, L., Edelman, D., Crook, J.: Credit scoring and its applications, pp. 131–139. SIAM, Society for industrial and applied mathematics, Philadelphia (2002)
3.
Zurück zum Zitat Refaat, M.: Credit Risk Scorecards: Development and Implementation Using SAS. Lulu.com, Raleigh (2011) Refaat, M.: Credit Risk Scorecards: Development and Implementation Using SAS. Lulu.com, Raleigh (2011)
4.
Zurück zum Zitat Kerber, R.: ChiMerge: discretization of numeric attributes. In: The Tenth National Conference on Artificial Intelligence, San Jose, California (1992) Kerber, R.: ChiMerge: discretization of numeric attributes. In: The Tenth National Conference on Artificial Intelligence, San Jose, California (1992)
5.
Zurück zum Zitat Fayyad, U.M., Irani, K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In: IJCAI (1993) Fayyad, U.M., Irani, K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In: IJCAI (1993)
7.
Zurück zum Zitat Kurgan, L., Cios, K.: CAIM discretization algorithm. IEEE Trans. Knowl. Data Eng. 16(2), 145–153 (2004)CrossRef Kurgan, L., Cios, K.: CAIM discretization algorithm. IEEE Trans. Knowl. Data Eng. 16(2), 145–153 (2004)CrossRef
8.
Zurück zum Zitat Tsai, C., Lee, C., Yang, W.: A discretization algorithm based on class-attribute. Inf. Sci. 178(3), 714–731 (2008)CrossRef Tsai, C., Lee, C., Yang, W.: A discretization algorithm based on class-attribute. Inf. Sci. 178(3), 714–731 (2008)CrossRef
9.
Zurück zum Zitat Gonzalez-Abril, L., Cuberos, F., Velasco, F., Ortega, J.: Ameva: an autonomous discretization algorithm. Expert Syst. Appl. 36(3), 5327–5332 (2009)CrossRef Gonzalez-Abril, L., Cuberos, F., Velasco, F., Ortega, J.: Ameva: an autonomous discretization algorithm. Expert Syst. Appl. 36(3), 5327–5332 (2009)CrossRef
12.
Zurück zum Zitat Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn, pp. 126–129. Prentice Hall, Upper Saddle River (2010)MATH Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn, pp. 126–129. Prentice Hall, Upper Saddle River (2010)MATH
13.
Zurück zum Zitat Coello, C.A.C.: Constraint-handling Techniques used with evolutionary algorithms. In: The Genetic and Evolutionary Computation Conference Companion, Kyoto, Japan (2018) Coello, C.A.C.: Constraint-handling Techniques used with evolutionary algorithms. In: The Genetic and Evolutionary Computation Conference Companion, Kyoto, Japan (2018)
Metadaten
Titel
Continuous Variable Binning Algorithm to Maximize Information Value Using Genetic Algorithm
verfasst von
Nattawut Vejkanchana
Pramote Kuacharoen
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-32475-9_12

Premium Partner