Skip to main content

2018 | OriginalPaper | Buchkapitel

Over-Sampling Algorithm Based on VAE in Imbalanced Classification

verfasst von : Chunkai Zhang, Ying Zhou, Yingyang Chen, Yepeng Deng, Xuan Wang, Lifeng Dong, Haoyu Wei

Erschienen in: Cloud Computing – CLOUD 2018

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The imbalanced classification problem is a problem that violates the assumption of uniform distribution of samples, classes differ in sample size, sample distribution and misclassification cost. The traditional classifiers tend to ignore the important minority samples because of their rarity. Oversampling, the algorithm uses various methods to increase the minority samples in the training set to increase the recognition rate of them. However, these over-sampling methods are too coarse to improve the classification effect of the minority samples, because they can’t make full use of the information in the original samples, but increase the training time because of adding extra samples. In this paper, we propose to use the distribution information of the minority samples, use the variational auto-encoder to fit the probability distribution function of them without any prior assumption, and reasonably expand the minority class sample set. The experimental results prove the effectiveness of the proposed algorithm.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Wang, Y., Li, X., Tao, B.: Improving classification of mature microRNA by solving class imbalance problem. Scientific reports (2016) Wang, Y., Li, X., Tao, B.: Improving classification of mature microRNA by solving class imbalance problem. Scientific reports (2016)
2.
Zurück zum Zitat Stegmayer, G., Yones, C., Kamenetzky, L., Milone, D.H.: High class-imbalance in pre-miRNA prediction: a novel approach based on deepSOM. IEEE/ACM Trans. Comput. Biol. Bioinf. 14, 1316–1326 (2016)CrossRef Stegmayer, G., Yones, C., Kamenetzky, L., Milone, D.H.: High class-imbalance in pre-miRNA prediction: a novel approach based on deepSOM. IEEE/ACM Trans. Comput. Biol. Bioinf. 14, 1316–1326 (2016)CrossRef
3.
Zurück zum Zitat Leichtle, T., Geiß, C., Lakes, T., Taubenböck, H.: Class imbalance in unsupervised change detection – a diagnostic analysis from urban remote sensing. Int. J. Appl. Earth Obs. Geoinf. 60, 83–98 (2017)CrossRef Leichtle, T., Geiß, C., Lakes, T., Taubenböck, H.: Class imbalance in unsupervised change detection – a diagnostic analysis from urban remote sensing. Int. J. Appl. Earth Obs. Geoinf. 60, 83–98 (2017)CrossRef
4.
Zurück zum Zitat Li, C., Liu, S.: A comparative study of the class imbalance problem in Twitter spam detection. Concurr. Comput. Pract. Exp. 30(4), e4281 (2018)CrossRef Li, C., Liu, S.: A comparative study of the class imbalance problem in Twitter spam detection. Concurr. Comput. Pract. Exp. 30(4), e4281 (2018)CrossRef
5.
Zurück zum Zitat Singh, S., Liu, Y., Ding, W., Li, Z.: Empirical Evaluation of Big Data Analytics using Design of Experiment: Case Studies on Telecommunication Data (2016) Singh, S., Liu, Y., Ding, W., Li, Z.: Empirical Evaluation of Big Data Analytics using Design of Experiment: Case Studies on Telecommunication Data (2016)
6.
Zurück zum Zitat Hale, M.L., Walter, C., Lin, J., Gamble, R.F.: A Priori Prediction of Phishing Victimization Based on Structural Content Factors (2017)CrossRef Hale, M.L., Walter, C., Lin, J., Gamble, R.F.: A Priori Prediction of Phishing Victimization Based on Structural Content Factors (2017)CrossRef
7.
Zurück zum Zitat Zhang, C., Wang, G., Zhou, Y., Jiang, J.: A new approach for imbalanced data classification based on minimize loss learning. In: IEEE Second International Conference on Data Science in Cyberspace, pp. 82–87 (2017) Zhang, C., Wang, G., Zhou, Y., Jiang, J.: A new approach for imbalanced data classification based on minimize loss learning. In: IEEE Second International Conference on Data Science in Cyberspace, pp. 82–87 (2017)
8.
Zurück zum Zitat Provost, F.: Machine learning from imbalanced data sets 101 (extended abstract). In: 2011 International Conference of Soft Computing and Pattern Recognition (SoCPaR), pp. 435–439 (2008) Provost, F.: Machine learning from imbalanced data sets 101 (extended abstract). In: 2011 International Conference of Soft Computing and Pattern Recognition (SoCPaR), pp. 435–439 (2008)
9.
Zurück zum Zitat Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: International Conference on Machine Learning, p. 104 (2004) Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: International Conference on Machine Learning, p. 104 (2004)
10.
Zurück zum Zitat Donoho, D.L., Tanner, J.: Precise undersampling theorems. Proc. IEEE 98(6), 913–924 (2010)CrossRef Donoho, D.L., Tanner, J.: Precise undersampling theorems. Proc. IEEE 98(6), 913–924 (2010)CrossRef
11.
Zurück zum Zitat Olken, F., Rotem, D.: Random sampling from databases: a survey. Stat. Comput. 5(1), 25–42 (1995)CrossRef Olken, F., Rotem, D.: Random sampling from databases: a survey. Stat. Comput. 5(1), 25–42 (1995)CrossRef
12.
Zurück zum Zitat Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)MATH Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)MATH
13.
Zurück zum Zitat Zhang, C., Guo, J., Lu, J.: Research on classification method of high-dimensional class-imbalanced data sets based on SVM. In: IEEE Second International Conference on Data Science in Cyberspace, pp. 60–67 (2017) Zhang, C., Guo, J., Lu, J.: Research on classification method of high-dimensional class-imbalanced data sets based on SVM. In: IEEE Second International Conference on Data Science in Cyberspace, pp. 60–67 (2017)
14.
15.
Zurück zum Zitat Gao, M., Hong, X., Chen, S., Harris, C.J.: Probability density function estimation based over-sampling for imbalanced two-class problems. In: International Joint Conference on Neural Networks, pp. 1–8 (2012) Gao, M., Hong, X., Chen, S., Harris, C.J.: Probability density function estimation based over-sampling for imbalanced two-class problems. In: International Joint Conference on Neural Networks, pp. 1–8 (2012)
16.
Zurück zum Zitat Chen, S.: A generalized Gaussian distribution based uncertainty sampling approach and its application in actual evapotranspiration assimilation. J. Hydrol. 552, 745–764 (2017)CrossRef Chen, S.: A generalized Gaussian distribution based uncertainty sampling approach and its application in actual evapotranspiration assimilation. J. Hydrol. 552, 745–764 (2017)CrossRef
18.
Zurück zum Zitat Li, D.C., Hu, S.C., Lin, L.S., Yeh, C.W.: Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets. Plos One 12(8), (2017)CrossRef Li, D.C., Hu, S.C., Lin, L.S., Yeh, C.W.: Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets. Plos One 12(8), (2017)CrossRef
19.
Zurück zum Zitat Diederik, P.K., Max, W.: Auto-Encoding Variational Bayes Diederik, P.K., Max, W.: Auto-Encoding Variational Bayes
21.
Zurück zum Zitat Menardi, G., Torelli, N.: Training and assessing classification rules with imbalanced data. Data Min. Know. Discov. 28(1), 92–122 (2014)MathSciNetCrossRef Menardi, G., Torelli, N.: Training and assessing classification rules with imbalanced data. Data Min. Know. Discov. 28(1), 92–122 (2014)MathSciNetCrossRef
Metadaten
Titel
Over-Sampling Algorithm Based on VAE in Imbalanced Classification
verfasst von
Chunkai Zhang
Ying Zhou
Yingyang Chen
Yepeng Deng
Xuan Wang
Lifeng Dong
Haoyu Wei
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-94295-7_23

Premium Partner