Skip to main content
Top

2018 | OriginalPaper | Chapter

A Novel Synthetic Over-Sampling Technique for Imbalanced Classification of Gene Expressions Using Autoencoders and Swarm Optimization

Authors : Maisa Daoud, Michael Mayo

Published in: AI 2018: Advances in Artificial Intelligence

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

A new synthetic minority class over-sampling approach for binary (normal/cancer) classification of microarray gene expression data is proposed. The idea is to exploit a previously trained autoencoder in combination with the Particle Swarm Optimisation algorithm to generate new synthetic examples of the minority class for solving the class imbalance problem. Experiments using two different autoencoder representation sizes (500 and 30) and two base classifiers (Support Vector Machine and naïve Bayes) show that the proposed method is able to generate discriminating representations that outperformed state-of-the-art methods such as Synthetic Minority Class Over-sampling Technique and Density-Based Synthetic Minority Class Over-sampling Technique in many test cases.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Sukarna, B., Md Monirul, I., Xin, Y., Kazyuki, M.: MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2), 405–425 (2014)CrossRef Sukarna, B., Md Monirul, I., Xin, Y., Kazyuki, M.: MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2), 405–425 (2014)CrossRef
3.
go back to reference Blagus, R., Lusa, L.: Evaluation of smote for high-dimensional class-imbalanced microarray data. In: 11th International Conference on Machine Learning and Applications (icmla), vol. 2, pp. 89–94. IEEE (2012) Blagus, R., Lusa, L.: Evaluation of smote for high-dimensional class-imbalanced microarray data. In: 11th International Conference on Machine Learning and Applications (icmla), vol. 2, pp. 89–94. IEEE (2012)
5.
go back to reference Chawla, N., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRef Chawla, N., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRef
7.
go back to reference Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetMATH Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetMATH
8.
go back to reference Dong, Y., Du, B., Zhang, L., Zhang, L.: Dimensionality reduction and classification of hyperspectral images using ensemble discriminative local metric learning. IEEE Trans. Geosci. Remote Sens. 55(5), 2509–2524 (2017)CrossRef Dong, Y., Du, B., Zhang, L., Zhang, L.: Dimensionality reduction and classification of hyperspectral images using ensemble discriminative local metric learning. IEEE Trans. Geosci. Remote Sens. 55(5), 2509–2524 (2017)CrossRef
9.
go back to reference Fawcett, T., Provost, F.: Adaptive fraud detection. Data Min. Knowl. Discov. 1(3), 291–316 (1997)CrossRef Fawcett, T., Provost, F.: Adaptive fraud detection. Data Min. Knowl. Discov. 1(3), 291–316 (1997)CrossRef
10.
go back to reference Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, H.I.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–8 (2009)CrossRef Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, H.I.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–8 (2009)CrossRef
13.
go back to reference Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the International Conference on Artificial Intelligence (2005) Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the International Conference on Artificial Intelligence (2005)
14.
go back to reference Kubat, M., Matwin, S., et al.: Addressing the curse of imbalanced training sets: one-sided selection. In: ICML 1997, Nashville, USA (1997) Kubat, M., Matwin, S., et al.: Addressing the curse of imbalanced training sets: one-sided selection. In: ICML 1997, Nashville, USA (1997)
15.
go back to reference Lin, W.J., Chen, J.J.: Class-imbalanced classifiers for high-dimensional data. Brief. Bioinform. 14(1), 13–26 (2012)CrossRef Lin, W.J., Chen, J.J.: Class-imbalanced classifiers for high-dimensional data. Brief. Bioinform. 14(1), 13–26 (2012)CrossRef
16.
go back to reference Poli, R., Kennedy, J., Blackwell, T.: Particle swarm optimization. Swarm Intell. 1(1), 33–57 (2007)CrossRef Poli, R., Kennedy, J., Blackwell, T.: Particle swarm optimization. Swarm Intell. 1(1), 33–57 (2007)CrossRef
17.
go back to reference Siriseriwan, W.: Smotefamily: A Collection of Oversampling Techniques for Class Imbalance Problem Based on SMOTE (2016) Siriseriwan, W.: Smotefamily: A Collection of Oversampling Techniques for Class Imbalance Problem Based on SMOTE (2016)
18.
go back to reference Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetMATH Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetMATH
19.
go back to reference Weiss, G.M.: Mining with rarity: a unifying framework. ACM Sigkdd Explor. Newsl. 6(1), 7–19 (2014)CrossRef Weiss, G.M.: Mining with rarity: a unifying framework. ACM Sigkdd Explor. Newsl. 6(1), 7–19 (2014)CrossRef
20.
go back to reference Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced data. ACM Sigkdd Explor. Newsl. 6(1), 80–89 (2004)CrossRef Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced data. ACM Sigkdd Explor. Newsl. 6(1), 80–89 (2004)CrossRef
Metadata
Title
A Novel Synthetic Over-Sampling Technique for Imbalanced Classification of Gene Expressions Using Autoencoders and Swarm Optimization
Authors
Maisa Daoud
Michael Mayo
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-030-03991-2_55

Premium Partner