Skip to main content

2018 | OriginalPaper | Buchkapitel

A Novel Synthetic Over-Sampling Technique for Imbalanced Classification of Gene Expressions Using Autoencoders and Swarm Optimization

verfasst von : Maisa Daoud, Michael Mayo

Erschienen in: AI 2018: Advances in Artificial Intelligence

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A new synthetic minority class over-sampling approach for binary (normal/cancer) classification of microarray gene expression data is proposed. The idea is to exploit a previously trained autoencoder in combination with the Particle Swarm Optimisation algorithm to generate new synthetic examples of the minority class for solving the class imbalance problem. Experiments using two different autoencoder representation sizes (500 and 30) and two base classifiers (Support Vector Machine and naïve Bayes) show that the proposed method is able to generate discriminating representations that outperformed state-of-the-art methods such as Synthetic Minority Class Over-sampling Technique and Density-Based Synthetic Minority Class Over-sampling Technique in many test cases.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Sukarna, B., Md Monirul, I., Xin, Y., Kazyuki, M.: MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2), 405–425 (2014)CrossRef Sukarna, B., Md Monirul, I., Xin, Y., Kazyuki, M.: MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2), 405–425 (2014)CrossRef
3.
Zurück zum Zitat Blagus, R., Lusa, L.: Evaluation of smote for high-dimensional class-imbalanced microarray data. In: 11th International Conference on Machine Learning and Applications (icmla), vol. 2, pp. 89–94. IEEE (2012) Blagus, R., Lusa, L.: Evaluation of smote for high-dimensional class-imbalanced microarray data. In: 11th International Conference on Machine Learning and Applications (icmla), vol. 2, pp. 89–94. IEEE (2012)
5.
Zurück zum Zitat Chawla, N., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRef Chawla, N., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRef
7.
Zurück zum Zitat Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetMATH Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetMATH
8.
Zurück zum Zitat Dong, Y., Du, B., Zhang, L., Zhang, L.: Dimensionality reduction and classification of hyperspectral images using ensemble discriminative local metric learning. IEEE Trans. Geosci. Remote Sens. 55(5), 2509–2524 (2017)CrossRef Dong, Y., Du, B., Zhang, L., Zhang, L.: Dimensionality reduction and classification of hyperspectral images using ensemble discriminative local metric learning. IEEE Trans. Geosci. Remote Sens. 55(5), 2509–2524 (2017)CrossRef
9.
Zurück zum Zitat Fawcett, T., Provost, F.: Adaptive fraud detection. Data Min. Knowl. Discov. 1(3), 291–316 (1997)CrossRef Fawcett, T., Provost, F.: Adaptive fraud detection. Data Min. Knowl. Discov. 1(3), 291–316 (1997)CrossRef
10.
Zurück zum Zitat Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, H.I.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–8 (2009)CrossRef Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, H.I.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–8 (2009)CrossRef
12.
13.
Zurück zum Zitat Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the International Conference on Artificial Intelligence (2005) Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the International Conference on Artificial Intelligence (2005)
14.
Zurück zum Zitat Kubat, M., Matwin, S., et al.: Addressing the curse of imbalanced training sets: one-sided selection. In: ICML 1997, Nashville, USA (1997) Kubat, M., Matwin, S., et al.: Addressing the curse of imbalanced training sets: one-sided selection. In: ICML 1997, Nashville, USA (1997)
15.
Zurück zum Zitat Lin, W.J., Chen, J.J.: Class-imbalanced classifiers for high-dimensional data. Brief. Bioinform. 14(1), 13–26 (2012)CrossRef Lin, W.J., Chen, J.J.: Class-imbalanced classifiers for high-dimensional data. Brief. Bioinform. 14(1), 13–26 (2012)CrossRef
16.
Zurück zum Zitat Poli, R., Kennedy, J., Blackwell, T.: Particle swarm optimization. Swarm Intell. 1(1), 33–57 (2007)CrossRef Poli, R., Kennedy, J., Blackwell, T.: Particle swarm optimization. Swarm Intell. 1(1), 33–57 (2007)CrossRef
17.
Zurück zum Zitat Siriseriwan, W.: Smotefamily: A Collection of Oversampling Techniques for Class Imbalance Problem Based on SMOTE (2016) Siriseriwan, W.: Smotefamily: A Collection of Oversampling Techniques for Class Imbalance Problem Based on SMOTE (2016)
18.
Zurück zum Zitat Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetMATH Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetMATH
19.
Zurück zum Zitat Weiss, G.M.: Mining with rarity: a unifying framework. ACM Sigkdd Explor. Newsl. 6(1), 7–19 (2014)CrossRef Weiss, G.M.: Mining with rarity: a unifying framework. ACM Sigkdd Explor. Newsl. 6(1), 7–19 (2014)CrossRef
20.
Zurück zum Zitat Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced data. ACM Sigkdd Explor. Newsl. 6(1), 80–89 (2004)CrossRef Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced data. ACM Sigkdd Explor. Newsl. 6(1), 80–89 (2004)CrossRef
Metadaten
Titel
A Novel Synthetic Over-Sampling Technique for Imbalanced Classification of Gene Expressions Using Autoencoders and Swarm Optimization
verfasst von
Maisa Daoud
Michael Mayo
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-03991-2_55

Premium Partner