Skip to main content

2018 | OriginalPaper | Buchkapitel

An Improved Measurement of the Imbalanced Dataset

verfasst von : Chunkai Zhang, Ying Zhou, Yingyang Chen, Changqing Qi, Xuan Wang, Lifeng Dong

Erschienen in: Cloud Computing – CLOUD 2018

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Imbalanced classification is a classification problem that violates the assumption of uniform distribution of samples. In such problems, traditional imbalanced datasets are measured in terms of the imbalance of sample size, without considering the distribution information, which has a more important impact on the classification performance, so the traditional measurements have a weak relation with the classification performance. This paper proposed an improved measurement for imbalanced datasets, it is based on the idea that a sample surrounded by more same class samples is easier to classify, for each sample of different classes, the proposed method calculates the average number of the k nearest neighbors in the same class in different subsets under the weighted k-NN, after that, the product of these average values is regarded as the measurement of this dataset, and it is a good indicator of the relationship between the distribution of samples and the classification results. The experimental results show that the proposed measurement has a higher correlation with the classification results and shows the difficulty of classification of data sets more clearly.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Wang, Y., Li, X., Tao, B.: Improving classification of mature microRNA by solving class imbalance problem. Sci. Rep. 6, 25941 (2016)CrossRef Wang, Y., Li, X., Tao, B.: Improving classification of mature microRNA by solving class imbalance problem. Sci. Rep. 6, 25941 (2016)CrossRef
2.
Zurück zum Zitat Stegmayer, G., Yones, C., Kamenetzky, L., Milone, D.H.: High class-imbalance in pre-miRNA prediction: a novel approach based on deepSOM. IEEE/ACM Trans. Comput. Biol. Bioinf. 14(6), 1316–1326 (2017)CrossRef Stegmayer, G., Yones, C., Kamenetzky, L., Milone, D.H.: High class-imbalance in pre-miRNA prediction: a novel approach based on deepSOM. IEEE/ACM Trans. Comput. Biol. Bioinf. 14(6), 1316–1326 (2017)CrossRef
3.
Zurück zum Zitat Leichtle, T., Geiß, C., Lakes, T., Taubenböck, H.: Class imbalance in unsupervised change detection – a diagnostic analysis from urban remote sensing. Int. J. Appl. Earth Obs. Geoinf. 60, 83–98 (2017)CrossRef Leichtle, T., Geiß, C., Lakes, T., Taubenböck, H.: Class imbalance in unsupervised change detection – a diagnostic analysis from urban remote sensing. Int. J. Appl. Earth Obs. Geoinf. 60, 83–98 (2017)CrossRef
4.
Zurück zum Zitat Li, C., Liu, S.: A comparative study of the class imbalance problem in Twitter spam detection. Concurr. Comput. Pract. Exp. 30(4) (2018)CrossRef Li, C., Liu, S.: A comparative study of the class imbalance problem in Twitter spam detection. Concurr. Comput. Pract. Exp. 30(4) (2018)CrossRef
5.
Zurück zum Zitat Singh, S., Liu, Y., Ding, W., Li, Z.: Empirical evaluation of big data analytics using design of experiment: case studies on telecommunication data (2016) Singh, S., Liu, Y., Ding, W., Li, Z.: Empirical evaluation of big data analytics using design of experiment: case studies on telecommunication data (2016)
6.
Zurück zum Zitat Hale, M.L., Walter, C., Lin, J., Gamble, R.F.: A priori prediction of phishing victimization based on structural content factors (2017)CrossRef Hale, M.L., Walter, C., Lin, J., Gamble, R.F.: A priori prediction of phishing victimization based on structural content factors (2017)CrossRef
7.
Zurück zum Zitat Anwar, N., Jones, G., Ganesh, S.: Measurement of data complexity for classification problems with unbalanced data. Stat. Anal. Data Min. 7(3), 194–211 (2014)MathSciNetCrossRef Anwar, N., Jones, G., Ganesh, S.: Measurement of data complexity for classification problems with unbalanced data. Stat. Anal. Data Min. 7(3), 194–211 (2014)MathSciNetCrossRef
8.
Zurück zum Zitat Tang, B., He, H.: GIR-based ensemble sampling approaches for imbalanced learning. Pattern Recogn. 71, 306–319 (2017)CrossRef Tang, B., He, H.: GIR-based ensemble sampling approaches for imbalanced learning. Pattern Recogn. 71, 306–319 (2017)CrossRef
9.
Zurück zum Zitat Ho, T.: A data complexity analysis of comparative advantages of decision forest constructors. Pattern Anal. Appl. 5(2), 102–112 (2002)MathSciNetCrossRef Ho, T.: A data complexity analysis of comparative advantages of decision forest constructors. Pattern Anal. Appl. 5(2), 102–112 (2002)MathSciNetCrossRef
10.
Zurück zum Zitat Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)MATH Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)MATH
11.
12.
Zurück zum Zitat Zhang, M.: Foundations of Modern Analysis. Academic Press, London (1960) Zhang, M.: Foundations of Modern Analysis. Academic Press, London (1960)
13.
Zurück zum Zitat Weiss, G.M.: Learning with rare cases and small disjuncts. In: Twelfth International Conference on Machine Learning, pp. 558–565 (1995)CrossRef Weiss, G.M.: Learning with rare cases and small disjuncts. In: Twelfth International Conference on Machine Learning, pp. 558–565 (1995)CrossRef
14.
Zurück zum Zitat Zhang, H., Wang, Z.: A normal distribution-based over-sampling approach to imbalanced data classification. In: International Conference on Advanced Data Mining and Applications, pp. 83–96 (2011)CrossRef Zhang, H., Wang, Z.: A normal distribution-based over-sampling approach to imbalanced data classification. In: International Conference on Advanced Data Mining and Applications, pp. 83–96 (2011)CrossRef
15.
Zurück zum Zitat Li, D.C., Hu, S.C., Lin, L.S., Yeh, C.W.: Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets. PLoS ONE 12(8), e0181853 (2017)CrossRef Li, D.C., Hu, S.C., Lin, L.S., Yeh, C.W.: Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets. PLoS ONE 12(8), e0181853 (2017)CrossRef
16.
Zurück zum Zitat Moreo, A., Esuli, A., Sebastiani, F.: Distributional random oversampling for imbalanced text classification, pp. 805–808 (2016) Moreo, A., Esuli, A., Sebastiani, F.: Distributional random oversampling for imbalanced text classification, pp. 805–808 (2016)
18.
Zurück zum Zitat Menardi, G., Torelli, N.: Training and assessing classification rules with imbalanced data. Data Min. Knowl. Disc. 28(1), 92–122 (2014)MathSciNetCrossRef Menardi, G., Torelli, N.: Training and assessing classification rules with imbalanced data. Data Min. Knowl. Disc. 28(1), 92–122 (2014)MathSciNetCrossRef
Metadaten
Titel
An Improved Measurement of the Imbalanced Dataset
verfasst von
Chunkai Zhang
Ying Zhou
Yingyang Chen
Changqing Qi
Xuan Wang
Lifeng Dong
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-94295-7_25

Premium Partner