Skip to main content
Top

2023 | OriginalPaper | Chapter

Comparative Analysis Between Macro and Micro-Accuracy in Imbalance Dataset for Movie Review Classification

Authors : Nur Suhailayani Suhaimi, Zalinda Othman, Mohd Ridzwan Yaakub

Published in: Proceedings of Seventh International Congress on Information and Communication Technology

Publisher: Springer Nature Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Classification for multi-class dataset provides exciting and explorative domain to be studied in data science domain. And yet, the challenges of measuring the accuracy of multi-class performance rise an issue worth detailed research to be explored. Due to multi-class accuracy may be lower due to imbalance dataset, this paper aimed to analyze the usage of macro and micro-accuracy in classifying text data with multi-class label. This research focused on text data of movie reviews being classified by three multi-class classifier which are Naïve Bayes (NB), Support Vector Machine (SVM), and Random Forest (RF). We set five performance measure to be analyzed; recall, precision, f-score, sensitivity and specificity with regards of micro and macro-accuracy. We successfully yielded a significant result of comparative analysis where average micro-accuracy (87.3%) produced 14.8% higher than macro-accuracy (72.5%) for imbalance dataset. Result also shown a significant gap between balanced and imbalanced dataset. For further analysis, the flexibility of class label in multi-class may be studied to obtain the changing of learning behavior of the classifier as future work.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Chaitra PC, Kumar RS (2018) Review of multi-class classification algorithms. Int J Pure Appl Math 118(14):17–26 Chaitra PC, Kumar RS (2018) Review of multi-class classification algorithms. Int J Pure Appl Math 118(14):17–26
2.
go back to reference Matthew B, Sohini R (2015) A generalized flow for multi-class and binary classification tasks: an azure ML approach. In: IEEE international conference on big data (Big Data), pp 1728–1737 Matthew B, Sohini R (2015) A generalized flow for multi-class and binary classification tasks: an azure ML approach. In: IEEE international conference on big data (Big Data), pp 1728–1737
3.
go back to reference Ghareb AS, Bakar AA, Hamdan AR (2015) Hybrid feature selection based on enhanced genetic algorithm for text. Elsevier J Exp Syst Appl 18:21–44 Ghareb AS, Bakar AA, Hamdan AR (2015) Hybrid feature selection based on enhanced genetic algorithm for text. Elsevier J Exp Syst Appl 18:21–44
4.
go back to reference Al-Aidaroos KM, Bakar AA, Othman Z (2010) Naïve Bayes variants in classification learning. In: International conference on information retrieval & knowledge management, IEEE Xplore, pp 276–281 Al-Aidaroos KM, Bakar AA, Othman Z (2010) Naïve Bayes variants in classification learning. In: International conference on information retrieval & knowledge management, IEEE Xplore, pp 276–281
5.
go back to reference Ahmad IS, Bakar AA, Yaakub MR (2020) A survey on machine learning techniques in movie revenue prediction. Springer Nat Comput Sci 1(235) Ahmad IS, Bakar AA, Yaakub MR (2020) A survey on machine learning techniques in movie revenue prediction. Springer Nat Comput Sci 1(235)
6.
go back to reference Awwalu J, Bakar AA, Yaakub MR (2019) Hybrid N-gram model using Naïve Bayes for classification of political sentiments on Twitter. Springer Nat Neural Comput Appl 31:9207–9220CrossRef Awwalu J, Bakar AA, Yaakub MR (2019) Hybrid N-gram model using Naïve Bayes for classification of political sentiments on Twitter. Springer Nat Neural Comput Appl 31:9207–9220CrossRef
7.
go back to reference Yaakub MR, Latiffi MIA, Zaabar LS (2019) A review on sentiment analysis techniques and applications. IOP Conf Ser Mater Sci Eng 55 Yaakub MR, Latiffi MIA, Zaabar LS (2019) A review on sentiment analysis techniques and applications. IOP Conf Ser Mater Sci Eng 55
8.
9.
go back to reference Yang Y, Miller C, Jiang P, Moghtaderi A (2020) A case study of multi-class classification with diversified precision recall requirements for query disambiguation. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp 1633–1636 Yang Y, Miller C, Jiang P, Moghtaderi A (2020) A case study of multi-class classification with diversified precision recall requirements for query disambiguation. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp 1633–1636
10.
go back to reference Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manage 45(4):427–437CrossRef Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manage 45(4):427–437CrossRef
11.
go back to reference Kuo KM, Yalley P, Kao Y, Huang CH (2020) A multi-class classification model for supporting the diagnosis of type II diabetes Melitus. PeerJ 8:e9920CrossRef Kuo KM, Yalley P, Kao Y, Huang CH (2020) A multi-class classification model for supporting the diagnosis of type II diabetes Melitus. PeerJ 8:e9920CrossRef
12.
go back to reference Koyejo O, Natarajan N, Ravikumar P, Dhillon IS (2015) Consistent multilabel classification. Comput Sci Math NIPS 7 Koyejo O, Natarajan N, Ravikumar P, Dhillon IS (2015) Consistent multilabel classification. Comput Sci Math NIPS 7
13.
go back to reference Burg GJJV-D, Groenen PJF (2019) GenSVM: a generalized multiclass support vector machine. J Mach Learn Res 17:1–42 Burg GJJV-D, Groenen PJF (2019) GenSVM: a generalized multiclass support vector machine. J Mach Learn Res 17:1–42
14.
go back to reference Rathgamage D, Duleep PW (2018) Multiclass classification using support vector machines. Electron Theses Dissertations 1845 Rathgamage D, Duleep PW (2018) Multiclass classification using support vector machines. Electron Theses Dissertations 1845
15.
go back to reference Rathgamage D, Iacob IE (2019) DCSVM: fast multi-class classification using support vector machines. Springer Int J Mach Learn Cybern Rathgamage D, Iacob IE (2019) DCSVM: fast multi-class classification using support vector machines. Springer Int J Mach Learn Cybern
16.
go back to reference Saigal P, Khanna V (2020) Multi-category news classification using support vector machine based classifiers. Springer Nat Appl Sci 2(3) Saigal P, Khanna V (2020) Multi-category news classification using support vector machine based classifiers. Springer Nat Appl Sci 2(3)
17.
go back to reference Sueno HT, Gerardo BD, Medina RP (2020) Multi-class document classification using support vector machine (SVM) based on improved Naïve Bayes vectorization technique. Int J Adv Trends Comput Sci Eng 9(3):3937–3944CrossRef Sueno HT, Gerardo BD, Medina RP (2020) Multi-class document classification using support vector machine (SVM) based on improved Naïve Bayes vectorization technique. Int J Adv Trends Comput Sci Eng 9(3):3937–3944CrossRef
18.
go back to reference Liu Y, Bi JW, Fan ZP (2017) Multi-class sentiment classification: the experimental comparisons of feature selection and machine learning algorithm. J Exp Syst Appl 80:323–339CrossRef Liu Y, Bi JW, Fan ZP (2017) Multi-class sentiment classification: the experimental comparisons of feature selection and machine learning algorithm. J Exp Syst Appl 80:323–339CrossRef
19.
go back to reference Singh N, Singh P (2019) A novel bagged Naïve Bayes-decision tree approach for multi-class classification problems. J Intell Fuzzy Syst 36:2261–2271CrossRef Singh N, Singh P (2019) A novel bagged Naïve Bayes-decision tree approach for multi-class classification problems. J Intell Fuzzy Syst 36:2261–2271CrossRef
20.
go back to reference Khan AH, Zubair M (2020) Classification of multi-lingual Tweets, into multi-class model using Naïve Bayes and semi-supervised learning. Springer Nat Multimedia Tools Appl 79(3) Khan AH, Zubair M (2020) Classification of multi-lingual Tweets, into multi-class model using Naïve Bayes and semi-supervised learning. Springer Nat Multimedia Tools Appl 79(3)
21.
go back to reference Ramesh N, Devi GL, Rao KS (2020) A frame work for classification of multi class medical data based on deep learning and Naïve Bayes classification model. Int J Inform Eng Electron Bus 1:37–43 Ramesh N, Devi GL, Rao KS (2020) A frame work for classification of multi class medical data based on deep learning and Naïve Bayes classification model. Int J Inform Eng Electron Bus 1:37–43
22.
go back to reference Hadi W, Al-Radaideh QA, Alhawari S (2018) Integrating associative rule-based classification with Naïve Bayes for text classification. Appl Soft Comput J 69:344–356CrossRef Hadi W, Al-Radaideh QA, Alhawari S (2018) Integrating associative rule-based classification with Naïve Bayes for text classification. Appl Soft Comput J 69:344–356CrossRef
23.
go back to reference Chaudary A, Kolhe S, Kamal R (2016) An improved random forest classifier for multi-class classification. Inform Process Agric 3(4):215–222 Chaudary A, Kolhe S, Kamal R (2016) An improved random forest classifier for multi-class classification. Inform Process Agric 3(4):215–222
24.
go back to reference Kang M-J, Lee J-K, Kang J-W (2017) Combining random forest with multi-block local binary pattern feature selection for multi-class head pose estimation. PLOS One 12(7) Kang M-J, Lee J-K, Kang J-W (2017) Combining random forest with multi-block local binary pattern feature selection for multi-class head pose estimation. PLOS One 12(7)
25.
go back to reference Apao NJ, Feliscuzo LS, Romana CLCS, Tagaro JAS (2020) Multiclass classification using random forest algorithm to prognosticate the level of activity of patients with stroke. Int J Sci Technol 9(4):1233–1240 Apao NJ, Feliscuzo LS, Romana CLCS, Tagaro JAS (2020) Multiclass classification using random forest algorithm to prognosticate the level of activity of patients with stroke. Int J Sci Technol 9(4):1233–1240
26.
go back to reference Sun Y, Li Y, Zeng Q, Bian Y (2020) Application research of text classification based on random forest algorithm. In: 3rd international conference on advanced electronic materials, computers and software engineering, pp 370–374 Sun Y, Li Y, Zeng Q, Bian Y (2020) Application research of text classification based on random forest algorithm. In: 3rd international conference on advanced electronic materials, computers and software engineering, pp 370–374
27.
go back to reference Tripathi A, Goswami T, Trivedi SK, Sharma RD (2021) A multi class random forest (MRCF) model for classification of small plant peptides. Int J Inform Manage Data Insight 1(2) Tripathi A, Goswami T, Trivedi SK, Sharma RD (2021) A multi class random forest (MRCF) model for classification of small plant peptides. Int J Inform Manage Data Insight 1(2)
Metadata
Title
Comparative Analysis Between Macro and Micro-Accuracy in Imbalance Dataset for Movie Review Classification
Authors
Nur Suhailayani Suhaimi
Zalinda Othman
Mohd Ridzwan Yaakub
Copyright Year
2023
Publisher
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-19-2394-4_8