Skip to main content
Top
Published in: Arabian Journal for Science and Engineering 11/2019

31-07-2019 | Research Article - Computer Engineering and Computer Science

Hybrid Filter–Wrapper Feature Selection Method for Sentiment Classification

Authors: Gunjan Ansari, Tanvir Ahmad, Mohammad Najmud Doja

Published in: Arabian Journal for Science and Engineering | Issue 11/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The feature selection (FS) has been the latest challenge in the area of sentiment classification. The filter- and wrapper-based feature selection methods are applied in the domain to reduce feature set size and increase accuracy of the classifiers. In this paper, a hybrid of filter and wrapper method for selecting relevant features is proposed. The feature subset is first selected from the original feature set using computationally fast rank-based FS methods. The selected features are further refined using two wrapper approaches. In the first approach, recursive feature elimination is applied to select optimal feature set, and in the second approach, evolutionary method based on binary particle swarm optimization is applied for finalization of feature subset. The comparison between the two proposed techniques is conducted on five different domain datasets used in the area of sentiment analysis. We used simple and efficient ML algorithms (Naïve Bayes, support vector machine and logistic regression) to evaluate the performance of the hybrid FS techniques. Finally, we assessed the performance of the proposed hybrid FS technique by comparing our results with the state-of-the-art methods. The results reveal that the proposed method is able to give better accuracy with fewer number of features.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Medhat, W.; Hassan, A.; Korashy, H.: Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5(4), 1093–1113 (2014)CrossRef Medhat, W.; Hassan, A.; Korashy, H.: Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5(4), 1093–1113 (2014)CrossRef
2.
go back to reference Pang, B.; Lee, L.: Opinion mining and sentiment analysis. Found. Trends® Inf. Retr. 2(1–2), 1–135 (2008) Pang, B.; Lee, L.: Opinion mining and sentiment analysis. Found. Trends® Inf. Retr. 2(1–2), 1–135 (2008)
3.
go back to reference Pang, B.; Lee, L.; Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86. ACL (2002) Pang, B.; Lee, L.; Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86. ACL (2002)
4.
go back to reference Pang, B.; Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 271. ACL (2004) Pang, B.; Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 271. ACL (2004)
5.
go back to reference Yang, Y.; Pederson, J.: A comparative study on feature selection in text categorization. In: International Conference on Machine Learning (ICML), vol. 97, pp. 412–420 (1997) Yang, Y.; Pederson, J.: A comparative study on feature selection in text categorization. In: International Conference on Machine Learning (ICML), vol. 97, pp. 412–420 (1997)
6.
go back to reference Tang, J.; Alelyani, S.; Liu, H.: Feature selection for classification: a review. In: Aggarwal, C.C. (ed.) Data Classification: Algorithms and Applications, pp. 37–64. CRC Press (2014) Tang, J.; Alelyani, S.; Liu, H.: Feature selection for classification: a review. In: Aggarwal, C.C. (ed.) Data Classification: Algorithms and Applications, pp. 37–64. CRC Press (2014)
7.
go back to reference Kohavi, R.; John, G.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)CrossRefMATH Kohavi, R.; John, G.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)CrossRefMATH
8.
go back to reference Abbasi, A.; Chen, H.: Sentiment analysis in multiple languages: feature selection for opinion classification in Web forums. ACM Trans. Inf. Syst. 26(3), 12:11–12.34 (2008)CrossRef Abbasi, A.; Chen, H.: Sentiment analysis in multiple languages: feature selection for opinion classification in Web forums. ACM Trans. Inf. Syst. 26(3), 12:11–12.34 (2008)CrossRef
9.
go back to reference Onan, A.; Koruko, S.; Glu, S.: A feature selection model based on genetic rank aggregation for text sentiment classification. J. Inf. Sci. 43(1), 25–38 (2017)CrossRef Onan, A.; Koruko, S.; Glu, S.: A feature selection model based on genetic rank aggregation for text sentiment classification. J. Inf. Sci. 43(1), 25–38 (2017)CrossRef
10.
go back to reference Cervante, L.; Xue, B.; Zhang, M.; Shang, L.: Binary particle swarm optimization for feature selection: a filter based approach. In: IEEE Congress on Evolutionary Computation (CEC), pp. 1–8 (2012) Cervante, L.; Xue, B.; Zhang, M.; Shang, L.: Binary particle swarm optimization for feature selection: a filter based approach. In: IEEE Congress on Evolutionary Computation (CEC), pp. 1–8 (2012)
11.
go back to reference Xue, B.; Zhang, M.; Browne, W.N.: Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans. Cybern. 43(6), 1656–1671 (2013)CrossRef Xue, B.; Zhang, M.; Browne, W.N.: Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans. Cybern. 43(6), 1656–1671 (2013)CrossRef
12.
go back to reference Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 417–424. ACL (2002) Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 417–424. ACL (2002)
13.
go back to reference Sharma, A.; Dey, S.: A comparative study of feature selection and machine learning techniques for sentiment analysis. In: Proceedings of the 2012 ACM Research in Applied Computation Symposium, pp. 1–7. ACM (2012) Sharma, A.; Dey, S.: A comparative study of feature selection and machine learning techniques for sentiment analysis. In: Proceedings of the 2012 ACM Research in Applied Computation Symposium, pp. 1–7. ACM (2012)
14.
go back to reference Tan, S.; Zhang, J.: An empirical study of sentiment analysis for Chinese documents. Expert Syst. Appl. 34(4), 2622–2629 (2008)CrossRef Tan, S.; Zhang, J.: An empirical study of sentiment analysis for Chinese documents. Expert Syst. Appl. 34(4), 2622–2629 (2008)CrossRef
15.
go back to reference Agarwal, B.; Mittal, N.: Prominent feature extraction for review analysis: an empirical study. J. Exp. Theor. Artif. Intell. 28(3), 485–498 (2016)CrossRef Agarwal, B.; Mittal, N.: Prominent feature extraction for review analysis: an empirical study. J. Exp. Theor. Artif. Intell. 28(3), 485–498 (2016)CrossRef
16.
go back to reference Xia, R.; Zong, C.; Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Inf. Sci. 181(6), 1138–1152 (2011)CrossRef Xia, R.; Zong, C.; Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Inf. Sci. 181(6), 1138–1152 (2011)CrossRef
17.
go back to reference Xie, J.; Wang, C.: Using support vector machines with a novel hybrid feature selection method for diagnosis of erythemato-squamous diseases. Expert Syst. Appl. 38(5), 5809–5815 (2011)CrossRef Xie, J.; Wang, C.: Using support vector machines with a novel hybrid feature selection method for diagnosis of erythemato-squamous diseases. Expert Syst. Appl. 38(5), 5809–5815 (2011)CrossRef
18.
go back to reference Peng, Y.; Wu, Z.; Jiang, J.: A novel feature selection approach for biomedical data classification. J. Biomed. Inform. 43(1), 15–23 (2010)CrossRef Peng, Y.; Wu, Z.; Jiang, J.: A novel feature selection approach for biomedical data classification. J. Biomed. Inform. 43(1), 15–23 (2010)CrossRef
19.
go back to reference Agarwal, B.; Mittal, N.: Sentiment Classification using Rough Set based Hybrid Feature Selection. In: Proceedings of 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 115–119. ACL (2013) Agarwal, B.; Mittal, N.: Sentiment Classification using Rough Set based Hybrid Feature Selection. In: Proceedings of 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 115–119. ACL (2013)
20.
go back to reference Yousefpour, A.; Ibrahim, R.; Hamed, H.N.A.: Ordinal-based and frequency-based integration of feature selection methods for sentiment analysis. Expert Syst. Appl. 75, 80–93 (2017)CrossRef Yousefpour, A.; Ibrahim, R.; Hamed, H.N.A.: Ordinal-based and frequency-based integration of feature selection methods for sentiment analysis. Expert Syst. Appl. 75, 80–93 (2017)CrossRef
21.
go back to reference Zhang, L.; Wang, J.; Zha, Y.; Yang Z.: A novel hybrid feature selection method algorithm: using ReliefF estimation for GA-Wrapper Search. In: Proceedings of the Second International Conference on Machine Learning and Cybernetics, pp. 380–384. IEEE (2003) Zhang, L.; Wang, J.; Zha, Y.; Yang Z.: A novel hybrid feature selection method algorithm: using ReliefF estimation for GA-Wrapper Search. In: Proceedings of the Second International Conference on Machine Learning and Cybernetics, pp. 380–384. IEEE (2003)
22.
go back to reference Hsu, H.H.; Hsieh, C.W.; Lu, M.D.: Hybrid feature selection by combining filters and wrappers. Expert Syst. Appl. 38(7), 8144–8150 (2011)CrossRef Hsu, H.H.; Hsieh, C.W.; Lu, M.D.: Hybrid feature selection by combining filters and wrappers. Expert Syst. Appl. 38(7), 8144–8150 (2011)CrossRef
23.
go back to reference Apolloni, J.; Leguizamón, G.; Alba, E.: Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Appl. Soft Comput. J. 38, 922–932 (2016)CrossRef Apolloni, J.; Leguizamón, G.; Alba, E.: Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Appl. Soft Comput. J. 38, 922–932 (2016)CrossRef
24.
go back to reference Zhang, Y.; Zhang, Y.; Lv, Y.; Hou, X.; Liu, F.; Jia, W.; Yang, M.; Phillips, P.; Wang, S.: Alcoholism detection by medical robots based on Hu moment invariants and predator–prey adaptive-inertia chaotic particle swarm optimization. Comput. Electr. Eng. J. 63, 126–138 (2017)CrossRef Zhang, Y.; Zhang, Y.; Lv, Y.; Hou, X.; Liu, F.; Jia, W.; Yang, M.; Phillips, P.; Wang, S.: Alcoholism detection by medical robots based on Hu moment invariants and predator–prey adaptive-inertia chaotic particle swarm optimization. Comput. Electr. Eng. J. 63, 126–138 (2017)CrossRef
25.
go back to reference Zhang, Y.; Wang, S.; Sui, Y.; Yang, M.; Liu, B.; Cheng, H.; Sun, J.; Jia, W.; Phillips, P.; Gorriz, J.: Multivariate approach for Alzheimer’s disease detection using stationary wavelet entropy and predator-prey particle swarm optimization. J. Alzheimers Dis. 65(3), 855–869 (2018)CrossRef Zhang, Y.; Wang, S.; Sui, Y.; Yang, M.; Liu, B.; Cheng, H.; Sun, J.; Jia, W.; Phillips, P.; Gorriz, J.: Multivariate approach for Alzheimer’s disease detection using stationary wavelet entropy and predator-prey particle swarm optimization. J. Alzheimers Dis. 65(3), 855–869 (2018)CrossRef
26.
go back to reference Basari, A.S.H.; Hussin, B.; Ananta, I.G.P.; Zeniarja, J.: Opinion mining of movie review using hybrid method of support vector machine and particle swarm optimization. Procedia Eng. 53, 453–462 (2013)CrossRef Basari, A.S.H.; Hussin, B.; Ananta, I.G.P.; Zeniarja, J.: Opinion mining of movie review using hybrid method of support vector machine and particle swarm optimization. Procedia Eng. 53, 453–462 (2013)CrossRef
27.
go back to reference Shang, L.; Zhou, Z.; Liu, X.: Particle swarm optimization-based feature selection in sentiment classification. Soft Comput. 20(10), 3821–3834 (2016)CrossRef Shang, L.; Zhou, Z.; Liu, X.: Particle swarm optimization-based feature selection in sentiment classification. Soft Comput. 20(10), 3821–3834 (2016)CrossRef
28.
go back to reference Chen, Y.T.; Chen, M.C.: Using Chi square statistics to measure similarities for text categorization. Expert Syst. Appl. 38(4), 3085–3090 (2011)CrossRef Chen, Y.T.; Chen, M.C.: Using Chi square statistics to measure similarities for text categorization. Expert Syst. Appl. 38(4), 3085–3090 (2011)CrossRef
29.
go back to reference Parlar, T.; Özel, S.A.; Song, F.: QER: a new feature selection method for sentiment analysis. Hum. Centric Comput. Inf. Sci. 8(1), 10 (2018)CrossRef Parlar, T.; Özel, S.A.; Song, F.: QER: a new feature selection method for sentiment analysis. Hum. Centric Comput. Inf. Sci. 8(1), 10 (2018)CrossRef
30.
go back to reference Meesad, P.; Boonrawd, P.; Nuipian, V.: A Chi square-test for word importance differentiation in text classification. In: International Conference on Information and Electronics Engineering, vol. 6, pp. 110–114. IACSIT (2011) Meesad, P.; Boonrawd, P.; Nuipian, V.: A Chi square-test for word importance differentiation in text classification. In: International Conference on Information and Electronics Engineering, vol. 6, pp. 110–114. IACSIT (2011)
31.
go back to reference Kennedy, J.; Eberhart, R.: A discrete binary version of particle swarm optimization. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics and Computational Cybernetics and Simulation, vol. 5, pp. 4104–4108. IEEE (1997) Kennedy, J.; Eberhart, R.: A discrete binary version of particle swarm optimization. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics and Computational Cybernetics and Simulation, vol. 5, pp. 4104–4108. IEEE (1997)
32.
go back to reference Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)CrossRefMATH Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)CrossRefMATH
33.
go back to reference Blitzer, J.; Dredze, M.; Pereira, F.: Biographies, Bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: Proceedings of the 45rd Annual Meeting on Association for Computational Linguistics, vol. 7, pp. 440–447. ACL (2007) Blitzer, J.; Dredze, M.; Pereira, F.: Biographies, Bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: Proceedings of the 45rd Annual Meeting on Association for Computational Linguistics, vol. 7, pp. 440–447. ACL (2007)
34.
go back to reference Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A.; Benítez, J.M.; Herrera, F.: A review of microarray datasets and applied feature selection methods. Inf. Sci. 282, 111–135 (2014)CrossRef Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A.; Benítez, J.M.; Herrera, F.: A review of microarray datasets and applied feature selection methods. Inf. Sci. 282, 111–135 (2014)CrossRef
Metadata
Title
Hybrid Filter–Wrapper Feature Selection Method for Sentiment Classification
Authors
Gunjan Ansari
Tanvir Ahmad
Mohammad Najmud Doja
Publication date
31-07-2019
Publisher
Springer Berlin Heidelberg
Published in
Arabian Journal for Science and Engineering / Issue 11/2019
Print ISSN: 2193-567X
Electronic ISSN: 2191-4281
DOI
https://doi.org/10.1007/s13369-019-04064-6

Other articles of this Issue 11/2019

Arabian Journal for Science and Engineering 11/2019 Go to the issue

Research Article - Computer Engineering and Computer Science

Storage Node Allocation Methods for Erasure Code-based Cloud Storage Systems

Research Article - Computer Engineering and Computer Science

Prediction Using Cuckoo Search Optimized Echo State Network

Research Article - Computer Engineering and Computer Science

Embedded Fuzzy Logic Control System for Refrigerated Display Cabinets

Premium Partners