Skip to main content
Log in

Swarm intelligence based optimal feature selection for enhanced predictive sentiment accuracy on twitter

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

A lot of uncertainty is generally associated with the micro-blog content, primarily due to the presence of noisy, heterogeneous, structured or unstructured data which may be high-dimensional, ambiguous, vague or imprecise. This makes feature engineering for predicting the sentiment arduous and challenging. Population-based meta-heuristics, especially the ones inspired by nature have been proposed in various pertinent studies for feature selection because of their probability to accept a less optimal solution and averting being stuck in local optimal solutions. This research demonstrates the use of two such swarm intelligence algorithms, namely, binary grey wolf and binary moth flame for feature optimization to enhance the sentiment classification performance accuracy. The study is conducted on tweets from two benchmark Twitter corpus (SemEval 2016 and SemEval 2017) and is initially analyzed using the conventional term frequency-inverse document frequency statistical weighting filter for feature extraction and subsequently using the swarm-based algorithms. The features are trained over five baseline classifiers namely, the Naïve Bayesian, support vector machines, k-nearest neighbor, multilayer perceptron and decision tree. The results validate that the population-based meta-heuristic algorithms for feature subset selection outperform the baseline supervised learning algorithms. For the binary grey wolf algorithm, an average improvement of 9.4% in accuracy is observed with an approximate 20.5% average reduction in features. Also, for the binary moth flame algorithm, an average accuracy improvement of 10.6% is observed with an approximate 40% average reduction in features. The highest accuracy of 76.5% is observed for support vector machine with binary grey wolf optimizer on SemEval 2016 benchmark dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Notes

  1. http://alt.qcri.org/semeval2016/task4/

  2. http://alt.qcri.org/semeval2017/task4/

  3. https://www.noslang.com/dictionary

  4. http://en.wikipedia.org/wiki/List of emoticons

  5. Natural Language Toolkit: https://www.nltk.org/

References

  1. Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut K (2017) A brief survey of text mining: Classification, clustering and extraction techniques. Proc KDD Bigdas, 0–13

  2. Alzubi J, Nayyar A, Kumar A (2018) Machine learning from theory to algorithms: an overview. J Phys: Conf Ser 1142(1):012012

    Google Scholar 

  3. Arias M, Arratia A, Xuriguera R (2013) Forecasting with twitter data. ACM Trans Intell Syst Technol 5(1):8.1–8.24

    Article  Google Scholar 

  4. Basari ASH, Hussin B, Ananta IGP, Zeniarja J (2013) Opinion mining of movie review using hybrid method of support vector machine and particle swarm optimization. Proc Eng, Elsevier 53:453–462

    Article  Google Scholar 

  5. Beheshti Z, Shamsuddin SMH (2013) A review of population-based meta-heuristic algorithms. Int J Adv Soft Comput Appl 5(1):1–35

    Google Scholar 

  6. Bhatia MPS, Kumar A (2008) A primer on the web information retrieval paradigm. J Theoret Appl Inform Technol 4 (7)

  7. Brendano (2018) GitHub.com. https://github.com/brendano/ark-tweet-nlp/tree/master/src/cmu/arktweetnlp (accessed 2 January 2018)

  8. Burnap P, Williams ML (2015) Cyber hate speech on twitter: an application of machine classification and statistical modeling for policy and decision making. Policy Internet 7:223–242

    Article  Google Scholar 

  9. Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Elect Eng 40(1):16–28

    Article  Google Scholar 

  10. Dave K, Lawrence S, Pennock DM (2003) Mining the Peanut gallery: opinion extraction and semantic classification of product reviews. Proceedings of the 12th international conference on world wide web, Hungary: 19–528

  11. Dhurve R, Seth M (2015) Weighted Sentiment Analysis Using Artificial Bee Colony Algorithm. International Journal of Science and Research (IJSR), ISSN (Online), 2319–7064

  12. Dorigo M, Di Caro G (1999) Ant colony optimization: a new meta-heuristic. Proc Congress Evol Comput: 1470–1477

  13. Emary E, Zawbaa HM, Hassanien AE (2016) Binary grey wolf optimization approaches for feature selection. Neurocomputing 172:371–381

    Article  Google Scholar 

  14. Faris H, Aljarah I, Al-Betar MA, Mirjalili S (2017) Grey wolf optimizer: a review of recent variants and applications. Neural Comput Applic: 1–23

  15. Finn S, Mustafaraj E (2013) Learning to discover political activism in the twitterverse. KI-KünstlicheIntelligenz 27:17–24

    Google Scholar 

  16. Gupta DK, Reddy KS, Ekbal A (2015) Pso-asent: feature selection using particle swarm optimization for aspect based sentiment analysis. In international conference on applications of natural language to information systems. Springer, Cham, pp 220–233

    Google Scholar 

  17. Hassanien AE, Gaber T, Mokhtar U, Hefny H (2017) An improved moth flame optimization algorithm based on rough sets for tomato diseases detection. J Comput Electron Agric Arch ACM 136(C):86–96

    Article  Google Scholar 

  18. Jianqiang Z, Xiaolin G (2017) Comparison research on text pre-processing methods on twitter sentiment analysis. IEEE Access 5:2870–2879

    Article  Google Scholar 

  19. Jong KAD (1975) Analysis of the behavior of a class of genetic adaptive systems [Ph.D. thesis], University of Michigan, Mich, USA

  20. Kennedy J, Eberhart R (1995) Particle swarm optimization. Proc IEEE Int Conf Neural Netw (ICNN ‘95) 4:1942–1948, Perth, Western Australia

    Article  Google Scholar 

  21. Kumar A, Abraham A (2017) Opinion mining to assist user acceptance testing for open-Beta versions. J Inform Assur Sec 12(4):146–153

    Google Scholar 

  22. Kumar A, Jaiswal A (2017) Empirical study of twitter and Tumblr for sentiment analysis using soft computing techniques. Proc World Congress Eng Comput Sci 1:1–5

    Google Scholar 

  23. Kumar A, Joshi A (2017) Ontology driven sentiment analysis on social web for government intelligence. Proceedings of the Special Collection on eGovernment Innovations in India, ACM: 134–139

  24. Kumar A, Khorwal R (2017) Firefly algorithm for feature selection in sentiment analysis. Computational Intelligence in Data Mining, Springer: 693–703

  25. Kumar A, Sebastian TM (2012) Machine learning assisted sentiment analysis. Proceedings of international conference on Computer Science & Engineering, ICCSE’2012, 123–130

  26. Kumar A, Sebastian TM (2012) Sentiment analysis on twitter. IJCSI Int J Comput Sci 9(4):372–378

    Google Scholar 

  27. Kumar A, Sebastian TM (2012) Sentiment analysis: a perspective on its past, present and future. Int J Intell Syst Appl 4(10):1–14

    Google Scholar 

  28. Kumar A, Sharma A (2017) Systematic literature review on opinion Mining of big Data for government intelligence. Webology 14(2)

  29. Kumar A, Sharma A (2018) Socio-Sentic framework for sustainable agricultural governance, Sustainable Computing: Informatics and Systems, 2018, ISSN 2210–5379, https://doi.org/10.1016/j.suscom.2018.08.006 (http://www.sciencedirect.com/science/article/pii/S2210537918302336)

  30. Kumar A, Dogra P, Dabas V (2015) Emotion analysis of twitter using opinion mining. In contemporary computing, 8th international conference on IC3, IEEE, 285–290

  31. Kumar A, Khorwal R, Chaudhary S (2016) A survey on sentiment analysis using swarm intelligence. Indian J Sci Technol 9(39):1–7

    Google Scholar 

  32. Kumar A, Dabas V, Hooda P (2018) Text classification algorithms for mining unstructured data: a SWOT analysis. Int J Inform Technol Springer: 1–11

  33. Kumar A, Jaiswal A, Garg S, Verma S, Kumar S (2019) Sentiment analysis using cuckoo search for optimized feature selection on Kaggle tweets. Int J Inform Retriev Res (IJIRR) 9(1):1–15

    Google Scholar 

  34. Lazar A, Reynolds RG (2003) Heuristic knowledge discovery for archaeological data using genetic algorithms and rough sets, artificial intelligence laboratory, Department of Computer Science, Wayne State University

  35. Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502

    Article  Google Scholar 

  36. Mirjalili S (2015) Moth-flame optimization algorithm: a novel nature-inspired heuristic paradigm. Knowl-Based Syst 89:228–249

    Article  Google Scholar 

  37. Mirjalili S (2015) How effective is the Grey wolf optimizer in training multi-layer perceptrons. Appl Intell 43(1):150–161

    Article  Google Scholar 

  38. Mirjalili S, Mirjalili SM (2014) A. Lewis, Grey wolf optimizer. Adv Eng Softw 69:46–61

    Article  Google Scholar 

  39. Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends® Inform Retriev 2(1–2):1–135

    Article  Google Scholar 

  40. Rashedi E, Nezamabadi-pour H, Saryazdi S (2010) GSA: a gravitational search algorithm. Inf Sci 213:267–289

    MATH  Google Scholar 

  41. Reddy S, Panwar L, Panigrahi BK, Kumar R (2017) A new binary moth-flame optimization algorithm (BMFOA) -development and application to solve unit commitment problem. In: Ying T (ed) Swarm intelligence: innovation, new algorithms and methods. Publisher: IET, UK

    Google Scholar 

  42. Shahana PH, Omman B (2015) Evaluation of features on sentimental analysis. Proc Comput Sci, Elsevier 46:1585–1592

    Article  Google Scholar 

  43. Sinha NK, Gupta MM, Zadeh LA (2000) Soft computing and intelligent systems, Theory and applications. Academic Press, London

    Google Scholar 

  44. Sivanandam SN, Deepa SN (2007) Principles of soft computing, first edn. Wiley India, New York

  45. Stylios G, Katsis CD, Christodoulakis D (2014) Using bio-inspired intelligence for web opinion mining. Int J Comput Appl 87(5)

  46. Sulis E, Farías DIH, Rosso P, Patti V, Ruffo G (2016) Figurative messages and affect in twitter: differences between# irony, # sarcasm and # not. Knowl-Based Syst 108:132–143

    Article  Google Scholar 

  47. Sumathi T, Karthik S, Marikkannan M (2014) Artificial bee colony optimization for feature selection in opinion mining. J Theoret Appl Inform Technol 66(1)

  48. Tuarob S, Tucker CS, Salathe M, Ram N (2014) An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages. J Biomed Inform 49:255–268

    Article  Google Scholar 

  49. Yao X, Liu Y, Lin G (1999) Evolutionary programming made faster. IEEE Trans Evol Comput 3(2):82–102

    Article  Google Scholar 

  50. Zhang L, Shan L, Wang J (2017) Optimal feature selection using distance-based discrete firefly algorithm with mutual information criterion. Neural Comput & Applic 28(9):2795–2808

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akshi Kumar.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, A., Jaiswal, A. Swarm intelligence based optimal feature selection for enhanced predictive sentiment accuracy on twitter. Multimed Tools Appl 78, 29529–29553 (2019). https://doi.org/10.1007/s11042-019-7278-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-7278-0

Keywords

Navigation