Skip to main content

2018 | OriginalPaper | Buchkapitel

Studying the Effects of Text Preprocessing and Ensemble Methods on Sentiment Analysis of Brazilian Portuguese Tweets

verfasst von : Fernando Barbosa Gomes, Juan Manuel Adán-Coello, Fernando Ernesto Kintschner

Erschienen in: Statistical Language and Speech Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The analysis of social media posts can provide useful feedback regarding user experience for people and organizations. This task requires the use of computational tools due to the massive amount of content and the speed at which it is generated. In this article we study the effects of text preprocessing heuristics and ensembles of machine learning algorithms on the accuracy and polarity bias of classifiers when performing sentiment analysis on short text messages. The results of an experimental evaluation performed on a Brazilian Portuguese tweets dataset have shown that these strategies have significant impact on increasing classification accuracy, particularly when the ensembles include a deep neural net, but not always on reducing polarity bias.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
3
Fivefold cross validation is a method in which 80% of the dataset is used to train the algorithm, while the rest 20% are used to test its accuracy. The 80–20% chunks of data are swapped five times until all data have been used for both testing and training.
 
Literatur
1.
Zurück zum Zitat Astya, P.: Sentiment analysis: approaches and open issues. In: 2017 International Conference on Computing, Communication and Automation (ICCCA), pp. 154–158. IEEE (2017) Astya, P.: Sentiment analysis: approaches and open issues. In: 2017 International Conference on Computing, Communication and Automation (ICCCA), pp. 154–158. IEEE (2017)
3.
Zurück zum Zitat Ng, A., Jordan, M.: On discriminative vs generative classifiers: a comparison of logistic regression and Naive Bayes. In: Advances in Neural Information Processing Systems, vol. 14 (2002) Ng, A., Jordan, M.: On discriminative vs generative classifiers: a comparison of logistic regression and Naive Bayes. In: Advances in Neural Information Processing Systems, vol. 14 (2002)
4.
Zurück zum Zitat Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Inf. Sci. 181(6), 1138–1152 (2011)CrossRef Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Inf. Sci. 181(6), 1138–1152 (2011)CrossRef
5.
Zurück zum Zitat Nasrabadi, N.M.: Pattern recognition and machine learning. J. Electron. Imaging 16, 049901 (2007)CrossRef Nasrabadi, N.M.: Pattern recognition and machine learning. J. Electron. Imaging 16, 049901 (2007)CrossRef
6.
Zurück zum Zitat Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Conference on Empirical Methods in Natural Language Processing (2013) Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Conference on Empirical Methods in Natural Language Processing (2013)
7.
Zurück zum Zitat Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)MATH Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)MATH
9.
Zurück zum Zitat Zhou, Z.-H.: Ensemble Methods: Foundations and Algorithms. Chapman and Hall, Boca Raton (2012)CrossRef Zhou, Z.-H.: Ensemble Methods: Foundations and Algorithms. Chapman and Hall, Boca Raton (2012)CrossRef
11.
Zurück zum Zitat Powers, D.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness & correlation. J. Mach. Learn. Technol. (2011) Powers, D.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness & correlation. J. Mach. Learn. Technol. (2011)
12.
Zurück zum Zitat Rosenthal, S., et al.: SemEval-2015 task 10: sentiment analysis in Twitter. In: Proceedings of the 9th International Workshop on Semantic Evaluation, SemEval 2015, Denver, Colorado (2015) Rosenthal, S., et al.: SemEval-2015 task 10: sentiment analysis in Twitter. In: Proceedings of the 9th International Workshop on Semantic Evaluation, SemEval 2015, Denver, Colorado (2015)
13.
Zurück zum Zitat Brum, H.B., das Nunes, M.G.V.: Building a sentiment corpus of Tweets in Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan (2018) Brum, H.B., das Nunes, M.G.V.: Building a sentiment corpus of Tweets in Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan (2018)
14.
Zurück zum Zitat Freitas, C., Motta, E., Milidiú, R., César, J.: Vampiro que brilha… rá! Desafios na anotação de opinião em um corpus de resenhas de livros. Encontro de Linguística de Corpus 11, 22 (2012) Freitas, C., Motta, E., Milidiú, R., César, J.: Vampiro que brilha… rá! Desafios na anotação de opinião em um corpus de resenhas de livros. Encontro de Linguística de Corpus 11, 22 (2012)
15.
Zurück zum Zitat dos Santos, F.L., Ladeira, M.: The role of text pre-processing in opinion mining on a social media language dataset. In: 2014 Brazilian Conference on Intelligent Systems (BRACIS), pp. 50–54. IEEE (2014) dos Santos, F.L., Ladeira, M.: The role of text pre-processing in opinion mining on a social media language dataset. In: 2014 Brazilian Conference on Intelligent Systems (BRACIS), pp. 50–54. IEEE (2014)
16.
Zurück zum Zitat Antonio, J.D., Santin, A.C.L.: “Haters gonna hate”: challenges for sentiment analysis of Facebook comments in Brazilian Portuguese. In: Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms, pp. 64–72 (2017) Antonio, J.D., Santin, A.C.L.: “Haters gonna hate”: challenges for sentiment analysis of Facebook comments in Brazilian Portuguese. In: Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms, pp. 64–72 (2017)
17.
Zurück zum Zitat Balage Filho, P.P., Pardo, T.A.S., Aluísio, S.M.: An evaluation of the Brazilian Portuguese LIWC dictionary for sentiment analysis. In: Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology (2013) Balage Filho, P.P., Pardo, T.A.S., Aluísio, S.M.: An evaluation of the Brazilian Portuguese LIWC dictionary for sentiment analysis. In: Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology (2013)
19.
Zurück zum Zitat de Araujo, G.D., Teixeira, F.O., Mancini, F., de Paiva Guimarães, M., Pisa, I.T.: Sentiment analysis of Twitter’s health messages in Brazilian Portuguese. J. Health Inform. 10 (2018) de Araujo, G.D., Teixeira, F.O., Mancini, F., de Paiva Guimarães, M., Pisa, I.T.: Sentiment analysis of Twitter’s health messages in Brazilian Portuguese. J. Health Inform. 10 (2018)
Metadaten
Titel
Studying the Effects of Text Preprocessing and Ensemble Methods on Sentiment Analysis of Brazilian Portuguese Tweets
verfasst von
Fernando Barbosa Gomes
Juan Manuel Adán-Coello
Fernando Ernesto Kintschner
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-00810-9_15