Skip to main content
Top

2020 | OriginalPaper | Chapter

Semi-Supervised Sentiment Analysis of Portuguese Tweets with Random Walk in Feature Sample Networks

Authors : Pedro Gengo, Filipe A. N. Verri

Published in: Intelligent Systems

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Nowadays, a huge amount of data is generated daily around the world and many machine learning tasks require labeled data, which sometimes is not available. Manual labeling such amount of data may consume a lot of time and resources. One way to overcome this limitation is to learn from both labeled and unlabeled data, which is known as semi-supervised learning. In this paper, we use a positive-unlabeled (PU) learning technique called Random Walk in Feature-Sample Networks (RWFSN) to perform semi-supervised sentiment analysis, which is an important machine learning that can be achieved by classifying the polarity of texts, in Brazilian Portuguese tweets. Although RWFSN reaches excellent performance in many PU learning problems, it has two major limitations when applied in our problem: it assumes that samples are long texts (many features) and that the class prior probabilities are known. We leverage the technique by augmenting the data representation in the feature space and by adding a validation set to better estimate the class priors. As a result, we identified unlabeled samples of the positive class with precision around at 70% in higher labeled ratio, but with high standard deviation, showing the impact of data variance in results. Moreover, given the properties of the RWFSN method, we provide interpretability of the results by pointing out the most relevant features of the task.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
3.
go back to reference Corrêa Jr, E.A., Marinho, V.Q., Santos, L.B.D., Bertaglia, T.F.C., Treviso, M.V., Brum, H.B.: Pelesent: Cross-domain polarity classification using distant supervision (2017) Corrêa Jr, E.A., Marinho, V.Q., Santos, L.B.D., Bertaglia, T.F.C., Treviso, M.V., Brum, H.B.: Pelesent: Cross-domain polarity classification using distant supervision (2017)
4.
go back to reference Dos Santos, C., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 69–78 (2014) Dos Santos, C., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 69–78 (2014)
5.
go back to reference Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Proj. Rep. Stanford 1(12), 2009 (2009) Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Proj. Rep. Stanford 1(12), 2009 (2009)
7.
go back to reference Liu, B.: Sentiment Analysis and Opinion Mining, pp. 1–135. Cambridge University Press, New York (2015)CrossRef Liu, B.: Sentiment Analysis and Opinion Mining, pp. 1–135. Cambridge University Press, New York (2015)CrossRef
8.
go back to reference Muniz, M.C.M.: A construção de recursos lingüístico-computacionais para o português do brasil: o projeto de unitex-pb. São Carlos (2004) Muniz, M.C.M.: A construção de recursos lingüístico-computacionais para o português do brasil: o projeto de unitex-pb. São Carlos (2004)
9.
go back to reference Mũnoz-Marí, J., Bovolo, F., Gómez-Chova, L., Bruzzone, L., Camp-Valls, G.: Semisupervised one-class support vector machines for classification of remote sensing data. IEEE Trans. Geosci. Remote Sens. 48(8), 3188–3197 (2010)CrossRef Mũnoz-Marí, J., Bovolo, F., Gómez-Chova, L., Bruzzone, L., Camp-Valls, G.: Semisupervised one-class support vector machines for classification of remote sensing data. IEEE Trans. Geosci. Remote Sens. 48(8), 3188–3197 (2010)CrossRef
10.
go back to reference Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. LREC 10, 1320–1326 (2010) Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. LREC 10, 1320–1326 (2010)
11.
go back to reference Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)CrossRef Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)CrossRef
Metadata
Title
Semi-Supervised Sentiment Analysis of Portuguese Tweets with Random Walk in Feature Sample Networks
Authors
Pedro Gengo
Filipe A. N. Verri
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-61377-8_42

Premium Partner