Skip to main content
Erschienen in:
Buchtitelbild

2020 | OriginalPaper | Buchkapitel

PolSentiLex: Sentiment Detection in Socio-Political Discussions on Russian Social Media

verfasst von : Olessia Koltsova, Svetlana Alexeeva, Sergei Pashakhin, Sergei Koltsov

Erschienen in: Artificial Intelligence and Natural Language

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We present a freely available Russian language sentiment lexicon PolSentiLex designed to detect sentiment in user-generated content related to social and political issues. The lexicon was generated from a database of posts and comments of the top 2,000 LiveJournal bloggers posted during one year (\(\sim \)1.5 million posts and 20 million comments). Following a topic modeling approach, we extracted 85,898 documents that were used to retrieve domain-specific terms. This term list was then merged with several external sources. Together, they formed a lexicon (16,399 units) marked-up using a crowdsourcing strategy. A sample of Russian native speakers (n = 105) was asked to assess words’ sentiment given the context of their use (randomly paired) as well as the prevailing sentiment of the respective texts. In total, we received 59,208 complete annotations for both texts and words. Several versions of the marked-up lexicon were experimented with, and the final version was tested for quality against the only other freely available Russian language lexicon and against three machine learning algorithms. All experiments were run on two different collections. They have shown that, in terms of \(\text {F}_{\text {macro}}\), lexicon-based approaches outperform machine learning by 11%, and our lexicon outperforms the alternative one by 11% on the first collection, and by 7% on the negative scale of the second collection while showing similar quality on the positive scale and being three times smaller. Our lexicon also outperforms or is similar to the best existing sentiment analysis results for other types of Russian-language texts .

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Androutsopoulos, J.: Language change and digital media: a review of conceptions and evidence. In: Standard Languages and Language Standards in a Changing Europe, pp. 145–160. Novus, Oslo (2011) Androutsopoulos, J.: Language change and digital media: a review of conceptions and evidence. In: Standard Languages and Language Standards in a Changing Europe, pp. 145–160. Novus, Oslo (2011)
2.
Zurück zum Zitat Blinov, P.D., Klekovkina, M.V., Kotelnikov, E.V., Pestov, O.A.: Research of lexical approach and machine learning methods for sentiment analysis. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue-2013”, vol. 2, pp. 51–61. RGGU, Moscow (2013). http://www.dialog-21.ru/media/1226/blinovpd.pdf Blinov, P.D., Klekovkina, M.V., Kotelnikov, E.V., Pestov, O.A.: Research of lexical approach and machine learning methods for sentiment analysis. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue-2013”, vol. 2, pp. 51–61. RGGU, Moscow (2013). http://​www.​dialog-21.​ru/​media/​1226/​blinovpd.​pdf
3.
Zurück zum Zitat Bobicev, V., Sokolova, M.: Inter-annotator agreement in sentiment analysis: machine learning perspective. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing, RANLP 2017, pp. 97–102. INCOMA Ltd., Varna, September 2017. https://doi.org/10.26615/978-954-452-049-6_015 Bobicev, V., Sokolova, M.: Inter-annotator agreement in sentiment analysis: machine learning perspective. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing, RANLP 2017, pp. 97–102. INCOMA Ltd., Varna, September 2017. https://​doi.​org/​10.​26615/​978-954-452-049-6_​015
6.
Zurück zum Zitat Chetviorkin, I., Braslavski, P., Loukachevitch, N.: Sentiment analysis track at ROMIP 2011. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, vol. 2, pp. 1–14 (2012) (2012) Chetviorkin, I., Braslavski, P., Loukachevitch, N.: Sentiment analysis track at ROMIP 2011. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, vol. 2, pp. 1–14 (2012) (2012)
8.
Zurück zum Zitat Chetviorkin, I., Loukachevitch, N.: Extraction of Russian sentiment lexicon for product meta-domain. In: Proceedings of COLING 2012: Technical Papers, Mumbai, pp. 593–610, December 2012 Chetviorkin, I., Loukachevitch, N.: Extraction of Russian sentiment lexicon for product meta-domain. In: Proceedings of COLING 2012: Technical Papers, Mumbai, pp. 593–610, December 2012
10.
Zurück zum Zitat Darling, W., Paul, M., Song, F.: Unsupervised part-of-speech tagging in noisy and esoteric domains with a syntactic-semantic Bayesian HMM. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Avignon (2012) Darling, W., Paul, M., Song, F.: Unsupervised part-of-speech tagging in noisy and esoteric domains with a syntactic-semantic Bayesian HMM. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Avignon (2012)
11.
Zurück zum Zitat Eisenstein, J.: What to do about bad language on the internet. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 359–369 (2013) Eisenstein, J.: What to do about bad language on the internet. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 359–369 (2013)
12.
Zurück zum Zitat Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(Suppl. 1), 5228–5235 (2004)CrossRef Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(Suppl. 1), 5228–5235 (2004)CrossRef
13.
Zurück zum Zitat Hsueh, P.Y., Melville, P., Sindhwani, V.: Data quality from crowdsourcing: a study of annotation selection criteria. In: Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing, pp. 27–35. Association for Computational Linguistics (2009) Hsueh, P.Y., Melville, P., Sindhwani, V.: Data quality from crowdsourcing: a study of annotation selection criteria. In: Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing, pp. 27–35. Association for Computational Linguistics (2009)
14.
Zurück zum Zitat Koltsova, O., Alexeeva, S., Koltsov, S.: An opinion word lexicon and a training dataset for Russian sentiment analysis of social media. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2016”, pp. 277–287. RSUH, Moscow (2016) Koltsova, O., Alexeeva, S., Koltsov, S.: An opinion word lexicon and a training dataset for Russian sentiment analysis of social media. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2016”, pp. 277–287. RSUH, Moscow (2016)
16.
Zurück zum Zitat Kotelnikov, E., Bushmeleva, N., Razova, E., Peskisheva, T., Pletneva, M.: Manually created sentiment lexicons: research and development. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue-2016”, vol. 15, pp. 300–314. RGGU, Moscow (2016). http://www.dialog-21.ru/media/3402/kotelnikovevetal.pdf Kotelnikov, E., Bushmeleva, N., Razova, E., Peskisheva, T., Pletneva, M.: Manually created sentiment lexicons: research and development. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue-2016”, vol. 15, pp. 300–314. RGGU, Moscow (2016). http://​www.​dialog-21.​ru/​media/​3402/​kotelnikovevetal​.​pdf
19.
Zurück zum Zitat Liu, B.: Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers (2012) Liu, B.: Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers (2012)
21.
Zurück zum Zitat Loukachevitch, N., Levchik, A.: Creating a general Russian sentiment lexicon. In: Proceedings of Language Resources and Evaluation Conference, LREC-2016, pp. 1171–1176 (2016) Loukachevitch, N., Levchik, A.: Creating a general Russian sentiment lexicon. In: Proceedings of Language Resources and Evaluation Conference, LREC-2016, pp. 1171–1176 (2016)
22.
Zurück zum Zitat Loukachevitch, N., Rubcova, Y.: SentiRuEval-2016: overcoming the time differences and sparsity of data for the reputation analysis problem on Twitter messages [SentiRuEval-2016: preodoleniye vremennykh razlichiy i razrezhennosti dannykh dlya zadachi analiza reputatsii po soobshcheniyam tvittera]. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2016”, pp. 416–426 (2015) Loukachevitch, N., Rubcova, Y.: SentiRuEval-2016: overcoming the time differences and sparsity of data for the reputation analysis problem on Twitter messages [SentiRuEval-2016: preodoleniye vremennykh razlichiy i razrezhennosti dannykh dlya zadachi analiza reputatsii po soobshcheniyam tvittera]. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2016”, pp. 416–426 (2015)
25.
Zurück zum Zitat Nikolenko, S., Koltcov, S., Koltsova, O.: Topic modelling for qualitative studies. J. Inf. Sci. 43(1), 88–102 (2017)CrossRef Nikolenko, S., Koltcov, S., Koltsova, O.: Topic modelling for qualitative studies. J. Inf. Sci. 43(1), 88–102 (2017)CrossRef
27.
Zurück zum Zitat Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002) Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002)
28.
Zurück zum Zitat Pedregosa, F., et al.: Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011) Pedregosa, F., et al.: Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
29.
Zurück zum Zitat Rogers, A., Romanov, A., Rumshisky, A., Volkova, S., Gronas, M., Gribov, A.: RuSentiment: an enriched sentiment analysis dataset for social media in Russian. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 755–763. Association for Computational Linguistics, Santa Fe, August 2018. https://www.aclweb.org/anthology/C18-1064 Rogers, A., Romanov, A., Rumshisky, A., Volkova, S., Gronas, M., Gribov, A.: RuSentiment: an enriched sentiment analysis dataset for social media in Russian. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 755–763. Association for Computational Linguistics, Santa Fe, August 2018. https://​www.​aclweb.​org/​anthology/​C18-1064
Metadaten
Titel
PolSentiLex: Sentiment Detection in Socio-Political Discussions on Russian Social Media
verfasst von
Olessia Koltsova
Svetlana Alexeeva
Sergei Pashakhin
Sergei Koltsov
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-59082-6_1

Premium Partner