Skip to main content
Top

2021 | OriginalPaper | Chapter

Improving Sentiment Classification in Low-Resource Bengali Language Utilizing Cross-Lingual Self-supervised Learning

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

One of the barriers of sentiment analysis research in low-resource languages such as Bengali is the lack of annotated data. Manual annotation requires resources, which are scarcely available in low-resource languages. We present a cross-lingual hybrid methodology that utilizes machine translation and prior sentiment information to generate accurate pseudo-labels. By leveraging the pseudo-labels, a supervised ML classifier is trained for sentiment classification. We contrast the performance of the proposed self-supervised methodology with the Bengali and English sentiment classification methods (i.e., methods which do not require labeled data). We observe that the self-supervised hybrid methodology improves the macro F1 scores by 15%–25%. The results infer that the proposed framework can improve the performance of sentiment classification in low-resource languages that lack labeled data.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Abdi, A., Shamsuddin, S.M., Hasan, S., Piran, J.: Deep learning-based sentiment classification of evaluative text based on multi-feature fusion. Inf. Process. Manag. 56(4), 1245–1259 (2019) Abdi, A., Shamsuddin, S.M., Hasan, S., Piran, J.: Deep learning-based sentiment classification of evaluative text based on multi-feature fusion. Inf. Process. Manag. 56(4), 1245–1259 (2019)
2.
go back to reference Al-Amin, M., Islam, M.S., Uzzal, S.D.: Sentiment analysis of Bengali comments with word2vec and sentiment information of words. In: 2017 International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 186–190, February 2017. https://doi.org/10.1109/ECACE.2017.7912903 Al-Amin, M., Islam, M.S., Uzzal, S.D.: Sentiment analysis of Bengali comments with word2vec and sentiment information of words. In: 2017 International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 186–190, February 2017. https://​doi.​org/​10.​1109/​ECACE.​2017.​7912903
4.
go back to reference Balamurali, A., Joshi, A., Bhattacharyya, P.: Cross-lingual sentiment analysis for Indian languages using linked wordnets. In: COLING (2012) Balamurali, A., Joshi, A., Bhattacharyya, P.: Cross-lingual sentiment analysis for Indian languages using linked wordnets. In: COLING (2012)
5.
go back to reference Banea, C., Mihalcea, R., Wiebe, J., Hassan, S.: Multilingual subjectivity analysis using machine translation. In: 2008 Conference on Empirical Methods in Natural Language Processing, pp. 127–135 (2008) Banea, C., Mihalcea, R., Wiebe, J., Hassan, S.: Multilingual subjectivity analysis using machine translation. In: 2008 Conference on Empirical Methods in Natural Language Processing, pp. 127–135 (2008)
6.
go back to reference Chen, X., Sun, Y., Athiwaratkun, B., Cardie, C., Weinberger, K.: Adversarial deep averaging networks for cross-lingual sentiment classification. Trans. Assoc. Comput. Linguist. 6, 557–570 (2018) Chen, X., Sun, Y., Athiwaratkun, B., Cardie, C., Weinberger, K.: Adversarial deep averaging networks for cross-lingual sentiment classification. Trans. Assoc. Comput. Linguist. 6, 557–570 (2018)
7.
go back to reference Chowdhury, S., Chowdhury, W.: Performing sentiment analysis in Bangla microblog posts. In: 2014 International Conference on Informatics, Electronics Vision (ICIEV), pp. 1–6, May 2014 Chowdhury, S., Chowdhury, W.: Performing sentiment analysis in Bangla microblog posts. In: 2014 International Conference on Informatics, Electronics Vision (ICIEV), pp. 1–6, May 2014
8.
go back to reference Das, A., Bandyopadhyay, S.: Sentiwordnet for Bangla. Knowl. Sharing Event-4: Task 2, 1–8 (2010) Das, A., Bandyopadhyay, S.: Sentiwordnet for Bangla. Knowl. Sharing Event-4: Task 2, 1–8 (2010)
9.
go back to reference Das, A., Bandyopadhyay, S.: Topic-based Bengali opinion summarization. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 232–240. Association for Computational Linguistics (2010) Das, A., Bandyopadhyay, S.: Topic-based Bengali opinion summarization. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 232–240. Association for Computational Linguistics (2010)
10.
go back to reference Feng, Y., Wan, X.: Towards a unified end-to-end approach for fully unsupervised cross-lingual sentiment analysis. In: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pp. 1035–1044. Hong Kong, China, November 2019 Feng, Y., Wan, X.: Towards a unified end-to-end approach for fully unsupervised cross-lingual sentiment analysis. In: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pp. 1035–1044. Hong Kong, China, November 2019
11.
go back to reference Hassan, A., Amin, M.R., Al Azad, A.K., Mohammed, N.: Sentiment analysis on Bangla and romanized Bangla text using deep recurrent models. In: 2016 International Workshop on Computational Intelligence (IWCI), pp. 51–56. IEEE (2016) Hassan, A., Amin, M.R., Al Azad, A.K., Mohammed, N.: Sentiment analysis on Bangla and romanized Bangla text using deep recurrent models. In: 2016 International Workshop on Computational Intelligence (IWCI), pp. 51–56. IEEE (2016)
12.
go back to reference He, Y., Zhou, D.: Self-training from labeled features for sentiment analysis. Inf. Process. Manag. 47(4), 606–616 (2011) He, Y., Zhou, D.: Self-training from labeled features for sentiment analysis. Inf. Process. Manag. 47(4), 606–616 (2011)
13.
go back to reference Hutto, C.J., Gilbert, E.: Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Eighth International AAAI Conference on Weblogs and Social Media (2014) Hutto, C.J., Gilbert, E.: Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Eighth International AAAI Conference on Weblogs and Social Media (2014)
14.
go back to reference Islam, M.S., Islam, M.A., Hossain, M.A., Dey, J.J.: Supervised approach of sentimentality extraction from Bengali Facebook status. In: 2016 19th International Conference on Computer and Information Technology (ICCIT), pp. 383–387, December 2016 Islam, M.S., Islam, M.A., Hossain, M.A., Dey, J.J.: Supervised approach of sentimentality extraction from Bengali Facebook status. In: 2016 19th International Conference on Computer and Information Technology (ICCIT), pp. 383–387, December 2016
15.
go back to reference Lusa, L., et al.: Smote for high-dimensional class-imbalanced data. BMC Bioinform. 14(1), 106 (2013) Lusa, L., et al.: Smote for high-dimensional class-imbalanced data. BMC Bioinform. 14(1), 106 (2013)
16.
go back to reference Meng, X., Wei, F., Liu, X., Zhou, M., Xu, G., Wang, H.: Cross-lingual mixture model for sentiment classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1, pp. 572–581 (2012) Meng, X., Wei, F., Liu, X., Zhou, M., Xu, G., Wang, H.: Cross-lingual mixture model for sentiment classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1, pp. 572–581 (2012)
17.
go back to reference Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002) Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002)
19.
go back to reference Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011) Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
20.
go back to reference Sazzed, S.: Cross-lingual sentiment classification in low-resource Bengali language. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp. 50–60 (2020) Sazzed, S.: Cross-lingual sentiment classification in low-resource Bengali language. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp. 50–60 (2020)
21.
go back to reference Sazzed, S.: Development of sentiment lexicon in Bengali utilizing corpus and cross-lingual resources. In: 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI), pp. 237–244. IEEE (2020) Sazzed, S.: Development of sentiment lexicon in Bengali utilizing corpus and cross-lingual resources. In: 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI), pp. 237–244. IEEE (2020)
22.
go back to reference Sazzed, S., Jayarathna, S.: A sentiment classification in Bengali and machine translated English corpus. In: 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI), pp. 107–114 (2019) Sazzed, S., Jayarathna, S.: A sentiment classification in Bengali and machine translated English corpus. In: 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI), pp. 107–114 (2019)
23.
go back to reference Sazzed, S., Jayarathna, S.: Ssentia: a self-supervised sentiment analyzer for classification from unlabeled data. Mach. Learn. Appl. 4 (2021) Sazzed, S., Jayarathna, S.: Ssentia: a self-supervised sentiment analyzer for classification from unlabeled data. Mach. Learn. Appl. 4 (2021)
24.
go back to reference Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., Kappas, A.: Sentiment strength detection in short informal text. J. Am. Soc. Inf. Sci. Technol. 61(12), 2544–2558 (2010) Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., Kappas, A.: Sentiment strength detection in short informal text. J. Am. Soc. Inf. Sci. Technol. 61(12), 2544–2558 (2010)
25.
go back to reference Tripto, N., Eunus Ali, M.: Detecting multilabel sentiment and emotions from Bangla Youtube comments. In: 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), pp. 1–6 (2018) Tripto, N., Eunus Ali, M.: Detecting multilabel sentiment and emotions from Bangla Youtube comments. In: 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), pp. 1–6 (2018)
26.
go back to reference Turney, P.D.: Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 417–424. Association for Computational Linguistics (2002) Turney, P.D.: Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 417–424. Association for Computational Linguistics (2002)
27.
go back to reference Xu, R., Yang, Y., Otani, N., Wu, Y.: Unsupervised cross-lingual transfer of word embedding spaces. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2465–2474. Brussels, Belgium, October-November 2018 Xu, R., Yang, Y., Otani, N., Wu, Y.: Unsupervised cross-lingual transfer of word embedding spaces. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2465–2474. Brussels, Belgium, October-November 2018
28.
go back to reference Zhang, L., Ghosh, R., Dekhil, M., Hsu, M., Liu, B.: Combining lexicon-based and learning-based methods for Twitter sentiment analysis. HP Laboratories, Technical Report HPL-2011 89 (2011) Zhang, L., Ghosh, R., Dekhil, M., Hsu, M., Liu, B.: Combining lexicon-based and learning-based methods for Twitter sentiment analysis. HP Laboratories, Technical Report HPL-2011 89 (2011)
29.
go back to reference Zhang, W., Zhao, K., Qiu, L., Hu, C.: Sess: a self-supervised and syntax-based method for sentiment classification. In: Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, vol. 2, pp. 596–605 (2009) Zhang, W., Zhao, K., Qiu, L., Hu, C.: Sess: a self-supervised and syntax-based method for sentiment classification. In: Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, vol. 2, pp. 596–605 (2009)
Metadata
Title
Improving Sentiment Classification in Low-Resource Bengali Language Utilizing Cross-Lingual Self-supervised Learning
Author
Salim Sazzed
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-80599-9_20

Premium Partner