Skip to main content
Erschienen in: Social Network Analysis and Mining 1/2022

01.12.2022 | Original Article

Abusive Bangla comments detection on Facebook using transformer-based deep learning models

verfasst von: Tanjim Taharat Aurpa, Rifat Sadik, Md Shoaib Ahmed

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the era of social networking platforms, user-generated content is flooding every second on online social media platforms like Facebook. So observing and identifying many contents, including threats and sexual harassment, are more accessible than traditional media. Online content with extreme toxicity can lead to online harassment, profanity, personal attacks, and bullying acts. As Bangla is the seventh most spoken language worldwide, the utilization of Bangla language in Facebook has raised current times. The use of abusive comments on Facebook with Bangla also has increased alarmingly, but the research regarding this is very low. In this research work, we concentrate on identifying abusive comments of Bangla language in social media (Facebook) that can filter out at the primitive stage of social media’s affixing. To classify abusive comments swiftly and precisely, we apply transformer-based deep neural network models. We employ pre-training language architectures, BERT (Bidirectional Encoder Representations from Transformers) and ELECTRA (Efficiency Learning an Encoder that Classifies Token Replacements Accurately). We have conducted this work with a novel dataset comprises 44,001 comments from multitudinous Facebook posts. In this classification process, we have exhibited an average accuracy, precision, recall, and f1-score to evaluate our proposed models. The outcomes have brought a percipience of our applied BERT and ELECTRA architecture that performs notably with 85.00% and 84.92% test accuracy, respectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ahmed MF, Mahmud Z, Biash ZT, et al (2021a) Bangla text dataset and exploratory analysis for online harassment detection. arXiv preprint arXiv:2102.02478 Ahmed MF, Mahmud Z, Biash ZT, et al (2021a) Bangla text dataset and exploratory analysis for online harassment detection. arXiv preprint arXiv:​2102.​02478
Zurück zum Zitat Ahmed MS, Aurpa TT, Anwar MM (2020a) Online topical clusters detection for top-k trending topics in twitter. In: 2020 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), IEEE, pp 573–577 Ahmed MS, Aurpa TT, Anwar MM (2020a) Online topical clusters detection for top-k trending topics in twitter. In: 2020 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), IEEE, pp 573–577
Zurück zum Zitat Ahmed MS, Aurpa TT, Anwar MM (2020b) Query oriented topical clusters detection for top-k trending topics in twitter. In: 2020 IEEE 8th R10 humanitarian technology conference (R10-HTC), IEEE, pp 1–6 Ahmed MS, Aurpa TT, Anwar MM (2020b) Query oriented topical clusters detection for top-k trending topics in twitter. In: 2020 IEEE 8th R10 humanitarian technology conference (R10-HTC), IEEE, pp 1–6
Zurück zum Zitat Ahmed MS, Aurpa TT, Anwar MM (2021) Detecting sentiment dynamics and clusters of twitter users for trending topics in covid-19 pandemic. Plos one 16(8):e0253300CrossRef Ahmed MS, Aurpa TT, Anwar MM (2021) Detecting sentiment dynamics and clusters of twitter users for trending topics in covid-19 pandemic. Plos one 16(8):e0253300CrossRef
Zurück zum Zitat Ahmed MS, Aurpa TT, Azad MAK (2021c) Fish disease detection using image based machine learning technique in aquaculture. J King Saud Univ-Comput Inf Sci Ahmed MS, Aurpa TT, Azad MAK (2021c) Fish disease detection using image based machine learning technique in aquaculture. J King Saud Univ-Comput Inf Sci
Zurück zum Zitat Al-Twairesh N (2021) The evolution of language models applied to emotion analysis of Arabic tweets. Information 12(2):84CrossRef Al-Twairesh N (2021) The evolution of language models applied to emotion analysis of Arabic tweets. Information 12(2):84CrossRef
Zurück zum Zitat Alam MT, Islam MM (2018) Bard: Bangla article classification using a new comprehensive dataset. In: 2018 international conference on bangla speech and language processing (ICBSLP), IEEE, pp 1–5 Alam MT, Islam MM (2018) Bard: Bangla article classification using a new comprehensive dataset. In: 2018 international conference on bangla speech and language processing (ICBSLP), IEEE, pp 1–5
Zurück zum Zitat Alzamzami F, Hoda M, El Saddik A (2020) Light gradient boosting machine for general sentiment classification on short texts: a comparative evaluation. IEEE Access 8:101840–101858CrossRef Alzamzami F, Hoda M, El Saddik A (2020) Light gradient boosting machine for general sentiment classification on short texts: a comparative evaluation. IEEE Access 8:101840–101858CrossRef
Zurück zum Zitat Ashrafi I, Mohammad M, Mauree AS et al (2020) Banner: a cost-sensitive contextualized model for Bangla named entity recognition. IEEE Access 8:58206–58226CrossRef Ashrafi I, Mohammad M, Mauree AS et al (2020) Banner: a cost-sensitive contextualized model for Bangla named entity recognition. IEEE Access 8:58206–58226CrossRef
Zurück zum Zitat Awal MA, Rahman MS, Rabbi J (2018) Detecting abusive comments in discussion threads using Naïve Bayes. In 2018 international conference on innovations in science, engineering and technology (ICISET), IEEE, pp 163–167 Awal MA, Rahman MS, Rabbi J (2018) Detecting abusive comments in discussion threads using Naïve Bayes. In 2018 international conference on innovations in science, engineering and technology (ICISET), IEEE, pp 163–167
Zurück zum Zitat Bauer T, Devrim E, Glazunov M, et al (2019) # metoomaastricht: building a chatbot to assist survivors of sexual harassment. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 503–521 Bauer T, Devrim E, Glazunov M, et al (2019) # metoomaastricht: building a chatbot to assist survivors of sexual harassment. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 503–521
Zurück zum Zitat Carneiro T, Da Nóbrega RVM, Nepomuceno T et al (2018) Performance analysis of google colaboratory as a tool for accelerating deep learning applications. IEEE Access 6:61677–61685CrossRef Carneiro T, Da Nóbrega RVM, Nepomuceno T et al (2018) Performance analysis of google colaboratory as a tool for accelerating deep learning applications. IEEE Access 6:61677–61685CrossRef
Zurück zum Zitat Chia YK, Witteveen S, Andrews M (2019) Transformer to cnn: Label-scarce distillation for efficient text classification. arXiv preprint arXiv:1909.03508 Chia YK, Witteveen S, Andrews M (2019) Transformer to cnn: Label-scarce distillation for efficient text classification. arXiv preprint arXiv:​1909.​03508
Zurück zum Zitat Clark K, Luong MT, Le QV, et al (2020) Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 Clark K, Luong MT, Le QV, et al (2020) Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:​2003.​10555
Zurück zum Zitat Das KA, Baruah A, Barbhuiya FA, et al (2020) Ensemble of electra for profiling fake news spreaders. In: CLEF Das KA, Baruah A, Barbhuiya FA, et al (2020) Ensemble of electra for profiling fake news spreaders. In: CLEF
Zurück zum Zitat Devlin J, Chang MW, Lee K, et al (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 Devlin J, Chang MW, Lee K, et al (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805
Zurück zum Zitat Emon EA, Rahman S, Banarjee J, et al (2019) A deep learning approach to detect abusive bengali text. In: 2019 7th international conference on smart computing and communications (ICSCC), IEEE, pp 1–5 Emon EA, Rahman S, Banarjee J, et al (2019) A deep learning approach to detect abusive bengali text. In: 2019 7th international conference on smart computing and communications (ICSCC), IEEE, pp 1–5
Zurück zum Zitat Farha IA, Magdy W (2021) Benchmarking transformer-based language models for arabic sentiment and sarcasm detection. In: Proceedings of the sixth Arabic natural language processing workshop, pp 21–31 Farha IA, Magdy W (2021) Benchmarking transformer-based language models for arabic sentiment and sarcasm detection. In: Proceedings of the sixth Arabic natural language processing workshop, pp 21–31
Zurück zum Zitat Iwendi C, Srivastava G, Khan S, et al (2020) Cyberbullying detection solutions based on deep learning architectures. Multimed Syst:1–14 Iwendi C, Srivastava G, Khan S, et al (2020) Cyberbullying detection solutions based on deep learning architectures. Multimed Syst:1–14
Zurück zum Zitat Janardhana D, Shetty AB, Hegde MN, et al (2021) Abusive comments classification in social media using neural networks. In: International conference on innovative computing and communications, Springer, pp 439–444 Janardhana D, Shetty AB, Hegde MN, et al (2021) Abusive comments classification in social media using neural networks. In: International conference on innovative computing and communications, Springer, pp 439–444
Zurück zum Zitat Kurnia R, Tangkuman Y, Girsang A (2020) Classification of user comment using word2vec and SVM classifier. Int J Adv Trends Comput Sci Eng 9:643–648CrossRef Kurnia R, Tangkuman Y, Girsang A (2020) Classification of user comment using word2vec and SVM classifier. Int J Adv Trends Comput Sci Eng 9:643–648CrossRef
Zurück zum Zitat Li X, Bing L, Zhang W, et al (2019) Exploiting bert for end-to-end aspect-based sentiment analysis. arXiv preprint arXiv:1910.00883 Li X, Bing L, Zhang W, et al (2019) Exploiting bert for end-to-end aspect-based sentiment analysis. arXiv preprint arXiv:​1910.​00883
Zurück zum Zitat Nobata C, Tetreault J, Thomas A, et al (2016) Abusive language detection in online user content. In: Proceedings of the 25th international conference on world wide web, pp 145–153 Nobata C, Tetreault J, Thomas A, et al (2016) Abusive language detection in online user content. In: Proceedings of the 25th international conference on world wide web, pp 145–153
Zurück zum Zitat Ostendorff M, Bourgonje P, Berger M, et al (2019) Enriching bert with knowledge graph embeddings for document classification. arXiv preprint arXiv:1909.08402 Ostendorff M, Bourgonje P, Berger M, et al (2019) Enriching bert with knowledge graph embeddings for document classification. arXiv preprint arXiv:​1909.​08402
Zurück zum Zitat Ozyurt IB (2020) On the effectiveness of small, discriminatively pre-trained language representation models for biomedical text mining. In: Proceedings of the first workshop on scholarly document processing, pp 104–112 Ozyurt IB (2020) On the effectiveness of small, discriminatively pre-trained language representation models for biomedical text mining. In: Proceedings of the first workshop on scholarly document processing, pp 104–112
Zurück zum Zitat Park JH, Fung P (2017) One-step and two-step classification for abusive language detection on twitter. arXiv preprint arXiv:1706.01206 Park JH, Fung P (2017) One-step and two-step classification for abusive language detection on twitter. arXiv preprint arXiv:​1706.​01206
Zurück zum Zitat Pericherla S, Ilavarasan E (20218) Performance analysis of word embeddings for cyberbullying detection. In: IOP conference series: materials science and engineering, IEEEOP Publishing, p 012008 Pericherla S, Ilavarasan E (20218) Performance analysis of word embeddings for cyberbullying detection. In: IOP conference series: materials science and engineering, IEEEOP Publishing, p 012008
Zurück zum Zitat Rosenberg A, Hirschberg J (2007) V-measure: A conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 410–420 Rosenberg A, Hirschberg J (2007) V-measure: A conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 410–420
Zurück zum Zitat Salminen J, Hopf M, Chowdhury SA et al (2020) Developing an online hate classifier for multiple social media platforms. Human-Cent Comput Inf Sci 10(1):1–34CrossRef Salminen J, Hopf M, Chowdhury SA et al (2020) Developing an online hate classifier for multiple social media platforms. Human-Cent Comput Inf Sci 10(1):1–34CrossRef
Zurück zum Zitat Salur MU, Aydin I (2020) A novel hybrid deep learning model for sentiment classification. IEEE Access 8:58080–58093CrossRef Salur MU, Aydin I (2020) A novel hybrid deep learning model for sentiment classification. IEEE Access 8:58080–58093CrossRef
Zurück zum Zitat Samad MD, Khounviengxay ND, Witherow MA (2020) Effect of text processing steps on twitter sentiment classification using word embedding. arXiv preprint arXiv:2007.13027 Samad MD, Khounviengxay ND, Witherow MA (2020) Effect of text processing steps on twitter sentiment classification using word embedding. arXiv preprint arXiv:​2007.​13027
Zurück zum Zitat Shukla S, Mittal G, Arya KV, et al (2021) Detecting hostile posts using relational graph convolutional network. arXiv preprint arXiv:2101.03485 Shukla S, Mittal G, Arya KV, et al (2021) Detecting hostile posts using relational graph convolutional network. arXiv preprint arXiv:​2101.​03485
Zurück zum Zitat Su J, Yu S, Luo D (2020) Enhancing aspect-based sentiment analysis with capsule network. IEEE Access 8:100551–100561CrossRef Su J, Yu S, Luo D (2020) Enhancing aspect-based sentiment analysis with capsule network. IEEE Access 8:100551–100561CrossRef
Zurück zum Zitat Tripto NI, Ali ME (2018) Detecting multilabel sentiment and emotions from bangla youtube comments. In: 2018 international conference on Bangla speech and language processing (ICBSLP), IEEE, pp 1–6 Tripto NI, Ali ME (2018) Detecting multilabel sentiment and emotions from bangla youtube comments. In: 2018 international conference on Bangla speech and language processing (ICBSLP), IEEE, pp 1–6
Zurück zum Zitat Xu H, Liu B, Shu L, et al (2020) Dombert: Domain-oriented language model for aspect-based sentiment analysis. arXiv preprint arXiv:2004.138167 Xu H, Liu B, Shu L, et al (2020) Dombert: Domain-oriented language model for aspect-based sentiment analysis. arXiv preprint arXiv:​2004.​138167
Zurück zum Zitat Xue K, Zhou Y, Ma Z, et al (2019) Fine-tuning bert for joint entity and relation extraction in chinese medical text. In: 2019 IEEE International conference on bioinformatics and biomedicine (BIBM), IEEE, pp 892–897 Xue K, Zhou Y, Ma Z, et al (2019) Fine-tuning bert for joint entity and relation extraction in chinese medical text. In: 2019 IEEE International conference on bioinformatics and biomedicine (BIBM), IEEE, pp 892–897
Zurück zum Zitat Yadav J, Kumar D, Chauhan D (2020) Cyberbullying detection using pre-trained bert model. In: 2020 International conference on electronics and sustainable communication systems (ICESC), IEEE, pp 1096–1100 Yadav J, Kumar D, Chauhan D (2020) Cyberbullying detection using pre-trained bert model. In: 2020 International conference on electronics and sustainable communication systems (ICESC), IEEE, pp 1096–1100
Zurück zum Zitat Yu J, Jiang J (2019) Adapting bert for target-oriented multimodal sentiment classification. IJCAI Yu J, Jiang J (2019) Adapting bert for target-oriented multimodal sentiment classification. IJCAI
Zurück zum Zitat Yu S, Su J, Luo D (2019) Improving bert-based text classification with auxiliary sentence and domain knowledge. IEEE Access 7:176600–176612CrossRef Yu S, Su J, Luo D (2019) Improving bert-based text classification with auxiliary sentence and domain knowledge. IEEE Access 7:176600–176612CrossRef
Zurück zum Zitat Yuan C (2019) Bb-kbqa: Bert-based knowledge base question answering. In: Chinese computational linguistics: 18th China national conference, CCL 2019, Kunming, China, October 18–20, 2019, Proceedings, Springer Nature, p 81 Yuan C (2019) Bb-kbqa: Bert-based knowledge base question answering. In: Chinese computational linguistics: 18th China national conference, CCL 2019, Kunming, China, October 18–20, 2019, Proceedings, Springer Nature, p 81
Zurück zum Zitat Zhang H, Sun S, Hu Y et al (2020) Sentiment classification for Chinese text based on interactive multitask learning. IEEE Access 8:129626–129635CrossRef Zhang H, Sun S, Hu Y et al (2020) Sentiment classification for Chinese text based on interactive multitask learning. IEEE Access 8:129626–129635CrossRef
Zurück zum Zitat Zhu Y, Kiros R, Zemel R, et al (2015) Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE international conference on computer vision, pp 19–27 Zhu Y, Kiros R, Zemel R, et al (2015) Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE international conference on computer vision, pp 19–27
Metadaten
Titel
Abusive Bangla comments detection on Facebook using transformer-based deep learning models
verfasst von
Tanjim Taharat Aurpa
Rifat Sadik
Md Shoaib Ahmed
Publikationsdatum
01.12.2022
Verlag
Springer Vienna
Erschienen in
Social Network Analysis and Mining / Ausgabe 1/2022
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-021-00852-x

Weitere Artikel der Ausgabe 1/2022

Social Network Analysis and Mining 1/2022 Zur Ausgabe

Premium Partner