skip to main content
research-article

HateCircle and Unsupervised Hate Speech Detection Incorporating Emotion and Contextual Semantics

Authors Info & Claims
Published:24 March 2023Publication History
Skip Abstract Section

Abstract

The explosive growth of social media has fueled an extensive increase in online freedom of speech. The worldwide platform of human voice creates possibilities to assail other users without facing any consequences, and flout social etiquettes, resulting in an inevitable increase of hate speech. Nowadays, English hate speech detection is a popular research area, but the prevalence of implicit hate content in regional languages desire effective language-independent models. The proposed research is the first unsupervised Hindi and Bengali hate content detection framework consisting of three significant concepts: HateCircle, hate tweet classification, and code-switch data preparation algorithms. The novel HateCircle method is proposed to detect hate orientation for each term by co-occurrence patterns of words, contextual semantics, and emotion analysis. The efficient multiclass hate tweet classification algorithm is proposed with parts of speech tagging, Euclidean distance, and the Geometric median methods. The detection of hate content is more efficient in the native script compared to the Roman script, so the transliteration algorithm is also proposed for code-switch data preparation. The experimentation evaluates the combination of various lexicons with our enriched hate lexicon that achieves a maximum of 0.74 F1-score for the Hindi and 0.88 F1-score for the Bengali datasets. The novel HateCircle and hate tweet detection framework evaluates with our proposed parts of speech tagging and Geometric median detection methods. Results reveal that HateCircle and hate tweet detection framework also achieves a maximum of 0.73 accuracy for the Hindi and 0.78 accuracy for the Bengali dataset. The experiment results signify that contextual semantic hate speech detection research with a language-independency feature offsets the growth of implicit abusive text in social media.

REFERENCES

  1. [1] Twitter Revenue and Usage Statistics. 2022. BusinessofApps. Retrieved January 11, 2022 from https://www.businessofapps.com/data/twitter-statistics/.Google ScholarGoogle Scholar
  2. [2] Statista Research Department. 2022. Number of Data Removal Requests Issued to Twitter from July to December 2020, by Country and Institution. Statista. Retrieved July 2022 from https://www.statista.com/statistics/234858/number-of-requests-for-data-removal-from-twitter/.Google ScholarGoogle Scholar
  3. [3] Kapil Prashant and Ekbal Asif. 2020. A deep neural network based multi-task learning approach to hate speech detection. Knowledge-Based Systems 210 (Dec. 2020), 106458. https://doi.org/10.1016/j.knosys.2020.106458Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Wikipedia. 2022. List of Languages by Total Number of Speakers. Retrieved January 15, 2022 from https://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_speakers.Google ScholarGoogle Scholar
  5. [5] Fortuna Paula and Nunes Sérgio. 2018. A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR) 51, 4 (July 2018), 130. https://doi.org/10.1145/3232676Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Pinkesh Badjatiya, Gupta Shashank, Gupta Manish, and Varma Vasudeva. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion. ACM, 759760. https://doi.org/10.1145/3041021.3054223Google ScholarGoogle Scholar
  7. [7] Parikh Pulkit, Abburi Harika, Badjatiya Pinkesh, Krishnan Radhika, Chhaya Niyati, Gupta Manish, and Varma Vasudeva. 2019. Multi-label categorization of accounts of sexism using a neural framework. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 16421652. https://doi.org/10.18653/v1/D19-1174Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Sharma Arushi, Kabra Anubha, and Jain Minni. 2022. Ceasing hate with MoH: Hate speech detection in Hindi–English code-switched language. Information Processing & Management 59, 1 (Jan. 2022), 102760. https://doi.org/10.1016/j.ipm.2021.102760Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Saurabh Sangwan R. and Bhatia M. P. S.. 2021. Denigrate comment detection in low-resource Hindi language using attention-based residual networks. Transactions on Asian and Low-Resource Language Information Processing 21, 1 (Jan. 2022), 114. https://doi.org/10.1145/3431729Google ScholarGoogle Scholar
  10. [10] Arango Aymé, Pérez Jorge, and Poblete Barbara. 2020. Hate speech detection is not as easy as you may think: A closer look at model validation (extended version). Information Systems 105 (Mar. 2020), 101584. https://doi.org/10.1016/j.is.2020.101584Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Saif Hassan, He Yulan, Fernandez Miriam, and Alani Harith. 2016. Contextual semantics for sentiment analysis of Twitter. Information Processing & Management 52, 1 (Jan. 2016), 519. https://doi.org/10.1016/j.ipm.2015.01.005Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Khosla S.. 2021. FB Didn't Flag Hate Speech in India as it Lacked Hindi, Bengali Classifiers: Haugen. Inshorts. Retrieved October 7, 2021 from https://inshorts.com/en/news/fb-didnt-flag-hate-speech-in-india-as-it-lacked-hindi-bengali-classifiers-haugen-1633598646476.Google ScholarGoogle Scholar
  13. [13] Luz Olivia Badillo. [n.d.]. For Every 10,000 Posts on Facebook, 15 are Hate Speech. Retrieved August 7, 2022 from https://tecreview.tec.mx/2021/11/25/en/for-every-10000-posts-on-facebook-15-are-hate-speech/.Google ScholarGoogle Scholar
  14. [14] Burnap Pete and Williams Matthew L.. 2015. Cyber hate speech on Twitter: An application of machine classification and statistical modeling for policy and decision making. Policy & Internet 7, 2 (Apr. 2015), 223242. https://doi.org/10.1002/poi3.85Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Waseem Zeerak, Davidson Thomas, Warmsley Dana, and Weber Ingmar. 2017. Understanding abuse: A typology of abusive language detection subtasks. In Proceedings of the 1st Workshop on Abusive Language Online. 7884. https://doi.org/10.18653/v1/W17-3012Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Davidson Thomas, Warmsley Dana, Macy Michael, and Weber Ingmar. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 11, 1, 512--515. https://doi.org/10.1609/icwsm.v11i1.14955Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Waseem Zeerak and Hovy Dirk. 2016. Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In Proceedings of the NAACL Student Research Workshop. 8893.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Nobata Chikashi, Tetreault Joel, Thomas Achint, Mehdad Yashar, and Chang Yi. 2016. Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web. ACM, 145153. https://doi.org/10.1145/2872427.2883062Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Araque Oscar and Iglesias Carlos A.. 2022. An ensemble method for radicalization and hate speech detection online empowered by sentic computing. Cognitive Computation 14, 1 (Feb. 2022), 4861. https://doi.org/10.1007/s12559-021-09845-6Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Zhang Ziqi, Robinson David, and Tepper Jonathan. 2018. Detecting hate speech on Twitter using a convolution-GRU based deep neural network. In European Semantic Web Conference. Springer, Cham, 745760. https://doi.org/10.1007/978-3-319-93417-4_48Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Miok Kristian, Škrlj Blaž, Zaharie Daniela, and Robnik-Šikonja Marko. 2021. To ban or not to ban: Bayesian attention networks for reliable hate speech detection. Cognitive Computation (Jan. 2021). 119. https://doi.org/10.1007/s12559-021-09826-9Google ScholarGoogle Scholar
  22. [22] Corazza Michele, Menini Stefano, Cabrio Elena, Tonelli Sara, and Villata Serena. 2020. A multilingual evaluation for online hate speech detection. ACM Transactions on Internet Technology (TOIT) 20, 2 (May 2020), 122. https://doi.org/10.1145/3377323Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Fabio Del Vigna , Cimino Andrea, Dell'Orletta Felice, Petrocchi Marinella, and Tesconi Maurizio. 2017. Hate me, hate me not: Hate speech detection on Facebook. In Proceedings of the First Italian Conference on Cybersecurity (ITASEC’17). 8695.Google ScholarGoogle Scholar
  24. [24] Kumar Ritesh, Lahiri Bornini, and Ojha Atul Kr. 2021. Aggressive and offensive language identification in Hindi, Bangla, and English: A comparative study. SN Computer Science 2, 1 (Jan. 2021), 120. https://doi.org/10.1007/s42979-020-00414-6Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Mozafari M., Farahbakhsh R., and Crespi N.. 2022. Cross-lingual few-shot hate speech and offensive language detection using meta learning. IEEE Access 10 (Jan. 2022), 1488014896. https://doi.org/10.1109/ACCESS.2022.3147588Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Xiang Guang, Fan Bin, Wang Ling, Hong Jason, and Rose Carolyn. 2012. Detecting offensive tweets via topical feature discovery over a large scale Twitter corpus. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, 19801984. https://doi.org/10.1145/2396761.2398556Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Rosenthal Sara, Atanasova Pepa, Karadzhov Georgi, Zampieri Marcos, and Nakov Preslav. 2020. SOLID: A large-scale semi-supervised dataset for offensive language identification. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 915928Google ScholarGoogle Scholar
  28. [28] Sarwar Sheikh Muhammad and Murdock Vanessa. 2021. Unsupervised domain adaptation for hate speech detection using a data augmentation approach. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 16. 852862. https://doi.org/10.1609/icwsm.v16i1.19340Google ScholarGoogle Scholar
  29. [29] Wadhwa Pooja and Bhatia M. P. S.. 2013. Tracking on-line radicalization using investigative data mining. In 2013 National Conference on Communications (NCC’13). IEEE, 15. https://doi.org/10.1109/NCC.2013.6488046Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Liu Han, Burnap Pete, Alorainy Wafa, and Williams Matthew L.. 2019. Fuzzy multi-task learning for hate speech type identification. In The World Wide Web Conference. 30063012. https://doi.org/10.1145/3308558.3313546Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Nogueira dos Santos Cicero, Melnyk Igor, and Padhi Inkit. 2018. Fighting offensive language on social media with unsupervised text style transfer. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 189194. https://doi.org/10.18653/v1/P18-2031Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Tran Minh, Zhang Yipeng, and Soleymani Mohammad. 2020. Towards a friendly online community: An unsupervised style transfer framework for profanity redaction. In Proceedings of the 28th International Conference on Computational Linguistics. 21072114. https://doi.org/10.18653/v1/2020.coling-main.190Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Rodriguez Axel, Argueta Carlos, and Chen Yi-Ling. 2019. Automatic detection of hate speech on Facebook using sentiment and emotion analysis. In 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC’19). IEEE, 169174. https://doi.org/10.1109/ICAIIC.2019.8669073Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Plaza-del-Arco F. M., Molina-González M. D., Ureña-López L. A., and Martín-Valdivia M. T.. 2022. Integrating implicit and explicit linguistic phenomena via multi-task learning for offensive language detection. Knowledge-Based Systems 258 (Dec. 2022), 109965. https://doi.org/10.1016/j.knosys.2022.109965Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Markov Ilia, Ljubešić Nikola, Fišer Darja, and Daelemans Walter. 2021. Exploring stylometric and emotion-based features for multilingual cross-domain hate speech detection. In Proceedings of the 11th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 149159.Google ScholarGoogle Scholar
  36. [36] Awal Md Rabiul, Cao Rui, Lee Roy Ka-Wei, and Mitrovic Sandra. 2021. Angrybert: Joint learning target and emotion for hate speech detection. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Cham, 701713. https://doi.org/10.1007/978-3-030-75762-5_55Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Velankar Abhishek, Patil Hrushikesh, Gore Amol, Salunke Shubham, and Joshi Raviraj. 2021. Hate and offensive speech detection in Hindi and Marathi. arXiv:2110.12200. https://doi.org/10.48550/arXiv.2110.12200Google ScholarGoogle Scholar
  38. [38] Joshi Ramchandra, Karnavat Rushabh, Jirapure Kaustubh, and Joshi Ravirai. 2021. Evaluation of deep learning models for hostility detection in Hindi text. In 2021 6th International Conference for Convergence in Technology (I2CT’21). IEEE, 15. https://doi.org/10.1109/I2CT51068.2021.9418073Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Romim Nauros, Ahmed Mosahed, Talukder Hriteshwar, and Islam Md Saiful. 2021. Hate speech detection in the Bengali language: A dataset and its baseline evaluation. In Proceedings of International Joint Conference on Advances in Computational Intelligence. Springer, Singapore, 457468. https://doi.org/10.1007/978-981-16-0586-4_37Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Karim Md, Dey Sumon Kanti, and Chakravarthi Bharathi Raja. 2021. DeepHateExplainer: Explainable hate speech detection in under-resourced Bengali language. In 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA’21). IEEE, 110. https://doi.org/10.1109/DSAA53316.2021.9564230Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Ali Raza, Farooq Umar, Arshad Umair, Shahzad Waseem, and Beg Mirza Omer. 2022. Hate speech detection on Twitter using transfer learning. Computer Speech & Language 74 (July 2022), 101365. https://doi.org/10.1016/j.csl.2022.101365Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Das Mithun, Saha Punyajoy, Mathew Binny, and Mukherjee Animesh. 2022. HateCheckHIn: Evaluating Hindi hate speech detection models. arXiv:2205.00328. https://doi.org/10.48550/arXiv.2205.00328Google ScholarGoogle Scholar
  43. [43] Alvi Md Ishmam , and Sadia Sharmin . 2019. Hateful speech detection in public Facebook pages for the Bengali language. In: 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA’19). IEEE, 555560. https://doi.org/10.1109/ICMLA.2019.00104Google ScholarGoogle Scholar
  44. [44] Modha Sandip, Majumder Prasenjit, Mandl Thomas, and Mandalia Chintak. 2020. Detecting and visualizing hate speech in social media: A cyber watchdog for surveillance. Expert Systems with Applications 161 (Dec. 2020), 113725. https://doi.org/10.1016/j.eswa.2020.113725Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Sazzed Salim. 2021. Abusive content detection in transliterated Bengali-English social media corpus. In Proceedings of the 5th Workshop on Computational Approaches to Linguistic Code-Switching. 125130. https://doi.org/10.18653/v1/2021.calcs-1.16Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Ghosal Sayani and Jain Amita. 2021. Research journey of hate content detection from cyberspace. In Natural Language Processing for Global and Local Business. IGI Global, 200225. https://doi.org/10.4018/978-1-7998-4240-8.ch009Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Kelly Ryan. 2016. Pyenchant: A Spellchecking Library for Python. Retrieved August 2021 from https://pythonhosted.org/pyenchant.Google ScholarGoogle Scholar
  48. [48] Han SuHan. 2020. Googletrans: A Google Translator Library for Python. Retrieved 2020 from https://pythonhosted.org/googletrans.Google ScholarGoogle Scholar
  49. [49] Indic Deep-Xlit Engine, AI4Bharat Transliteration Application Library for Python. Retrieved November 2020 from https://pythonhosted.org/ai4bharat-transliteration.Google ScholarGoogle Scholar
  50. [50] Hardeniya N., Perkins J., Chopra D., N. Joshi , and I. Mathur . 2016. Natural Language Processing: Python and NLTK. Packt Publishing Ltd. https://doi.org/10.5555/3161300Google ScholarGoogle Scholar
  51. [51] MEmoLon —The Multilingual Emotion Lexicon. Github. Retrieved March 2021 from https://github.com/JULIELab/MEmoLon.Google ScholarGoogle Scholar
  52. [52] Bassignana Elisa, Basile Valerio, and Patti Viviana. 2018. Hurtlex: A multilingual lexicon of words to hurt. In 5th Italian Conference on Computational Linguistics (CLiC-it’18), Vol. 2253. CEUR-WS, 16.Google ScholarGoogle Scholar
  53. [53] Hurtlex. Github. Retrieved November 2021 from https://github.com/valeriobasile/hurtlex.Google ScholarGoogle Scholar
  54. [54] Viraaj. Hindi Bad Words. Scribd. Retrieved February 18, 2015 from https://www.scribd.com/document/256110319/Hindi-Bad-Words#download.Google ScholarGoogle Scholar
  55. [55] Das Subrata. Bengali Slang Words with Meaning (Bengali Slang Dictionary). Academia. Retrieved July 2021 from https://www.academia.edu/2965218/Bengali_slang_words_with_meaning_Bengali_slang_dictionary_.Google ScholarGoogle Scholar
  56. [56] Bharati Akshar, Sangal Rajeev, Sharma Dipti Misra, and Bai Lakshmi. 2006. AnnCorra: Annotating corpora guidelines for POS and Chunk annotation for Indian languages. LTRC-TR31, 138.Google ScholarGoogle Scholar
  57. [57] HASOC. 2019. Google. Retrieved 2019 from https://hasocfire.github.io/hasoc/2019/index.html.Google ScholarGoogle Scholar
  58. [58] Jha Vikas Kumar, Hrudya Pa, Vinu P. N., Vijayan Vishnu, and Pa Prabaharan . 2020. DHOT-repository and classification of offensive tweets in the Hindi language. Procedia Computer Science 171 (2020), 23242333. https://doi.org/10.1016/j.procs.2020.04.252Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] NNTI Final Project (Sentiment Analysis & Transfer Learning). Github. Retrieved 2021 from GitHub - SouravDutta91/NNTI-WS2021-NLP-Project: Saarland University NNTI WS2021 NLP Final Project.Google ScholarGoogle Scholar
  60. [60] Ishmam Alvi Md, Arman Jawad, and Sharmin Sadia. 2019. Towards the development of the Bengali language corpus from public Facebook pages for hate speech research. In Proceedings of the Asian CHI Symposium 2019: Emerging HCI Research Collection. ACM, 141146. https://doi.org/10.1145/3309700.3338457Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. [61] Bhattacharya Shiladitya, Singh Siddharth, Kumar Ritesh, Bansal Akanksha, Bhagat Akash, Dawer Yogesh, Lahiri Bornini, and Ojha Atul Kr. 2020. Developing a multilingual annotated corpus of misogyny and aggression. In Proceedings of the 2nd Workshop on Trolling, Aggression and Cyberbullying. 158168.Google ScholarGoogle Scholar

Index Terms

  1. HateCircle and Unsupervised Hate Speech Detection Incorporating Emotion and Contextual Semantics

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 4
      April 2023
      682 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3588902
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 March 2023
      • Online AM: 19 December 2022
      • Accepted: 1 December 2022
      • Revised: 24 October 2022
      • Received: 1 July 2021
      Published in tallip Volume 22, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format