research-article

HateCircle and Unsupervised Hate Speech Detection Incorporating Emotion and Contextual Semantics

Authors:
Sayani Ghosal

Netaji Subhas University of Technology East Campus (erstwhile A.I.A.C.T.R.), Guru Gobind Singh Indraprastha University, Delhi, India

Netaji Subhas University of Technology East Campus (erstwhile A.I.A.C.T.R.), Guru Gobind Singh Indraprastha University, Delhi, India

0000-0002-0979-0788
View Profile

,
Amita Jain

Netaji Subhas University of Technology, Delhi, India

Netaji Subhas University of Technology, Delhi, India

0000-0003-0891-3675
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 22 Issue 4Article No.: 108pp 1–28https://doi.org/10.1145/3576913

Published:24 March 2023Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

The explosive growth of social media has fueled an extensive increase in online freedom of speech. The worldwide platform of human voice creates possibilities to assail other users without facing any consequences, and flout social etiquettes, resulting in an inevitable increase of hate speech. Nowadays, English hate speech detection is a popular research area, but the prevalence of implicit hate content in regional languages desire effective language-independent models. The proposed research is the first unsupervised Hindi and Bengali hate content detection framework consisting of three significant concepts: HateCircle, hate tweet classification, and code-switch data preparation algorithms. The novel HateCircle method is proposed to detect hate orientation for each term by co-occurrence patterns of words, contextual semantics, and emotion analysis. The efficient multiclass hate tweet classification algorithm is proposed with parts of speech tagging, Euclidean distance, and the Geometric median methods. The detection of hate content is more efficient in the native script compared to the Roman script, so the transliteration algorithm is also proposed for code-switch data preparation. The experimentation evaluates the combination of various lexicons with our enriched hate lexicon that achieves a maximum of 0.74 F1-score for the Hindi and 0.88 F1-score for the Bengali datasets. The novel HateCircle and hate tweet detection framework evaluates with our proposed parts of speech tagging and Geometric median detection methods. Results reveal that HateCircle and hate tweet detection framework also achieves a maximum of 0.73 accuracy for the Hindi and 0.78 accuracy for the Bengali dataset. The experiment results signify that contextual semantic hate speech detection research with a language-independency feature offsets the growth of implicit abusive text in social media.

REFERENCES

[1] Twitter Revenue and Usage Statistics. 2022. BusinessofApps. Retrieved January 11, 2022 from https://www.businessofapps.com/data/twitter-statistics/.Google Scholar
[2] Statista Research Department. 2022. Number of Data Removal Requests Issued to Twitter from July to December 2020, by Country and Institution. Statista. Retrieved July 2022 from https://www.statista.com/statistics/234858/number-of-requests-for-data-removal-from-twitter/.Google Scholar
[3] Kapil Prashant and Ekbal Asif. 2020. A deep neural network based multi-task learning approach to hate speech detection. Knowledge-Based Systems 210 (Dec. 2020), 106458. https://doi.org/10.1016/j.knosys.2020.106458Google ScholarCross Ref
[4] Wikipedia. 2022. List of Languages by Total Number of Speakers. Retrieved January 15, 2022 from https://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_speakers.Google Scholar
[5] Fortuna Paula and Nunes Sérgio. 2018. A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR) 51, 4 (July 2018), 1–30. https://doi.org/10.1145/3232676Google ScholarDigital Library
[6] Pinkesh Badjatiya, Gupta Shashank, Gupta Manish, and Varma Vasudeva. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion. ACM, 759–760. https://doi.org/10.1145/3041021.3054223Google Scholar
[7] Parikh Pulkit, Abburi Harika, Badjatiya Pinkesh, Krishnan Radhika, Chhaya Niyati, Gupta Manish, and Varma Vasudeva. 2019. Multi-label categorization of accounts of sexism using a neural framework. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 1642–1652. https://doi.org/10.18653/v1/D19-1174Google ScholarCross Ref
[8] Sharma Arushi, Kabra Anubha, and Jain Minni. 2022. Ceasing hate with MoH: Hate speech detection in Hindi–English code-switched language. Information Processing & Management 59, 1 (Jan. 2022), 102760. https://doi.org/10.1016/j.ipm.2021.102760Google ScholarDigital Library
[9] Saurabh Sangwan R. and Bhatia M. P. S.. 2021. Denigrate comment detection in low-resource Hindi language using attention-based residual networks. Transactions on Asian and Low-Resource Language Information Processing 21, 1 (Jan. 2022), 1–14. https://doi.org/10.1145/3431729Google Scholar
[10] Arango Aymé, Pérez Jorge, and Poblete Barbara. 2020. Hate speech detection is not as easy as you may think: A closer look at model validation (extended version). Information Systems 105 (Mar. 2020), 101584. https://doi.org/10.1016/j.is.2020.101584Google ScholarDigital Library
[11] Saif Hassan, He Yulan, Fernandez Miriam, and Alani Harith. 2016. Contextual semantics for sentiment analysis of Twitter. Information Processing & Management 52, 1 (Jan. 2016), 5–19. https://doi.org/10.1016/j.ipm.2015.01.005Google ScholarDigital Library
[12] Khosla S.. 2021. FB Didn't Flag Hate Speech in India as it Lacked Hindi, Bengali Classifiers: Haugen. Inshorts. Retrieved October 7, 2021 from https://inshorts.com/en/news/fb-didnt-flag-hate-speech-in-india-as-it-lacked-hindi-bengali-classifiers-haugen-1633598646476.Google Scholar
[13] Luz Olivia Badillo. [n.d.]. For Every 10,000 Posts on Facebook, 15 are Hate Speech. Retrieved August 7, 2022 from https://tecreview.tec.mx/2021/11/25/en/for-every-10000-posts-on-facebook-15-are-hate-speech/.Google Scholar
[14] Burnap Pete and Williams Matthew L.. 2015. Cyber hate speech on Twitter: An application of machine classification and statistical modeling for policy and decision making. Policy & Internet 7, 2 (Apr. 2015), 223–242. https://doi.org/10.1002/poi3.85Google ScholarCross Ref
[15] Waseem Zeerak, Davidson Thomas, Warmsley Dana, and Weber Ingmar. 2017. Understanding abuse: A typology of abusive language detection subtasks. In Proceedings of the 1st Workshop on Abusive Language Online. 78–84. https://doi.org/10.18653/v1/W17-3012Google ScholarCross Ref
[16] Davidson Thomas, Warmsley Dana, Macy Michael, and Weber Ingmar. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 11, 1, 512--515. https://doi.org/10.1609/icwsm.v11i1.14955Google ScholarCross Ref
[17] Waseem Zeerak and Hovy Dirk. 2016. Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In Proceedings of the NAACL Student Research Workshop. 88–93.Google ScholarCross Ref
[18] Nobata Chikashi, Tetreault Joel, Thomas Achint, Mehdad Yashar, and Chang Yi. 2016. Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web. ACM, 145–153. https://doi.org/10.1145/2872427.2883062Google ScholarDigital Library
[19] Araque Oscar and Iglesias Carlos A.. 2022. An ensemble method for radicalization and hate speech detection online empowered by sentic computing. Cognitive Computation 14, 1 (Feb. 2022), 48–61. https://doi.org/10.1007/s12559-021-09845-6Google ScholarCross Ref
[20] Zhang Ziqi, Robinson David, and Tepper Jonathan. 2018. Detecting hate speech on Twitter using a convolution-GRU based deep neural network. In European Semantic Web Conference. Springer, Cham, 745–760. https://doi.org/10.1007/978-3-319-93417-4_48Google ScholarDigital Library
[21] Miok Kristian, Škrlj Blaž, Zaharie Daniela, and Robnik-Šikonja Marko. 2021. To ban or not to ban: Bayesian attention networks for reliable hate speech detection. Cognitive Computation (Jan. 2021). 1–19. https://doi.org/10.1007/s12559-021-09826-9Google Scholar
[22] Corazza Michele, Menini Stefano, Cabrio Elena, Tonelli Sara, and Villata Serena. 2020. A multilingual evaluation for online hate speech detection. ACM Transactions on Internet Technology (TOIT) 20, 2 (May 2020), 1–22. https://doi.org/10.1145/3377323Google ScholarDigital Library
[23] Fabio Del Vigna , Cimino Andrea, Dell'Orletta Felice, Petrocchi Marinella, and Tesconi Maurizio. 2017. Hate me, hate me not: Hate speech detection on Facebook. In Proceedings of the First Italian Conference on Cybersecurity (ITASEC’17). 86–95.Google Scholar
[24] Kumar Ritesh, Lahiri Bornini, and Ojha Atul Kr. 2021. Aggressive and offensive language identification in Hindi, Bangla, and English: A comparative study. SN Computer Science 2, 1 (Jan. 2021), 1–20. https://doi.org/10.1007/s42979-020-00414-6Google ScholarCross Ref
[25] Mozafari M., Farahbakhsh R., and Crespi N.. 2022. Cross-lingual few-shot hate speech and offensive language detection using meta learning. IEEE Access 10 (Jan. 2022), 14880–14896. https://doi.org/10.1109/ACCESS.2022.3147588Google ScholarCross Ref
[26] Xiang Guang, Fan Bin, Wang Ling, Hong Jason, and Rose Carolyn. 2012. Detecting offensive tweets via topical feature discovery over a large scale Twitter corpus. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, 1980–1984. https://doi.org/10.1145/2396761.2398556Google ScholarDigital Library
[27] Rosenthal Sara, Atanasova Pepa, Karadzhov Georgi, Zampieri Marcos, and Nakov Preslav. 2020. SOLID: A large-scale semi-supervised dataset for offensive language identification. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 915–928Google Scholar
[28] Sarwar Sheikh Muhammad and Murdock Vanessa. 2021. Unsupervised domain adaptation for hate speech detection using a data augmentation approach. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 16. 852–862. https://doi.org/10.1609/icwsm.v16i1.19340Google Scholar
[29] Wadhwa Pooja and Bhatia M. P. S.. 2013. Tracking on-line radicalization using investigative data mining. In 2013 National Conference on Communications (NCC’13). IEEE, 1–5. https://doi.org/10.1109/NCC.2013.6488046Google ScholarCross Ref
[30] Liu Han, Burnap Pete, Alorainy Wafa, and Williams Matthew L.. 2019. Fuzzy multi-task learning for hate speech type identification. In The World Wide Web Conference. 3006–3012. https://doi.org/10.1145/3308558.3313546Google ScholarDigital Library
[31] Nogueira dos Santos Cicero, Melnyk Igor, and Padhi Inkit. 2018. Fighting offensive language on social media with unsupervised text style transfer. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 189–194. https://doi.org/10.18653/v1/P18-2031Google ScholarCross Ref
[32] Tran Minh, Zhang Yipeng, and Soleymani Mohammad. 2020. Towards a friendly online community: An unsupervised style transfer framework for profanity redaction. In Proceedings of the 28th International Conference on Computational Linguistics. 2107–2114. https://doi.org/10.18653/v1/2020.coling-main.190Google ScholarCross Ref
[33] Rodriguez Axel, Argueta Carlos, and Chen Yi-Ling. 2019. Automatic detection of hate speech on Facebook using sentiment and emotion analysis. In 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC’19). IEEE, 169–174. https://doi.org/10.1109/ICAIIC.2019.8669073Google ScholarCross Ref
[34] Plaza-del-Arco F. M., Molina-González M. D., Ureña-López L. A., and Martín-Valdivia M. T.. 2022. Integrating implicit and explicit linguistic phenomena via multi-task learning for offensive language detection. Knowledge-Based Systems 258 (Dec. 2022), 109965. https://doi.org/10.1016/j.knosys.2022.109965Google ScholarDigital Library
[35] Markov Ilia, Ljubešić Nikola, Fišer Darja, and Daelemans Walter. 2021. Exploring stylometric and emotion-based features for multilingual cross-domain hate speech detection. In Proceedings of the 11th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 149–159.Google Scholar
[36] Awal Md Rabiul, Cao Rui, Lee Roy Ka-Wei, and Mitrovic Sandra. 2021. Angrybert: Joint learning target and emotion for hate speech detection. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Cham, 701–713. https://doi.org/10.1007/978-3-030-75762-5_55Google ScholarDigital Library
[37] Velankar Abhishek, Patil Hrushikesh, Gore Amol, Salunke Shubham, and Joshi Raviraj. 2021. Hate and offensive speech detection in Hindi and Marathi. arXiv:2110.12200. https://doi.org/10.48550/arXiv.2110.12200Google Scholar
[38] Joshi Ramchandra, Karnavat Rushabh, Jirapure Kaustubh, and Joshi Ravirai. 2021. Evaluation of deep learning models for hostility detection in Hindi text. In 2021 6th International Conference for Convergence in Technology (I2CT’21). IEEE, 1–5. https://doi.org/10.1109/I2CT51068.2021.9418073Google ScholarCross Ref
[39] Romim Nauros, Ahmed Mosahed, Talukder Hriteshwar, and Islam Md Saiful. 2021. Hate speech detection in the Bengali language: A dataset and its baseline evaluation. In Proceedings of International Joint Conference on Advances in Computational Intelligence. Springer, Singapore, 457–468. https://doi.org/10.1007/978-981-16-0586-4_37Google ScholarCross Ref
[40] Karim Md, Dey Sumon Kanti, and Chakravarthi Bharathi Raja. 2021. DeepHateExplainer: Explainable hate speech detection in under-resourced Bengali language. In 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA’21). IEEE, 1–10. https://doi.org/10.1109/DSAA53316.2021.9564230Google ScholarCross Ref
[41] Ali Raza, Farooq Umar, Arshad Umair, Shahzad Waseem, and Beg Mirza Omer. 2022. Hate speech detection on Twitter using transfer learning. Computer Speech & Language 74 (July 2022), 101365. https://doi.org/10.1016/j.csl.2022.101365Google ScholarDigital Library
[42] Das Mithun, Saha Punyajoy, Mathew Binny, and Mukherjee Animesh. 2022. HateCheckHIn: Evaluating Hindi hate speech detection models. arXiv:2205.00328. https://doi.org/10.48550/arXiv.2205.00328Google Scholar
[43] Alvi Md Ishmam , and Sadia Sharmin . 2019. Hateful speech detection in public Facebook pages for the Bengali language. In: 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA’19). IEEE, 555–560. https://doi.org/10.1109/ICMLA.2019.00104Google Scholar
[44] Modha Sandip, Majumder Prasenjit, Mandl Thomas, and Mandalia Chintak. 2020. Detecting and visualizing hate speech in social media: A cyber watchdog for surveillance. Expert Systems with Applications 161 (Dec. 2020), 113725. https://doi.org/10.1016/j.eswa.2020.113725Google ScholarCross Ref
[45] Sazzed Salim. 2021. Abusive content detection in transliterated Bengali-English social media corpus. In Proceedings of the 5th Workshop on Computational Approaches to Linguistic Code-Switching. 125–130. https://doi.org/10.18653/v1/2021.calcs-1.16Google ScholarCross Ref
[46] Ghosal Sayani and Jain Amita. 2021. Research journey of hate content detection from cyberspace. In Natural Language Processing for Global and Local Business. IGI Global, 200–225. https://doi.org/10.4018/978-1-7998-4240-8.ch009Google ScholarCross Ref
[47] Kelly Ryan. 2016. Pyenchant: A Spellchecking Library for Python. Retrieved August 2021 from https://pythonhosted.org/pyenchant.Google Scholar
[48] Han SuHan. 2020. Googletrans: A Google Translator Library for Python. Retrieved 2020 from https://pythonhosted.org/googletrans.Google Scholar
[49] Indic Deep-Xlit Engine, AI4Bharat Transliteration Application Library for Python. Retrieved November 2020 from https://pythonhosted.org/ai4bharat-transliteration.Google Scholar
[50] Hardeniya N., Perkins J., Chopra D., N. Joshi , and I. Mathur . 2016. Natural Language Processing: Python and NLTK. Packt Publishing Ltd. https://doi.org/10.5555/3161300Google Scholar
[51] MEmoLon —The Multilingual Emotion Lexicon. Github. Retrieved March 2021 from https://github.com/JULIELab/MEmoLon.Google Scholar
[52] Bassignana Elisa, Basile Valerio, and Patti Viviana. 2018. Hurtlex: A multilingual lexicon of words to hurt. In 5th Italian Conference on Computational Linguistics (CLiC-it’18), Vol. 2253. CEUR-WS, 1–6.Google Scholar
[53] Hurtlex. Github. Retrieved November 2021 from https://github.com/valeriobasile/hurtlex.Google Scholar
[54] Viraaj. Hindi Bad Words. Scribd. Retrieved February 18, 2015 from https://www.scribd.com/document/256110319/Hindi-Bad-Words#download.Google Scholar
[55] Das Subrata. Bengali Slang Words with Meaning (Bengali Slang Dictionary). Academia. Retrieved July 2021 from https://www.academia.edu/2965218/Bengali_slang_words_with_meaning_Bengali_slang_dictionary_.Google Scholar
[56] Bharati Akshar, Sangal Rajeev, Sharma Dipti Misra, and Bai Lakshmi. 2006. AnnCorra: Annotating corpora guidelines for POS and Chunk annotation for Indian languages. LTRC-TR31, 1–38.Google Scholar
[57] HASOC. 2019. Google. Retrieved 2019 from https://hasocfire.github.io/hasoc/2019/index.html.Google Scholar
[58] Jha Vikas Kumar, Hrudya Pa, Vinu P. N., Vijayan Vishnu, and Pa Prabaharan . 2020. DHOT-repository and classification of offensive tweets in the Hindi language. Procedia Computer Science 171 (2020), 2324–2333. https://doi.org/10.1016/j.procs.2020.04.252Google ScholarCross Ref
[59] NNTI Final Project (Sentiment Analysis & Transfer Learning). Github. Retrieved 2021 from GitHub - SouravDutta91/NNTI-WS2021-NLP-Project: Saarland University NNTI WS2021 NLP Final Project.Google Scholar
[60] Ishmam Alvi Md, Arman Jawad, and Sharmin Sadia. 2019. Towards the development of the Bengali language corpus from public Facebook pages for hate speech research. In Proceedings of the Asian CHI Symposium 2019: Emerging HCI Research Collection. ACM, 141–146. https://doi.org/10.1145/3309700.3338457Google ScholarDigital Library
[61] Bhattacharya Shiladitya, Singh Siddharth, Kumar Ritesh, Bansal Akanksha, Bhagat Akash, Dawer Yogesh, Lahiri Bornini, and Ojha Atul Kr. 2020. Developing a multilingual annotated corpus of misogyny and aggression. In Proceedings of the 2nd Workshop on Trolling, Aggression and Cyberbullying. 158–168.Google Scholar

Index Terms

HateCircle and Unsupervised Hate Speech Detection Incorporating Emotion and Contextual Semantics
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

A Multilingual Evaluation for Online Hate Speech Detection
Special Section on Emotions in Conflictual Social Interactions and Regular Papers

The increasing popularity of social media platforms such as Twitter and Facebook has led to a rise in the presence of hate and aggressive speech on these platforms. Despite the number of approaches recently proposed in the Natural Language Processing ...
Read More
Hate Speech Detection in Roman Urdu
Special issue on Deep Learning for Low-Resource Natural Language Processing, Part 1 and Regular Papers

Hate speech is a specific type of controversial content that is widely legislated as a crime that must be identified and blocked. However, due to the sheer volume and velocity of the Twitter data stream, hate speech detection cannot be performed ...
Read More
Improving hate speech detection using Cross-Lingual Learning
Abstract
The growth of social media worldwide has brought social benefits and challenges. One problem we highlight is the proliferation of hate speech on social media. We propose a novel method for detecting hate speech in texts using Cross-Lingual ...
Highlights
- The development of a new methodology for hate speech detection.
- Portuguese hate speech detection using Cross-Lingual Learning.
- Up to 20% performance improvement over other models using the OffComBr-2 corpus.
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 22, Issue 4
April 2023
682 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3588902
Editor:
Imed Zitouni
Google, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 March 2023
- Online AM: 19 December 2022
- Accepted: 1 December 2022
- Revised: 24 October 2022
- Received: 1 July 2021
Published in tallip Volume 22, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Low-resource languages
hate speech detection
emotion analysis
contextual semantics
parts-of-speech tagging
code-switch script
Indian languages
social media
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 665
  Total Downloads
- Downloads (Last 12 months)466
- Downloads (Last 6 weeks)35
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

HateCircle and Unsupervised Hate Speech Detection Incorporating Emotion and Contextual Semantics

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

A Multilingual Evaluation for Online Hate Speech Detection

Hate Speech Detection in Roman Urdu

Improving hate speech detection using Cross-Lingual Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Caption

HateCircle and Unsupervised Hate Speech Detection Incorporating Emotion and Contextual Semantics

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

A Multilingual Evaluation for Online Hate Speech Detection

Hate Speech Detection in Roman Urdu

Improving hate speech detection using Cross-Lingual Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Share this Publication link

Share on Social Media