nach oben

Social Network Analysis and Mining

Erschienen in:

01.12.2024 | Review Paper

A survey of hate speech detection in Indian languages

verfasst von: Arpan Nandi, Kamal Sarkar, Arjun Mallick, Arkadeep De

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

With the enormous increase in accessibility of high-speed internet, the number of social media users is increasing rapidly. Due to a lack of proper regulations and ethics, social media platforms are often contaminated by posts and comments containing abusive language and offensive remarks toward individuals, groups, races, religions, and communities. A single remark often triggers a huge chain of reactions with similar abusiveness, or even more. To prevent such occurrences, there is a need for automated systems that can detect abusive texts and hate speeches and remove them immediately. However, most existing research works are limited only to globally popular languages like English. Since India is a nation of many diverse languages and multiple religions, nowadays abusive posts and remarks in Indian languages (monolingual or code-mixed form) are not infrequent on social media platforms. Although resources such as hate speech lexicon and annotated datasets are limited for Indian languages, most research works on hate speech detection in such languages used traditional machine learning and deep learning methods for this task. However, multilingualism and code-mixing make hate speech detection in Indian languages more challenging. Given these facts, this paper mainly focuses on reviewing the latest impactful research works on hate speech detection in Indian languages. In this paper, we have analyzed and compared the latest research works on hate speech detection in Indian languages in terms of various aspects—datasets used, feature extraction and classification methods applied, and the results achieved.

Vorheriger Artikel Effect of social media sentiment on donations received by NPOS

Nächster Artikel A novel influence quantification model on Instagram using data science approach for targeted business advertising and better digital marketing outcomes

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Akhter S, et al ( 2018) Social media bullying detection using machine learning on Bangla text. In: 2018 10th International conference on electrical and computer engineering (ICECE). IEEE, pp 385–388

Alrehili A (2019) Automatic hate speech detection on social media: a brief survey. In: 2019 IEEE/ACS 16th international conference on computer systems and applications (AICCSA). IEEE, pp 1–6

Al Kuwatly H, Wich M, Groh G (2020) Identifying and measuring annotator bias based on annotators’ demographic characteristics. In: Proceedings of the 4th Workshop on online abuse and harms, pp 184–190

Anusha M, Shashirekha H (2020) An ensemble model for hate speech and offensive content identification in Indo-European languages. In: FIRE (Working Notes), pp 253–259

Barnwal S, Kumar R, Pamula R (2022) IIT DHANBAD CODECHAMPS at SemEval-2022 task 5: MAMI—multimedia automatic misogyny identification. In: Proceedings of the 16th international workshop on semantic evaluation (SemEval-2022). Association for Computational Linguistics, Seattle, pp 733–735. https://doi.org/10.18653/v1/2022.semeval-1.101

Bharathi B, Varsha J ( 2022) Ssncse nlp@ tamilnlp-acl2022: transformer based approach for detection of abusive comment for Tamil language. In: Proceedings of the 2nd workshop on speech and language technologies for Dravidian languages, pp 158–164

Bhattacharya S, Singh S, Kumar R, Bansal A, Bhagat A, Dawer Y, Lahiri B, Ojha AK (2020) Developing a multilingual annotated corpus of misogyny and aggression. arXiv preprint arXiv:2003.07428

Biradar S, Saumya S et al (2022) Fighting hate speech from bilingual Hinglish speaker’s perspective, a transformer-and translation-based approach. Soc Network Anal Min 12(1):1–10

Bohra A, Vijay D, Singh V, Akhtar SS, Shrivastava M (2018) A dataset of Hindi–English code-mixed social media text for hate speech detection. In: Proceedings of the 2nd workshop on computational modeling of people’s opinions, personality, and emotions in social media. Association for Computational Linguistics, New Orleans, Louisiana, pp 36–41. https://doi.org/10.18653/v1/W18-1105

Chakravarthi BR (2022) Hope speech detection in Youtube comments. Soc Network Anal Min 12(1):1–19

Chakravarthi BR, Priyadharshini R, Muralidaran V, Jose N, Suryawanshi S, Sherly E, McCrae JP (2022) Dravidiancodemix: sentiment analysis and offensive language identification dataset for Dravidian languages in code-mixed text. Lang Resour Eval 56(3):765–806CrossRef

Chakravarthi BR, Priyadharshini R, Jose N, Mandl T, Kumaresan PK, Ponnusamy R, Hariharan R, McCrae JP, Sherly E, et al (2021) Findings of the shared task on offensive language identification in Tamil, Malayalam, and Kannada. In: Proceedings of the 1st workshop on speech and language technologies for Dravidian languages, pp 133–145

Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRef

Das M, Saha P, Mathew B, Mukherjee A (2022) Hatecheckhin: Evaluating Hindi hate speech detection models. arXiv preprint arXiv:2205.00328

Del Vigna12 F, Cimino23 A, Dell’Orletta F, Petrocchi M, Tesconi M (2017) Hate me, hate me not: hate speech detection on facebook. In: Proceedings of the 1st Italian conference on cybersecurity (ITASEC17), pp 86–95

Dhanya L, Balakrishnan K (2021) Hate speech detection in Asian languages: A Survey. In: 2021 International conference on communication, control and information sciences (ICCISc) 1:1–5 (IEEE)

Dowlagar S, Mamidi R (2021) A survey of recent neural network models on code-mixed Indian hate speech data. In: Forum for information retrieval evaluation, pp 67–74

Dutta S, Majumder U, Naskar SK ( 2021) sdutta at comma@ icon: a CNN-LSTM model for hate detection. In: Proceedings of the 18th international conference on natural language processing: shared task on multilingual gender biased and communal language identification, pp 53–57

Eshan SC, Hasan MS (2017) An application of machine learning to detect abusive bengali text. In: 2017 20th International conference of computer and information technology (ICCIT). IEEE, pp 1–6

Grave E, Bojanowski P, Gupta P, Joulin A, Mikolov T (2018a) Learning Word Vectors for 157 Languages. https://doi.org/10.48550/ARXIV.1802.06893

Grave E, Bojanowski P, Gupta P, Joulin A, Mikolov T (2018b) Learning word vectors for 157 languages. arXiv preprint arXiv:1802.06893

Guest E, Vidgen B, Mittos A, Sastry N, Tyson G, Margetts H (2021) An expert annotated dataset for the detection of online misogyny. In: Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume, pp 1336–1350

Himabindu GSSN, Rao R, Sethia D (2022) A self-attention hybrid emoji prediction model for code-mixed language: (Hinglish). Social Network Anal Min 12(1):137CrossRef

Ishmam AM, Sharmin S (2019) Hateful speech detection in public facebook pages for the Bengali language. In: 2019 18th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 555–560

Islam M, Hossain MS, Akhter N ( 2022) Hate speech detection using machine learning in Bengali languages. In: 2022 6th International conference on intelligent computing and control systems (ICICCS). IEEE, pp 1349–1354

Jemima PP, Majumder BR, Ghosh BK, Hoda F (2022) Hate speech detection using machine learning. In: 2022 7th international conference on communication and electronics systems (ICCES). IEEE, pp 1274–1277

Jha VK, Hrudya P, Vinu P, Vijayan V, Prabaharan P (2020) Dhot-repository and classification of offensive tweets in the Hindi language. Procedia Comput Sci 171:2324–2333CrossRef

Joshi R, Karnavat R, Jirapure K, Joshi R (2021) Evaluation of deep learning models for hostility detection in Hindi text. In: 2021 6th International conference for convergence in technology (I2CT). IEEE, pp 1–5

Kamble S, Joshi A (2018) Hate speech detection from code-mixed Hindi–English tweets using deep learning models. arXiv preprint arXiv:1811.05145

Karim MR, Dey SK, Islam T, Sarker S, Menon MH, Hossain K, Hossain MA, Decker S (2021) Deephateexplainer: explainable hate speech detection in under-resourced Bengali language. In: 2021 IEEE 8th international conference on data science and advanced analytics (DSAA). IEEE, pp 1–10

Khan H, Phillips JL (2021) Language agnostic model: detecting islamophobic content on social media. In: Proceedings of the 2021 ACM southeast conference, pp 229–233

Kumar R, Lahiri B, Ojha AK (2021) Aggressive and offensive language identification in Hindi, Bangla, and English: a comparative study. SN Comput Sci 2(1):1–20CrossRef

Kumar R, Reganti AN, Bhatia A, Maheshwari T (2018) Aggression-annotated corpus of Hindi–English code-mixed data. arXiv preprint arXiv:1803.09402

Kumar T, Mahrishi M, Sharma G (2023) Emotion recognition in Hindi text using multilingual Bert transformer. Multimed Tools Appl 1–22

Kumar R, Ojha AK, Malmasi S, Zampieri M ( 2018) Benchmarking aggression identification in social media. In: Proceedings of the 1st workshop on trolling, aggression and cyberbullying (TRAC-2018), pp 1–11

Kumar R, Ojha AK, Malmasi S, Zampieri M (2020) Evaluating aggression identification in social media. In: Proceedings of the 2nd workshop on trolling, aggression and cyberbullying, pp 1–5

Kumaresan PK, Sakuntharaj R, Thavareesan S, Navaneethakrishnan S, Madasamy AK, Chakravarthi BR, McCrae JP (2021) Findings of shared task on offensive language identification in Tamil and Malayalam. In: Forum for information retrieval evaluation, pp 16–18

Mandl T, Modha S, Majumder P, Patel D, Dave M, Mandlia C, Patel A ( 2019) Overview of the hasoc track at fire 2019: hate speech and offensive content identification in Indo-European languages. In: Proceedings of the 11th annual meeting of the forum for information retrieval evaluation, pp 14–17

Mandl T, Modha S, Kumar MA, Chakravarthi BR ( 2020) Overview of the hasoc track at fire 2020: hate speech and offensive language identification in Tamil, Malayalam, Hindi, English and German. In: Forum for information retrieval evaluation, pp 29–32

Masud S, Charaborty T (2023) Political mud slandering and power dynamics during Indian assembly elections. Soc Network Anal Min 13(1):108CrossRef

Mathew B, Illendula A, Saha P, Sarkar S, Goyal P, Mukherjee A (2020) Hate begets hate: a temporal study of hate speech. Proc ACM Hum–Comput Interaction 4( CSCW2):1–24

Mathur P, Shah R, Sawhney R, Mahata D (2018) Detecting offensive tweets in Hindi–English code-switched language. In: Proceedings of the 6th international workshop on natural language processing for social media, pp 18–26

Meetei LS, Singh TD, Borgohain SK, Bandyopadhyay S (2021) Low resource language specific pre-processing and features for sentiment analysis task. Lang Resour Eval 55(4):947–969CrossRef

Mikolov T, Chen K, Corrado G, Dean, J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

Mridha MF, Wadud MAH, Hamid MA, Monowar MM, Abdullah-Al-Wadud M, Alamri A (2021) L-boost: identifying offensive texts from social media post in Bengali. IEEE Access 9:164681–164699CrossRef

Mundra S, Mittal N (2022) Fa-net: fused attention-based network for Hindi English code-mixed offensive text classification. Soc Network Anal Min 12(1):100CrossRef

Mundra S, Mittal N (2023) Cmhe-an: code mixed hybrid embedding based attention network for aggression identification in Hindi English code-mixed text. Multimed Tools Appl 82(8):11337–11364CrossRef

Naseem U, Razzak I, Eklund PW (2021) A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter. Multimed Tools Appl 80(28):35239–35266CrossRef

Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRef

Patil H, Velankar A, Joshi R (2022) L3cube-mahahate: A tweet-based marathi hate speech detection dataset and bert models. In: Proceedings of the 3rd workshop on threat, aggression and cyberbullying (TRAC 2022), pp 1– 9

Pavlopoulos J, Sorensen J, Laugier L, Androutsopoulos I (2021) Semeval-2021 task 5: toxic spans detection. In: Proceedings of the 15th international workshop on semantic evaluation (SemEval-2021), pp 59–69

Poletto F, Basile V, Sanguinetti M, Bosco C, Patti V (2021) Resources and benchmark corpora for hate speech detection: a systematic review. Lang Resourc Eval 55(2):477–523CrossRef

Rahman AI, Akhand Z-E, Noor MAU, Islam J, Mahtab M, Mehedi MHK, Rasel AA, et al (2022) Comparative analysis on joint modeling of emotion and abuse detection in Bangla language. In: International conference on advances in computing and data sciences. Springer, pp 199–209

Rani P, Suryawanshi S, Goswami K, Chakravarthi BR, Fransen T, McCrae JP (2020) A comparative study of different state-of-the-art hate speech detection methods in Hindi–English code-mixed data. In: Proceedings of the 2nd workshop on trolling, aggression and cyberbullying, pp 42–48

Remon NI, Tuli NH, Akash RD( 2022) Bengali hate speech detection in public facebook pages. In: 2022 International conference on innovations in science, engineering and technology (ICISET). IEEE, pp 169–173

Roy PK, Bhawal S, Subalalitha CN (2022) Hate speech and offensive language detection in Dravidian languages using deep ensemble framework. Comput Speech Lang 75:101386CrossRef

Roy A, Kapil P, Basak K, Ekbal A(2018) An ensemble approach for aggression identification in english and hindi text. In: Proceedings of the 1st workshop on trolling, aggression and cyberbullying (TRAC-2018), pp 66–73

Samghabadi NS, Patwa P, Pykl S, Mukherjee P, Das A, Solorio T( 2020) Aggression and misogyny detection using bert: a multi-task approach. In: Proceedings of the 2nd workshop on trolling, aggression and cyberbullying, pp 126–131

Sap M, Card D, Gabriel S, Choi Y, Smith NA ( 2019) The risk of racial bias in hate speech detection. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 1668–1678

Sarkar K (2018) Using character n-gram features and multinomial naïve bayes for sentiment polarity detection in Bengali tweets. In: 2018 5th International conference on emerging applications of information technology (EAIT), pp 1–4

Sarker M, Hossain MF, Liza FR, Sakib SN, Al Farooq A ( 2022) A machine learning approach to classify anti-social Bengali comments on social media. In: 2022 International conference on advancement in electrical and electronic engineering (ICAEEE). IEEE, pp 1–6

Schmidt A, Wiegand M (2017) A survey on hate speech detection using natural language processing. In: Proceedings of the 5th international workshop on natural language processing for social media, pp 1–10

Sengupta A, Bhattacharjee SK, Akhtar MS, Chakraborty T (2022) Does aggression lead to hate? Detecting and reasoning offensive traits in Hinglish code-mixed texts. Neurocomputing 488:598–617CrossRef

Sharma A, Kabra A, Jain M (2022) Ceasing hate with moh: Hate speech detection in Hindi–English code-switched language. Inf Process Manag 59(1):102760CrossRef

Sreelakshmi K, Premjith B, Soman K (2020) Detection of hate speech text in Hindi–English code-mixed data. Procedia Comput Sci 171:737–744CrossRef

Subramanian M, Ponnusamy R, Benhur S, Shanmugavadivel K, Ganesan A, Ravi D, Shanmugasundaram GK, Priyadharshini R, Chakravarthi BR (2022) Offensive language detection in Tamil youtube comments by adapters and cross-domain knowledge transfer. Comput Speech Lang 76:101404CrossRef

Subramanian M, Adhithiya G, Gowthamkrishnan S, Deepti R (2022) Detecting offensive Tamil texts using machine learning and multilingual transformer models. In: 2022 International conference on smart technologies and systems for next generation computing (ICSTSN). IEEE, pp 1–6

Thomson M, Murfi H, Ardaneswari G (2023) Bert-based hybrid deep learning with text augmentation for sentiment analysis of Indonesian hotel reviews. In: DATA, pp 468–473

Vashistha N, Zubiaga A (2020) Online multilingual hate speech detection: experimenting with Hindi and English social media. Information 12(1):5CrossRef

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:1

Zampieri M, Ranasinghe T, Chaudhari M, Gaikwad S, Krishna P, Nene M, Paygude S (2022) Predicting the type and target of offensive social media posts in Marathi. Soc Network Anal Min 12(1):77CrossRef

Zhang L, Liu B ( 2012) Sentiment analysis and opinion mining. In: Encyclopedia of machine learning and data mining

Zimmerman S, Kruschwitz U, Fox C (2018) Improving hate speech detection with deep learning ensembles. In: Proceedings of the 11th international conference on language resources and evaluation (LREC 2018)

Titel: A survey of hate speech detection in Indian languages
verfasst von: Arpan Nandi
Kamal Sarkar
Arjun Mallick
Arkadeep De
Publikationsdatum: 01.12.2024
Verlag: Springer Vienna
Erschienen in: Social Network Analysis and Mining / Ausgabe 1/2024
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI: https://doi.org/10.1007/s13278-024-01223-y

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1/2024

Lgt: long-range graph transformer for early rumor detection

Enhancing cyberbullying detection: a comparative study of ensemble CNN–SVM and BERT models

Sentiment analysis of tweets on social security and medicare

Classifying informative tweets using feature enhanced pre-trained language model

Development of intelligent system based on synthesis of affective signals and deep neural networks to foster mental health of the Indian virtual community

CLUS-BET: improving influence propagation and classification in networks using a novel seed selection technique

Premium Partner