Skip to main content
Erschienen in: Social Network Analysis and Mining 1/2024

01.12.2024 | Review Paper

A survey of hate speech detection in Indian languages

verfasst von: Arpan Nandi, Kamal Sarkar, Arjun Mallick, Arkadeep De

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the enormous increase in accessibility of high-speed internet, the number of social media users is increasing rapidly. Due to a lack of proper regulations and ethics, social media platforms are often contaminated by posts and comments containing abusive language and offensive remarks toward individuals, groups, races, religions, and communities. A single remark often triggers a huge chain of reactions with similar abusiveness, or even more. To prevent such occurrences, there is a need for automated systems that can detect abusive texts and hate speeches and remove them immediately. However, most existing research works are limited only to globally popular languages like English. Since India is a nation of many diverse languages and multiple religions, nowadays abusive posts and remarks in Indian languages (monolingual or code-mixed form) are not infrequent on social media platforms. Although resources such as hate speech lexicon and annotated datasets are limited for Indian languages, most research works on hate speech detection in such languages used traditional machine learning and deep learning methods for this task. However, multilingualism and code-mixing make hate speech detection in Indian languages more challenging. Given these facts, this paper mainly focuses on reviewing the latest impactful research works on hate speech detection in Indian languages. In this paper, we have analyzed and compared the latest research works on hate speech detection in Indian languages in terms of various aspects—datasets used, feature extraction and classification methods applied, and the results achieved.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Akhter S, et al ( 2018) Social media bullying detection using machine learning on Bangla text. In: 2018 10th International conference on electrical and computer engineering (ICECE). IEEE, pp 385–388 Akhter S, et al ( 2018) Social media bullying detection using machine learning on Bangla text. In: 2018 10th International conference on electrical and computer engineering (ICECE). IEEE, pp 385–388
Zurück zum Zitat Alrehili A (2019) Automatic hate speech detection on social media: a brief survey. In: 2019 IEEE/ACS 16th international conference on computer systems and applications (AICCSA). IEEE, pp 1–6 Alrehili A (2019) Automatic hate speech detection on social media: a brief survey. In: 2019 IEEE/ACS 16th international conference on computer systems and applications (AICCSA). IEEE, pp 1–6
Zurück zum Zitat Al Kuwatly H, Wich M, Groh G (2020) Identifying and measuring annotator bias based on annotators’ demographic characteristics. In: Proceedings of the 4th Workshop on online abuse and harms, pp 184–190 Al Kuwatly H, Wich M, Groh G (2020) Identifying and measuring annotator bias based on annotators’ demographic characteristics. In: Proceedings of the 4th Workshop on online abuse and harms, pp 184–190
Zurück zum Zitat Anusha M, Shashirekha H (2020) An ensemble model for hate speech and offensive content identification in Indo-European languages. In: FIRE (Working Notes), pp 253–259 Anusha M, Shashirekha H (2020) An ensemble model for hate speech and offensive content identification in Indo-European languages. In: FIRE (Working Notes), pp 253–259
Zurück zum Zitat Barnwal S, Kumar R, Pamula R (2022) IIT DHANBAD CODECHAMPS at SemEval-2022 task 5: MAMI—multimedia automatic misogyny identification. In: Proceedings of the 16th international workshop on semantic evaluation (SemEval-2022). Association for Computational Linguistics, Seattle, pp 733–735. https://doi.org/10.18653/v1/2022.semeval-1.101 Barnwal S, Kumar R, Pamula R (2022) IIT DHANBAD CODECHAMPS at SemEval-2022 task 5: MAMI—multimedia automatic misogyny identification. In: Proceedings of the 16th international workshop on semantic evaluation (SemEval-2022). Association for Computational Linguistics, Seattle, pp 733–735. https://​doi.​org/​10.​18653/​v1/​2022.​semeval-1.​101
Zurück zum Zitat Bharathi B, Varsha J ( 2022) Ssncse nlp@ tamilnlp-acl2022: transformer based approach for detection of abusive comment for Tamil language. In: Proceedings of the 2nd workshop on speech and language technologies for Dravidian languages, pp 158–164 Bharathi B, Varsha J ( 2022) Ssncse nlp@ tamilnlp-acl2022: transformer based approach for detection of abusive comment for Tamil language. In: Proceedings of the 2nd workshop on speech and language technologies for Dravidian languages, pp 158–164
Zurück zum Zitat Bhattacharya S, Singh S, Kumar R, Bansal A, Bhagat A, Dawer Y, Lahiri B, Ojha AK (2020) Developing a multilingual annotated corpus of misogyny and aggression. arXiv preprint arXiv:2003.07428 Bhattacharya S, Singh S, Kumar R, Bansal A, Bhagat A, Dawer Y, Lahiri B, Ojha AK (2020) Developing a multilingual annotated corpus of misogyny and aggression. arXiv preprint arXiv:​2003.​07428
Zurück zum Zitat Biradar S, Saumya S et al (2022) Fighting hate speech from bilingual Hinglish speaker’s perspective, a transformer-and translation-based approach. Soc Network Anal Min 12(1):1–10 Biradar S, Saumya S et al (2022) Fighting hate speech from bilingual Hinglish speaker’s perspective, a transformer-and translation-based approach. Soc Network Anal Min 12(1):1–10
Zurück zum Zitat Bohra A, Vijay D, Singh V, Akhtar SS, Shrivastava M (2018) A dataset of Hindi–English code-mixed social media text for hate speech detection. In: Proceedings of the 2nd workshop on computational modeling of people’s opinions, personality, and emotions in social media. Association for Computational Linguistics, New Orleans, Louisiana, pp 36–41. https://doi.org/10.18653/v1/W18-1105 Bohra A, Vijay D, Singh V, Akhtar SS, Shrivastava M (2018) A dataset of Hindi–English code-mixed social media text for hate speech detection. In: Proceedings of the 2nd workshop on computational modeling of people’s opinions, personality, and emotions in social media. Association for Computational Linguistics, New Orleans, Louisiana, pp 36–41. https://​doi.​org/​10.​18653/​v1/​W18-1105
Zurück zum Zitat Chakravarthi BR (2022) Hope speech detection in Youtube comments. Soc Network Anal Min 12(1):1–19 Chakravarthi BR (2022) Hope speech detection in Youtube comments. Soc Network Anal Min 12(1):1–19
Zurück zum Zitat Chakravarthi BR, Priyadharshini R, Muralidaran V, Jose N, Suryawanshi S, Sherly E, McCrae JP (2022) Dravidiancodemix: sentiment analysis and offensive language identification dataset for Dravidian languages in code-mixed text. Lang Resour Eval 56(3):765–806CrossRef Chakravarthi BR, Priyadharshini R, Muralidaran V, Jose N, Suryawanshi S, Sherly E, McCrae JP (2022) Dravidiancodemix: sentiment analysis and offensive language identification dataset for Dravidian languages in code-mixed text. Lang Resour Eval 56(3):765–806CrossRef
Zurück zum Zitat Chakravarthi BR, Priyadharshini R, Jose N, Mandl T, Kumaresan PK, Ponnusamy R, Hariharan R, McCrae JP, Sherly E, et al (2021) Findings of the shared task on offensive language identification in Tamil, Malayalam, and Kannada. In: Proceedings of the 1st workshop on speech and language technologies for Dravidian languages, pp 133–145 Chakravarthi BR, Priyadharshini R, Jose N, Mandl T, Kumaresan PK, Ponnusamy R, Hariharan R, McCrae JP, Sherly E, et al (2021) Findings of the shared task on offensive language identification in Tamil, Malayalam, and Kannada. In: Proceedings of the 1st workshop on speech and language technologies for Dravidian languages, pp 133–145
Zurück zum Zitat Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRef Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRef
Zurück zum Zitat Das M, Saha P, Mathew B, Mukherjee A (2022) Hatecheckhin: Evaluating Hindi hate speech detection models. arXiv preprint arXiv:2205.00328 Das M, Saha P, Mathew B, Mukherjee A (2022) Hatecheckhin: Evaluating Hindi hate speech detection models. arXiv preprint arXiv:​2205.​00328
Zurück zum Zitat Del Vigna12 F, Cimino23 A, Dell’Orletta F, Petrocchi M, Tesconi M (2017) Hate me, hate me not: hate speech detection on facebook. In: Proceedings of the 1st Italian conference on cybersecurity (ITASEC17), pp 86–95 Del Vigna12 F, Cimino23 A, Dell’Orletta F, Petrocchi M, Tesconi M (2017) Hate me, hate me not: hate speech detection on facebook. In: Proceedings of the 1st Italian conference on cybersecurity (ITASEC17), pp 86–95
Zurück zum Zitat Dhanya L, Balakrishnan K (2021) Hate speech detection in Asian languages: A Survey. In: 2021 International conference on communication, control and information sciences (ICCISc) 1:1–5 (IEEE) Dhanya L, Balakrishnan K (2021) Hate speech detection in Asian languages: A Survey. In: 2021 International conference on communication, control and information sciences (ICCISc) 1:1–5 (IEEE)
Zurück zum Zitat Dowlagar S, Mamidi R (2021) A survey of recent neural network models on code-mixed Indian hate speech data. In: Forum for information retrieval evaluation, pp 67–74 Dowlagar S, Mamidi R (2021) A survey of recent neural network models on code-mixed Indian hate speech data. In: Forum for information retrieval evaluation, pp 67–74
Zurück zum Zitat Dutta S, Majumder U, Naskar SK ( 2021) sdutta at comma@ icon: a CNN-LSTM model for hate detection. In: Proceedings of the 18th international conference on natural language processing: shared task on multilingual gender biased and communal language identification, pp 53–57 Dutta S, Majumder U, Naskar SK ( 2021) sdutta at comma@ icon: a CNN-LSTM model for hate detection. In: Proceedings of the 18th international conference on natural language processing: shared task on multilingual gender biased and communal language identification, pp 53–57
Zurück zum Zitat Eshan SC, Hasan MS (2017) An application of machine learning to detect abusive bengali text. In: 2017 20th International conference of computer and information technology (ICCIT). IEEE, pp 1–6 Eshan SC, Hasan MS (2017) An application of machine learning to detect abusive bengali text. In: 2017 20th International conference of computer and information technology (ICCIT). IEEE, pp 1–6
Zurück zum Zitat Grave E, Bojanowski P, Gupta P, Joulin A, Mikolov T (2018b) Learning word vectors for 157 languages. arXiv preprint arXiv:1802.06893 Grave E, Bojanowski P, Gupta P, Joulin A, Mikolov T (2018b) Learning word vectors for 157 languages. arXiv preprint arXiv:​1802.​06893
Zurück zum Zitat Guest E, Vidgen B, Mittos A, Sastry N, Tyson G, Margetts H (2021) An expert annotated dataset for the detection of online misogyny. In: Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume, pp 1336–1350 Guest E, Vidgen B, Mittos A, Sastry N, Tyson G, Margetts H (2021) An expert annotated dataset for the detection of online misogyny. In: Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume, pp 1336–1350
Zurück zum Zitat Himabindu GSSN, Rao R, Sethia D (2022) A self-attention hybrid emoji prediction model for code-mixed language: (Hinglish). Social Network Anal Min 12(1):137CrossRef Himabindu GSSN, Rao R, Sethia D (2022) A self-attention hybrid emoji prediction model for code-mixed language: (Hinglish). Social Network Anal Min 12(1):137CrossRef
Zurück zum Zitat Ishmam AM, Sharmin S (2019) Hateful speech detection in public facebook pages for the Bengali language. In: 2019 18th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 555–560 Ishmam AM, Sharmin S (2019) Hateful speech detection in public facebook pages for the Bengali language. In: 2019 18th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 555–560
Zurück zum Zitat Islam M, Hossain MS, Akhter N ( 2022) Hate speech detection using machine learning in Bengali languages. In: 2022 6th International conference on intelligent computing and control systems (ICICCS). IEEE, pp 1349–1354 Islam M, Hossain MS, Akhter N ( 2022) Hate speech detection using machine learning in Bengali languages. In: 2022 6th International conference on intelligent computing and control systems (ICICCS). IEEE, pp 1349–1354
Zurück zum Zitat Jemima PP, Majumder BR, Ghosh BK, Hoda F (2022) Hate speech detection using machine learning. In: 2022 7th international conference on communication and electronics systems (ICCES). IEEE, pp 1274–1277 Jemima PP, Majumder BR, Ghosh BK, Hoda F (2022) Hate speech detection using machine learning. In: 2022 7th international conference on communication and electronics systems (ICCES). IEEE, pp 1274–1277
Zurück zum Zitat Jha VK, Hrudya P, Vinu P, Vijayan V, Prabaharan P (2020) Dhot-repository and classification of offensive tweets in the Hindi language. Procedia Comput Sci 171:2324–2333CrossRef Jha VK, Hrudya P, Vinu P, Vijayan V, Prabaharan P (2020) Dhot-repository and classification of offensive tweets in the Hindi language. Procedia Comput Sci 171:2324–2333CrossRef
Zurück zum Zitat Joshi R, Karnavat R, Jirapure K, Joshi R (2021) Evaluation of deep learning models for hostility detection in Hindi text. In: 2021 6th International conference for convergence in technology (I2CT). IEEE, pp 1–5 Joshi R, Karnavat R, Jirapure K, Joshi R (2021) Evaluation of deep learning models for hostility detection in Hindi text. In: 2021 6th International conference for convergence in technology (I2CT). IEEE, pp 1–5
Zurück zum Zitat Kamble S, Joshi A (2018) Hate speech detection from code-mixed Hindi–English tweets using deep learning models. arXiv preprint arXiv:1811.05145 Kamble S, Joshi A (2018) Hate speech detection from code-mixed Hindi–English tweets using deep learning models. arXiv preprint arXiv:​1811.​05145
Zurück zum Zitat Karim MR, Dey SK, Islam T, Sarker S, Menon MH, Hossain K, Hossain MA, Decker S (2021) Deephateexplainer: explainable hate speech detection in under-resourced Bengali language. In: 2021 IEEE 8th international conference on data science and advanced analytics (DSAA). IEEE, pp 1–10 Karim MR, Dey SK, Islam T, Sarker S, Menon MH, Hossain K, Hossain MA, Decker S (2021) Deephateexplainer: explainable hate speech detection in under-resourced Bengali language. In: 2021 IEEE 8th international conference on data science and advanced analytics (DSAA). IEEE, pp 1–10
Zurück zum Zitat Khan H, Phillips JL (2021) Language agnostic model: detecting islamophobic content on social media. In: Proceedings of the 2021 ACM southeast conference, pp 229–233 Khan H, Phillips JL (2021) Language agnostic model: detecting islamophobic content on social media. In: Proceedings of the 2021 ACM southeast conference, pp 229–233
Zurück zum Zitat Kumar R, Lahiri B, Ojha AK (2021) Aggressive and offensive language identification in Hindi, Bangla, and English: a comparative study. SN Comput Sci 2(1):1–20CrossRef Kumar R, Lahiri B, Ojha AK (2021) Aggressive and offensive language identification in Hindi, Bangla, and English: a comparative study. SN Comput Sci 2(1):1–20CrossRef
Zurück zum Zitat Kumar R, Reganti AN, Bhatia A, Maheshwari T (2018) Aggression-annotated corpus of Hindi–English code-mixed data. arXiv preprint arXiv:1803.09402 Kumar R, Reganti AN, Bhatia A, Maheshwari T (2018) Aggression-annotated corpus of Hindi–English code-mixed data. arXiv preprint arXiv:​1803.​09402
Zurück zum Zitat Kumar T, Mahrishi M, Sharma G (2023) Emotion recognition in Hindi text using multilingual Bert transformer. Multimed Tools Appl 1–22 Kumar T, Mahrishi M, Sharma G (2023) Emotion recognition in Hindi text using multilingual Bert transformer. Multimed Tools Appl 1–22
Zurück zum Zitat Kumar R, Ojha AK, Malmasi S, Zampieri M ( 2018) Benchmarking aggression identification in social media. In: Proceedings of the 1st workshop on trolling, aggression and cyberbullying (TRAC-2018), pp 1–11 Kumar R, Ojha AK, Malmasi S, Zampieri M ( 2018) Benchmarking aggression identification in social media. In: Proceedings of the 1st workshop on trolling, aggression and cyberbullying (TRAC-2018), pp 1–11
Zurück zum Zitat Kumar R, Ojha AK, Malmasi S, Zampieri M (2020) Evaluating aggression identification in social media. In: Proceedings of the 2nd workshop on trolling, aggression and cyberbullying, pp 1–5 Kumar R, Ojha AK, Malmasi S, Zampieri M (2020) Evaluating aggression identification in social media. In: Proceedings of the 2nd workshop on trolling, aggression and cyberbullying, pp 1–5
Zurück zum Zitat Kumaresan PK, Sakuntharaj R, Thavareesan S, Navaneethakrishnan S, Madasamy AK, Chakravarthi BR, McCrae JP (2021) Findings of shared task on offensive language identification in Tamil and Malayalam. In: Forum for information retrieval evaluation, pp 16–18 Kumaresan PK, Sakuntharaj R, Thavareesan S, Navaneethakrishnan S, Madasamy AK, Chakravarthi BR, McCrae JP (2021) Findings of shared task on offensive language identification in Tamil and Malayalam. In: Forum for information retrieval evaluation, pp 16–18
Zurück zum Zitat Mandl T, Modha S, Majumder P, Patel D, Dave M, Mandlia C, Patel A ( 2019) Overview of the hasoc track at fire 2019: hate speech and offensive content identification in Indo-European languages. In: Proceedings of the 11th annual meeting of the forum for information retrieval evaluation, pp 14–17 Mandl T, Modha S, Majumder P, Patel D, Dave M, Mandlia C, Patel A ( 2019) Overview of the hasoc track at fire 2019: hate speech and offensive content identification in Indo-European languages. In: Proceedings of the 11th annual meeting of the forum for information retrieval evaluation, pp 14–17
Zurück zum Zitat Mandl T, Modha S, Kumar MA, Chakravarthi BR ( 2020) Overview of the hasoc track at fire 2020: hate speech and offensive language identification in Tamil, Malayalam, Hindi, English and German. In: Forum for information retrieval evaluation, pp 29–32 Mandl T, Modha S, Kumar MA, Chakravarthi BR ( 2020) Overview of the hasoc track at fire 2020: hate speech and offensive language identification in Tamil, Malayalam, Hindi, English and German. In: Forum for information retrieval evaluation, pp 29–32
Zurück zum Zitat Masud S, Charaborty T (2023) Political mud slandering and power dynamics during Indian assembly elections. Soc Network Anal Min 13(1):108CrossRef Masud S, Charaborty T (2023) Political mud slandering and power dynamics during Indian assembly elections. Soc Network Anal Min 13(1):108CrossRef
Zurück zum Zitat Mathew B, Illendula A, Saha P, Sarkar S, Goyal P, Mukherjee A (2020) Hate begets hate: a temporal study of hate speech. Proc ACM Hum–Comput Interaction 4( CSCW2):1–24 Mathew B, Illendula A, Saha P, Sarkar S, Goyal P, Mukherjee A (2020) Hate begets hate: a temporal study of hate speech. Proc ACM Hum–Comput Interaction 4( CSCW2):1–24
Zurück zum Zitat Mathur P, Shah R, Sawhney R, Mahata D (2018) Detecting offensive tweets in Hindi–English code-switched language. In: Proceedings of the 6th international workshop on natural language processing for social media, pp 18–26 Mathur P, Shah R, Sawhney R, Mahata D (2018) Detecting offensive tweets in Hindi–English code-switched language. In: Proceedings of the 6th international workshop on natural language processing for social media, pp 18–26
Zurück zum Zitat Meetei LS, Singh TD, Borgohain SK, Bandyopadhyay S (2021) Low resource language specific pre-processing and features for sentiment analysis task. Lang Resour Eval 55(4):947–969CrossRef Meetei LS, Singh TD, Borgohain SK, Bandyopadhyay S (2021) Low resource language specific pre-processing and features for sentiment analysis task. Lang Resour Eval 55(4):947–969CrossRef
Zurück zum Zitat Mikolov T, Chen K, Corrado G, Dean, J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 Mikolov T, Chen K, Corrado G, Dean, J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:​1301.​3781
Zurück zum Zitat Mridha MF, Wadud MAH, Hamid MA, Monowar MM, Abdullah-Al-Wadud M, Alamri A (2021) L-boost: identifying offensive texts from social media post in Bengali. IEEE Access 9:164681–164699CrossRef Mridha MF, Wadud MAH, Hamid MA, Monowar MM, Abdullah-Al-Wadud M, Alamri A (2021) L-boost: identifying offensive texts from social media post in Bengali. IEEE Access 9:164681–164699CrossRef
Zurück zum Zitat Mundra S, Mittal N (2022) Fa-net: fused attention-based network for Hindi English code-mixed offensive text classification. Soc Network Anal Min 12(1):100CrossRef Mundra S, Mittal N (2022) Fa-net: fused attention-based network for Hindi English code-mixed offensive text classification. Soc Network Anal Min 12(1):100CrossRef
Zurück zum Zitat Mundra S, Mittal N (2023) Cmhe-an: code mixed hybrid embedding based attention network for aggression identification in Hindi English code-mixed text. Multimed Tools Appl 82(8):11337–11364CrossRef Mundra S, Mittal N (2023) Cmhe-an: code mixed hybrid embedding based attention network for aggression identification in Hindi English code-mixed text. Multimed Tools Appl 82(8):11337–11364CrossRef
Zurück zum Zitat Naseem U, Razzak I, Eklund PW (2021) A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter. Multimed Tools Appl 80(28):35239–35266CrossRef Naseem U, Razzak I, Eklund PW (2021) A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter. Multimed Tools Appl 80(28):35239–35266CrossRef
Zurück zum Zitat Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRef Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRef
Zurück zum Zitat Patil H, Velankar A, Joshi R (2022) L3cube-mahahate: A tweet-based marathi hate speech detection dataset and bert models. In: Proceedings of the 3rd workshop on threat, aggression and cyberbullying (TRAC 2022), pp 1– 9 Patil H, Velankar A, Joshi R (2022) L3cube-mahahate: A tweet-based marathi hate speech detection dataset and bert models. In: Proceedings of the 3rd workshop on threat, aggression and cyberbullying (TRAC 2022), pp 1– 9
Zurück zum Zitat Pavlopoulos J, Sorensen J, Laugier L, Androutsopoulos I (2021) Semeval-2021 task 5: toxic spans detection. In: Proceedings of the 15th international workshop on semantic evaluation (SemEval-2021), pp 59–69 Pavlopoulos J, Sorensen J, Laugier L, Androutsopoulos I (2021) Semeval-2021 task 5: toxic spans detection. In: Proceedings of the 15th international workshop on semantic evaluation (SemEval-2021), pp 59–69
Zurück zum Zitat Poletto F, Basile V, Sanguinetti M, Bosco C, Patti V (2021) Resources and benchmark corpora for hate speech detection: a systematic review. Lang Resourc Eval 55(2):477–523CrossRef Poletto F, Basile V, Sanguinetti M, Bosco C, Patti V (2021) Resources and benchmark corpora for hate speech detection: a systematic review. Lang Resourc Eval 55(2):477–523CrossRef
Zurück zum Zitat Rahman AI, Akhand Z-E, Noor MAU, Islam J, Mahtab M, Mehedi MHK, Rasel AA, et al (2022) Comparative analysis on joint modeling of emotion and abuse detection in Bangla language. In: International conference on advances in computing and data sciences. Springer, pp 199–209 Rahman AI, Akhand Z-E, Noor MAU, Islam J, Mahtab M, Mehedi MHK, Rasel AA, et al (2022) Comparative analysis on joint modeling of emotion and abuse detection in Bangla language. In: International conference on advances in computing and data sciences. Springer, pp 199–209
Zurück zum Zitat Rani P, Suryawanshi S, Goswami K, Chakravarthi BR, Fransen T, McCrae JP (2020) A comparative study of different state-of-the-art hate speech detection methods in Hindi–English code-mixed data. In: Proceedings of the 2nd workshop on trolling, aggression and cyberbullying, pp 42–48 Rani P, Suryawanshi S, Goswami K, Chakravarthi BR, Fransen T, McCrae JP (2020) A comparative study of different state-of-the-art hate speech detection methods in Hindi–English code-mixed data. In: Proceedings of the 2nd workshop on trolling, aggression and cyberbullying, pp 42–48
Zurück zum Zitat Remon NI, Tuli NH, Akash RD( 2022) Bengali hate speech detection in public facebook pages. In: 2022 International conference on innovations in science, engineering and technology (ICISET). IEEE, pp 169–173 Remon NI, Tuli NH, Akash RD( 2022) Bengali hate speech detection in public facebook pages. In: 2022 International conference on innovations in science, engineering and technology (ICISET). IEEE, pp 169–173
Zurück zum Zitat Roy PK, Bhawal S, Subalalitha CN (2022) Hate speech and offensive language detection in Dravidian languages using deep ensemble framework. Comput Speech Lang 75:101386CrossRef Roy PK, Bhawal S, Subalalitha CN (2022) Hate speech and offensive language detection in Dravidian languages using deep ensemble framework. Comput Speech Lang 75:101386CrossRef
Zurück zum Zitat Roy A, Kapil P, Basak K, Ekbal A(2018) An ensemble approach for aggression identification in english and hindi text. In: Proceedings of the 1st workshop on trolling, aggression and cyberbullying (TRAC-2018), pp 66–73 Roy A, Kapil P, Basak K, Ekbal A(2018) An ensemble approach for aggression identification in english and hindi text. In: Proceedings of the 1st workshop on trolling, aggression and cyberbullying (TRAC-2018), pp 66–73
Zurück zum Zitat Samghabadi NS, Patwa P, Pykl S, Mukherjee P, Das A, Solorio T( 2020) Aggression and misogyny detection using bert: a multi-task approach. In: Proceedings of the 2nd workshop on trolling, aggression and cyberbullying, pp 126–131 Samghabadi NS, Patwa P, Pykl S, Mukherjee P, Das A, Solorio T( 2020) Aggression and misogyny detection using bert: a multi-task approach. In: Proceedings of the 2nd workshop on trolling, aggression and cyberbullying, pp 126–131
Zurück zum Zitat Sap M, Card D, Gabriel S, Choi Y, Smith NA ( 2019) The risk of racial bias in hate speech detection. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 1668–1678 Sap M, Card D, Gabriel S, Choi Y, Smith NA ( 2019) The risk of racial bias in hate speech detection. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 1668–1678
Zurück zum Zitat Sarkar K (2018) Using character n-gram features and multinomial naïve bayes for sentiment polarity detection in Bengali tweets. In: 2018 5th International conference on emerging applications of information technology (EAIT), pp 1–4 Sarkar K (2018) Using character n-gram features and multinomial naïve bayes for sentiment polarity detection in Bengali tweets. In: 2018 5th International conference on emerging applications of information technology (EAIT), pp 1–4
Zurück zum Zitat Sarker M, Hossain MF, Liza FR, Sakib SN, Al Farooq A ( 2022) A machine learning approach to classify anti-social Bengali comments on social media. In: 2022 International conference on advancement in electrical and electronic engineering (ICAEEE). IEEE, pp 1–6 Sarker M, Hossain MF, Liza FR, Sakib SN, Al Farooq A ( 2022) A machine learning approach to classify anti-social Bengali comments on social media. In: 2022 International conference on advancement in electrical and electronic engineering (ICAEEE). IEEE, pp 1–6
Zurück zum Zitat Schmidt A, Wiegand M (2017) A survey on hate speech detection using natural language processing. In: Proceedings of the 5th international workshop on natural language processing for social media, pp 1–10 Schmidt A, Wiegand M (2017) A survey on hate speech detection using natural language processing. In: Proceedings of the 5th international workshop on natural language processing for social media, pp 1–10
Zurück zum Zitat Sengupta A, Bhattacharjee SK, Akhtar MS, Chakraborty T (2022) Does aggression lead to hate? Detecting and reasoning offensive traits in Hinglish code-mixed texts. Neurocomputing 488:598–617CrossRef Sengupta A, Bhattacharjee SK, Akhtar MS, Chakraborty T (2022) Does aggression lead to hate? Detecting and reasoning offensive traits in Hinglish code-mixed texts. Neurocomputing 488:598–617CrossRef
Zurück zum Zitat Sharma A, Kabra A, Jain M (2022) Ceasing hate with moh: Hate speech detection in Hindi–English code-switched language. Inf Process Manag 59(1):102760CrossRef Sharma A, Kabra A, Jain M (2022) Ceasing hate with moh: Hate speech detection in Hindi–English code-switched language. Inf Process Manag 59(1):102760CrossRef
Zurück zum Zitat Sreelakshmi K, Premjith B, Soman K (2020) Detection of hate speech text in Hindi–English code-mixed data. Procedia Comput Sci 171:737–744CrossRef Sreelakshmi K, Premjith B, Soman K (2020) Detection of hate speech text in Hindi–English code-mixed data. Procedia Comput Sci 171:737–744CrossRef
Zurück zum Zitat Subramanian M, Ponnusamy R, Benhur S, Shanmugavadivel K, Ganesan A, Ravi D, Shanmugasundaram GK, Priyadharshini R, Chakravarthi BR (2022) Offensive language detection in Tamil youtube comments by adapters and cross-domain knowledge transfer. Comput Speech Lang 76:101404CrossRef Subramanian M, Ponnusamy R, Benhur S, Shanmugavadivel K, Ganesan A, Ravi D, Shanmugasundaram GK, Priyadharshini R, Chakravarthi BR (2022) Offensive language detection in Tamil youtube comments by adapters and cross-domain knowledge transfer. Comput Speech Lang 76:101404CrossRef
Zurück zum Zitat Subramanian M, Adhithiya G, Gowthamkrishnan S, Deepti R (2022) Detecting offensive Tamil texts using machine learning and multilingual transformer models. In: 2022 International conference on smart technologies and systems for next generation computing (ICSTSN). IEEE, pp 1–6 Subramanian M, Adhithiya G, Gowthamkrishnan S, Deepti R (2022) Detecting offensive Tamil texts using machine learning and multilingual transformer models. In: 2022 International conference on smart technologies and systems for next generation computing (ICSTSN). IEEE, pp 1–6
Zurück zum Zitat Thomson M, Murfi H, Ardaneswari G (2023) Bert-based hybrid deep learning with text augmentation for sentiment analysis of Indonesian hotel reviews. In: DATA, pp 468–473 Thomson M, Murfi H, Ardaneswari G (2023) Bert-based hybrid deep learning with text augmentation for sentiment analysis of Indonesian hotel reviews. In: DATA, pp 468–473
Zurück zum Zitat Vashistha N, Zubiaga A (2020) Online multilingual hate speech detection: experimenting with Hindi and English social media. Information 12(1):5CrossRef Vashistha N, Zubiaga A (2020) Online multilingual hate speech detection: experimenting with Hindi and English social media. Information 12(1):5CrossRef
Zurück zum Zitat Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:1 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:1
Zurück zum Zitat Zampieri M, Ranasinghe T, Chaudhari M, Gaikwad S, Krishna P, Nene M, Paygude S (2022) Predicting the type and target of offensive social media posts in Marathi. Soc Network Anal Min 12(1):77CrossRef Zampieri M, Ranasinghe T, Chaudhari M, Gaikwad S, Krishna P, Nene M, Paygude S (2022) Predicting the type and target of offensive social media posts in Marathi. Soc Network Anal Min 12(1):77CrossRef
Zurück zum Zitat Zhang L, Liu B ( 2012) Sentiment analysis and opinion mining. In: Encyclopedia of machine learning and data mining Zhang L, Liu B ( 2012) Sentiment analysis and opinion mining. In: Encyclopedia of machine learning and data mining
Zurück zum Zitat Zimmerman S, Kruschwitz U, Fox C (2018) Improving hate speech detection with deep learning ensembles. In: Proceedings of the 11th international conference on language resources and evaluation (LREC 2018) Zimmerman S, Kruschwitz U, Fox C (2018) Improving hate speech detection with deep learning ensembles. In: Proceedings of the 11th international conference on language resources and evaluation (LREC 2018)
Metadaten
Titel
A survey of hate speech detection in Indian languages
verfasst von
Arpan Nandi
Kamal Sarkar
Arjun Mallick
Arkadeep De
Publikationsdatum
01.12.2024
Verlag
Springer Vienna
Erschienen in
Social Network Analysis and Mining / Ausgabe 1/2024
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-024-01223-y

Weitere Artikel der Ausgabe 1/2024

Social Network Analysis and Mining 1/2024 Zur Ausgabe

Premium Partner