ABSTRACT
The HASOC track is dedicated to the evaluation of technology for finding Offensive Language and Hate Speech. HASOC is creating a multilingual data corpus mainly for English and under-resourced languages(Hindi and Marathi). This paper presents one HASOC subtrack with two tasks. In 2021, we organized the classification task for English, Hindi, and Marathi. The first task consists of two classification tasks; Subtask 1A consists of a binary and fine-grained classification into offensive and non-offensive tweets. Subtask 1B asks to classify the tweets into Hate, Profane and offensive. Task 2 consists of identifying tweets given additional context in the form of the preceding conversion. During the shared task, 65 teams have submitted 652 runs. This overview paper briefly presents the task descriptions, the data and the results obtained from the participant’s submission.
- Kadam Aditya, Goel Anmol, Jain Jivitesh, Kalra Jushaan, Singh, Subramanian Mallika, Reddy Manvith, Kodali Prashant, H Arjun, T, Shrivastava Manish, and Kumaraguru Ponnurangam. 2021. Battling Hateful Content in Indic Languages HASOC ’21. In Forum for Information Retrieval Evaluation (Working Notes) (FIRE). CEUR-WS.org.Google Scholar
- Glazkova Anna, Kadantsev Michael, and Glazkov Maksim. 2021. Fine-tuning of Pre-trained Transformers for Hate, Offensive, and Profane Content Detection in English and Marathi. In Forum for Information Retrieval Evaluation (Working Notes) (FIRE). CEUR-WS.org.Google Scholar
- Mitra Arka and Sankhala Priyanshu. 2021. Multilingual Hate Speech and Offensive Content Detection using Modified Cross-entropy Loss. In Forum for Information Retrieval Evaluation (Working Notes) (FIRE). CEUR-WS.org.Google Scholar
- Thomas Mandl, Sandip Modha, Anand Kumar M, and Bharathi Raja Chakravarthi. 2020. Overview of the HASOC Track at FIRE 2020: Hate Speech and Offensive Language Identification in Tamil, Malayalam, Hindi, English and German. In FIRE 2020: Forum for Information Retrieval Evaluation, Hyderabad, India, December 16-20, 2020, Prasenjit Majumder, Mandar Mitra, Surupendu Gangopadhyay, and Parth Mehta (Eds.). ACM, 29–32. https://doi.org/10.1145/3441501.3441517Google ScholarDigital Library
- Thomas Mandl, Sandip Modha, Prasenjit Majumder, Daksh Patel, Mohana Dave, Chintak Mandlia, and Aditya Patel. 2019. Overview of the hasoc track at fire 2019: Hate speech and offensive content identification in indo-european languages. In Proceedings of the 11th forum for information retrieval evaluation. 14–17.Google ScholarDigital Library
- Thomas Mandl, Sandip Modha, Gautam Kishore Shahi, Hiren Madhu, Shrey Satapara, Prasenjit Majumder, Johannes Schäfer, Tharindu Ranasinghe, Marcos Zampieri, Durgesh Nandini, and Amit Kumar Jaiswal. 2021. Overview of the HASOC subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages. In Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation. CEUR. http://ceur-ws.org/Google Scholar
- Nene Mayuresh, North Kai, Ranasinghe Tharindu, and Zampieri Marcos. 2021. Transformer Models for Offensive Language Identification in Marathi. In Forum for Information Retrieval Evaluation (Working Notes) (FIRE). CEUR-WS.org.Google Scholar
- Bhatia Mehar, Bhotia Tenzin, Singhay, Agarwal Akshat, Ramesh Prakash, Gupta Shubham, Shridhar Kumar, Laumann Felix, and Dash Ayushman. 2021. One to Rule Them All: Towards Joint Indic Language Hate Speech Detection. In Forum for Information Retrieval Evaluation (Working Notes) (FIRE). CEUR-WS.org.Google Scholar
- Sandip Modha, Prasenjit Majumder, Thomas Mandl, and Rishab Singla. 2021. Design and analysis of microblog-based summarization system. Social Network Analysis and Mining 11, 1 (2021), 1–16. https://doi.org/10.1007/s13278-021-00830-3Google ScholarCross Ref
- Bölücü Necva and Canbay Pelin. 2021. Hate Speech and Offensive Content Identification with Graph Convolutional Networks. In Forum for Information Retrieval Evaluation (Working Notes) (FIRE). CEUR-WS.org.Google Scholar
- Shrey Satapara, Sandip Modha, Thomas Mandl, Hiren Madhu, and Prasenjit Majumder. 2021. Overview of the HASOC Subtrack at FIRE 2021: Conversational Hate Speech Detection in Code-mixed language. In Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation. CEUR.Google Scholar
- Mundra Shikha, Singh Nikhil, and Mittal Namita. 2021. Fine-tune BERT to Classify Hate Speech in Hindi English Code-Mixed Text. In Forum for Information Retrieval Evaluation (Working Notes) (FIRE). CEUR-WS.org.Google Scholar
- Banerjee Somnath, Sarkar Maulindu, Agrawal Nancy, Saha Punyajoy, and Das Mithun. 2021. Exploring Transformer Based Models to Identify Hate Speech and Offensive Content in English and Indo-Aryan Languages. In Forum for Information Retrieval Evaluation (Working Notes) (FIRE). CEUR-WS.org.Google Scholar
- Agustian Surya, Saputra Reski, and Fadhilah Aidil. 2021. “Feature Selection” with Pretrained-BERT for Hate Speech and Offensive Content Identification in English and Hindi Languages. In Forum for Information Retrieval Evaluation (Working Notes) (FIRE). CEUR-WS.org.Google Scholar
- Kui Yongyi. 2021. Detect Hate and Offensive Content in English and Indo-Aryan Languages based on Transformer. In Forum for Information Retrieval Evaluation (Working Notes) (FIRE). CEUR-WS.org.Google Scholar
- Bestgen Yves. 2021. A simple language-agnostic yet strong baseline system for hate speech and offensive content identification. In Forum for Information Retrieval Evaluation (Working Notes) (FIRE). CEUR-WS.org.Google Scholar
- Farooqi Zaki, Mustafa, Ghosh Sreyan, and Shah Rajiv, Ratn. 2021. Leveraging Transformers for Hate Speech Detection in Conversational Code-Mixed Tweets. In Forum for Information Retrieval Evaluation (Working Notes) (FIRE). CEUR-WS.org.Google Scholar
Index Terms
- Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages and Conversational Hate Speech
Recommendations
Overview of the HASOC Subtrack at FIRE 2022: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages
FIRE '22: Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval EvaluationIn recent years, the spread of online offensive content has become of great concern, motivating researchers to develop robust systems capable of identifying such content automatically. To carry out a fair evaluation of these systems, several ...
Overview of the HASOC Subtracks at FIRE 2023: Hate Speech and Offensive Content Identification in Assamese, Bengali, Bodo, Gujarati and Sinhala
FIRE '23: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval EvaluationThe evaluation of content moderation systems requires reliable benchmark data. This task becomes particularly formidable for low-resource languages, where obtaining or curating such data poses significant challenges. Addressing this issue, HASOC 2023 ...
Identifying and Categorising Profane Words in Hate Speech
ICCDA '18: Proceedings of the 2nd International Conference on Compute and Data AnalysisThis study attempts to explore the different types of Hate Speech appearing in social media by identifying profane words used in hate speech. This study also compares the profane words used in different generations to assist in identifying the user's ...
Comments