ABSTRACT
With the increase in user generated content, particularly on social media networks, the amount of hate speech is also steadily increasing. So, there is a need to automatically detect such hateful content and curb the wrongful activities. While relevant research has been done independently on code-mixed social media texts and hate speech detection, this paper deals with the task of identification of hate speech from code-mixed social media text. We perform experiments with available code-mixed dataset for hate speech detection using two architectures namely sub-word level LSTM model and Hierarchical LSTM model with attention based on phonemic sub-words.
- Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, and Vasudeva Varma. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 759--760. Google ScholarDigital Library
- Kalika Bali, Jatin Sharma, Monojit Choudhury, and Yogarshi Vyas. 2014. " I am borrowing ya mixing?" An Analysis of English-Hindi Code Mixing in Facebook. In Proceedings of the First Workshop on Computational Approaches to Code Switching. 116--126.Google ScholarCross Ref
- Utsab Barman, Amitava Das, Joachim Wagner, and Jennifer Foster. 2014. Code mixing: A challenge for language identification in the language of social media. In Proceedings of the first workshop on computational approaches to code switching. 13--23.Google ScholarCross Ref
- Billal Belainine, Alexsandro Fonseca, and Fatiha Sadat. 2016. Named Entity Recognition and Hashtag Decomposition to Improve the Classification of Tweets. In Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT). 102--111.Google Scholar
- Aditya Bohra, Deepanshu Vijay, Vinay Singh, Syed Sarfaraz Akhtar, and Manish Shrivastava. 2018. A Dataset of Hindi-English Code-Mixed Social Media Text for Hate Speech Detection. In Proceedings of the Second Workshop on Computational Modeling of PeopleâăŹs Opinions, Personality, and Emotions in Social Media. 36--41.Google ScholarCross Ref
- Gokul Chittaranjan, Yogarshi Vyas, Kalika Bali, and Monojit Choudhury. 2014. Word-level language identification using CRF: Code-switching shared task report of MSR India system. In Proceedings of The First Workshop on Computational Approaches to Code Switching. 73--79.Google ScholarCross Ref
- Karthik Dinakar, Roi Reichart, and Henry Lieberman. 2011. Modeling the detection of Textual Cyberbullying. The Social Mobile Web 11, 02 (2011), 11--17.Google Scholar
- Nemanja Djuric, Jing Zhou, Robin Morris, Mihajlo Grbovic, Vladan Radosavljevic, and Narayan Bhamidipati. 2015. Hate speech detection with comment embeddings. In Proceedings of the 24th international conference on world wide web. ACM, 29--30. Google ScholarDigital Library
- Souvick Ghosh, Satanu Ghosh, and Dipankar Das. 2017. Sentiment Identification in Code-Mixed Social Media Text. arXiv preprint arXiv:1707.01184 (2017).Google Scholar
- Parth Gupta, Kalika Bali, Rafael E Banchs, Monojit Choudhury, and Paolo Rosso. 2014. Query expansion for mixed-script information retrieval. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 677--686. Google ScholarDigital Library
- Aditya Joshi, Ameya Prabhu, Manish Shrivastava, and Vasudeva Varma. 2016. Towards sub-word level compositions for sentiment analysis of hindi-english code mixed text. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 2482--2491.Google Scholar
- Upendra Kumar, Vishal Singh Rana, Chris Andrew, Santhoshini Reddy, and Amitava Das. 2018. Consonant-Vowel Sequences as Subword Units for Code-Mixed Languages. (2018).Google Scholar
- Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International Conference on Machine Learning. 1188--1196. Google ScholarDigital Library
- Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang. 2016. Abusive language detection in online user content. In Proceedings of the 25th international conference on world wide web. International World Wide Web Conferences Steering Committee, 145--153. Google ScholarDigital Library
- Ameya Prabhu, Aditya Joshi, Manish Shrivastava, and Vasudeva Varma. 2016. Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text. arXiv preprint arXiv.1611.00472 (2016).Google Scholar
- Haji Mohammad Saleem, Kelly P Dillon, Susan Benesch, and Derek Ruths. 2017. A web of hate: Tackling hateful speech in online social spaces. arXiv preprint arXiv:1709.10159 (2017).Google Scholar
- Arnav Sharma, Sakshi Gupta, Raveesh Motlani, Piyush Bansal, Manish Srivastava, Radhika Mamidi, and Dipti M Sharma. 2016. Shallow parsing pipeline for hindi-english code-mixed social media text. arXiv preprint arXiv:1604.03136 (2016).Google Scholar
- Stéphan Tulkens, Lisa Hilte, Elise Lodewyckx, Ben Verhoeven, and Walter Daelemans. 2016. A dictionary-based approach to racism detection in dutch social media. arXiv preprint arXiv:1608.08738 (2016).Google Scholar
- Yogarshi Vyas, Spandana Gella, Jatin Sharma, Kalika Bali, and Monojit Choudhury. 2014. Pos tagging of english-hindi code-mixed social media content. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 974--979.Google ScholarCross Ref
- Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of the NAACL student research workshop. 88--93.Google ScholarCross Ref
Index Terms
- Hate Speech Detection in Hindi-English Code-Mixed Social Media Text
Recommendations
A Framework for Online Hate Speech Detection on Code-mixed Hindi-English Text and Hindi Text in Devanagari
Social Media has been growing and has provided the world with a platform to opine, debate, display, and discuss like never before. It has a major influence in research areas that analyze human behavior and social groups, and the phenomenon of social ...
A Measurement Study of Hate Speech in Social Media
HT '17: Proceedings of the 28th ACM Conference on Hypertext and Social MediaSocial media platforms provide an inexpensive communication medium that allows anyone to quickly reach millions of users. Consequently, in these platforms anyone can publish content and anyone interested in the content can obtain it, representing a ...
Part-of-Speech Tagger for Konkani-English Code-Mixed Social Media Text
Natural Language Processing and Information SystemsAbstractIn this paper, we propose efficient and less resource-intensive strategies for Konkani-English code-mixed social media text. which witnesses several challenges as compared to tagging general normal text. Part-of-Speech Tagging is a primary and an ...
Comments