skip to main content
10.1145/3297001.3297048acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
short-paper

Hate Speech Detection in Hindi-English Code-Mixed Social Media Text

Published:03 January 2019Publication History

ABSTRACT

With the increase in user generated content, particularly on social media networks, the amount of hate speech is also steadily increasing. So, there is a need to automatically detect such hateful content and curb the wrongful activities. While relevant research has been done independently on code-mixed social media texts and hate speech detection, this paper deals with the task of identification of hate speech from code-mixed social media text. We perform experiments with available code-mixed dataset for hate speech detection using two architectures namely sub-word level LSTM model and Hierarchical LSTM model with attention based on phonemic sub-words.

References

  1. Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, and Vasudeva Varma. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 759--760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Kalika Bali, Jatin Sharma, Monojit Choudhury, and Yogarshi Vyas. 2014. " I am borrowing ya mixing?" An Analysis of English-Hindi Code Mixing in Facebook. In Proceedings of the First Workshop on Computational Approaches to Code Switching. 116--126.Google ScholarGoogle ScholarCross RefCross Ref
  3. Utsab Barman, Amitava Das, Joachim Wagner, and Jennifer Foster. 2014. Code mixing: A challenge for language identification in the language of social media. In Proceedings of the first workshop on computational approaches to code switching. 13--23.Google ScholarGoogle ScholarCross RefCross Ref
  4. Billal Belainine, Alexsandro Fonseca, and Fatiha Sadat. 2016. Named Entity Recognition and Hashtag Decomposition to Improve the Classification of Tweets. In Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT). 102--111.Google ScholarGoogle Scholar
  5. Aditya Bohra, Deepanshu Vijay, Vinay Singh, Syed Sarfaraz Akhtar, and Manish Shrivastava. 2018. A Dataset of Hindi-English Code-Mixed Social Media Text for Hate Speech Detection. In Proceedings of the Second Workshop on Computational Modeling of PeopleâăŹs Opinions, Personality, and Emotions in Social Media. 36--41.Google ScholarGoogle ScholarCross RefCross Ref
  6. Gokul Chittaranjan, Yogarshi Vyas, Kalika Bali, and Monojit Choudhury. 2014. Word-level language identification using CRF: Code-switching shared task report of MSR India system. In Proceedings of The First Workshop on Computational Approaches to Code Switching. 73--79.Google ScholarGoogle ScholarCross RefCross Ref
  7. Karthik Dinakar, Roi Reichart, and Henry Lieberman. 2011. Modeling the detection of Textual Cyberbullying. The Social Mobile Web 11, 02 (2011), 11--17.Google ScholarGoogle Scholar
  8. Nemanja Djuric, Jing Zhou, Robin Morris, Mihajlo Grbovic, Vladan Radosavljevic, and Narayan Bhamidipati. 2015. Hate speech detection with comment embeddings. In Proceedings of the 24th international conference on world wide web. ACM, 29--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Souvick Ghosh, Satanu Ghosh, and Dipankar Das. 2017. Sentiment Identification in Code-Mixed Social Media Text. arXiv preprint arXiv:1707.01184 (2017).Google ScholarGoogle Scholar
  10. Parth Gupta, Kalika Bali, Rafael E Banchs, Monojit Choudhury, and Paolo Rosso. 2014. Query expansion for mixed-script information retrieval. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 677--686. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Aditya Joshi, Ameya Prabhu, Manish Shrivastava, and Vasudeva Varma. 2016. Towards sub-word level compositions for sentiment analysis of hindi-english code mixed text. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 2482--2491.Google ScholarGoogle Scholar
  12. Upendra Kumar, Vishal Singh Rana, Chris Andrew, Santhoshini Reddy, and Amitava Das. 2018. Consonant-Vowel Sequences as Subword Units for Code-Mixed Languages. (2018).Google ScholarGoogle Scholar
  13. Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International Conference on Machine Learning. 1188--1196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang. 2016. Abusive language detection in online user content. In Proceedings of the 25th international conference on world wide web. International World Wide Web Conferences Steering Committee, 145--153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ameya Prabhu, Aditya Joshi, Manish Shrivastava, and Vasudeva Varma. 2016. Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text. arXiv preprint arXiv.1611.00472 (2016).Google ScholarGoogle Scholar
  16. Haji Mohammad Saleem, Kelly P Dillon, Susan Benesch, and Derek Ruths. 2017. A web of hate: Tackling hateful speech in online social spaces. arXiv preprint arXiv:1709.10159 (2017).Google ScholarGoogle Scholar
  17. Arnav Sharma, Sakshi Gupta, Raveesh Motlani, Piyush Bansal, Manish Srivastava, Radhika Mamidi, and Dipti M Sharma. 2016. Shallow parsing pipeline for hindi-english code-mixed social media text. arXiv preprint arXiv:1604.03136 (2016).Google ScholarGoogle Scholar
  18. Stéphan Tulkens, Lisa Hilte, Elise Lodewyckx, Ben Verhoeven, and Walter Daelemans. 2016. A dictionary-based approach to racism detection in dutch social media. arXiv preprint arXiv:1608.08738 (2016).Google ScholarGoogle Scholar
  19. Yogarshi Vyas, Spandana Gella, Jatin Sharma, Kalika Bali, and Monojit Choudhury. 2014. Pos tagging of english-hindi code-mixed social media content. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 974--979.Google ScholarGoogle ScholarCross RefCross Ref
  20. Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of the NAACL student research workshop. 88--93.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Hate Speech Detection in Hindi-English Code-Mixed Social Media Text

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data
        January 2019
        380 pages
        ISBN:9781450362078
        DOI:10.1145/3297001

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 January 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper
        • Research
        • Refereed limited

        Acceptance Rates

        CODS-COMAD '19 Paper Acceptance Rate62of198submissions,31%Overall Acceptance Rate197of680submissions,29%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader