short-paper

Hate Speech Detection in Hindi-English Code-Mixed Social Media Text

Authors:
T. Y.S.S. Santosh

IIT Kharagpur, Kharagpur, West Bengal, India

IIT Kharagpur, Kharagpur, West Bengal, India
View Profile

,
K. V.S. Aravind

IIT Kharagpur, Kharagpur, West Bengal, India

IIT Kharagpur, Kharagpur, West Bengal, India
View Profile

CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of DataJanuary 2019Pages 310–313https://doi.org/10.1145/3297001.3297048

Published:03 January 2019Publication History

CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data

Pages 310–313

ABSTRACT

With the increase in user generated content, particularly on social media networks, the amount of hate speech is also steadily increasing. So, there is a need to automatically detect such hateful content and curb the wrongful activities. While relevant research has been done independently on code-mixed social media texts and hate speech detection, this paper deals with the task of identification of hate speech from code-mixed social media text. We perform experiments with available code-mixed dataset for hate speech detection using two architectures namely sub-word level LSTM model and Hierarchical LSTM model with attention based on phonemic sub-words.

References

Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, and Vasudeva Varma. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 759--760. Google ScholarDigital Library
Kalika Bali, Jatin Sharma, Monojit Choudhury, and Yogarshi Vyas. 2014. " I am borrowing ya mixing?" An Analysis of English-Hindi Code Mixing in Facebook. In Proceedings of the First Workshop on Computational Approaches to Code Switching. 116--126.Google ScholarCross Ref
Utsab Barman, Amitava Das, Joachim Wagner, and Jennifer Foster. 2014. Code mixing: A challenge for language identification in the language of social media. In Proceedings of the first workshop on computational approaches to code switching. 13--23.Google ScholarCross Ref
Billal Belainine, Alexsandro Fonseca, and Fatiha Sadat. 2016. Named Entity Recognition and Hashtag Decomposition to Improve the Classification of Tweets. In Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT). 102--111.Google Scholar
Aditya Bohra, Deepanshu Vijay, Vinay Singh, Syed Sarfaraz Akhtar, and Manish Shrivastava. 2018. A Dataset of Hindi-English Code-Mixed Social Media Text for Hate Speech Detection. In Proceedings of the Second Workshop on Computational Modeling of Peopleâă&Zacute;s Opinions, Personality, and Emotions in Social Media. 36--41.Google ScholarCross Ref
Gokul Chittaranjan, Yogarshi Vyas, Kalika Bali, and Monojit Choudhury. 2014. Word-level language identification using CRF: Code-switching shared task report of MSR India system. In Proceedings of The First Workshop on Computational Approaches to Code Switching. 73--79.Google ScholarCross Ref
Karthik Dinakar, Roi Reichart, and Henry Lieberman. 2011. Modeling the detection of Textual Cyberbullying. The Social Mobile Web 11, 02 (2011), 11--17.Google Scholar
Nemanja Djuric, Jing Zhou, Robin Morris, Mihajlo Grbovic, Vladan Radosavljevic, and Narayan Bhamidipati. 2015. Hate speech detection with comment embeddings. In Proceedings of the 24th international conference on world wide web. ACM, 29--30. Google ScholarDigital Library
Souvick Ghosh, Satanu Ghosh, and Dipankar Das. 2017. Sentiment Identification in Code-Mixed Social Media Text. arXiv preprint arXiv:1707.01184 (2017).Google Scholar
Parth Gupta, Kalika Bali, Rafael E Banchs, Monojit Choudhury, and Paolo Rosso. 2014. Query expansion for mixed-script information retrieval. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 677--686. Google ScholarDigital Library
Aditya Joshi, Ameya Prabhu, Manish Shrivastava, and Vasudeva Varma. 2016. Towards sub-word level compositions for sentiment analysis of hindi-english code mixed text. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 2482--2491.Google Scholar
Upendra Kumar, Vishal Singh Rana, Chris Andrew, Santhoshini Reddy, and Amitava Das. 2018. Consonant-Vowel Sequences as Subword Units for Code-Mixed Languages. (2018).Google Scholar
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International Conference on Machine Learning. 1188--1196. Google ScholarDigital Library
Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang. 2016. Abusive language detection in online user content. In Proceedings of the 25th international conference on world wide web. International World Wide Web Conferences Steering Committee, 145--153. Google ScholarDigital Library
Ameya Prabhu, Aditya Joshi, Manish Shrivastava, and Vasudeva Varma. 2016. Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text. arXiv preprint arXiv.1611.00472 (2016).Google Scholar
Haji Mohammad Saleem, Kelly P Dillon, Susan Benesch, and Derek Ruths. 2017. A web of hate: Tackling hateful speech in online social spaces. arXiv preprint arXiv:1709.10159 (2017).Google Scholar
Arnav Sharma, Sakshi Gupta, Raveesh Motlani, Piyush Bansal, Manish Srivastava, Radhika Mamidi, and Dipti M Sharma. 2016. Shallow parsing pipeline for hindi-english code-mixed social media text. arXiv preprint arXiv:1604.03136 (2016).Google Scholar
Stéphan Tulkens, Lisa Hilte, Elise Lodewyckx, Ben Verhoeven, and Walter Daelemans. 2016. A dictionary-based approach to racism detection in dutch social media. arXiv preprint arXiv:1608.08738 (2016).Google Scholar
Yogarshi Vyas, Spandana Gella, Jatin Sharma, Kalika Bali, and Monojit Choudhury. 2014. Pos tagging of english-hindi code-mixed social media content. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 974--979.Google ScholarCross Ref
Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of the NAACL student research workshop. 88--93.Google ScholarCross Ref

Index Terms

Hate Speech Detection in Hindi-English Code-Mixed Social Media Text
1. Human-centered computing
  1. Collaborative and social computing
    1. Collaborative and social computing theory, concepts and paradigms
      1. Social media
      2. Social tagging

Recommendations

A Framework for Online Hate Speech Detection on Code-mixed Hindi-English Text and Hindi Text in Devanagari
Social Media has been growing and has provided the world with a platform to opine, debate, display, and discuss like never before. It has a major influence in research areas that analyze human behavior and social groups, and the phenomenon of social ...
Read More
A Measurement Study of Hate Speech in Social Media
HT '17: Proceedings of the 28th ACM Conference on Hypertext and Social Media

Social media platforms provide an inexpensive communication medium that allows anyone to quickly reach millions of users. Consequently, in these platforms anyone can publish content and anyone interested in the content can obtain it, representing a ...
Read More
Part-of-Speech Tagger for Konkani-English Code-Mixed Social Media Text
Natural Language Processing and Information Systems
Abstract
In this paper, we propose efficient and less resource-intensive strategies for Konkani-English code-mixed social media text. which witnesses several challenges as compared to tagging general normal text. Part-of-Speech Tagging is a primary and an ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data
January 2019
380 pages
ISBN:9781450362078
DOI:10.1145/3297001
General Chairs:
Lipika Dey
TCS Innovation Labs
,
Surajit Chaudhury
Microsoft Research
,
Program Chairs:
Raghu Krishnapuram
Robert Bosch Center, IISc Bangalore
,
Parag Singla
IIT Delhi
,
Publications Chair:
Rishiraj Saha Roy
Max Planck Institute for Informatics
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 January 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
code-mixing
deep learning
hate speech
Qualifiers
- short-paper
- Research
- Refereed limited
Conference

Acceptance Rates
CODS-COMAD '19 Paper Acceptance Rate62of198submissions,31%Overall Acceptance Rate197of680submissions,29%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 53
  Total Citations
  View Citations
- 713
  Total Downloads
- Downloads (Last 12 months)58
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Hate Speech Detection in Hindi-English Code-Mixed Social Media Text

CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Framework for Online Hate Speech Detection on Code-mixed Hindi-English Text and Hindi Text in Devanagari

A Measurement Study of Hate Speech in Social Media

Part-of-Speech Tagger for Konkani-English Code-Mixed Social Media Text

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Hate Speech Detection in Hindi-English Code-Mixed Social Media Text

CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Framework for Online Hate Speech Detection on Code-mixed Hindi-English Text and Hindi Text in Devanagari

A Measurement Study of Hate Speech in Social Media

Part-of-Speech Tagger for Konkani-English Code-Mixed Social Media Text

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media