Skip to main content
Erschienen in: Social Network Analysis and Mining 1/2022

01.12.2022 | Original Article

FA-Net: fused attention-based network for Hindi English code-mixed offensive text classification

verfasst von: Shikha Mundra, Namita Mittal

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Widespread usage of social media platforms like Twitter, Facebook, and YouTube allows sharing of opinions and suggestions across countries. On the contrary, these platforms are often misused to disseminate hate speech and offensive content. Moreover, in a multilingual society such as India, many users resort to code-mixing while typing on social media. Thus, we have focused on Hindi English (Hi–En) Code-Mixed hate speech and offensive text classification. Recently, numerous approaches have emerged, and most of these approaches use CNN and LSTM in a stacked manner to extract local and sequential semantic features. However, these arrangements diminish the comprehensive effect of local and sequential features. In addition, deep framework suffers from issue of vanising gradient. Therefore, in our work, we have proposed, local and sequential knowledge aware Fused Attention-based Network (FA-Net), which introduces a fusion of attention mechanism of collective and mutual learning between local and sequential features. The proposed network (FA-Net) is lower in depth more in breadth in comparison to the existing architectures. It has three building blocks: Code Mixed Hybrid Embedding, Locally Driven Sequential Attention-2 (LDSA-2), Locally Driven Sequential Attention-3 (LDSA-3). CMHE is developed using customized Hi-En code mixed data, aiming the network to initialize with relevant weights. LDSA-2 and LDSA-3 equip the model to build a comprehensive representation having past, future, and local contextual knowledge w.r.t any location in the sentence. Extensive experimentation on two benchmark datasets shows that FA-Net has outperformed other state of the art.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Bahdanau D, Cho KH, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on learning representations, ICLR 2015 Bahdanau D, Cho KH, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on learning representations, ICLR 2015
Zurück zum Zitat Bhattacharya S, Singh S, Kumar R, et al. (2020) Developing a multilingual annotated corpus of misogyny and aggression. In: Proceedings of the second workshop on trolling, aggression and cyberbullying, pp. 158–168. European Language Resources Association (ELRA), Marseille, France. https://aclanthology.org/2020.trac-1.25 Bhattacharya S, Singh S, Kumar R, et al. (2020) Developing a multilingual annotated corpus of misogyny and aggression. In: Proceedings of the second workshop on trolling, aggression and cyberbullying, pp. 158–168. European Language Resources Association (ELRA), Marseille, France. https://​aclanthology.​org/​2020.​trac-1.​25
Zurück zum Zitat Datta A, Si S, Chakraborty U, Naskar SK (2020) Spyder: Aggression detection on multilingual tweets. In: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, language resources and evaluation Conference (LREC 2020, pp. 87–92. https://www.smartinsights.com/social-media- Datta A, Si S, Chakraborty U, Naskar SK (2020) Spyder: Aggression detection on multilingual tweets. In: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, language resources and evaluation Conference (LREC 2020, pp. 87–92. https://​www.​smartinsights.​com/​social-media-
Zurück zum Zitat Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference 1, 4171–4186 Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference 1, 4171–4186
Zurück zum Zitat Dharma EM, Gaol FL, Leslie H, Warnars HS, Soewito B (2022) The accuracy comparison among word2vec, glove, and fasttext towards convolution neural network (cnn) text classification. J Theor Appl Inf Technol 100(2):31 Dharma EM, Gaol FL, Leslie H, Warnars HS, Soewito B (2022) The accuracy comparison among word2vec, glove, and fasttext towards convolution neural network (cnn) text classification. J Theor Appl Inf Technol 100(2):31
Zurück zum Zitat Joshi A, Prabhu A, Shrivastava M, Varma V (2016) Towards sub-word level compositions for sentiment analysis of Hindi-English code mixed text. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2482–2491. The COLING 2016 Organizing Committee, Osaka, Japan. https://aclanthology.org/C16-1234 Joshi A, Prabhu A, Shrivastava M, Varma V (2016) Towards sub-word level compositions for sentiment analysis of Hindi-English code mixed text. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2482–2491. The COLING 2016 Organizing Committee, Osaka, Japan. https://​aclanthology.​org/​C16-1234
Zurück zum Zitat Koufakou A, Basile V, Patti V (2020) FlorUniTo@TRAC-2: Retrofitting word embeddings on an abusive lexicon for aggressive language detection. In: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pp. 106–112. European Language Resources Association (ELRA), Marseille, France. https://aclanthology.org/2020.trac-1.17 Koufakou A, Basile V, Patti V (2020) FlorUniTo@TRAC-2: Retrofitting word embeddings on an abusive lexicon for aggressive language detection. In: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pp. 106–112. European Language Resources Association (ELRA), Marseille, France. https://​aclanthology.​org/​2020.​trac-1.​17
Zurück zum Zitat Kumari K, Singh JP, Dwivedi YK, Rana NP (2021) Bilingual cyber-aggression detection on social media using LSTM autoencoder. Soft Comput 25(14):8999–9012CrossRef Kumari K, Singh JP, Dwivedi YK, Rana NP (2021) Bilingual cyber-aggression detection on social media using LSTM autoencoder. Soft Comput 25(14):8999–9012CrossRef
Zurück zum Zitat Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26:3111–3119 Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26:3111–3119
Zurück zum Zitat Novaković JD, Veljović A, Ilić SS, Željko Papić, Milica T (2017) Evaluation of classification models in machine learning. Theor Appl Math Comput Sci 7:39–46MathSciNet Novaković JD, Veljović A, Ilić SS, Željko Papić, Milica T (2017) Evaluation of classification models in machine learning. Theor Appl Math Comput Sci 7:39–46MathSciNet
Zurück zum Zitat One Speaker, Two Languages (1995) Cross-disciplinary perspectives on code-switching. Cambridge University Press One Speaker, Two Languages (1995) Cross-disciplinary perspectives on code-switching. Cambridge University Press
Zurück zum Zitat Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNetMATH Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNetMATH
Zurück zum Zitat Pires T, Schlinger E, Garrette D (2019) How multilingual is multilingual BERT? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4996–5001. Association for Computational Linguistics, Florence, Italy. https://doi.org/10.18653/v1/P19-1493 Pires T, Schlinger E, Garrette D (2019) How multilingual is multilingual BERT? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4996–5001. Association for Computational Linguistics, Florence, Italy. https://​doi.​org/​10.​18653/​v1/​P19-1493
Zurück zum Zitat Samghabadi NS, Mave D, Kar S, Solorio T (2018) Ritual-uh at TRAC 2018 shared task: Aggression identification. CoRR abs/1807.11712 1807.11712 Samghabadi NS, Mave D, Kar S, Solorio T (2018) Ritual-uh at TRAC 2018 shared task: Aggression identification. CoRR abs/1807.11712 1807.11712
Zurück zum Zitat Zhang Y, Wallace BC (2017) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. In: Proceedings of the Eighth International joint conference on natural language processing (Volume 1: Long Papers), pp. 253–263 Zhang Y, Wallace BC (2017) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. In: Proceedings of the Eighth International joint conference on natural language processing (Volume 1: Long Papers), pp. 253–263
Metadaten
Titel
FA-Net: fused attention-based network for Hindi English code-mixed offensive text classification
verfasst von
Shikha Mundra
Namita Mittal
Publikationsdatum
01.12.2022
Verlag
Springer Vienna
Erschienen in
Social Network Analysis and Mining / Ausgabe 1/2022
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-022-00929-1

Weitere Artikel der Ausgabe 1/2022

Social Network Analysis and Mining 1/2022 Zur Ausgabe

Premium Partner