nach oben

Social Network Analysis and Mining

Erschienen in:

01.12.2022 | Original Article

FA-Net: fused attention-based network for Hindi English code-mixed offensive text classification

verfasst von: Shikha Mundra, Namita Mittal

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Widespread usage of social media platforms like Twitter, Facebook, and YouTube allows sharing of opinions and suggestions across countries. On the contrary, these platforms are often misused to disseminate hate speech and offensive content. Moreover, in a multilingual society such as India, many users resort to code-mixing while typing on social media. Thus, we have focused on Hindi English (Hi–En) Code-Mixed hate speech and offensive text classification. Recently, numerous approaches have emerged, and most of these approaches use CNN and LSTM in a stacked manner to extract local and sequential semantic features. However, these arrangements diminish the comprehensive effect of local and sequential features. In addition, deep framework suffers from issue of vanising gradient. Therefore, in our work, we have proposed, local and sequential knowledge aware Fused Attention-based Network (FA-Net), which introduces a fusion of attention mechanism of collective and mutual learning between local and sequential features. The proposed network (FA-Net) is lower in depth more in breadth in comparison to the existing architectures. It has three building blocks: Code Mixed Hybrid Embedding, Locally Driven Sequential Attention-2 (LDSA-2), Locally Driven Sequential Attention-3 (LDSA-3). CMHE is developed using customized Hi-En code mixed data, aiming the network to initialize with relevant weights. LDSA-2 and LDSA-3 equip the model to build a comprehensive representation having past, future, and local contextual knowledge w.r.t any location in the sentence. Extensive experimentation on two benchmark datasets shows that FA-Net has outperformed other state of the art.

Vorheriger Artikel Performance analysis of transformer-based architectures and their ensembles to detect trait-based cyberbullying

Nächster Artikel Lifetime of tweets: a statistical analysis

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.

Abadi M, Agarwal A, Barham P, et al. (2016) TensorFlow: large-scale machine learning on heterogeneous systems. Software available from tensorflow.org. https://www.tensorflow.org/

Abuqaddom I, Mahafzah BA, Faris H (2021) Oriented stochastic loss descent algorithm to train very deep multi-layer neural networks without vanishing gradients. Knowl Based Syst 230:107391. https://doi.org/10.1016/j.knosys.2021.107391CrossRef

Badjatiya P, Gupta S, Gupta M, Varma V (2017) Deep learning for hate speech detection in tweets. 26th International World Wide Web Conference 2017, WWW 2017 Companion, 759–760. https://doi.org/10.1145/3041021.3054223

Bahdanau D, Cho KH, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on learning representations, ICLR 2015

Bali K, Sharma J, Choudhury M, Vyas Y (2015) “i am borrowing ya mixing ?”an analysis of english-hindi code mixing in facebook, 116–126. https://doi.org/10.3115/v1/w14-3914

Bhattacharya S, Singh S, Kumar R, et al. (2020) Developing a multilingual annotated corpus of misogyny and aggression. In: Proceedings of the second workshop on trolling, aggression and cyberbullying, pp. 158–168. European Language Resources Association (ELRA), Marseille, France. https://aclanthology.org/2020.trac-1.25

Bhat IA, Mujadia V, Tammewar A, Bhat RA, Shrivastava M (2015) Iiit-h system submission for fire2014 shared task on transliterated search. https://doi.org/10.1145/2824864.2824872

Bohra A, Vijay D, Singh V, Akhtar SS, Shrivastava M (2018) A dataset of Hindi-English code-mixed social media text for hate speech detection, 36–41. https://doi.org/10.18653/v1/W18-1105

Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguis 5:135–146. https://doi.org/10.1162/tacl_a_00051CrossRef

Bulao J (2022) How much data is created every day in 2022? https://techjury.net/blog/how-much-data-is-created-every-day

Chollet F et al (2015) Keras. https://github.com/fchollet/keras

Datta A, Si S, Chakraborty U, Naskar SK (2020) Spyder: Aggression detection on multilingual tweets. In: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, language resources and evaluation Conference (LREC 2020, pp. 87–92. https://www.smartinsights.com/social-media-

Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference 1, 4171–4186

Dharma EM, Gaol FL, Leslie H, Warnars HS, Soewito B (2022) The accuracy comparison among word2vec, glove, and fasttext towards convolution neural network (cnn) text classification. J Theor Appl Inf Technol 100(2):31

Du C, Wang Y, Wang C, Shi C, Xiao B (2020) Selective feature connection mechanism: concatenating multi-layer cnn features with a feature selector. Pattern Recogn Lett 129:108–114. https://doi.org/10.1016/j.patrec.2019.11.015CrossRef

Jefferson-Henrique (2019) CodeGetOldTweets3 0.0.11. https://pypi.org/project/GetOldTweets3/

James Ker-Lindsay (2022) Hinglish. https://en.wikipedia.org/wiki/Hinglish

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780. https://doi.org/10.1162/NECO.1997.9.8.1735CrossRef

Joshi A, Prabhu A, Shrivastava M, Varma V (2016) Towards sub-word level compositions for sentiment analysis of Hindi-English code mixed text. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2482–2491. The COLING 2016 Organizing Committee, Osaka, Japan. https://aclanthology.org/C16-1234

Kapil P, Ekbal A (2020) A deep neural network based multi-task learning approach to hate speech detection. Knowl Based Syst 210:106458. https://doi.org/10.1016/j.knosys.2020.106458CrossRef

Kim Y (2014) Convolutional neural networks for sentence classification. EMNLP 2014 - 2014 Conference on empirical methods in natural language processing, Proceedings of the Conference, 1746–1751. https://doi.org/10.3115/v1/d14-1181

Koufakou A, Basile V, Patti V (2020) FlorUniTo@TRAC-2: Retrofitting word embeddings on an abusive lexicon for aggressive language detection. In: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pp. 106–112. European Language Resources Association (ELRA), Marseille, France. https://aclanthology.org/2020.trac-1.17

Kumari K, Singh JP, Dwivedi YK, Rana NP (2021) Bilingual cyber-aggression detection on social media using LSTM autoencoder. Soft Comput 25(14):8999–9012CrossRef

Majumder A, Paul A, Banerjee A (2022) Deep learning-based approach using word and character embedding for named entity recognition from hindi-english tweets, 237–243. https://doi.org/10.1007/978-981-16-7305-4_23

Malte A, Ratadiya P (2019) Multilingual cyber abuse detection using advanced transformer architecture. IEEE Region 10 Annual International Conference, Proceedings/TENCON 2019-Octob, 784–789. https://doi.org/10.1109/TENCON.2019.8929493

Mathur P, Shah R, Sawhney R, Mahata D (2019) Detecting offensive tweets in hindi-english code-switched language, 18–26. https://doi.org/10.18653/v1/w18-3504

Ma Q, Yu L, Tian S, Chen E, Ng WWY (2019) Global-local mutual attention model for text classification. IEEE/ACM Trans Audio Speech Lang Process 27:2127–2139. https://doi.org/10.1109/TASLP.2019.2942160CrossRef

Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26:3111–3119

Monto MA, McRee N, Deryck FS (2018) Nonsuicidal self-injury among a representative sample of us adolescents, 2015. Am Journal Public Health 108:1042–1048. https://doi.org/10.2105/AJPH.2018.304470CrossRef

Novaković JD, Veljović A, Ilić SS, Željko Papić, Milica T (2017) Evaluation of classification models in machine learning. Theor Appl Math Comput Sci 7:39–46MathSciNet

One Speaker, Two Languages (1995) Cross-disciplinary perspectives on code-switching. Cambridge University Press

Pasricha J (2016) Cyber violence against women in India - a research report. https://feminisminindia.com/2016/11/15/cyber-violence-against-women-india-report/

Patchin JW, Hinduja S (2018) Deterring teen bullying: assessing the impact of perceived punishment from police, schools, and parents. Youth Violence Juvenile Justice 16:190–207. https://doi.org/10.1177/1541204016681057CrossRef

Paul S, Saha S, Singh JP (2022) Covid-19 and cyberbullying: deep ensemble model to identify cyberbullying from code-switched languages during the pandemic. Multimedia Tools and Applications, 1–17. https://doi.org/10.1007/S11042-021-11601-9/TABLES/8

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNetMATH

Pires T, Schlinger E, Garrette D (2019) How multilingual is multilingual BERT? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4996–5001. Association for Computational Linguistics, Florence, Italy. https://doi.org/10.18653/v1/P19-1493

Rehurek R, Sojka P (2011) GENSIM. https://radimrehurek.com/gensim/models/word2vec.html

Samghabadi NS, Mave D, Kar S, Solorio T (2018) Ritual-uh at TRAC 2018 shared task: Aggression identification. CoRR abs/1807.11712 1807.11712

Santosh TYSS, Aravind KVS (2019) Hate speech detection in hindi-english code-mixed social media text. ACM Int Conf Proc Ser. https://doi.org/10.1145/3297001.3297048

Sasidhar TT, B P, P SK (2020) Emotion detection in hinglish(hindi+english) code-mixed social media text. Procedia Computer Science 171, 1346–1352. https://doi.org/10.1016/j.procs.2020.04.144. Third International Conference on Computing and Network Communications (CoCoNet’19)

Sharma S, Srinivas PYKL, Balabantaray RC (2015) Text normalization of code mix and sentiment analysis. 2015 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2015, 1468–1473. https://doi.org/10.1109/ICACCI.2015.7275819

Shetty A (2008) India ranks third on global cyber bullying list. https://www.firstpost.com/tech/news-analysis/india-ranks-third-on-global-cyber-bullying-list-3602419.html

Singh V, Varshney A, Akhtar SS, Vijay D, Shrivastava M (2018) Aggression detection on social media text using deep neural networks. EMNLP 2018, 43. https://doi.org/10.18653/v1/w18-5106

Singh KN, Devi SD, Devi HM, Mahanta AK (2022) A novel approach for dimension reduction using word embedding: an enhanced text classification approach. Int J Inf Manag Data Insights 2:100061. https://doi.org/10.1016/J.JJIMEI.2022.100061CrossRef

Si S, Datta A, Banerjee S, Naskar SK (2019) Aggression detection on multilingual social media text. 10th International Conference on computing, communication and networking technologies, ICCCNT 2019, 1–5. https://doi.org/10.1109/ICCCNT45670.2019.8944868

Too EC, Yujian L, Njuki S, Yingchun L (2019) A comparative study of fine-tuning deep learning models for plant disease identification. Computers and Electronics in Agriculture 161, 272–279. https://doi.org/10.1016/j.compag.2018.03.032. BigData and DSS in Agriculture

Zhang X, LeCun Y (2015) Text Understanding from Scratch. arXiv. https://doi.org/10.48550/ARXIV.1502.01710

Zhang Y, Wallace BC (2017) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. In: Proceedings of the Eighth International joint conference on natural language processing (Volume 1: Long Papers), pp. 253–263

Titel: FA-Net: fused attention-based network for Hindi English code-mixed offensive text classification
verfasst von: Shikha Mundra
Namita Mittal
Publikationsdatum: 01.12.2022
Verlag: Springer Vienna
Erschienen in: Social Network Analysis and Mining / Ausgabe 1/2022
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI: https://doi.org/10.1007/s13278-022-00929-1

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1/2022

Contextualizing focal structure analysis in social networks

Computing small temporal modules in time logarithmic in history length

Studying topic engagement and synergy among candidates for 2020 US Elections

Community deception: from undirected to directed networks

Social media activity forecasting with exogenous and endogenous signals

Quantitative analysis of fanfictions’ popularity

Premium Partner