nach oben

Social Network Analysis and Mining

Erschienen in:

01.12.2021 | Original Article

Progressive domain adaptation for detecting hate speech on social media with small training set and its application to COVID-19 concerned posts

verfasst von: Md Abul Bashar, Richi Nayak, Khanh Luong, Thirunavukarasu Balasubramaniam

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this world of information and experience era, microblogging sites have been commonly used to express people feelings including fear, panic, hate and abuse. Monitoring and control of abuse on social media, especially during pandemics such as COVID-19, can help in keeping the public sentiment and morale positive. Developing the fear and hate detection methods based on machine learning requires labelled data. However, obtaining the labelled data in suddenly changed circumstances as a pandemic is expensive and acquiring them in a short time is impractical. Related labelled hate data from other domains or previous incidents may be available. However, the predictive accuracy of these hate detection models decreases significantly if the data distribution of the target domain, where the prediction will be applied, is different. To address this problem, we propose a novel concept of unsupervised progressive domain adaptation based on a deep-learning language model generated through multiple text datasets. We showcase the efficacy of the proposed method in hate speech and fear detection on the tweets collection during COVID-19 where the labelled information is unavailable.

Vorheriger Artikel A model of influencing factors of online social networks for informal learning in research institutes

Nächster Artikel General framework of opening and closing shops over a spatial network based on stochastic utility under competitive and time-bounded environment

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

https://www.nielsen.com/us/en/insights/article/2020/covid-19-tracking-the-impact-on-media-consumption/.

https://theconversation.com/covid19-social-media-both-a-blessing-and-a-curse-during-coronavirus-pandemic-133596.

https://fasttext.cc.

https://www.kaggle.com/vkrahul/twitter-hate-speech.

https://https://www.qut.edu.au/institute-for-future-environments/facilities/digital-observatory.

https://pytorch.org/.

https://jupyter.org/.

High-performance computing facilities used in this research were provided by eResearch Office, Queensland University of Technology, Brisbane, Australia

Al-garadi MA, Khan MS, Varathan KD, Mujtaba G, Al-Kabsi AM (2016) Using online social networks to track a pandemic: A systematic review. J Biomed Inform 62:1–11CrossRef

Badjatiya P, Gupta S, Gupta M, Varma V (2017) Deep learning for hate speech detection in tweets

Badjatiya P, Gupta S, Gupta M, Varma V (2017) Deep learning for hate speech detection in tweets. In: Proceedings of the 26th International Conference on World Wide Web Companion, ser. WWW ’17 Companion. Republic and Canton of Geneva, CHE: International World Wide Web Conferences Steering Committee, pp 759–760

Baktashmotlagh M, Harandi MT, Lovell BC, Salzmann M (2013) Unsupervised domain adaptation by domain invariant projection, pp 769–776

Balasubramaniam T, Nayak R, Bashar MA (2020) Understanding the spatio-temporal topic dynamics of covid-19 using nonnegative tensor factorization: A case study. arXiv preprint arXiv:2009.09253

Banko M, Brill E (2001) Mitigating the paucity-of-data problem: Exploring the effect of training corpus size on classifier performance for natural language processing

Bashar MA, Nayak R (2021) Active learning for effectively fine-tuning transfer learning to downstream task. ACM Trans Intell Syst Technol (TIST) 12(2):1–24CrossRef

Bashar MA, Nayak R, Suzor N (2020) Regularising lstm classifier by transfer learning for detecting misogynistic tweets with small training set. Know Inform Syst 62(10):4029–4054CrossRef

Bashar MA, Nayak R, Balasubramaniam T (2020) Topic, sentiment and impact analysis: Covid19 information seeking on social media. arXiv preprint arXiv:2008.12435

Bashar MA, Nayak R, Suzor N, Weir B (2018) Misogynistic tweet detection: Modelling cnn with small datasets. In Australasian Conference on Data Mining. Springer, Berlin, pp 3–16

Bashar MA, Nayak R (2019) Qutnocturnal@ hasoc’19: Cnn for hate speech and offensive content identification in hindi language. In: CEUR Workshop Proceedings, vol 2517. CEUR-WS, pp 237–245

Blundell C, Cornebise J, Kavukcuoglu K, Wierstra D (2015) Weight uncertainty in neural network. pp 1613–1622

Bradbury J, Merity S, Xiong C, Socher R (2016) Quasi-recurrent neural networks

Brindha MD, Jayaseelan R, Kadeswara S (2020) Social media reigned by information or misinformation about covid-19: a phenomenological study

Chen C, Xie W, Huang W, Rong Y, Ding X, Huang Y, Xu T, Huang J (2019) Progressive feature alignment for unsupervised domain adaptation. pp 627–636

Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. pp 785–794

Davidson T, Warmsley D, Macy M, Weber I (2017) Automated hate speech detection and the problem of offensive language

Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

Founta AM, Chatzakou D, Kourtellis N, Blackburn J, Vakali A, Leontiadis I (2019). A unified deep learning architecture for abuse detection. Association for Computing Machinery, New York, NY, USA, pp 105–114

Gal Y (2016) Uncertainty in deep learning. University of Cambridge, Cambridge

Gambäck B, Sikdar UK (2017) Using convolutional neural networks to classify hate-speech. In: Proceedings of the First Workshop on Abusive Language Online. Vancouver, BC, Canada: Association for Computational Linguistics, pp. 85–90

Ganin Y, Lempitsky V (2015) Unsupervised domain adaptation by backpropagation. In: International conference on machine learning. PMLR, pp. 1180–1189

Ghifary M, Kleijn WB, Zhang M (2014) Domain adaptive neural networks for object recognition. In: Pacific Rim international conference on artificial intelligence. Springer, Berlin, pp 898–904

Glorot X, Bordes A, Bengio Y (2011) Domain adaptation for large-scale sentiment classification: a deep learning approach. pp 513–520

Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. pp 249–256

Gong B, Grauman K, Sha F (2013) Connecting the dots with landmarks: Discriminatively learning domain-invariant features for unsupervised domain adaptation, pp 222–230

Han X, Eisenstein J (2019) Unsupervised domain adaptation of contextualized embeddings for sequence labeling

Hausman DM, Woodward J (1999) Independence, invariance and the causal markov condition. Br J Philos Science 50(4):521–583MathSciNetCrossRef

He R, Lee WS, Ng HT, Dahlmeier D (2018) Exploiting document knowledge for aspect-level sentiment classification arXiv preprint arXiv:1806.04346

Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Appl 13(4):18–28CrossRef

Heverin T, Zach L (2012) Law enforcement agency adoption and use of twitter as a crisis communication tool. In: Crisis Information Management. Elsevier, pp. 25–42

Higgins I, Amos D, Pfau D, Racaniere S, Matthey L, Rezende D, Lerchner A (2018) Towards a definition of disentangled representations. arXiv preprint arXiv:1812.02230

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef

Hoerl AE, Kennard RW (1970) Ridge regression: applications to nonorthogonal problems. Technometrics 12(1):69–82CrossRef

Hoffman J, Tzeng E, Park T, Zhu J-Y, Isola P, Saenko K, Efros A, Darrell T (2018) Cycada: Cycle-consistent adversarial domain adaptation. In: International conference on machine learning. PMLR, pp. 1989–1998

Howard J, Ruder S (2018) Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol 1, pp 328–339

Jozefowicz R, Vinyals O, Schuster M, Shazeer N, Wu Y (2016) Exploring the limits of language modeling. arXiv preprint arXiv:1602.02410

Kuncoro A, Dyer C, Hale J, Yogatama D, Clark S, Blunsom P (2018) Lstms can learn syntax-sensitive dependencies well, but modeling structure makes them better. pp 1426–1436

Lambert AJ, Eadeh FR, Peak SA, Scherer LD, Schott JP, Slochower JM (2014) Toward a greater understanding of the emotional dynamics of the mortality salience manipulation: Revisiting the “affect-free” claim of terror management research. J Person Soci Psychol 106(5):655CrossRef

Lewis DD (1998) Naive (bayes) at forty: The independence assumption in information retrieval. In: European conference on machine learning. Springer, Berlin, pp. 4–15

Li Y, Gal Y (2017) Dropout inference in bayesian neural networks with alpha-divergences. In: Proceedings of the 34th International Conference on Machine Learning, vol 70. JMLR. org, pp. 2052–2061

Li Z, Wei Y, Zhang Y, Zhang X, Li X (2019) Exploiting coarse-to-fine task transfer for aspect-level sentiment classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 4253–4260

Liaw A, Wiener M et al (2002) Classification and regression by randomforest. R news 2(3):18–22

Long M, Wang J, Ding G, Sun J, Yu PS (2013) Transfer feature learning with joint distribution adaptation, pp 2200–2207

Long M, Cao Y, Wang J, Jordan M (2015) Learning transferable features with deep adaptation networks. In: International conference on machine learning. PMLR, pp 97–105

MacAvaney S, Yao H-R, Yang E, Russell K, Goharian N, Frieder O (2019) Hate speech detection: Challenges and solutions. PloS One 14(8):e0221152CrossRef

MacAvaney S, Yao H-R, Yang E, Russell K, Goharian N, Frieder O (2019) Hate speech detection: challenges and solutions. PLOS ONE 14(8):1–16CrossRef

MacKay DJ (1992) A practical bayesian framework for backpropagation networks. Neural Comput 4(3):448–472CrossRef

Malmasi S, Zampieri M (2017) Detecting hate speech in social media, CoRR, vol. abs/1712.06427. [Online]. Available: http://arxiv.org/abs/1712.06427

Melis G, Dyer C, Blunsom P (2017) On the state of the art of evaluation in neural language models. arXiv preprint arXiv:1707.05589

Merity S, Keskar NS, Socher R (2017) Regularizing and optimizing LSTM language models

Merity S, Xiong C, Bradbury J, Socher R (2016) Pointer sentinel mixture models

Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. pp 3111–3119

Mikolov T, Karafiát M, Burget L, Černockỳ J, Khudanpur S (2010) Recurrent neural network based language model

Mnih A, Yuecheng Z, Hinton G (2009) Improving a statistical language model through non-linear prediction. Neurocomputing 72(7–9):1414–1418CrossRef

Mohammad S, Kiritchenko S (2018) Understanding emotions: A dataset of tweets to study interactions between affect categories. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Mozafari M, Farahbakhsh R, Crespi N (2020) A bert-based transfer learning approach for hate speech detection in online social media. In: Cherifi H, Gaito S, Mendes JF, Moro E, Rocha LM (eds) Complex networks and their applications VIII. Springer International Publishing, Cambridge, pp 928–940CrossRef

Pan SJ, Tsang IW, Kwok JT, Yang Q (2010) Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 22(2):199–210CrossRef

Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Know Data Eng 22(10):1345–1359CrossRef

Park JH, Fung P (Aug. 2017) One-step and two-step classification for abusive language detection on Twitter. In: Proceedings of the First Workshop on Abusive Language Online. Vancouver, BC, Canada: Association for Computational Linguistics, pp. 41–45

Rajalakshmi R, Reddy B (2019) Dlrg@hasoc 2019: An enhanced ensemble classifier for hate and offensive content identification. In: FIRE

Rietzler A, Stabinger S, Opitz P, Engl S (2019) Adapt or get left behind: Domain adaptation through bert language model finetuning for aspect-target sentiment classification. arXiv preprint arXiv:1908.11860

Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. pp 806–813

Shimodaira H (2000) Improving predictive inference under covariate shift by weighting the log-likelihood function. J Stat Plan Infer 90(2):227–244MathSciNetCrossRef

Tzeng E, Hoffman J, Saenko K, Darrell T (2017) Adversarial discriminative domain adaptation. pp 7167–7176

Vidgen B, Botelho A, Broniatowski D, Guest E, Hall M, Margetts H, Tromble R, Waseem Z, Hale S (2020) Detecting east asian prejudice on social media. arXiv preprint arXiv:2005.03909

Wang B, Wang A, Chen F, Wang Y, Kuo C-CJ (2019) Evaluating word embedding models: methods and experimental results. APSIPA Transactions on Signal and Information Processing, vol 8

Wang X, Schneider J (2014) Flexible transfer learning under support and model shift. pp 1898–1906

Waseem Z, Hovy D (2016) Hateful symbols or hateful people? predictive features for hate speech detection on twitter. pp 88–93

Waseem Z (2016) Are you a racist or am I seeing things? annotator influence on hate speech detection on Twitter. In: Proceedings of the First Workshop on NLP and Computational Social Science. Austin, Texas: Association for Computational Linguistics, pp 138–142

Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244MATH

Xu H, Liu B, Shu L, Yu PS (2019) Bert post-training for review reading comprehension and aspect-based sentiment analysis. arXiv preprint arXiv:1904.02232

Xu X, Zhou X, Venkatesan R, Swaminathan G, Majumder O (2019) d-sne: Domain adaptation using stochastic neighborhood embedding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2497–2506

Yang Z, Chen W, Wang F, Xu B (2018) Unsupervised neural machine translation with weight sharing

Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? In: Advances in neural information processing systems, pp 3320–3328

Titel: Progressive domain adaptation for detecting hate speech on social media with small training set and its application to COVID-19 concerned posts
verfasst von: Md Abul Bashar
Richi Nayak
Khanh Luong
Thirunavukarasu Balasubramaniam
Publikationsdatum: 01.12.2021
Verlag: Springer Vienna
Erschienen in: Social Network Analysis and Mining / Ausgabe 1/2021
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI: https://doi.org/10.1007/s13278-021-00780-w

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1/2021

COVID-19 fake news diffusion across Latin America

Probabilistic reasoning system for social influence analysis in online social networks

High-resolution home location prediction from Twitter activities using consensus deep learning

Leveraging node neighborhoods and egograph topology for better bot detection in social graphs

Identifying Covid-19 misinformation tweets and learning their spatio-temporal topic dynamics using Nonnegative Coupled Matrix Tensor Factorization

An empirical characterization of community structures in complex networks using a bivariate map of quality metrics

Premium Partner