Skip to main content
Top
Published in:

01-12-2023 | Original Article

Identifying COVID-19 english informative tweets using limited labelled data

Authors: Srinivasulu Kothuru, A. Santhanavijayan

Published in: Social Network Analysis and Mining | Issue 1/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Identifying COVID-19 informative tweets is very useful in building monitoring systems to track the latest updates. Existing approaches to identify informative tweets rely on a large number of labelled tweets to achieve good performances. As labelling is an expensive and laborious process, there is a need to develop approaches that can identify COVID-19 informative tweets using limited labelled data. In this paper, we propose a simple yet novel labelled data-efficient approach that achieves the state-of-the-art (SOTA) F1-score of 91.23 on the WNUT COVID-19 dataset using just 1000 tweets (14.3% of the full training set). Our labelled data-efficient approach starts with limited labelled data, augment it using data augmentation methods and then fine-tune the model using augmented data set. It is the first work to approach the task of identifying COVID-19 English informative tweets using limited labelled data yet achieve the new SOTA performance.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Babu YP, Eswari R (2020) Cia_nitt at wnut-2020 task 2: classification of Covid-19 tweets using pre-trained language models. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp 471–474 Babu YP, Eswari R (2020) Cia_nitt at wnut-2020 task 2: classification of Covid-19 tweets using pre-trained language models. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp 471–474
go back to reference Bao LD, Nguyen VA, Huu QP (2020) Sunbear at wnut-2020 task 2: improving BERT-based noisy text classification with knowledge of the data domain. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp 485–490 Bao LD, Nguyen VA, Huu QP (2020) Sunbear at wnut-2020 task 2: improving BERT-based noisy text classification with knowledge of the data domain. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp 485–490
go back to reference Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp 4171–4186 Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp 4171–4186
go back to reference Feng SY, Gangal V, Wei J, Chandar S, Vosoughi S, Mitamura T, Hovy E (2021) A survey of data augmentation approaches for NLP. arXiv preprint arXiv:2105.03075 Feng SY, Gangal V, Wei J, Chandar S, Vosoughi S, Mitamura T, Hovy E (2021) A survey of data augmentation approaches for NLP. arXiv preprint arXiv:​2105.​03075
go back to reference Gumilang M, Purwarianti A (2018) Experiments on character and word level features for text classification using deep neural network. In: 2018 Third International Conference on Informatics and Computing (ICIC), pp 1–6. IEEE Gumilang M, Purwarianti A (2018) Experiments on character and word level features for text classification using deep neural network. In: 2018 Third International Conference on Informatics and Computing (ICIC), pp 1–6. IEEE
go back to reference Hettiarachchi H, Ranasinghe T (2020) Infominer at wnut-2020 task 2: transformer-based Covid-19 informative tweet extraction. In: Proceedings of the 6th Workshop on Noisy User-generated Text (W-NUT 2020), pp 359–365 Hettiarachchi H, Ranasinghe T (2020) Infominer at wnut-2020 task 2: transformer-based Covid-19 informative tweet extraction. In: Proceedings of the 6th Workshop on Noisy User-generated Text (W-NUT 2020), pp 359–365
go back to reference Jagadeesh M, Alphonse P (2020) covid-19 at wnut-2020 task 2: deep learning model roberta for identify informative Covid-19 english tweets. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp 450–454 Jagadeesh M, Alphonse P (2020) covid-19 at wnut-2020 task 2: deep learning model roberta for identify informative Covid-19 english tweets. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp 450–454
go back to reference Kalyan KS, Rajasekharan A, Sangeetha S (2021) Ammu–a survey of transformer-based biomedical pretrained language models. arXiv preprint arXiv:2105.00827 Kalyan KS, Rajasekharan A, Sangeetha S (2021) Ammu–a survey of transformer-based biomedical pretrained language models. arXiv preprint arXiv:​2105.​00827
go back to reference Kalyan KS, Rajasekharan A, Sangeetha S (2021) Ammus: a survey of transformer-based pretrained models in natural language processing. arXiv preprint arXiv:2108.05542 Kalyan KS, Rajasekharan A, Sangeetha S (2021) Ammus: a survey of transformer-based pretrained models in natural language processing. arXiv preprint arXiv:​2108.​05542
go back to reference Kalyan KS, Sangeetha S (2021) BertMCN: mapping colloquial phrases to standard medical concepts using BERT and highway network. Artif Intell Med 112:102008CrossRef Kalyan KS, Sangeetha S (2021) BertMCN: mapping colloquial phrases to standard medical concepts using BERT and highway network. Artif Intell Med 112:102008CrossRef
go back to reference Karimi A, Rossi L, Prati A (2021) Aeda: an easier data augmentation technique for text classification. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp 2748–2754 Karimi A, Rossi L, Prati A (2021) Aeda: an easier data augmentation technique for text classification. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp 2748–2754
go back to reference Kumar P, Singh A (2020) Nutcracker at wnut-2020 task 2: robustly identifying informative Covid-19 tweets using ensembling and adversarial training. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp 404–408 Kumar P, Singh A (2020) Nutcracker at wnut-2020 task 2: robustly identifying informative Covid-19 tweets using ensembling and adversarial training. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp 404–408
go back to reference Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:​1907.​11692
go back to reference Malla S, Alphonse P (2021) Covid-19 outbreak: an ensemble pre-trained deep learning model for detecting informative tweets. Appl Soft Comput 107:107495CrossRef Malla S, Alphonse P (2021) Covid-19 outbreak: an ensemble pre-trained deep learning model for detecting informative tweets. Appl Soft Comput 107:107495CrossRef
go back to reference Møller AG, Van Der Goot R, Plank B (2020) NLP north at WNUT-2020 task 2: pre-training versus ensembling for detection of informative Covid-19 english tweets. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp 331–336 Møller AG, Van Der Goot R, Plank B (2020) NLP north at WNUT-2020 task 2: pre-training versus ensembling for detection of informative Covid-19 english tweets. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp 331–336
go back to reference Morris J, Lifland E, Yoo JY, Grigsby J, Jin D, Qi Y (2020) Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in NLP. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp 119–126 Morris J, Lifland E, Yoo JY, Grigsby J, Jin D, Qi Y (2020) Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in NLP. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp 119–126
go back to reference Müller M, Salathé M, Kummervold PE (2020) Covid-twitter-bert: A natural language processing model to analyse Covid-19 content on twitter. arXiv preprint arXiv:2005.07503 Müller M, Salathé M, Kummervold PE (2020) Covid-twitter-bert: A natural language processing model to analyse Covid-19 content on twitter. arXiv preprint arXiv:​2005.​07503
go back to reference Nguyen AT (2020) TATL at WNUT-2020 task 2: a transformer-based baseline system for identification of informative Covid-19 english tweets. In: Proceedings of the 6th Workshop on Noisy User-generated Text (W-NUT 2020), pp 319–323 (2020) Nguyen AT (2020) TATL at WNUT-2020 task 2: a transformer-based baseline system for identification of informative Covid-19 english tweets. In: Proceedings of the 6th Workshop on Noisy User-generated Text (W-NUT 2020), pp 319–323 (2020)
go back to reference Nguyen DQ, Vu T, Nguyen AT (2020) Bertweet: a pre-trained language model for english tweets. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp 9–14 Nguyen DQ, Vu T, Nguyen AT (2020) Bertweet: a pre-trained language model for english tweets. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp 9–14
go back to reference Nguyen DQ, Vu T, Rahimi A, Dao MH, Doan L (2020) Wnut-2020 task 2: identification of informative Covid-19 english tweets. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp 314–318 Nguyen DQ, Vu T, Rahimi A, Dao MH, Doan L (2020) Wnut-2020 task 2: identification of informative Covid-19 english tweets. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp 314–318
go back to reference Ng N, Yee K, Baevski A, Ott M, Auli M, Edunov S (2019) Facebook FAIR’s WMT19 news translation task submission. In: Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pp 314–319 Ng N, Yee K, Baevski A, Ott M, Auli M, Edunov S (2019) Facebook FAIR’s WMT19 news translation task submission. In: Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pp 314–319
go back to reference Nowak J, Taspinar A, Scherer R (2017) LSTM recurrent neural networks for short text and sentiment classification. In: International Conference on Artificial Intelligence and Soft Computing, pp 553–562. Springer Nowak J, Taspinar A, Scherer R (2017) LSTM recurrent neural networks for short text and sentiment classification. In: International Conference on Artificial Intelligence and Soft Computing, pp 553–562. Springer
go back to reference Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21:1–67MathSciNet Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21:1–67MathSciNet
go back to reference Reddy S, Biswal P (2020) Iiitbh at wnut-2020 task 2: Exploiting the best of both worlds. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp. 342–346 Reddy S, Biswal P (2020) Iiitbh at wnut-2020 task 2: Exploiting the best of both worlds. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp. 342–346
go back to reference Sengupta A (2020) Datamafia at wnut-2020 task 2: a study of pre-trained language models along with regularization techniques for downstream tasks. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp 371–377 Sengupta A (2020) Datamafia at wnut-2020 task 2: a study of pre-trained language models along with regularization techniques for downstream tasks. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp 371–377
go back to reference Sennrich R, Haddow B, Birch A (2016) Improving neural machine translation models with monolingual data. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 86–96 Sennrich R, Haddow B, Birch A (2016) Improving neural machine translation models with monolingual data. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 86–96
go back to reference Somers H (2005) Round-trip translation: what is it good for? In: Proceedings of the Australasian Language Technology Workshop 2005, pp 127–133 Somers H (2005) Round-trip translation: what is it good for? In: Proceedings of the Australasian Language Technology Workshop 2005, pp 127–133
go back to reference Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605 Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605
go back to reference Wadhawan A (2020) Phonemer at wnut-2020 task 2: sequence classification using Covid twitter BERT and bagging ensemble technique based on plurality voting. In: Proceedings of the 6th Workshop on Noisy User-generated Text (W-NUT 2020), pp 347–351 Wadhawan A (2020) Phonemer at wnut-2020 task 2: sequence classification using Covid twitter BERT and bagging ensemble technique based on plurality voting. In: Proceedings of the 6th Workshop on Noisy User-generated Text (W-NUT 2020), pp 347–351
go back to reference Wei J, Zou K (2019) Eda: easy data augmentation techniques for boosting performance on text classification tasks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 6382–6388 Wei J, Zou K (2019) Eda: easy data augmentation techniques for boosting performance on text classification tasks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 6382–6388
go back to reference Wolf T, Chaumond J, Debut L, Sanh V, Delangue C, Moi A, Cistac P, Funtowicz M, Davison J, Shleifer S (2020) Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp 38–45 Wolf T, Chaumond J, Debut L, Sanh V, Delangue C, Moi A, Cistac P, Funtowicz M, Davison J, Shleifer S (2020) Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp 38–45
go back to reference Wu X, He J (2020) Character-level recurrent neural network for text classification applied to large scale Chinese news corpus. In: 2020 The 3rd International Conference on Machine Learning and Machine Intelligence, pp 83–87 Wu X, He J (2020) Character-level recurrent neural network for text classification applied to large scale Chinese news corpus. In: 2020 The 3rd International Conference on Machine Learning and Machine Intelligence, pp 83–87
go back to reference Yang Y, Lv H, Chen N (2022) A survey on ensemble learning under the era of deep learning. Artif Intell Rev 55:1–45 Yang Y, Lv H, Chen N (2022) A survey on ensemble learning under the era of deep learning. Artif Intell Rev 55:1–45
go back to reference Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. Adv Neural Inf Process Syst 28:649–657 Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. Adv Neural Inf Process Syst 28:649–657
go back to reference Zhang Y, Baldridge J, He L (2019) Paws: paraphrase adversaries from word scrambling. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp 1298–1308 Zhang Y, Baldridge J, He L (2019) Paws: paraphrase adversaries from word scrambling. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp 1298–1308
go back to reference Zhang Y, Wallace BC (2017) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp 253–263 Zhang Y, Wallace BC (2017) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp 253–263
Metadata
Title
Identifying COVID-19 english informative tweets using limited labelled data
Authors
Srinivasulu Kothuru
A. Santhanavijayan
Publication date
01-12-2023
Publisher
Springer Vienna
Published in
Social Network Analysis and Mining / Issue 1/2023
Print ISSN: 1869-5450
Electronic ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-023-01025-8

Premium Partner