Skip to main content
Top

2022 | OriginalPaper | Chapter

COVID-19 Informative Tweets Identification Through Word-by-Word Lexicon Replacement Using Pretrained Biomedical Corpus

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The coronavirus pandemic has contributed toward the prevalence of numerous fake news and misleading information over the social media especially Twitter. Therefore, a task of identifying whether a tweet is informative or uninformative has caught many researchers’ attentions. The literature showed high dependency on transformers architectures. Yet, since the task requires more concentration on domain-specific terms, there would be a need for lexicon-based expansion. Hence, this paper proposes a word-by-word lexicon replacement method for the task of informative tweet extraction. A pretrained model of medical word embedding has been utilized to perform the replacement. In addition, multiple replacement conditions have been employed. Consequentially, different feature space representations have been applied upon the new tweet document with replaced terms. Lastly, a Logistic Regression (LR) classifier has been used to classify documents into Informative and Uninformative. Using the benchmark dataset of WNUT-2020 at Task 2, results of applying the proposed replacement method showed better performance than without the replacement by obtaining 0.91 of f1-score. This result emphasizes the importance and usefulness of the proposed replacement method in terms of improving the classification accuracy of informative tweet identification.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Al-Rakhami, M.S., Al-Amri, A.M.: Lies kill, facts save: detecting COVID-19 misinformation in Twitter. IEEE Access 8, 155961–155970 (2020)CrossRef Al-Rakhami, M.S., Al-Amri, A.M.: Lies kill, facts save: detecting COVID-19 misinformation in Twitter. IEEE Access 8, 155961–155970 (2020)CrossRef
go back to reference Chiu, B., et al.: How to train good word embeddings for biomedical NLP. In: Proceedings of the 15th Workshop on Biomedical Natural Language Processing (2016) Chiu, B., et al.: How to train good word embeddings for biomedical NLP. In: Proceedings of the 15th Workshop on Biomedical Natural Language Processing (2016)
go back to reference Dai, A.M., Le, Q.V.: Semi-supervised sequence learning. In: Advances in Neural Information Processing Systems (2015) Dai, A.M., Le, Q.V.: Semi-supervised sequence learning. In: Advances in Neural Information Processing Systems (2015)
go back to reference Gill, S., et al.: Twitter and the Credibility of Disseminated Medical Information During the COVID-19 Pandemic. SAGE Publications, Los Angeles (2021)CrossRef Gill, S., et al.: Twitter and the Credibility of Disseminated Medical Information During the COVID-19 Pandemic. SAGE Publications, Los Angeles (2021)CrossRef
go back to reference Hettiarachchi, H., Ranasinghe, T.: InfoMiner at WNUT-2020 task 2: transformer-based covid-19 informative tweet extraction. arXiv preprint arXiv:2010.05327 (2020) Hettiarachchi, H., Ranasinghe, T.: InfoMiner at WNUT-2020 task 2: transformer-based covid-19 informative tweet extraction. arXiv preprint arXiv:​2010.​05327 (2020)
go back to reference Malla, S., Alphonse, P.J.A.: COVID-19 outbreak: an ensemble pre-trained deep learning model for detecting informative tweets. Appl. Soft Comput. 107, 107495 (2021)CrossRef Malla, S., Alphonse, P.J.A.: COVID-19 outbreak: an ensemble pre-trained deep learning model for detecting informative tweets. Appl. Soft Comput. 107, 107495 (2021)CrossRef
go back to reference Møller, A.G., Van Der Goot, R., Plank, B.: NLP North at WNUT-2020 Task 2: pre-training versus ensembling for detection of informative COVID-19 English tweets. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020) (2020) Møller, A.G., Van Der Goot, R., Plank, B.: NLP North at WNUT-2020 Task 2: pre-training versus ensembling for detection of informative COVID-19 English tweets. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020) (2020)
go back to reference Orso, D., et al.: Infodemic and the spread of fake news in the COVID-19-era. Eur. J. Emerg. Med. 27(5), 327–328 (2020)CrossRef Orso, D., et al.: Infodemic and the spread of fake news in the COVID-19-era. Eur. J. Emerg. Med. 27(5), 327–328 (2020)CrossRef
go back to reference Sætre, R., et al.: AKANE system: protein-protein interaction pairs in BioCreAtIvE2 challenge, PPI-IPS subtask. In: Proceedings of the Second Biocreative Challenge Workshop, Madrid (2007) Sætre, R., et al.: AKANE system: protein-protein interaction pairs in BioCreAtIvE2 challenge, PPI-IPS subtask. In: Proceedings of the Second Biocreative Challenge Workshop, Madrid (2007)
go back to reference Sancheti, A., Chawla, K., Verma, G.: LynyrdSkynyrd at WNUT-2020 task 2: semi-supervised learning for identification of informative COVID-19 English tweets. arXiv preprint arXiv:2009.03849 (2020) Sancheti, A., Chawla, K., Verma, G.: LynyrdSkynyrd at WNUT-2020 task 2: semi-supervised learning for identification of informative COVID-19 English tweets. arXiv preprint arXiv:​2009.​03849 (2020)
go back to reference Stephens, M.: A geospatial infodemic: mapping Twitter conspiracy theories of COVID-19. Dialogues Hum. Geogr. 10(2), 276–281 (2020)CrossRef Stephens, M.: A geospatial infodemic: mapping Twitter conspiracy theories of COVID-19. Dialogues Hum. Geogr. 10(2), 276–281 (2020)CrossRef
go back to reference Tran, K.V., et al.: UIT-HSE at WNUT-2020 Task 2: exploiting CT-BERT for Identifying COVID-19 information on the Twitter social network. arXiv preprint arXiv:2009.02935 (2020) Tran, K.V., et al.: UIT-HSE at WNUT-2020 Task 2: exploiting CT-BERT for Identifying COVID-19 information on the Twitter social network. arXiv preprint arXiv:​2009.​02935 (2020)
go back to reference Wadhawan, A.: Phonemer at WNUT-2020 Task 2: sequence classification using COVID Twitter BERT and bagging ensemble technique based on plurality voting. arXiv preprint arXiv:2010.00294 (2020) Wadhawan, A.: Phonemer at WNUT-2020 Task 2: sequence classification using COVID Twitter BERT and bagging ensemble technique based on plurality voting. arXiv preprint arXiv:​2010.​00294 (2020)
go back to reference Yang, K.-C., Torres-Lugo, C., Menczer, F.: Prevalence of low-credibility information on Twitter during the covid-19 outbreak. arXiv preprint arXiv:2004.14484 (2020) Yang, K.-C., Torres-Lugo, C., Menczer, F.: Prevalence of low-credibility information on Twitter during the covid-19 outbreak. arXiv preprint arXiv:​2004.​14484 (2020)
go back to reference Zhou, X., et al.: Recovery: a multimodal repository for covid-19 news credibility research. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management (2020) Zhou, X., et al.: Recovery: a multimodal repository for covid-19 news credibility research. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management (2020)
Metadata
Title
COVID-19 Informative Tweets Identification Through Word-by-Word Lexicon Replacement Using Pretrained Biomedical Corpus
Author
Rami Naim Mohammed Yousuf
Copyright Year
2022
DOI
https://doi.org/10.1007/978-3-031-08087-6_17

Premium Partners