Skip to main content

2022 | OriginalPaper | Buchkapitel

COVID-19 Informative Tweets Identification Through Word-by-Word Lexicon Replacement Using Pretrained Biomedical Corpus

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The coronavirus pandemic has contributed toward the prevalence of numerous fake news and misleading information over the social media especially Twitter. Therefore, a task of identifying whether a tweet is informative or uninformative has caught many researchers’ attentions. The literature showed high dependency on transformers architectures. Yet, since the task requires more concentration on domain-specific terms, there would be a need for lexicon-based expansion. Hence, this paper proposes a word-by-word lexicon replacement method for the task of informative tweet extraction. A pretrained model of medical word embedding has been utilized to perform the replacement. In addition, multiple replacement conditions have been employed. Consequentially, different feature space representations have been applied upon the new tweet document with replaced terms. Lastly, a Logistic Regression (LR) classifier has been used to classify documents into Informative and Uninformative. Using the benchmark dataset of WNUT-2020 at Task 2, results of applying the proposed replacement method showed better performance than without the replacement by obtaining 0.91 of f1-score. This result emphasizes the importance and usefulness of the proposed replacement method in terms of improving the classification accuracy of informative tweet identification.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Al-Rakhami, M.S., Al-Amri, A.M.: Lies kill, facts save: detecting COVID-19 misinformation in Twitter. IEEE Access 8, 155961–155970 (2020)CrossRef Al-Rakhami, M.S., Al-Amri, A.M.: Lies kill, facts save: detecting COVID-19 misinformation in Twitter. IEEE Access 8, 155961–155970 (2020)CrossRef
Zurück zum Zitat Chiu, B., et al.: How to train good word embeddings for biomedical NLP. In: Proceedings of the 15th Workshop on Biomedical Natural Language Processing (2016) Chiu, B., et al.: How to train good word embeddings for biomedical NLP. In: Proceedings of the 15th Workshop on Biomedical Natural Language Processing (2016)
Zurück zum Zitat Dai, A.M., Le, Q.V.: Semi-supervised sequence learning. In: Advances in Neural Information Processing Systems (2015) Dai, A.M., Le, Q.V.: Semi-supervised sequence learning. In: Advances in Neural Information Processing Systems (2015)
Zurück zum Zitat Gill, S., et al.: Twitter and the Credibility of Disseminated Medical Information During the COVID-19 Pandemic. SAGE Publications, Los Angeles (2021)CrossRef Gill, S., et al.: Twitter and the Credibility of Disseminated Medical Information During the COVID-19 Pandemic. SAGE Publications, Los Angeles (2021)CrossRef
Zurück zum Zitat Hettiarachchi, H., Ranasinghe, T.: InfoMiner at WNUT-2020 task 2: transformer-based covid-19 informative tweet extraction. arXiv preprint arXiv:2010.05327 (2020) Hettiarachchi, H., Ranasinghe, T.: InfoMiner at WNUT-2020 task 2: transformer-based covid-19 informative tweet extraction. arXiv preprint arXiv:​2010.​05327 (2020)
Zurück zum Zitat Malla, S., Alphonse, P.J.A.: COVID-19 outbreak: an ensemble pre-trained deep learning model for detecting informative tweets. Appl. Soft Comput. 107, 107495 (2021)CrossRef Malla, S., Alphonse, P.J.A.: COVID-19 outbreak: an ensemble pre-trained deep learning model for detecting informative tweets. Appl. Soft Comput. 107, 107495 (2021)CrossRef
Zurück zum Zitat Møller, A.G., Van Der Goot, R., Plank, B.: NLP North at WNUT-2020 Task 2: pre-training versus ensembling for detection of informative COVID-19 English tweets. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020) (2020) Møller, A.G., Van Der Goot, R., Plank, B.: NLP North at WNUT-2020 Task 2: pre-training versus ensembling for detection of informative COVID-19 English tweets. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020) (2020)
Zurück zum Zitat Nguyen, D.Q., et al.: WNUT-2020 task 2: identification of informative COVID-19 English tweets. arXiv preprint arXiv:2010.08232 (2020) Nguyen, D.Q., et al.: WNUT-2020 task 2: identification of informative COVID-19 English tweets. arXiv preprint arXiv:​2010.​08232 (2020)
Zurück zum Zitat Orso, D., et al.: Infodemic and the spread of fake news in the COVID-19-era. Eur. J. Emerg. Med. 27(5), 327–328 (2020)CrossRef Orso, D., et al.: Infodemic and the spread of fake news in the COVID-19-era. Eur. J. Emerg. Med. 27(5), 327–328 (2020)CrossRef
Zurück zum Zitat Sætre, R., et al.: AKANE system: protein-protein interaction pairs in BioCreAtIvE2 challenge, PPI-IPS subtask. In: Proceedings of the Second Biocreative Challenge Workshop, Madrid (2007) Sætre, R., et al.: AKANE system: protein-protein interaction pairs in BioCreAtIvE2 challenge, PPI-IPS subtask. In: Proceedings of the Second Biocreative Challenge Workshop, Madrid (2007)
Zurück zum Zitat Sancheti, A., Chawla, K., Verma, G.: LynyrdSkynyrd at WNUT-2020 task 2: semi-supervised learning for identification of informative COVID-19 English tweets. arXiv preprint arXiv:2009.03849 (2020) Sancheti, A., Chawla, K., Verma, G.: LynyrdSkynyrd at WNUT-2020 task 2: semi-supervised learning for identification of informative COVID-19 English tweets. arXiv preprint arXiv:​2009.​03849 (2020)
Zurück zum Zitat Stephens, M.: A geospatial infodemic: mapping Twitter conspiracy theories of COVID-19. Dialogues Hum. Geogr. 10(2), 276–281 (2020)CrossRef Stephens, M.: A geospatial infodemic: mapping Twitter conspiracy theories of COVID-19. Dialogues Hum. Geogr. 10(2), 276–281 (2020)CrossRef
Zurück zum Zitat Tran, K.V., et al.: UIT-HSE at WNUT-2020 Task 2: exploiting CT-BERT for Identifying COVID-19 information on the Twitter social network. arXiv preprint arXiv:2009.02935 (2020) Tran, K.V., et al.: UIT-HSE at WNUT-2020 Task 2: exploiting CT-BERT for Identifying COVID-19 information on the Twitter social network. arXiv preprint arXiv:​2009.​02935 (2020)
Zurück zum Zitat Wadhawan, A.: Phonemer at WNUT-2020 Task 2: sequence classification using COVID Twitter BERT and bagging ensemble technique based on plurality voting. arXiv preprint arXiv:2010.00294 (2020) Wadhawan, A.: Phonemer at WNUT-2020 Task 2: sequence classification using COVID Twitter BERT and bagging ensemble technique based on plurality voting. arXiv preprint arXiv:​2010.​00294 (2020)
Zurück zum Zitat Yang, K.-C., Torres-Lugo, C., Menczer, F.: Prevalence of low-credibility information on Twitter during the covid-19 outbreak. arXiv preprint arXiv:2004.14484 (2020) Yang, K.-C., Torres-Lugo, C., Menczer, F.: Prevalence of low-credibility information on Twitter during the covid-19 outbreak. arXiv preprint arXiv:​2004.​14484 (2020)
Zurück zum Zitat Zarocostas, J.: How to fight an infodemic. Lancet 395(10225), 676 (2020)CrossRef Zarocostas, J.: How to fight an infodemic. Lancet 395(10225), 676 (2020)CrossRef
Zurück zum Zitat Zhou, X., et al.: Recovery: a multimodal repository for covid-19 news credibility research. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management (2020) Zhou, X., et al.: Recovery: a multimodal repository for covid-19 news credibility research. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management (2020)
Metadaten
Titel
COVID-19 Informative Tweets Identification Through Word-by-Word Lexicon Replacement Using Pretrained Biomedical Corpus
verfasst von
Rami Naim Mohammed Yousuf
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-031-08087-6_17

    Marktübersichten

    Die im Laufe eines Jahres in der „adhäsion“ veröffentlichten Marktübersichten helfen Anwendern verschiedenster Branchen, sich einen gezielten Überblick über Lieferantenangebote zu verschaffen.