Skip to main content
Erschienen in: Social Network Analysis and Mining 1/2024

01.12.2024 | Original Article

Classifying informative tweets using feature enhanced pre-trained language model

verfasst von: Prakash Babu Yandrapati, R. Eswari

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Classifying tweets containing valuable information about COVID-19 is crucial for developing monitoring systems that provide the latest updates. Existing approaches for informative tweet classification considers only the last layer vector of a special token by ignoring the vectors of other tokens and the token vectors from the previous layers. The paper addresses this drawback by proposing a novel approach which (i) makes use of all the token vectors from the last four layers and (ii) leverages additional information in the form of POS tags and informative words. Experiment results show that the proposed approach outperforms all the existing approaches and achieves an accuracy of 92% and F1-score of 92.01% on the COVID-19 informative tweets dataset. The uniqueness of this paper is the attempt to leverage token vectors from the last four layers, additional information in the form of POS tags and informative words from COVID-19 informative tweets for classification.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Bangyal WH, Qasim R, Ahmad Z et al (2021) Detection of fake news text classification on covid-19 using deep learning approaches. Comput Math Methods Med 2021:1–14CrossRef Bangyal WH, Qasim R, Ahmad Z et al (2021) Detection of fake news text classification on covid-19 using deep learning approaches. Comput Math Methods Med 2021:1–14CrossRef
Zurück zum Zitat Bao LD, Nguyen VA, Huu QP (2020) Sunbear at wnut-2020 task 2: improving bert-based noisy text classification with knowledge of the data domain. In: Proceedings of the sixth workshop on noisy user-generated text (W-NUT 2020), pp 485–490 Bao LD, Nguyen VA, Huu QP (2020) Sunbear at wnut-2020 task 2: improving bert-based noisy text classification with knowledge of the data domain. In: Proceedings of the sixth workshop on noisy user-generated text (W-NUT 2020), pp 485–490
Zurück zum Zitat Bojanowski P, Grave E, Joulin A et al (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146CrossRef Bojanowski P, Grave E, Joulin A et al (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146CrossRef
Zurück zum Zitat Chen S, Huang Y, Huang X, et al (2019) Hitsz-icrc: a report for smm4h shared task 2019-automatic classification and extraction of adverse effect mentions in tweets. In: Proceedings of the fourth social media mining for health applications (# SMM4H) workshop & shared task, pp 47–51 Chen S, Huang Y, Huang X, et al (2019) Hitsz-icrc: a report for smm4h shared task 2019-automatic classification and extraction of adverse effect mentions in tweets. In: Proceedings of the fourth social media mining for health applications (# SMM4H) workshop & shared task, pp 47–51
Zurück zum Zitat Devlin J, Chang MW, Lee K, et al (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4171–4186 Devlin J, Chang MW, Lee K, et al (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4171–4186
Zurück zum Zitat Jagadeesh M, Alphonse P (2020) Nit covid-19 at wnut-2020 task 2: deep learning model Roberta for identify informative covid-19 English tweets. In: Proceedings of the sixth workshop on noisy user-generated text (W-NUT 2020), pp 450–454 Jagadeesh M, Alphonse P (2020) Nit covid-19 at wnut-2020 task 2: deep learning model Roberta for identify informative covid-19 English tweets. In: Proceedings of the sixth workshop on noisy user-generated text (W-NUT 2020), pp 450–454
Zurück zum Zitat Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, pp 1746–1751 Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, pp 1746–1751
Zurück zum Zitat Kumar P, Singh A (2020) Nutcracker at wnut-2020 task 2: Robustly identifying informative covid-19 tweets using ensembling and adversarial training. In: Proceedings of the sixth workshop on noisy user-generated text (W-NUT 2020), pp 404–408 Kumar P, Singh A (2020) Nutcracker at wnut-2020 task 2: Robustly identifying informative covid-19 tweets using ensembling and adversarial training. In: Proceedings of the sixth workshop on noisy user-generated text (W-NUT 2020), pp 404–408
Zurück zum Zitat Madichetty S, Sridevi M (2021) A novel method for identifying the damage assessment tweets during disaster. Futur Gener Comput Syst 116:440–454CrossRef Madichetty S, Sridevi M (2021) A novel method for identifying the damage assessment tweets during disaster. Futur Gener Comput Syst 116:440–454CrossRef
Zurück zum Zitat Madichetty S, Muthukumarasamy S, Jayadev P (2021) Multi-modal classification of twitter data during disasters for humanitarian response. J Ambient Intell Human Comput 12(11):10223–10237CrossRef Madichetty S, Muthukumarasamy S, Jayadev P (2021) Multi-modal classification of twitter data during disasters for humanitarian response. J Ambient Intell Human Comput 12(11):10223–10237CrossRef
Zurück zum Zitat Madichetty S et al (2021b) A stacked convolutional neural network for detecting the resource tweets during a disaster. Multimed Tools Appl 80(3):3927–3949CrossRefPubMed Madichetty S et al (2021b) A stacked convolutional neural network for detecting the resource tweets during a disaster. Multimed Tools Appl 80(3):3927–3949CrossRefPubMed
Zurück zum Zitat Malla S, Alphonse P (2021) Covid-19 outbreak: an ensemble pre-trained deep learning model for detecting informative tweets. Appl Soft Comput 107(107):495 Malla S, Alphonse P (2021) Covid-19 outbreak: an ensemble pre-trained deep learning model for detecting informative tweets. Appl Soft Comput 107(107):495
Zurück zum Zitat Møller AG, Van Der Goot R, Plank B (2020) NLP north at wnut-2020 task 2: pre-training versus ensembling for detection of informative covid-19 English tweets. In: Proceedings of the sixth workshop on noisy user-generated text (W-NUT 2020), pp 331–336 Møller AG, Van Der Goot R, Plank B (2020) NLP north at wnut-2020 task 2: pre-training versus ensembling for detection of informative covid-19 English tweets. In: Proceedings of the sixth workshop on noisy user-generated text (W-NUT 2020), pp 331–336
Zurück zum Zitat Nguyen DQ, Vu T, Rahimi A, et al (2020b) WNUT-2020 Task 2: identification of informative COVID-19 English tweets. In: Proceedings of the 6th workshop on noisy user-generated text Nguyen DQ, Vu T, Rahimi A, et al (2020b) WNUT-2020 Task 2: identification of informative COVID-19 English tweets. In: Proceedings of the 6th workshop on noisy user-generated text
Zurück zum Zitat Nimmi K, Janet B, Kalai SA et al (2022) Pre-trained ensemble model for identification of emotion during covid-19 based on emergency response support system dataset. Appl Soft Comput 120(108):842 Nimmi K, Janet B, Kalai SA et al (2022) Pre-trained ensemble model for identification of emotion during covid-19 based on emergency response support system dataset. Appl Soft Comput 120(108):842
Zurück zum Zitat Nowak J, Taspinar A, Scherer R (2017) LSTM recurrent neural networks for short text and sentiment classification. In: International conference on artificial intelligence and soft computing, Springer, pp 553–562 Nowak J, Taspinar A, Scherer R (2017) LSTM recurrent neural networks for short text and sentiment classification. In: International conference on artificial intelligence and soft computing, Springer, pp 553–562
Zurück zum Zitat Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNet Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNet
Zurück zum Zitat Pennington J, Socher R, Manning C (2014) GloVe: global vectors for word representation. In: Moschitti A, Pang B, Daelemans W (eds) Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, pp 1532–154 Pennington J, Socher R, Manning C (2014) GloVe: global vectors for word representation. In: Moschitti A, Pang B, Daelemans W (eds) Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, pp 1532–154
Zurück zum Zitat Matthew P, Mark N, Mohit I, Matt G, Christopher C, Kenton L, Luke Z (2018) Deep contextualized word representations. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, 1: 2227–2237 Matthew P, Mark N, Mohit I, Matt G, Christopher C, Kenton L, Luke Z (2018) Deep contextualized word representations. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, 1: 2227–2237
Zurück zum Zitat Sreenivasulu M, Sridevi M (2018) A survey on event detection methods on various social media. In: Recent findings in intelligent computing techniques. Springer, pp 87–93 Sreenivasulu M, Sridevi M (2018) A survey on event detection methods on various social media. In: Recent findings in intelligent computing techniques. Springer, pp 87–93
Zurück zum Zitat Sreenivasulu M, Sridevi M (2020) Comparative study of statistical features to detect the target event during disaster. Big Data Min Anal 3(2):121–130CrossRef Sreenivasulu M, Sridevi M (2020) Comparative study of statistical features to detect the target event during disaster. Big Data Min Anal 3(2):121–130CrossRef
Zurück zum Zitat Waheeb SA, Khan NA, Shang X (2022) An efficient sentiment analysis based deep learning classification model to evaluate treatment quality. Malays J Comput Sci 35(1):1–20ADSCrossRef Waheeb SA, Khan NA, Shang X (2022) An efficient sentiment analysis based deep learning classification model to evaluate treatment quality. Malays J Comput Sci 35(1):1–20ADSCrossRef
Zurück zum Zitat Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in neural information processing systems, pp 649–657 Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in neural information processing systems, pp 649–657
Metadaten
Titel
Classifying informative tweets using feature enhanced pre-trained language model
verfasst von
Prakash Babu Yandrapati
R. Eswari
Publikationsdatum
01.12.2024
Verlag
Springer Vienna
Erschienen in
Social Network Analysis and Mining / Ausgabe 1/2024
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-024-01204-1

Weitere Artikel der Ausgabe 1/2024

Social Network Analysis and Mining 1/2024 Zur Ausgabe

Premium Partner