Skip to main content
Top
Published in: World Wide Web 1/2023

16-03-2022

Identifying informative tweets during a pandemic via a topic-aware neural language model

Authors: Wang Gao, Lin Li, Xiaohui Tao, Jing Zhou, Jun Tao

Published in: World Wide Web | Issue 1/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Every epidemic affects the real lives of many people around the world and leads to terrible consequences. Recently, many tweets about the COVID-19 pandemic have been shared publicly on social media platforms. The analysis of these tweets is helpful for emergency response organizations to prioritize their tasks and make better decisions. However, most of these tweets are non-informative, which is a challenge for establishing an automated system to detect useful information in social media. Furthermore, existing methods ignore unlabeled data and topic background knowledge, which can provide additional semantic information. In this paper, we propose a novel Topic-Aware BERT (TABERT) model to solve the above challenges. TABERT first leverages a topic model to extract the latent topics of tweets. Secondly, a flexible framework is used to combine topic information with the output of BERT. Finally, we adopt adversarial training to achieve semi-supervised learning, and a large amount of unlabeled data can be used to improve inner representations of the model. Experimental results on the dataset of COVID-19 English tweets show that our model outperforms classic and state-of-the-art baselines.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Al-garadi, M.A., Khan, M.S., Varathan, K.D., Mujtaba, G., Al-Kabsi, A.M.: Using online social networks to track a pandemic: A systematic review. Journal of Biomedical Informatics 62, 1–11 (2016)CrossRef Al-garadi, M.A., Khan, M.S., Varathan, K.D., Mujtaba, G., Al-Kabsi, A.M.: Using online social networks to track a pandemic: A systematic review. Journal of Biomedical Informatics 62, 1–11 (2016)CrossRef
2.
go back to reference Cai, T., Li, J., Mian, A.S., Sellis, T., Yu, J.X., et al.: Target-aware holistic influence maximization in spatial social networks. IEEE Transactions on Knowledge and Data Engineering 34(4), 1993–2007 (2022) Cai, T., Li, J., Mian, A.S., Sellis, T., Yu, J.X., et al.: Target-aware holistic influence maximization in spatial social networks. IEEE Transactions on Knowledge and Data Engineering 34(4), 1993–2007 (2022)
3.
go back to reference Chaudhary, Y., Gupta, P., Saxena, K., Kulkarni, V., Runkler, T.A., Schütze, H.: Topicbert for energy efficient document classification. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP), pp. 1682–1690 (2020) Chaudhary, Y., Gupta, P., Saxena, K., Kulkarni, V., Runkler, T.A., Schütze, H.: Topicbert for energy efficient document classification. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP), pp. 1682–1690 (2020)
4.
go back to reference Cheng, X., Yan, X., Lan, Y., Guo, J.: Btm: Topic modeling over short texts. IEEE Transactions on Knowledge and Data Engineering 26(12), 2928–2941 (2014)CrossRef Cheng, X., Yan, X., Lan, Y., Guo, J.: Btm: Topic modeling over short texts. IEEE Transactions on Knowledge and Data Engineering 26(12), 2928–2941 (2014)CrossRef
5.
go back to reference Chowdhury, J.R., Caragea, C., Caragea, D.: Cross-lingual disaster-related multi-label tweet classification with manifold mixup. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 292–298 (2020) Chowdhury, J.R., Caragea, C., Caragea, D.: Cross-lingual disaster-related multi-label tweet classification with manifold mixup. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 292–298 (2020)
6.
go back to reference Chowdhury, J.R., Caragea, C., Caragea, D.: On identifying hashtags in disaster twitter data. In: Proceedings of Conference on Artificial Intelligence (AAAI), pp. 498–506 (2020) Chowdhury, J.R., Caragea, C., Caragea, D.: On identifying hashtags in disaster twitter data. In: Proceedings of Conference on Artificial Intelligence (AAAI), pp. 498–506 (2020)
7.
go back to reference Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pp. 4171–4186 (2019) Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pp. 4171–4186 (2019)
8.
go back to reference Feng, J., Rao, Y., Xie, H., Wang, F.L., Li, Q.: User group based emotion detection and topic discovery over short text. World Wide Web 23(3), 1553–1587 (2020)CrossRef Feng, J., Rao, Y., Xie, H., Wang, F.L., Li, Q.: User group based emotion detection and topic discovery over short text. World Wide Web 23(3), 1553–1587 (2020)CrossRef
9.
go back to reference Gao, W., Fang, Y., Li, L., Tao, X.: Event detection in social media via graph neural network. In: Web Information Systems Engineering (WISE), pp. 370–384 (2021) Gao, W., Fang, Y., Li, L., Tao, X.: Event detection in social media via graph neural network. In: Web Information Systems Engineering (WISE), pp. 370–384 (2021)
10.
go back to reference Gao, W., Peng, M., Wang, H., Zhang, Y., Xie, Q., Tian, G.: Incorporating word embeddings into topic modeling of short text. Knowledge and Information Systems 61, 1123–1145 (2019)CrossRef Gao, W., Peng, M., Wang, H., Zhang, Y., Xie, Q., Tian, G.: Incorporating word embeddings into topic modeling of short text. Knowledge and Information Systems 61, 1123–1145 (2019)CrossRef
11.
go back to reference Gao, W., Peng, M., Wang, H., Zhang, Y., Han, W., Hu, G., Xie, Q.: Generation of topic evolution graphs from short text streams. Neurocomputing 383, 282–294 (2020)CrossRef Gao, W., Peng, M., Wang, H., Zhang, Y., Han, W., Hu, G., Xie, Q.: Generation of topic evolution graphs from short text streams. Neurocomputing 383, 282–294 (2020)CrossRef
12.
go back to reference Gao, W., Fang, Y., Zhang, F., Yang, Z.: Representation learning of knowledge graphs using convolutional neural networks. Neural Network World 30, 145–160 (2020)CrossRef Gao, W., Fang, Y., Zhang, F., Yang, Z.: Representation learning of knowledge graphs using convolutional neural networks. Neural Network World 30, 145–160 (2020)CrossRef
13.
go back to reference Gao, W., Li, L., Zhu, X., Wang, Y.: Detecting disaster-related tweets via multimodal adversarial neural network. IEEE MultiMedia 27(4), 28–37 (2020)CrossRef Gao, W., Li, L., Zhu, X., Wang, Y.: Detecting disaster-related tweets via multimodal adversarial neural network. IEEE MultiMedia 27(4), 28–37 (2020)CrossRef
14.
go back to reference Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.C., Bengio, Y.: Generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 2672–2680 (2014) Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.C., Bengio, Y.: Generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 2672–2680 (2014)
15.
go back to reference Haldar, N.A.H., Reynolds, M., Shao, Q., Paris, C., Li, J., Chen, Y.: Activity location inference of users based on social relationship. World Wide Web 24(4), 1165–1183 (2021)CrossRef Haldar, N.A.H., Reynolds, M., Shao, Q., Paris, C., Li, J., Chen, Y.: Activity location inference of users based on social relationship. World Wide Web 24(4), 1165–1183 (2021)CrossRef
16.
go back to reference Hu, W., Tsujii, J.: A latent concept topic model for robust topic inference using word embeddings. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 380–386 (2016) Hu, W., Tsujii, J.: A latent concept topic model for robust topic inference using word embeddings. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 380–386 (2016)
17.
go back to reference Huang, J., Peng, M., Li, P., Hu, Z., Xu, C.: Improving biterm topic model with word embeddings. World Wide Web 23(6), 3099–3124 (2020)CrossRef Huang, J., Peng, M., Li, P., Hu, Z., Xu, C.: Improving biterm topic model with word embeddings. World Wide Web 23(6), 3099–3124 (2020)CrossRef
18.
go back to reference Imran, M., Mitra, P., Castillo, C.: Twitter as a lifeline: Human-annotated twitter corpora for NLP of crisis-related messages. In: Proceedings of International Conference on Language Resources and Evaluation (LREC), pp. 1–6 (2016) Imran, M., Mitra, P., Castillo, C.: Twitter as a lifeline: Human-annotated twitter corpora for NLP of crisis-related messages. In: Proceedings of International Conference on Language Resources and Evaluation (LREC), pp. 1–6 (2016)
19.
go back to reference Kumar, A., Singh, J.P., Dwivedi, Y.K., Rana, N.P.: A deep multi-modal neural network for informative twitter content classification during emergencies. Annals of Operations Research 7, 1–32 (2020) Kumar, A., Singh, J.P., Dwivedi, Y.K., Rana, N.P.: A deep multi-modal neural network for informative twitter content classification during emergencies. Annals of Operations Research 7, 1–32 (2020)
20.
go back to reference Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: A lite BERT for self-supervised learning of language representations. In: Proceedings of International Conference on Learning Representations (ICLR), pp. 1–17 (2020) Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: A lite BERT for self-supervised learning of language representations. In: Proceedings of International Conference on Learning Representations (ICLR), pp. 1–17 (2020)
21.
go back to reference Li, C., Duan, Y., Wang, H., Zhang, Z., Sun, A., Ma, Z.: Enhancing topic modeling for short texts with auxiliary word embeddings. ACM Transactions on Information Systems 36(2), 1–30 (2017)CrossRef Li, C., Duan, Y., Wang, H., Zhang, Z., Sun, A., Ma, Z.: Enhancing topic modeling for short texts with auxiliary word embeddings. ACM Transactions on Information Systems 36(2), 1–30 (2017)CrossRef
22.
go back to reference Li, J., Cai, T., Deng, K., Wang, X., Sellis, T., Xia, F.: Community-diversified influence maximization in social networks. Information Systems 92, 1–12 (2020)CrossRef Li, J., Cai, T., Deng, K., Wang, X., Sellis, T., Xia, F.: Community-diversified influence maximization in social networks. Information Systems 92, 1–12 (2020)CrossRef
23.
go back to reference Li, Z., Wang, X., Li, J., Zhang, Q.: Deep attributed network representation learning of complex coupling and interaction. Knowledge-Based Systems 212, 1–15 (2021)CrossRef Li, Z., Wang, X., Li, J., Zhang, Q.: Deep attributed network representation learning of complex coupling and interaction. Knowledge-Based Systems 212, 1–15 (2021)CrossRef
24.
go back to reference Long, Z., Alharthi, R., El Saddik, A.: Needfull-a tweet analysis platform to study human needs during the covid-19 pandemic in new york state. IEEE Access 8, 136046–136055 (2020)CrossRef Long, Z., Alharthi, R., El Saddik, A.: Needfull-a tweet analysis platform to study human needs during the covid-19 pandemic in new york state. IEEE Access 8, 136046–136055 (2020)CrossRef
25.
go back to reference Mahata, D., Talburt, J.R., Singh, V.K.: From chirps to whistles: Discovering event-specific informative content from twitter. In: Proceedings of the ACM Web Science Conference (WebSci), pp. 1–10 (2015) Mahata, D., Talburt, J.R., Singh, V.K.: From chirps to whistles: Discovering event-specific informative content from twitter. In: Proceedings of the ACM Web Science Conference (WebSci), pp. 1–10 (2015)
26.
go back to reference Mukherjee, S., Kumar, R., Bala, P.K.: Managing a natural disaster: actionable insights from microblog data. Journal of Decision Systems 31, 134–149 (2022)CrossRef Mukherjee, S., Kumar, R., Bala, P.K.: Managing a natural disaster: actionable insights from microblog data. Journal of Decision Systems 31, 134–149 (2022)CrossRef
27.
go back to reference Narayan, S., Cohen, S.B., Lapata, M.: Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP), pp. 1797–1807 (2018) Narayan, S., Cohen, S.B., Lapata, M.: Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP), pp. 1797–1807 (2018)
28.
go back to reference Neppalli, V.K., Caragea, C., Caragea, D.: Deep neural networks versus naive bayes classifiers for identifying informative tweets during disasters. In: Proceedings of Information Systems for Crisis Response and Management (ISCRAM), pp. 1–10 (2018) Neppalli, V.K., Caragea, C., Caragea, D.: Deep neural networks versus naive bayes classifiers for identifying informative tweets during disasters. In: Proceedings of Information Systems for Crisis Response and Management (ISCRAM), pp. 1–10 (2018)
29.
go back to reference Nguyen, D.Q., Vu, T., Rahimi, A., Dao, M.H., Nguyen, L.T., Doan, L.: WNUT-2020 task 2: identification of informative COVID-19 english tweets. In: Proceedings of the Workshop on Noisy User-generated Text (WNUT), pp. 314–318 (2020) Nguyen, D.Q., Vu, T., Rahimi, A., Dao, M.H., Nguyen, L.T., Doan, L.: WNUT-2020 task 2: identification of informative COVID-19 english tweets. In: Proceedings of the Workshop on Noisy User-generated Text (WNUT), pp. 314–318 (2020)
30.
go back to reference Roy, S., Mishra, S., Matam, R.: Classification and summarization for informative tweets. In: Proceedings of IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), pp. 1–4 (2020) Roy, S., Mishra, S., Matam, R.: Classification and summarization for informative tweets. In: Proceedings of IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), pp. 1–4 (2020)
31.
go back to reference Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 2226–2234 (2016) Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 2226–2234 (2016)
32.
go back to reference Sarki, R., Ahmed, K., Wang, H., Zhang, Y.: Automated detection of mild and multi-class diabetic eye diseases using deep learning. Health Information Science and Systems 8(1), 1–9 (2020)CrossRef Sarki, R., Ahmed, K., Wang, H., Zhang, Y.: Automated detection of mild and multi-class diabetic eye diseases using deep learning. Health Information Science and Systems 8(1), 1–9 (2020)CrossRef
33.
go back to reference Shahi, G.K., Dirkson, A., Majchrzak, T.A.: An exploratory study of COVID-19 misinformation on twitter. Online Social Networks and Media 22, 1–16 (2021)CrossRef Shahi, G.K., Dirkson, A., Majchrzak, T.A.: An exploratory study of COVID-19 misinformation on twitter. Online Social Networks and Media 22, 1–16 (2021)CrossRef
34.
go back to reference Singh, L., Bansal, S., Bode, L., Budak, C., Chi, G., Kawintiranon, K., Padden, C., Vanarsdall, R., Vraga, E.K., Wang, Y.: A first look at COVID-19 information and misinformation sharing on twitter. 1–24 arxiv:2003.13907 (2020) Singh, L., Bansal, S., Bode, L., Budak, C., Chi, G., Kawintiranon, K., Padden, C., Vanarsdall, R., Vraga, E.K., Wang, Y.: A first look at COVID-19 information and misinformation sharing on twitter. 1–24 arxiv:​2003.​13907 (2020)
35.
go back to reference Sreenivasulu, M., Sridevi, M.: Classifying informative and non-informative tweets from the twitter by adapting image features during disaster. Multimedia Tools and Applications 79(3), 28901–28923 (2020) Sreenivasulu, M., Sridevi, M.: Classifying informative and non-informative tweets from the twitter by adapting image features during disaster. Multimedia Tools and Applications 79(3), 28901–28923 (2020)
36.
go back to reference Supriya, S., Siuly, S., Wang, H., Zhang, Y.: Automated epilepsy detection techniques from electroencephalogram signals: a review study. Health Information Science and Systems 8(1), 1–15 (2020) CrossRef Supriya, S., Siuly, S., Wang, H., Zhang, Y.: Automated epilepsy detection techniques from electroencephalogram signals: a review study. Health Information Science and Systems 8(1), 1–15 (2020) CrossRef
37.
go back to reference Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 5998–6008 (2017) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 5998–6008 (2017)
38.
go back to reference Wang, L., Yao, J., Tao, Y., Zhong, L., Liu, W., Du, Q.: A reinforced topic-aware convolutional sequence-to-sequence model for abstractive text summarization. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 4453–4460 (2018) Wang, L., Yao, J., Tao, Y., Zhong, L., Liu, W., Du, Q.: A reinforced topic-aware convolutional sequence-to-sequence model for abstractive text summarization. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 4453–4460 (2018)
39.
go back to reference Yang, Y., Guan, Z., Li, J., Zhao, W., Cui, J., Wang, Q.: Interpretable and efficient heterogeneous graph convolutional network. IEEE Transactions on Knowledge and Data Engineering, 1–14 (2021) Yang, Y., Guan, Z., Li, J., Zhao, W., Cui, J., Wang, Q.: Interpretable and efficient heterogeneous graph convolutional network. IEEE Transactions on Knowledge and Data Engineering, 1–14 (2021)
40.
go back to reference Yin, J., Tang, M., Cao, J., Wang, H., You, M., Lin, Y.: Vulnerability exploitation time prediction: an integrated framework for dynamic imbalanced learning. World Wide Web 25, 401–423 (2022)CrossRef Yin, J., Tang, M., Cao, J., Wang, H., You, M., Lin, Y.: Vulnerability exploitation time prediction: an integrated framework for dynamic imbalanced learning. World Wide Web 25, 401–423 (2022)CrossRef
41.
go back to reference Yin, H., Yang, S., Song, X., Liu, W., Li, J.: Deep fusion of multimodal features for social media retweet time prediction. World Wide Web 24(4), 1027–1044 (2021)CrossRef Yin, H., Yang, S., Song, X., Liu, W., Li, J.: Deep fusion of multimodal features for social media retweet time prediction. World Wide Web 24(4), 1027–1044 (2021)CrossRef
42.
go back to reference Zahera, H.M., Elgendy, I.A., Jalota, R., Sherif, M.A.: Fine-tuned BERT model for multi-label tweets classification. In: Proceedings of the Text Retrieval Conference (TREC), pp. 1–7 (2019) Zahera, H.M., Elgendy, I.A., Jalota, R., Sherif, M.A.: Fine-tuned BERT model for multi-label tweets classification. In: Proceedings of the Text Retrieval Conference (TREC), pp. 1–7 (2019)
43.
go back to reference Zeng, J., Li, J., Song, Y., Gao, C., Lyu, M.R., King, I.: Topic memory networks for short text classification. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP), pp. 3120–3131 (2018) Zeng, J., Li, J., Song, Y., Gao, C., Lyu, M.R., King, I.: Topic memory networks for short text classification. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP), pp. 3120–3131 (2018)
Metadata
Title
Identifying informative tweets during a pandemic via a topic-aware neural language model
Authors
Wang Gao
Lin Li
Xiaohui Tao
Jing Zhou
Jun Tao
Publication date
16-03-2022
Publisher
Springer US
Published in
World Wide Web / Issue 1/2023
Print ISSN: 1386-145X
Electronic ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-022-01034-1

Other articles of this Issue 1/2023

World Wide Web 1/2023 Go to the issue

Decision Making in Heterogeneous Network Data Scenarios and Applications

Attention-based hierarchical denoised deep clustering network

Premium Partner