Skip to main content
main-content
Top

Hint

Swipe to navigate through the chapters of this book

2021 | OriginalPaper | Chapter

Token-Level Multilingual Epidemic Dataset for Event Extraction

Authors : Stephen Mutuvi, Emanuela Boros, Antoine Doucet, Gaël Lejeune, Adam Jatowt, Moses Odeo

Published in: Linking Theory and Practice of Digital Libraries

Publisher: Springer International Publishing

share
SHARE

Abstract

In this paper, we present a dataset and a baseline evaluation for multilingual epidemic event extraction. We experiment with a multilingual news dataset which we annotate at the token level, a common tagging scheme utilized in event extraction systems. We approach the task of extracting epidemic events by first detecting the relevant documents from a large collection of news reports. Then, event extraction (disease names and locations) is performed on the detected relevant documents. Preliminary experiments with the entire dataset and with ground-truth relevant documents showed promising results, while also establishing a stronger baseline for epidemiological event extraction.
Literature
1.
go back to reference Aiello, A.E., Renson, A., Zivich, P.N.: Social media-and internet-based disease surveillance for public health. Annu. Rev. Public Health 41, 101–118 (2020) CrossRef Aiello, A.E., Renson, A., Zivich, P.N.: Social media-and internet-based disease surveillance for public health. Annu. Rev. Public Health 41, 101–118 (2020) CrossRef
2.
go back to reference Brixtel, R., Lejeune, G., Doucet, A., Lucas, N.: Any language early detection of epidemic diseases from web news streams. In: 2013 IEEE International Conference on Healthcare Informatics, pp. 159–168. IEEE (2013) Brixtel, R., Lejeune, G., Doucet, A., Lucas, N.: Any language early detection of epidemic diseases from web news streams. In: 2013 IEEE International Conference on Healthcare Informatics, pp. 159–168. IEEE (2013)
4.
go back to reference Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960) CrossRef Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960) CrossRef
6.
go back to reference Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis (2019). https://​doi.​org/​10.​18653/​v1/​N19-1423 Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis (2019). https://​doi.​org/​10.​18653/​v1/​N19-1423
7.
go back to reference Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2016) Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2016)
8.
go back to reference Lampos, V., Zou, B., Cox, I.J.: Enhancing feature selection using word embeddings: The case of flu surveillance. In: Proceedings of the 26th International Conference on World Wide Web, pp. 695–704 (2017) Lampos, V., Zou, B., Cox, I.J.: Enhancing feature selection using word embeddings: The case of flu surveillance. In: Proceedings of the 26th International Conference on World Wide Web, pp. 695–704 (2017)
11.
go back to reference Ng, V., Rees, E.E., Niu, J., Zaghool, A., Ghiasbeglou, H., Verster, A.: Application of natural language processing algorithms for extracting information from news articles in event-based surveillance. Can. Commun. Dis. Rep. 46(6), 186–191 (2020) CrossRef Ng, V., Rees, E.E., Niu, J., Zaghool, A., Ghiasbeglou, H., Verster, A.: Application of natural language processing algorithms for extracting information from news articles in event-based surveillance. Can. Commun. Dis. Rep. 46(6), 186–191 (2020) CrossRef
12.
go back to reference Wang, C.K., Singh, O., Tang, Z.L., Dai, H.J.: Using a recurrent neural network model for classification of tweets conveyed influenza-related information. In: Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017), pp. 33–38 (2017) Wang, C.K., Singh, O., Tang, Z.L., Dai, H.J.: Using a recurrent neural network model for classification of tweets conveyed influenza-related information. In: Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017), pp. 33–38 (2017)
Metadata
Title
Token-Level Multilingual Epidemic Dataset for Event Extraction
Authors
Stephen Mutuvi
Emanuela Boros
Antoine Doucet
Gaël Lejeune
Adam Jatowt
Moses Odeo
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-86324-1_6

Premium Partner