Skip to main content
Top
Published in: International Journal on Digital Libraries 4/2021

09-10-2021

Unified approach to retrospective event detection for event- based epidemic intelligence

Author: Marco Fisichella

Published in: International Journal on Digital Libraries | Issue 4/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Inferring the magnitude and occurrence of real-world events from natural language text is a crucial task in various domains. Particularly in the domain of public health, the state-of-the-art document and token centric event detection approaches have not kept the pace with the growing need for more robust event detection in public health. In this paper, we propose UPHED, a unified approach, which combines both the document and token centric event detection techniques in an unsupervised manner such that events which are: rare (aperiodic); reoccurring (periodic) can be detected using a generative model for the domain of public health. We evaluate the efficiency of our approach as well as its effectiveness for two real-world case studies with respect to the quality of document clusters. Our results show that we are able to achieve a precision of 60% and a recall of 71% analyzed using manually annotated real-world data. Finally, we also make a comparative analysis of our work with the well-established rule-based system of MedISys and find that UPHED can be used in a cooperative way with MedISys to not only detect similar anomalies, but can also deliver more information about the specific outbreak of reported diseases.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Al Tamime, R., Giordano, R., Hall, W.: Observing burstiness in wikipedia articles during new disease outbreaks. In: Proceedings of the 10th ACM Conference on Web Science, WebSci ’18, pp. 117–126. Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3201064.3201080 Al Tamime, R., Giordano, R., Hall, W.: Observing burstiness in wikipedia articles during new disease outbreaks. In: Proceedings of the 10th ACM Conference on Web Science, WebSci ’18, pp. 117–126. Association for Computing Machinery, New York, NY, USA (2018). https://​doi.​org/​10.​1145/​3201064.​3201080
2.
3.
go back to reference Banerjee, A., Dhillon, I.S., Ghosh, J., Sra, S.: Generative model-based clustering of directional data. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (2003) Banerjee, A., Dhillon, I.S., Ghosh, J., Sra, S.: Generative model-based clustering of directional data. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (2003)
6.
go back to reference Ceroni, A., Fisichella, M.: Towards an entity-based automatic event validation. In: de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C., de Jong, F., Radinsky, K., Hofmann, K. (eds.) Advances in Information Retrieval, pp. 605–611. Springer, Cham (2014)CrossRef Ceroni, A., Fisichella, M.: Towards an entity-based automatic event validation. In: de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C., de Jong, F., Radinsky, K., Hofmann, K. (eds.) Advances in Information Retrieval, pp. 605–611. Springer, Cham (2014)CrossRef
7.
go back to reference Ceroni, A., Gadiraju, U., Fisichella, M.: Justevents: a crowdsourced corpus for event validation with strict temporal constraints. In: Jose, J.M., Hauff, C., Altıngovde, I.S., Song, D., Albakour, D., Watt, S., Tait, J. (eds.) Advances in Information Retrieval, pp. 484–492. Springer, Cham (2017)CrossRef Ceroni, A., Gadiraju, U., Fisichella, M.: Justevents: a crowdsourced corpus for event validation with strict temporal constraints. In: Jose, J.M., Hauff, C., Altıngovde, I.S., Song, D., Albakour, D., Watt, S., Tait, J. (eds.) Advances in Information Retrieval, pp. 484–492. Springer, Cham (2017)CrossRef
8.
go back to reference Ceroni, A., Gadiraju, U., Matschke, J., Wingert, S., Fisichella, M.: Where the event lies: predicting event occurrence in textual documents. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’16, p. 1157–1160. Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2911451.2911452 Ceroni, A., Gadiraju, U., Matschke, J., Wingert, S., Fisichella, M.: Where the event lies: predicting event occurrence in textual documents. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’16, p. 1157–1160. Association for Computing Machinery, New York, NY, USA (2016). https://​doi.​org/​10.​1145/​2911451.​2911452
9.
go back to reference Ceroni, A., Gadiraju, U.K., Fisichella, M.: Improving event detection by automatically assessing validity of event occurrence in text. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM ’15, pp. 1815–1818. Association for Computing Machinery, New York, NY, USA (2015). https://doi.org/10.1145/2806416.2806624 Ceroni, A., Gadiraju, U.K., Fisichella, M.: Improving event detection by automatically assessing validity of event occurrence in text. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM ’15, pp. 1815–1818. Association for Computing Machinery, New York, NY, USA (2015). https://​doi.​org/​10.​1145/​2806416.​2806624
11.
go back to reference Cinti, S., Huff, A.G., Breit, N., Allen, T., Whiting, K., Kiley, C.: Evaluation and verification of the global rapid identification of threats system for infectious diseases in textual data sources. Interdiscip. Perspect. Infect. Dis. 2016, 5080746 (2016). https://doi.org/10.1155/2016/5080746CrossRef Cinti, S., Huff, A.G., Breit, N., Allen, T., Whiting, K., Kiley, C.: Evaluation and verification of the global rapid identification of threats system for infectious diseases in textual data sources. Interdiscip. Perspect. Infect. Dis. 2016, 5080746 (2016). https://​doi.​org/​10.​1155/​2016/​5080746CrossRef
12.
go back to reference Conway, M., Collier, N., Doan, S.: Using hedges to enhance a disease outbreak report text mining system. In: BioNLP ’09: Proceedings of the Workshop on BioNLP, pp. 142–143. Association for Computational Linguistics, Morristown, NJ, USA (2009) Conway, M., Collier, N., Doan, S.: Using hedges to enhance a disease outbreak report text mining system. In: BioNLP ’09: Proceedings of the Workshop on BioNLP, pp. 142–143. Association for Computational Linguistics, Morristown, NJ, USA (2009)
13.
go back to reference Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. 39(1), 1–38 (1977)MathSciNetMATH Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. 39(1), 1–38 (1977)MathSciNetMATH
17.
go back to reference Fisichella, M., Stewart, A., Cuzzocrea, A., Denecke, K.: Detecting health events on the social web to enable epidemic intelligence. In: SPIRE, pp. 87–103 (2011) Fisichella, M., Stewart, A., Cuzzocrea, A., Denecke, K.: Detecting health events on the social web to enable epidemic intelligence. In: SPIRE, pp. 87–103 (2011)
18.
go back to reference Fung, G.P.C., Yu, J.X., Yu, P.S., Lu, H.: Parameter free bursty events detection in text streams. In: VLDB ’05: Proceedings of the 31st international conference on Very large data bases, pp. 181–192. VLDB Endowment (2005) Fung, G.P.C., Yu, J.X., Yu, P.S., Lu, H.: Parameter free bursty events detection in text streams. In: VLDB ’05: Proceedings of the 31st international conference on Very large data bases, pp. 181–192. VLDB Endowment (2005)
19.
go back to reference Hartley, D., et al.: The landscape of international event-based biosurveillance. Emerg. Health Threats 3, 7096 (2010)CrossRef Hartley, D., et al.: The landscape of international event-based biosurveillance. Emerg. Health Threats 3, 7096 (2010)CrossRef
20.
go back to reference He, Q., Chang, K., Lim, E.P.: Analyzing feature trajectories for event detection. In: SIGIR, pp. 207–214 (2007) He, Q., Chang, K., Lim, E.P.: Analyzing feature trajectories for event detection. In: SIGIR, pp. 207–214 (2007)
21.
go back to reference He, Q., Chang, K., Lim, E.P.: Using burstiness to improve clustering of topics in news streams. In: Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, ICDM ’07, pp. 493–498. IEEE Computer Society, Washington, DC, USA (2007). https://doi.org/10.1109/ICDM.2007.17 He, Q., Chang, K., Lim, E.P.: Using burstiness to improve clustering of topics in news streams. In: Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, ICDM ’07, pp. 493–498. IEEE Computer Society, Washington, DC, USA (2007). https://​doi.​org/​10.​1109/​ICDM.​2007.​17
23.
go back to reference He, Q., Chang, K., Lim, E.P., Zhang, J.: Bursty feature representation for clustering text streams. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 491–496 (2007) He, Q., Chang, K., Lim, E.P., Zhang, J.: Bursty feature representation for clustering text streams. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 491–496 (2007)
24.
go back to reference Hoffart, J., Suchanek, F., Berberich, K., Weikum, G.: Yago2: a spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 194, 28–61 (2012)MathSciNetCrossRef Hoffart, J., Suchanek, F., Berberich, K., Weikum, G.: Yago2: a spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 194, 28–61 (2012)MathSciNetCrossRef
25.
go back to reference Hofmann, T.: Probabilistic latent semantic analysis. In: UAI, pp. 289–296 (1999) Hofmann, T.: Probabilistic latent semantic analysis. In: UAI, pp. 289–296 (1999)
26.
go back to reference Keller, M., Blench, M., Tolentino, H., et al.: Use of unstructured event-based reports for global infectious disease surveillance. Emerg. Infect. Dis. 15(5), 689 (2009)CrossRef Keller, M., Blench, M., Tolentino, H., et al.: Use of unstructured event-based reports for global infectious disease surveillance. Emerg. Infect. Dis. 15(5), 689 (2009)CrossRef
27.
go back to reference Kuzey, E., Vreeken, J., Weikum, G.: A fresh look on knowledge bases: distilling named events from news. In: CIKM ’14 (2014) Kuzey, E., Vreeken, J., Weikum, G.: A fresh look on knowledge bases: distilling named events from news. In: CIKM ’14 (2014)
28.
go back to reference Lejeune, G., Brixtel, R., Doucet, A., Lucas, N.: Daniel: Language independent character-based news surveillance. In: Isahara, H., Kanzaki, K. (eds.) Advances in Natural Language Processing, pp. 64–75. Springer, Berlin (2012)CrossRef Lejeune, G., Brixtel, R., Doucet, A., Lucas, N.: Daniel: Language independent character-based news surveillance. In: Isahara, H., Kanzaki, K. (eds.) Advances in Natural Language Processing, pp. 64–75. Springer, Berlin (2012)CrossRef
30.
go back to reference Li, Z., Wang, B., Li, M., Ma, W.Y.: A probabilistic model for retrospective news event detection. In: SIGIR (2005) Li, Z., Wang, B., Li, M., Ma, W.Y.: A probabilistic model for retrospective news event detection. In: SIGIR (2005)
31.
go back to reference Linge, J., Steinberger, R., Fuart, F., Bucci, S., Belyaeva, J., Gemo, M.: Medisys: medical information system. In: Asimakopoulou, Eleana, Bessis, Nik (eds.) Advanced ICTs for Disaster Management and Threat Detection: Collaborative and Distributed Frameworks, pp. 131–142. IGI Global, Hershey (2010)CrossRef Linge, J., Steinberger, R., Fuart, F., Bucci, S., Belyaeva, J., Gemo, M.: Medisys: medical information system. In: Asimakopoulou, Eleana, Bessis, Nik (eds.) Advanced ICTs for Disaster Management and Threat Detection: Collaborative and Distributed Frameworks, pp. 131–142. IGI Global, Hershey (2010)CrossRef
32.
go back to reference Linge, J.P., Mantero, J., Fuart, F., Belyaeva, J., Atkinson, M., van der Goot, E.: Tracking media reports on the shiga toxin-producing Escherichia coli. In: In Proceedings of the Electronic Healthcare International Conference (eHealth). Springer (2011) Linge, J.P., Mantero, J., Fuart, F., Belyaeva, J., Atkinson, M., van der Goot, E.: Tracking media reports on the shiga toxin-producing Escherichia coli. In: In Proceedings of the Electronic Healthcare International Conference (eHealth). Springer (2011)
33.
go back to reference Mutuvi, S., Boros, E., Doucet, A., Jatowt, A., Lejeune, G., Odeo, M.: Multilingual epidemiological text classification: a comparative study. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 6172–6183. International Committee on Computational Linguistics, Barcelona, Spain (2020). https://doi.org/10.18653/v1/2020.coling-main.543 Mutuvi, S., Boros, E., Doucet, A., Jatowt, A., Lejeune, G., Odeo, M.: Multilingual epidemiological text classification: a comparative study. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 6172–6183. International Committee on Computational Linguistics, Barcelona, Spain (2020). https://​doi.​org/​10.​18653/​v1/​2020.​coling-main.​543
35.
go back to reference Paquet, C., Coulombier, D., Kaiser, R., Ciotti, M.: Epidemic intelligence: a new framework for strengthening disease surveillance in Europe. Euro Surveill. 11(12), 212–214 (2006)CrossRef Paquet, C., Coulombier, D., Kaiser, R., Ciotti, M.: Epidemic intelligence: a new framework for strengthening disease surveillance in Europe. Euro Surveill. 11(12), 212–214 (2006)CrossRef
36.
go back to reference Paul, M.J., Dredze, M.: You are what you tweet: analyzing twitter for public health. Artif. Intell. I, 265–272 (2011) Paul, M.J., Dredze, M.: You are what you tweet: analyzing twitter for public health. Artif. Intell. I, 265–272 (2011)
37.
go back to reference Rao, D., Paul, M., Fink, C., Yarowsky, D., Oates, T., Coppersmith, G.: Hierarchical Bayesian models for latent attribute detection in social media. In: ICWSM (2011) Rao, D., Paul, M., Fink, C., Yarowsky, D., Oates, T., Coppersmith, G.: Hierarchical Bayesian models for latent attribute detection in social media. In: ICWSM (2011)
39.
go back to reference Steinberger, R., Fuart, F., van der Groot, E., Best, C., von Etter, P., Yangarber, R.: Text mining from the web for medical intelligence. Min. Massive Data Sets Secur. 19, 295–310 (2008) Steinberger, R., Fuart, F., van der Groot, E., Best, C., von Etter, P., Yangarber, R.: Text mining from the web for medical intelligence. Min. Massive Data Sets Secur. 19, 295–310 (2008)
40.
go back to reference Stewart, A., Fisichella, M., Denecke, K.: Detecting public health indicators from the web for epidemic intelligence. In: eHealth, pp. 10–17 (2010) Stewart, A., Fisichella, M., Denecke, K.: Detecting public health indicators from the web for epidemic intelligence. In: eHealth, pp. 10–17 (2010)
41.
go back to reference Stewart, A., Smith, M., Nejdl, W.: A transfer approach to detecting disease reporting events in blog social media. In: Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia, HT ’11, pp. 271–280. ACM, New York, NY, USA (2011). https://doi.org/10.1145/1995966.1996001 Stewart, A., Smith, M., Nejdl, W.: A transfer approach to detecting disease reporting events in blog social media. In: Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia, HT ’11, pp. 271–280. ACM, New York, NY, USA (2011). https://​doi.​org/​10.​1145/​1995966.​1996001
42.
go back to reference Steyvers, M., Griffiths, T.: Probabilistic Topic Models. Lawrence Erlbaum Associates, Mahwah (2007) Steyvers, M., Griffiths, T.: Probabilistic Topic Models. Lawrence Erlbaum Associates, Mahwah (2007)
44.
go back to reference Vlachos, M.: Identifying similarities, periodicities and bursts for online search queries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 131–142. ACM Press (2004) Vlachos, M.: Identifying similarities, periodicities and bursts for online search queries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 131–142. ACM Press (2004)
46.
go back to reference Yang, Y., Pierce, T., Carbonell, J.: A study of retrospective and on-line event detection. In: SIGIR ’98: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 28–36. ACM, New York, NY, USA (1998). https://doi.org/10.1145/290941.290953 Yang, Y., Pierce, T., Carbonell, J.: A study of retrospective and on-line event detection. In: SIGIR ’98: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 28–36. ACM, New York, NY, USA (1998). https://​doi.​org/​10.​1145/​290941.​290953
47.
go back to reference Yangarber, R.: Verification of facts across document boundaries. In: Proceedings International Workshop on Intelligent Information Access (2006) Yangarber, R.: Verification of facts across document boundaries. In: Proceedings International Workshop on Intelligent Information Access (2006)
50.
go back to reference Zhang, Y.: Automatic extraction of outbreak information from news. Ph.D. thesis, University of Illinois (2008) Zhang, Y.: Automatic extraction of outbreak information from news. Ph.D. thesis, University of Illinois (2008)
Metadata
Title
Unified approach to retrospective event detection for event- based epidemic intelligence
Author
Marco Fisichella
Publication date
09-10-2021
Publisher
Springer Berlin Heidelberg
Published in
International Journal on Digital Libraries / Issue 4/2021
Print ISSN: 1432-5012
Electronic ISSN: 1432-1300
DOI
https://doi.org/10.1007/s00799-021-00308-9

Premium Partner