Skip to main content

Advertisement

Log in

Chinese social media analysis for disease surveillance

  • Original Article
  • Published:
Personal and Ubiquitous Computing Aims and scope Submit manuscript

Abstract

It is reported that there are hundreds of thousands of deaths caused by seasonal flu all around the world every year. More other diseases such as chickenpox, malaria, etc. are also serious threats to people’s physical and mental health. There are 250,000–500,000 deaths every year around the world. Therefore proper techniques for disease surveillance are highly demanded. Recently, social media analysis is regarded as an efficient way to achieve this goal, which is feasible since growing number of people have been posting their health information on social media such as blogs, personal websites, etc. Previous work on social media analysis mainly focused on English materials but hardly considered Chinese materials, which hinders the application of such technique to Chinese people. In this paper, we proposed a new method of Chinese social media analysis for disease surveillance. More specifically, we compared different kinds of methods in the process of classification and then proposed a new way to process Chinese text data. The Chinese Sina micro-blog data collected from September to December 2013 are used to validate the effectiveness of the proposed method. The results show that a high classification precision of 87.49 % in average has been obtained. Comparing with the data from the authority, Chinese National Influenza Center, we can predict the outbreak time of flu 5 days earlier.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. IResearch (2010) In 2010 the global Internet users spend most of their time in social media. http://service.iresearch.cn/others//20101129/128573.shtml

  2. Infographic (2012) The growing impact of social media. http://www.sociallyawareblog.com/2012/11/21/time-americans-spend-per-month-on-social-media-sites/

  3. Collier N, Son NT, Nguyen NM (2011) OMG u got flu? Analysis of shared health messages for bio-surveillance. J. Biomed Semant 2(S–5):S9

    Article  Google Scholar 

  4. Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L (2008) Detecting influenza epidemics using search engine query data. Nature 457(7232):1012–1014

    Article  Google Scholar 

  5. Mangold WG, Faulds DJ (2009) Social media: the new hybrid element of the promotion mix. Bus Horiz 52(4):357–365

    Article  Google Scholar 

  6. Kamel Boulos MN, Sanfilippo AP, Corley CD, Wheeler S (2010) Social web mining and exploitation for serious applications. Technosocial predictive analytics and related technologies for public health, environmental and national security surveillance. Comput Methods Programs Biomed 100(1):16–23

    Article  Google Scholar 

  7. Lampos V, De Bie T, Cristianini N (2010) Flu detector-tracking epidemics on twitter. In: European conference on machine learning and principles and practice of knowledge discovery in databases (ECML PKDD 2010), Barcelona, Spain, pp 599–602

  8. Freifeld CC, Chunara R, Mekaru SR, Chan EH, Kass-Hout T, Iacucci AA, Brownstein JS (2010) Participatory epidemiology: use of mobile phones for community-based health reporting. PLoS Med 7(12):e1000376

    Article  Google Scholar 

  9. Sadilek A, Kautz HA, Silenzio (2012a) Predicting disease transmission from geo-tagged micro-blog data. In: Twenty-sixth AAAI conference on artificial intelligence

  10. Sadilek A, Kautz H, Silenzio V (2012b) Dublin: modeling spread of disease from social interactions. In: Proceedings of sixth AAAI international conference on weblogs and social media (ICWSM)

  11. Kaundal R, Kapoor AS, Raghava GP (2006) Machine learning techniques in disease forecasting: a case study on rice blast prediction. BMC Bioinform 7(1):485

    Article  Google Scholar 

  12. Jin X, Gallagher A, Cao L, Luo J, Han J (2010) The wisdom of social multimedia: using flickr for prediction and forecast. In: Proceedings of the international conference on multimedia. ACM, pp 1235–1244

  13. Zheng-yan C (2010) Short message classification of microblogging based on semantic. Mod Comput 8:006

    Google Scholar 

  14. Yang F, Liu Y, Yu X, Yang M (2012) Automatic detection of rumor on sina weibo. In: Proceedings of the ACM SIGKDD workshop on mining data semantics. ACM, p 13

  15. Bao M, Yang N, Zhou L, Lao Y, Zhang Y, Tian Y (2013) The spatial analysis of weibo check-in data–the case study of wuhan. In: Geo-informatics in resource management and sustainable ecosystem. Springer, Berlin, pp 480–491

  16. Sun Y, Yan H, Lu C, Bie R, Zhou Z (2014) Constructing the web of events from raw data in the web of things. Mob Inf Syst 10(1):105–125

    Google Scholar 

  17. Ritchie M, Charlish A, Woodbridge K, Stove A (2011) Use of the Kullback–Leibler divergence in estimating clutter distributions. In: 2011 IEEE on radar conference (RADAR). IEEE, pp 751–756

  18. Amati G, Van Rijsbergen CJ (2002) Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans Inf Syst (TOIS) 20(4):357–389

    Article  Google Scholar 

  19. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523

    Article  Google Scholar 

  20. Liu J, Li B, Zhang W-S (2012) Feature extraction using maximum variance sparse mapping. Neural Comput Appl 21(8):1827–1833

    Article  Google Scholar 

  21. Deng S, Xu Y, Li L, Li X, He Y (2013) A feature-selection algorithm based on support vector machine-multiclass for hyperspectral visible spectral analysis. J Food Eng 119(1):159–166

    Article  Google Scholar 

  22. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  23. Joachims T (1999) Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A (eds) Advances in kernel methods—support vector learning. MIT-Press, pp 41–56

  24. Chang C-C, Lin C-J (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27

    Google Scholar 

  25. Yang N, Li S, Liu J, Bian F (2014) Sensitivity of support vector machine classification to various training features. TELKOMNIKA Indones J Electr Eng 12(1):286–291

    Google Scholar 

  26. Han E-HS, Karypis G, Kumar V (2001) Text categorization using weight adjusted k-nearest neighbor classification. Springer, Berlin

    Book  Google Scholar 

Download references

Acknowledgments

This research is supported in part by National Nature Science Foundation of China No. 61440054, Fundamental Research Funds for the Central Universities of China No. 216-274213, and Nature Science Foundation of Hubei, China No. 2014CFA048. Outstanding Academic Talents Startup Funds of Wuhan University, No. 216-410100003 and 216-410100004.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaohui Cui.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cui, X., Yang, N., Wang, Z. et al. Chinese social media analysis for disease surveillance. Pers Ubiquit Comput 19, 1125–1132 (2015). https://doi.org/10.1007/s00779-015-0877-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00779-015-0877-5

Keywords

Navigation