skip to main content
research-article

Nowcasting Events from the Social Web with Statistical Learning

Published:01 September 2012Publication History
Skip Abstract Section

Abstract

We present a general methodology for inferring the occurrence and magnitude of an event or phenomenon by exploring the rich amount of unstructured textual information on the social part of the Web. Having geo-tagged user posts on the microblogging service of Twitter as our input data, we investigate two case studies. The first consists of a benchmark problem, where actual levels of rainfall in a given location and time are inferred from the content of tweets. The second one is a real-life task, where we infer regional Influenza-like Illness rates in the effort of detecting timely an emerging epidemic disease. Our analysis builds on a statistical learning framework, which performs sparse learning via the bootstrapped version of LASSO to select a consistent subset of textual features from a large amount of candidates. In both case studies, selected features indicate close semantic correlation with the target topics and inference, conducted by regression, has a significant performance, especially given the short length --approximately one year-- of Twitter’s data time series.

References

  1. Asur, S. and Huberman, B. A. 2010. Predicting the future with social media. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. IEEE, 492--499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bach, F. R. 2008. Bolasso: Model consistent Lasso estimation through the bootstrap. In Proceedings of the 25th International Conference on Machine Learning. 33--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bartlett, P. L., Mendelson, S., and Neeman, J. 2009. l1-regularized linear regression: Persistence and oracle inequalities. Tech. rep., UC-Berkeley.Google ScholarGoogle Scholar
  4. Bollen, J., Mao, H., and Zeng, X. 2011. Twitter mood predicts the stock market. J. Comput. Sci.Google ScholarGoogle ScholarCross RefCross Ref
  5. Breiman, L. 1996. Bagging predictors. Mach. Learn. 24, 2, 123--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Corley, C. D., Mikler, A. R., Singh, K. P., and Cook, D. J. 2009. Monitoring influenza trends through mining social media. In Proceedings of the International Conference on Bioinformatics and Computational Biology. 340--346.Google ScholarGoogle Scholar
  7. Culotta, A. 2010. Towards detecting influenza epidemics by analyzing Twitter messages. In Proceedings of the KDD Workshop on Social Media Analytics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Efron, B. 1979. Bootstrap methods: Another look at the jackknife. Ann. Statist. 7, 1, 1--26.Google ScholarGoogle ScholarCross RefCross Ref
  9. Efron, B. and Tibshirani, R. J. 1993. An Introduction to the Bootstrap. Chapman & Hall.Google ScholarGoogle Scholar
  10. Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. 2004. Least angle regression. Ann. Statist. 32, 2, 407--451.Google ScholarGoogle ScholarCross RefCross Ref
  11. Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., and Brilliant, L. 2008. Detecting influenza epidemics using search engine query data. Nature 457, 7232, 1012--1014.Google ScholarGoogle Scholar
  12. Guyon, I. and Elisseeff, A. 2003. An introduction to variable and feature selection. J. Mach. Learn. Resear. 3, 7--8, 1157--1182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jenkins, G. J., Perry, M. C., and Prior, M. J. 2008. The Climate of the United Kingdom and Recent Trends. Met Office, Hadley Centre, Exeter, UK.Google ScholarGoogle Scholar
  14. Lampos, V. and Cristianini, N. 2010. Tracking the flu pandemic by monitoring the Social Web. In Proceedings of the 2nd IAPR Workshop on Cognitive Information Processing. IEEE Press, 411--416.Google ScholarGoogle Scholar
  15. Lampos, V., De Bie, T., and Cristianini, N. 2010. Flu detector---Tracking epidemics on Twitter. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Springer, 599--602. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Lv, J. and Fan, Y. 2009. A unified approach to model selection and sparse recovery using regularized least squares. Ann. Statist. 37, 6A, 3498--3528.Google ScholarGoogle ScholarCross RefCross Ref
  17. Manning, C. D., Raghavan, P., and Schütze, H. 2008. Introduction to Information Retrieval. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Pang, B. and Lee, L. 2008. Opinion mining and sentiment analysis. Found. Trends Inf. Retriev. 2, 1--2, 1--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Polgreen, P. M., Chen, Y., Pennock, D. M., Nelson, F. D., and Weinstein, R. A. 2008. Using internet searches for influenza surveillance. Clinical Infectious Diseases 47, 11, 1443--1448.Google ScholarGoogle ScholarCross RefCross Ref
  20. Porter, M. F. 1980. An algorithm for suffix stripping. Program 14, 3, 130--137.Google ScholarGoogle ScholarCross RefCross Ref
  21. Sakaki, T., Okazaki, M., and Matsuo, Y. 2010. Earthquake shakes Twitter users: Real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web. 851--860. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. J. Royal Statist. Soc. Series B (Methodological) 58, 1, 267--288.Google ScholarGoogle ScholarCross RefCross Ref
  23. Tumasjan, A., Sprenger, T. O., Sandner, P. G., and Welpe, I. M. 2010. Predicting elections with Twitter: What 140 characters reveal about political sentiment. In Proceedings of the International AAAI Conference on Weblogs and Social Media. 178--185.Google ScholarGoogle Scholar
  24. Zhao, P. and Yu, B. 2006. On model selection consistency of Lasso. J. Mach. Learn. Resear. 7, 11, 2541--2563. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Nowcasting Events from the Social Web with Statistical Learning

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Intelligent Systems and Technology
            ACM Transactions on Intelligent Systems and Technology  Volume 3, Issue 4
            September 2012
            410 pages
            ISSN:2157-6904
            EISSN:2157-6912
            DOI:10.1145/2337542
            Issue’s Table of Contents

            Copyright © 2012 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 September 2012
            • Accepted: 1 September 2011
            • Revised: 1 August 2011
            • Received: 1 April 2011
            Published in tist Volume 3, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader