Abstract
We present a general methodology for inferring the occurrence and magnitude of an event or phenomenon by exploring the rich amount of unstructured textual information on the social part of the Web. Having geo-tagged user posts on the microblogging service of Twitter as our input data, we investigate two case studies. The first consists of a benchmark problem, where actual levels of rainfall in a given location and time are inferred from the content of tweets. The second one is a real-life task, where we infer regional Influenza-like Illness rates in the effort of detecting timely an emerging epidemic disease. Our analysis builds on a statistical learning framework, which performs sparse learning via the bootstrapped version of LASSO to select a consistent subset of textual features from a large amount of candidates. In both case studies, selected features indicate close semantic correlation with the target topics and inference, conducted by regression, has a significant performance, especially given the short length --approximately one year-- of Twitter’s data time series.
- Asur, S. and Huberman, B. A. 2010. Predicting the future with social media. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. IEEE, 492--499. Google ScholarDigital Library
- Bach, F. R. 2008. Bolasso: Model consistent Lasso estimation through the bootstrap. In Proceedings of the 25th International Conference on Machine Learning. 33--40. Google ScholarDigital Library
- Bartlett, P. L., Mendelson, S., and Neeman, J. 2009. l1-regularized linear regression: Persistence and oracle inequalities. Tech. rep., UC-Berkeley.Google Scholar
- Bollen, J., Mao, H., and Zeng, X. 2011. Twitter mood predicts the stock market. J. Comput. Sci.Google ScholarCross Ref
- Breiman, L. 1996. Bagging predictors. Mach. Learn. 24, 2, 123--140. Google ScholarDigital Library
- Corley, C. D., Mikler, A. R., Singh, K. P., and Cook, D. J. 2009. Monitoring influenza trends through mining social media. In Proceedings of the International Conference on Bioinformatics and Computational Biology. 340--346.Google Scholar
- Culotta, A. 2010. Towards detecting influenza epidemics by analyzing Twitter messages. In Proceedings of the KDD Workshop on Social Media Analytics. Google ScholarDigital Library
- Efron, B. 1979. Bootstrap methods: Another look at the jackknife. Ann. Statist. 7, 1, 1--26.Google ScholarCross Ref
- Efron, B. and Tibshirani, R. J. 1993. An Introduction to the Bootstrap. Chapman & Hall.Google Scholar
- Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. 2004. Least angle regression. Ann. Statist. 32, 2, 407--451.Google ScholarCross Ref
- Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., and Brilliant, L. 2008. Detecting influenza epidemics using search engine query data. Nature 457, 7232, 1012--1014.Google Scholar
- Guyon, I. and Elisseeff, A. 2003. An introduction to variable and feature selection. J. Mach. Learn. Resear. 3, 7--8, 1157--1182. Google ScholarDigital Library
- Jenkins, G. J., Perry, M. C., and Prior, M. J. 2008. The Climate of the United Kingdom and Recent Trends. Met Office, Hadley Centre, Exeter, UK.Google Scholar
- Lampos, V. and Cristianini, N. 2010. Tracking the flu pandemic by monitoring the Social Web. In Proceedings of the 2nd IAPR Workshop on Cognitive Information Processing. IEEE Press, 411--416.Google Scholar
- Lampos, V., De Bie, T., and Cristianini, N. 2010. Flu detector---Tracking epidemics on Twitter. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Springer, 599--602. Google ScholarDigital Library
- Lv, J. and Fan, Y. 2009. A unified approach to model selection and sparse recovery using regularized least squares. Ann. Statist. 37, 6A, 3498--3528.Google ScholarCross Ref
- Manning, C. D., Raghavan, P., and Schütze, H. 2008. Introduction to Information Retrieval. Cambridge University Press. Google ScholarDigital Library
- Pang, B. and Lee, L. 2008. Opinion mining and sentiment analysis. Found. Trends Inf. Retriev. 2, 1--2, 1--135. Google ScholarDigital Library
- Polgreen, P. M., Chen, Y., Pennock, D. M., Nelson, F. D., and Weinstein, R. A. 2008. Using internet searches for influenza surveillance. Clinical Infectious Diseases 47, 11, 1443--1448.Google ScholarCross Ref
- Porter, M. F. 1980. An algorithm for suffix stripping. Program 14, 3, 130--137.Google ScholarCross Ref
- Sakaki, T., Okazaki, M., and Matsuo, Y. 2010. Earthquake shakes Twitter users: Real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web. 851--860. Google ScholarDigital Library
- Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. J. Royal Statist. Soc. Series B (Methodological) 58, 1, 267--288.Google ScholarCross Ref
- Tumasjan, A., Sprenger, T. O., Sandner, P. G., and Welpe, I. M. 2010. Predicting elections with Twitter: What 140 characters reveal about political sentiment. In Proceedings of the International AAAI Conference on Weblogs and Social Media. 178--185.Google Scholar
- Zhao, P. and Yu, B. 2006. On model selection consistency of Lasso. J. Mach. Learn. Resear. 7, 11, 2541--2563. Google ScholarDigital Library
Index Terms
- Nowcasting Events from the Social Web with Statistical Learning
Recommendations
Supporting temporal analytics for health-related events in microblogs
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementMicroblogging services, such as Twitter, are gaining interests as a means of sharing information in social networks. Numerous works have shown the potential of using Twitter posts (or tweets) in order to infer the existence and magnitude of real-world ...
Identification of live news events using Twitter
LBSN '11: Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Location-Based Social NetworksTwitter presents a source of information that cannot easily be obtained anywhere else. However, though many posts on Twitter reveal up-to-the-minute information about events in the world or interesting sentiments, far more posts are of no interest to ...
Feature Extraction and Analysis for Identifying Disruptive Events from Social Media
ASONAM '15: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015Disruptive event identification is a concept that is crucial to ensuring public safety regarding large-scale events. Recent work on detecting events from social media shows that although these platforms are used for social purposes, they have been ...
Comments