ABSTRACT
With the widespread growth of various social network tools and platforms, analyzing and understanding societal response and crowd reaction to important and emerging social issues and events through social media data is increasingly an important problem. However, there are numerous challenges towards realizing this goal effectively and efficiently, due to the unstructured and noisy nature of social media data. The large volume of the underlying data also presents a fundamental challenge. Furthermore, in many application scenarios, it is often interesting, and in some cases critical, to discover patterns and trends based on geographical and/or temporal partitions, and keep track of how they will change overtime.
This brings up the interesting problem of spatio-temporal sentiment analysis from large-scale social media data. This paper investigates this problem through a data science project called "US Election 2016, What Twitter Says". The objective is to discover sentiment on Twitter towards either the democratic or the republican party at US county and state levels over any arbitrary temporal intervals, using a large collection of geotagged tweets from a period of 6 months leading up to the US Presidential Election in 2016. Our results demonstrate that by integrating and developing a combination of machine learning and data management techniques, it ispossible to do this at scale with effective outcomes. The results of our project have the potential to be adapted towards solving and influencing other interesting social issues such as building neighborhood happiness and health indicators.
Supplemental Material
- D. Agarwal and B.-C. Chen. flda: matrix factorization through latent dirichlet allocation. In WSDM, 2010.Google ScholarDigital Library
- L. AlSumait, D. Barbará, and C. Domeniconi. On-line lda: Adaptive topic models for mining text streams with applications to topic detection and tracking. In ICDM. IEEE, 2008. Google ScholarDigital Library
- A. Anandkumar, D. P. Foster, D. J. Hsu, S. M. Kakade, and Y.-K. Liu. A spectral algorithm for latent dirichlet allocation. In NIPS, 2012.Google ScholarDigital Library
- D. Anuta, J. Churchin, and J. Luo. Election bias: Comparing polls and twitter in the 2016 us election. arXiv:1701.06232, 2016.Google Scholar
- AOL. 2016 presidential election timeline, 2016. [accessed 08-02-2017].Google Scholar
- D. M. Blei. Probabilistic topic models. CACM, 55(4):77--84, 2012. Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3(Jan):993--1022, 2003.Google ScholarDigital Library
- P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606, 2016.Google Scholar
- A. Bovet, F. Morone, and H. A. Makse. Predicting election trends with twitter: Hillary clinton versus donald trump. arXiv:1610.01587, 2016.Google Scholar
- M. Cataldi, L. Di Caro, and C. Schifanella. Emerging topic detection on twitter based on temporal and social terms evaluation. In MDM/KDD, 2010. Google ScholarDigital Library
- G. Cormode and S. Muthukrishnan. An improved data stream summary: The count-min sketch and its applications. In LATIN, 2004.Google ScholarCross Ref
- C. N. Dos Santos and M. Gatti. Deep convolutional neural networks for sentiment analysis of short texts. In COLING, 2014.Google Scholar
- A. Duric and F. Song. Feature selection for sentiment analysis based on content and syntax models. Decision Support Systems, 53(4):704--711, 2011. Google ScholarDigital Library
- A. El-Kishky, Y. Song, C. Wang, C. R. Voss, and J. Han. Scalable topical phrase mining from text corpora. PVLDB, 8(3), 2014.Google Scholar
- A. Genkin, D. D. Lewis, and D. Madigan. Large-scale bayesian logistic regression for text categorization. Technometrics, 49(3), 2007. Google ScholarCross Ref
- A. Go, R. Bhayani, and L. Huang. Twitter sentiment classification using distant supervision. CS224N Project, Stanford, 1(12), 2009.Google Scholar
- F. Godin, V. Slavkovikj, W. De Neve, B. Schrauwen, and R. Van de Walle. Using topic models for twitter hashtag recommendation. In WWW, 2013. Google ScholarDigital Library
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735--1780, 1997. Google ScholarDigital Library
- T. Hofmann. Probabilistic latent semantic analysis. In Uncertainty in artificial intelligence, pages 289--296, 1999.Google ScholarDigital Library
- IETF. Rfc 7946 - the geojson format, 2017. [accessed 08-Feb-2017].Google Scholar
- L. Jiang, M. Yu, M. Zhou, X. Liu, and T. Zhao. Target-dependent twitter sentiment classification. In ACL HLT, pages 151--160.Google Scholar
- T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In ECML, 1998.Google ScholarDigital Library
- A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759, 2016.Google Scholar
- D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.Google Scholar
- J. Kleinberg. Bursty and hierarchical structure in streams. Data Mining and Knowledge Discovery, 7(4):373--397, 2003. Google ScholarDigital Library
- E. Kouloumpis, T. Wilson, and J. D. Moore. Twitter sentiment analysis: The good the bad and the omg! Icwsm, 11(538--541), 2011.Google Scholar
- S. Lai, L. Xu, K. Liu, and J. Zhao. Recurrent convolutional neural networks for text classification. In AAAI, volume 333, pages 2267--2273, 2015.Google ScholarDigital Library
- Q. Li, S. Shah, X. Liu, A. Nourbakhsh, and R. Fang. Tweetsift: Tweet topic classification based on entity knowledge base and topic enhanced word embedding. In CIKM, 2016.Google ScholarDigital Library
- R. Lu and Q. Yang. Trend analysis of news topics on twitter. IJMLC, 2(3), 2012.Google Scholar
- A. McCallum, K. Nigam, et al. A comparison of event models for naive bayes text classification. In AAAI, volume 752, pages 41--48, 1998.Google Scholar
- L. Medsker and L. C. Jain. Recurrent neural networks: design and applications. CRC press, 1999. Google ScholarCross Ref
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv:1301.3781, 2013.Google Scholar
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111--3119, 2013.Google ScholarDigital Library
- D.-P. Nguyen, R. Gravel, R. Trieschnigg, and T. Meder. "how old do you think i am?" a study of language and age in twitter. 2013.Google Scholar
- B. Pang, L. Lee, et al. Opinion mining and sentiment analysis. FTIR, 2(1--2):1--135, 2008.Google ScholarCross Ref
- PRC. Demographics of social media users in 2016, 2016. [accessed 08-Feb-2017].Google Scholar
- D. A. Shamma, L. Kennedy, and E. F. Churchill. Peaks and persistence: Modeling the shape of microblog conversations. In CSCW, 2011.Google Scholar
- M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede. Lexicon-based methods for sentiment analysis. Computational Linguistics, 37(2):267--307, 2011. Google ScholarDigital Library
- D. Tang, B. Qin, and T. Liu. Document modeling with gated recurrent neural network for sentiment classification. In EMNLP, pages 1422--1432, 2015. Google ScholarCross Ref
- N. Y. Times. Election 2016: Exit polls, 2016.Google Scholar
- S. Vosoughi, H. Zhou, and D. Roy. Enhanced twitter sentiment classification using contextual information. arXiv:1605.05195, 2016.Google Scholar
- C. Wang and D. M. Blei. Collaborative topic modeling for recommending scientific articles. In SIGKDD, 2011. Google ScholarDigital Library
- Z. Wei, G. Luo, K. Yi, X. Du, and J.-R. Wen. Persistent data sketching. In SIGMOD, 2015. Google ScholarDigital Library
- Wikipedia. Swift gamma-ray burst mission -- wikipedia, the free encyclopedia, 2016. [accessed 08-Feb-2017].Google Scholar
- D. Xie, F. Li, B. Yao, G. Li, L. Zhou, and M. Guo. Simba: Efficient in-memory spatial analytics. In SIGMOD, 2016.Google ScholarDigital Library
- W. Xie, F. Zhu, J. Jiang, E.-P. Lim, and K. Wang. Topicsketch: Real-time bursty topic detection from twitter. In ICDM, pages 837--846, 2013.Google ScholarCross Ref
- W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. Comparing twitter and traditional media using topic models. In ECIR, 2011. Google ScholarDigital Library
- C. Zhou, C. Sun, Z. Liu, and F. Lau. A c-lstm neural network for text classification. arXiv:1511.08630, 2015.Google Scholar
- Y. Zhu and D. Shasha. Efficient elastic burst detection in data streams. In SIGKDD, 2003. Google ScholarDigital Library
Index Terms
- Compass: Spatio Temporal Sentiment Analysis of US Election What Twitter Says!
Recommendations
On tweets, retweets, hashtags and user profiles in the 2016 American Presidential Election Scene
dg.o '17: Proceedings of the 18th Annual International Conference on Digital Government ResearchTwitter is a microblogging where users can publish short messages restricted to 140 characters. It has been used in the political scene from different perspectives. One of them is predicting election results. In this area, many researchers have drawn ...
The diffusion of misinformation on social media
This study examines dynamic communication processes of political misinformation on social media focusing on three components: the temporal pattern, content mutation, and sources of misinformation. We traced the lifecycle of 17 popular political rumors ...
The Effect of Misinformation Intervention: Evidence from Trump’s Tweets and the 2020 Election
Disinformation in Open Online MediaAbstractIn this study, we examine the effect of actions of misinformation mitigation. We use three datasets that contain a wide range of misinformation stories during the 2020 election, and we use synthetic controls to examine the causal effect of Twitter’...
Comments