skip to main content
10.1145/3097983.3098053acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Compass: Spatio Temporal Sentiment Analysis of US Election What Twitter Says!

Published:13 August 2017Publication History

ABSTRACT

With the widespread growth of various social network tools and platforms, analyzing and understanding societal response and crowd reaction to important and emerging social issues and events through social media data is increasingly an important problem. However, there are numerous challenges towards realizing this goal effectively and efficiently, due to the unstructured and noisy nature of social media data. The large volume of the underlying data also presents a fundamental challenge. Furthermore, in many application scenarios, it is often interesting, and in some cases critical, to discover patterns and trends based on geographical and/or temporal partitions, and keep track of how they will change overtime.

This brings up the interesting problem of spatio-temporal sentiment analysis from large-scale social media data. This paper investigates this problem through a data science project called "US Election 2016, What Twitter Says". The objective is to discover sentiment on Twitter towards either the democratic or the republican party at US county and state levels over any arbitrary temporal intervals, using a large collection of geotagged tweets from a period of 6 months leading up to the US Presidential Election in 2016. Our results demonstrate that by integrating and developing a combination of machine learning and data management techniques, it ispossible to do this at scale with effective outcomes. The results of our project have the potential to be adapted towards solving and influencing other interesting social issues such as building neighborhood happiness and health indicators.

Skip Supplemental Material Section

Supplemental Material

paul_sentiment_analysis.mp4

mp4

349.9 MB

References

  1. D. Agarwal and B.-C. Chen. flda: matrix factorization through latent dirichlet allocation. In WSDM, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. AlSumait, D. Barbará, and C. Domeniconi. On-line lda: Adaptive topic models for mining text streams with applications to topic detection and tracking. In ICDM. IEEE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Anandkumar, D. P. Foster, D. J. Hsu, S. M. Kakade, and Y.-K. Liu. A spectral algorithm for latent dirichlet allocation. In NIPS, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Anuta, J. Churchin, and J. Luo. Election bias: Comparing polls and twitter in the 2016 us election. arXiv:1701.06232, 2016.Google ScholarGoogle Scholar
  5. AOL. 2016 presidential election timeline, 2016. [accessed 08-02-2017].Google ScholarGoogle Scholar
  6. D. M. Blei. Probabilistic topic models. CACM, 55(4):77--84, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3(Jan):993--1022, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606, 2016.Google ScholarGoogle Scholar
  9. A. Bovet, F. Morone, and H. A. Makse. Predicting election trends with twitter: Hillary clinton versus donald trump. arXiv:1610.01587, 2016.Google ScholarGoogle Scholar
  10. M. Cataldi, L. Di Caro, and C. Schifanella. Emerging topic detection on twitter based on temporal and social terms evaluation. In MDM/KDD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Cormode and S. Muthukrishnan. An improved data stream summary: The count-min sketch and its applications. In LATIN, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  12. C. N. Dos Santos and M. Gatti. Deep convolutional neural networks for sentiment analysis of short texts. In COLING, 2014.Google ScholarGoogle Scholar
  13. A. Duric and F. Song. Feature selection for sentiment analysis based on content and syntax models. Decision Support Systems, 53(4):704--711, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. El-Kishky, Y. Song, C. Wang, C. R. Voss, and J. Han. Scalable topical phrase mining from text corpora. PVLDB, 8(3), 2014.Google ScholarGoogle Scholar
  15. A. Genkin, D. D. Lewis, and D. Madigan. Large-scale bayesian logistic regression for text categorization. Technometrics, 49(3), 2007. Google ScholarGoogle ScholarCross RefCross Ref
  16. A. Go, R. Bhayani, and L. Huang. Twitter sentiment classification using distant supervision. CS224N Project, Stanford, 1(12), 2009.Google ScholarGoogle Scholar
  17. F. Godin, V. Slavkovikj, W. De Neve, B. Schrauwen, and R. Van de Walle. Using topic models for twitter hashtag recommendation. In WWW, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735--1780, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Hofmann. Probabilistic latent semantic analysis. In Uncertainty in artificial intelligence, pages 289--296, 1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. IETF. Rfc 7946 - the geojson format, 2017. [accessed 08-Feb-2017].Google ScholarGoogle Scholar
  21. L. Jiang, M. Yu, M. Zhou, X. Liu, and T. Zhao. Target-dependent twitter sentiment classification. In ACL HLT, pages 151--160.Google ScholarGoogle Scholar
  22. T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In ECML, 1998.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759, 2016.Google ScholarGoogle Scholar
  24. D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.Google ScholarGoogle Scholar
  25. J. Kleinberg. Bursty and hierarchical structure in streams. Data Mining and Knowledge Discovery, 7(4):373--397, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. Kouloumpis, T. Wilson, and J. D. Moore. Twitter sentiment analysis: The good the bad and the omg! Icwsm, 11(538--541), 2011.Google ScholarGoogle Scholar
  27. S. Lai, L. Xu, K. Liu, and J. Zhao. Recurrent convolutional neural networks for text classification. In AAAI, volume 333, pages 2267--2273, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Q. Li, S. Shah, X. Liu, A. Nourbakhsh, and R. Fang. Tweetsift: Tweet topic classification based on entity knowledge base and topic enhanced word embedding. In CIKM, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Lu and Q. Yang. Trend analysis of news topics on twitter. IJMLC, 2(3), 2012.Google ScholarGoogle Scholar
  30. A. McCallum, K. Nigam, et al. A comparison of event models for naive bayes text classification. In AAAI, volume 752, pages 41--48, 1998.Google ScholarGoogle Scholar
  31. L. Medsker and L. C. Jain. Recurrent neural networks: design and applications. CRC press, 1999. Google ScholarGoogle ScholarCross RefCross Ref
  32. T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv:1301.3781, 2013.Google ScholarGoogle Scholar
  33. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111--3119, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. D.-P. Nguyen, R. Gravel, R. Trieschnigg, and T. Meder. "how old do you think i am?" a study of language and age in twitter. 2013.Google ScholarGoogle Scholar
  35. B. Pang, L. Lee, et al. Opinion mining and sentiment analysis. FTIR, 2(1--2):1--135, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  36. PRC. Demographics of social media users in 2016, 2016. [accessed 08-Feb-2017].Google ScholarGoogle Scholar
  37. D. A. Shamma, L. Kennedy, and E. F. Churchill. Peaks and persistence: Modeling the shape of microblog conversations. In CSCW, 2011.Google ScholarGoogle Scholar
  38. M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede. Lexicon-based methods for sentiment analysis. Computational Linguistics, 37(2):267--307, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. D. Tang, B. Qin, and T. Liu. Document modeling with gated recurrent neural network for sentiment classification. In EMNLP, pages 1422--1432, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  40. N. Y. Times. Election 2016: Exit polls, 2016.Google ScholarGoogle Scholar
  41. S. Vosoughi, H. Zhou, and D. Roy. Enhanced twitter sentiment classification using contextual information. arXiv:1605.05195, 2016.Google ScholarGoogle Scholar
  42. C. Wang and D. M. Blei. Collaborative topic modeling for recommending scientific articles. In SIGKDD, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Z. Wei, G. Luo, K. Yi, X. Du, and J.-R. Wen. Persistent data sketching. In SIGMOD, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Wikipedia. Swift gamma-ray burst mission -- wikipedia, the free encyclopedia, 2016. [accessed 08-Feb-2017].Google ScholarGoogle Scholar
  45. D. Xie, F. Li, B. Yao, G. Li, L. Zhou, and M. Guo. Simba: Efficient in-memory spatial analytics. In SIGMOD, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. W. Xie, F. Zhu, J. Jiang, E.-P. Lim, and K. Wang. Topicsketch: Real-time bursty topic detection from twitter. In ICDM, pages 837--846, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  47. W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. Comparing twitter and traditional media using topic models. In ECIR, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. C. Zhou, C. Sun, Z. Liu, and F. Lau. A c-lstm neural network for text classification. arXiv:1511.08630, 2015.Google ScholarGoogle Scholar
  49. Y. Zhu and D. Shasha. Efficient elastic burst detection in data streams. In SIGKDD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Compass: Spatio Temporal Sentiment Analysis of US Election What Twitter Says!

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
            August 2017
            2240 pages
            ISBN:9781450348874
            DOI:10.1145/3097983

            Copyright © 2017 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 13 August 2017

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            KDD '17 Paper Acceptance Rate64of748submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

            Upcoming Conference

            KDD '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader