skip to main content
research-article

Twitter analytics: a big data management perspective

Published:25 September 2014Publication History
Skip Abstract Section

Abstract

With the inception of the Twitter microblogging platform in 2006, a myriad of research efforts have emerged studying different aspects of the Twittersphere. Each study exploits its own tools and mechanisms to capture, store, query and analyze Twitter data. Inevitably, platforms have been developed to replace this ad-hoc exploration with a more structured and methodological form of analysis. Another body of literature focuses on developing languages for querying Tweets. This paper addresses issues around the big data nature of Twitter and emphasizes the need for new data management and query language frameworks that address limitations of existing systems. We review existing approaches that were developed to facilitate twitter analytics followed by a discussion on research issues and technical challenges in developing integrated solutions.

References

  1. FaceBook Query Language(FQL) overview. https://developers.facebook.com/docs/technical-guides/fql.Google ScholarGoogle Scholar
  2. Neo4j: The world's leading graph database. http://www.neo4j.org/.Google ScholarGoogle Scholar
  3. Sparksee: Scalable high-performance graph database. http://www.sparsity-technologies.com/.Google ScholarGoogle Scholar
  4. Titan: distributed graph database. http://thinkaurelius.github.io/titan.Google ScholarGoogle Scholar
  5. TrendsMap, Realtime local twitter trends. http://trendsmap.com/.Google ScholarGoogle Scholar
  6. Twitalyzer: Serious analytics for social business. http://twitalyzer.com.Google ScholarGoogle Scholar
  7. Yahoo! Query Language guide on YDN. https:// developer.yahoo.com/yql/.Google ScholarGoogle Scholar
  8. F. Abel, C. Hauff, and G. Houben. Twitcident: fighting fire with information from social web streams. In WWW, pages 305--308, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Amer-Yahia, V. Markl, A. Halevy, A. Doan, G. Alonso, D. Kossmann, and G. Weikum. Databases and Web 2.0 panel at VLDB 2007. In SIGMOD Record, volume 37, pages 49--52, Mar. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. AmerYahia;, L. V. Lakshmanan;, and Cong Yu. SocialScope : Enabling information discovery on social content sites. In CIDR, 2009.Google ScholarGoogle Scholar
  11. T. Baldwin, P. Cook, and B. Han. A support platform for event detection using social intelligence. In Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 69--72, 2012. Google ScholarGoogle Scholar
  12. L. Barbosa and J. Feng. Robust sentiment detection on Twitter from biased and noisy data. pages 36--44, Aug. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. S. Bernstein, B. Suh, L. Hong, J. Chen, S. Kairam, and E. H. Chi. Eddi: interactive topic-based browsing of social status streams. In 23nd annual ACM symposium on User interface software and technology - UIST, pages 303--312, Oct. 2010. Google ScholarGoogle Scholar
  14. A. Bifet and E. Frank. Sentiment knowledge discovery in twitter streaming data. Discovery Science. Springer Berlin Heidelberg, pages 1--15, Oct. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Black, C. Mascaro, M. Gallagher, and S. P. Goggins. Twitter Zombie: Architecture for capturing, socially transforming and analyzing the Twittersphere. In International conference on Supporting group work, pages 229--238, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Boanjak and E. Oliveira. TwitterEcho - A distributed focused crawler to support open research with twitter data. In International conference companion on World Wide Web, pages 1233--1239, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Bontcheva and L. Derczynski. TwitIE: an opensource information extraction pipeline for microblog text. In International Conference on Recent Advances in Natural Language Processing, 2013.Google ScholarGoogle Scholar
  18. C. Budak, T. Georgiou, and D. E. Abbadi. GeoScope: Online detection of geo-correlated information trends in social networks. PVLDB, 7(4):229--240, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Byun, H. Lee, Y. Kim, and K. K. Kim. Twitter data collecting tool with rule-based filtering and analysis module. International Journal of Web Information Systems, 9(3):184--203, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  20. S. Carter, W. Weerkamp, and M. Tsagkias. Microblog language identification: Overcoming the limitations of short, unedited and idiomatic text. Language Resources and Evaluation, 47(1):195--215, June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Cha, H. Haddadi, F. Benevenuto, and K. P. Gummadi. Measuring user influence in twitter: The million follower fallacy. In ICWSM, pages 10--17, 2010.Google ScholarGoogle Scholar
  22. S. Chandra, L. Khan, and F. B. Muhaya. Estimating twitter user location using social interactions--a content based approach. In IEEE Conference on Privacy, Security, Risk and Trust, pages 838--843, Oct. 2011.Google ScholarGoogle ScholarCross RefCross Ref
  23. C. Chen, F. Li, C. Ooi, and S. Wu. TI : An efficient indexing mechanism for real-time search. In SIGMOD, pages 649--660, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Z. Cheng, J. Caverlee, K. Lee, and C. Science. A content-driven framework for geo-locating microblog users. ACM Transactions on Intelligent Systems and Technology, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Cheong and S. Ray. A literature review of recent microblogging developments. Technical report, Clayton School of Information Technology, Monash University, 2011.Google ScholarGoogle Scholar
  26. Chew, Cynthia, and G. Eysenbach. Pandemics in the age of twitter: content analysis of tweets during the 2009 H1N1 outbreak. PloS one, 5(11), 2010.Google ScholarGoogle Scholar
  27. B. O. Connor, N. A. Smith, and E. P. Xing. A latent variable model for geographic lexical variation. In Conference on Empirical Methods in Natural Language Processing, pages 1277--1287, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Conover, Michael, J. Ratkiewicz, M. Francisco, B. Gonçalves, F. Menczer, and A. Flammini. Political polarization on Twitter. In ICWSM, 2011.Google ScholarGoogle Scholar
  29. J. David. Thats what friends are for inferring location in online social media platforms based on social relationships. In ICWSM, 2013.Google ScholarGoogle Scholar
  30. Diego Serrano, Eleni Stroulia, Denilson Barbosa and V. Guana. SociQL: A query language for the social Web. In E. Kranakis, editor, Advances in Network Analysis and its Applications, chapter 17, pages 381--406. 2013.Google ScholarGoogle Scholar
  31. Y. Doytsher and B. Galon. Querying geo-social data by bridging spatial networks and social networks. In 2nd ACM SIGSPATIAL International Workshop on Location Based Social Networks, pages 39--46, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Dries, S. Nijssen, and L. De Raedt. A query language for analyzing networks. In CIKM, pages 485--494, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Efron. Hashtag retrieval in a microblogging environment. pages 787--788, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Frénot and S. Grumbach. An in-browser microblog ranking engine. In International conference on Advances in Conceptual Modeling, volume 7518, pages 78--88, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. G. Golovchinsky and M. Efron. Making sense of Twitter search. In CHI, 2010.Google ScholarGoogle Scholar
  36. M. Graham, S. A. Hale, and D. Gaffney. Where in the world are you -- Geolocation and language identification in Twitter. In ICWSM, pages 518--521, 2012.Google ScholarGoogle Scholar
  37. B. Hecht, L. Hong, B. Suh, and E. Chi. Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles. In Conference on Human Factors in Computing Systems, pages 237--246, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. B. J. Jansen, M. Zhang, K. Sobel, and A. Chowdury. Twitter power: Tweets as electronic word of mouth. Journal of the American Society for Information Science and Technology, 60(11):2169--2188, Nov. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. Jiang, L. Hidayah, T. Elsayed, and H. Ramadan. BEST of KAUST at TREC-2011 : Building effective search in Twitter. TREC, 2011.Google ScholarGoogle Scholar
  40. P. Jürgens, A. Jungherr, and H. Schoen. Small worlds with a difference: new gatekeepers and the filtering of political information on Twitter. In International Web Science Conference-WebSci, pages 1--5, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. U. Kang, D. H. Chau, and C. Faloutsos. Managing and mining large graphs : Systems and implementations. In SIGMOD, volume 1, pages 589--592, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. U. Kang and C. Faloutsos. Big graph mining : Algorithms and discoveries. SIGKDD Explorations, 14(2):29--36, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. U. Kang, H. Tong, J. Sun, C.-Y. Lin, and C. Faloutsos. Gbase: An efficient analysis platform for large graphs. VLDB Journal, 21(5):637--650, June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. S. Kumar, G. Barbier, M. Abbasi, and H. Liu. Tweet-Tracker: An analysis tool for humanitarian and disaster relief. In ICWSM, pages 661--662, 2011.Google ScholarGoogle Scholar
  45. H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In WWW, pages 591--600, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. C.-H. Lee, H.-C. Yang, T.-F. Chien, and W.-S. Wen. A novel approach for event detection by mining spatiotemporal information on microblogs. In International Conference on Advances in Social Networks Analysis and Mining, pages 254--259, July 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. C. Li, J. Weng, Q. He, Y. Yao, and A. Datta. TwiNER: named entity recognition in targeted twitter stream. In SIGIR, pages 721--730, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. A. Marcus, M. Bernstein, and O. Badar. Tweets as data: demonstration of TweeQL and Twitinfo. In SIGMOD, pages 1259--1261, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. A. Marcus, M. Bernstein, and O. Badar. Processing and visualizing the data in tweets. SIGMOD Record, 40(4), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. M. S. Martín and C. Gutierrez. Representing, querying and transforming social networks with RDF/SPARQL. European Semantic Web Conference, pages 293--307, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. P. T. W. Mauro San Martín, Claudio Gutierrez. SNQL : A social network query and transformation language. In 5th Alberto Mendelzon International Workshop on Foundations of Data Management, 2011.Google ScholarGoogle Scholar
  52. M. Mcglohon and C. Faloutsos. Statistical properties of social networks. In C. C. Aggarwal, editor, Social Network Data Analytics, chapter 2, pages 17--42. 2011.Google ScholarGoogle Scholar
  53. P. Mendes, A. Passant, and P. Kapanipathi. Twarql: tapping into the wisdom of the crowd. In Proceedings of the 6th International Conference on Semantic Systems, pages 3--5, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. F. Morstatter, S. Kumar, H. Liu, and R. Maciejewski. Understanding Twitter data with TweetXplorer. In SIGKDD, pages 1482--1485, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. P. Noordhuis, M. Heijkoop, and A. Lazovik. Mining Twitter in the cloud: A case study. In IEEE 3rd International Conference on Cloud Computing, pages 107--114, July 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. I. Ounis, C. Macdonald, J. Lin, and I. Soboroff. Overview of the TREC-2011 Microblog Track. In 20th Text REtrieval Conference (TREC), 2011.Google ScholarGoogle Scholar
  57. A. Pak and P. Paroubek. Twitter as a corpus for sentiment analysis and opinion mining. In International Conference on Language Resources and Evaluation, pages 1320--1326, 2010.Google ScholarGoogle Scholar
  58. Paul, M. J, and M. Dredze. In ICWSM, pages 265--272.Google ScholarGoogle Scholar
  59. Plachouras and Y. Stavrakas. Querying term associations and their temporal evolution in social data. In International VLDB Workshop on Online Social Systems, 2012.Google ScholarGoogle Scholar
  60. V. Plachouras, Y. Stavrakas, and A. Andreou. Assessing the coverage of data collection campaigns on Twitter: A case study. In On the Move to Meaningful Internet Systems: OTM 2013 Workshops, pages 598--607. 2013.Google ScholarGoogle Scholar
  61. D. Preotiuc-Pietro, S. Samangooei, and T. Cohn. Trendminer : An architecture for real time analysis of social media text. In Workshop on RealTime Analysis and Mining of Social Streams, pages 4--7, 2012.Google ScholarGoogle Scholar
  62. L. Ratinov and D. Roth. Design challenges and misconceptions in named entity recognition. In Conference on Computational Natural Language Learning (CoNLL), number June, pages 147--155, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. A. Ritter, S. Clark, and O. Etzioni. Named entity recognition in tweets : an experimental study. In Conference on Empirical Methods in Natural Language Processing, pages 1524--1534, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. R. Ronen and O. Shmueli. SoQL: A language for querying and creating data in social networks. In ICDE, pages 1595--1602, Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. T. Sakaki. Earthquake shakes twitter users : Real-time event detection by social sensors. In WWW, pages 851--860, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. S. Salihoglu and J. Widom. GPS : A graph processing system. In International Conference on Scientific and Statistical Database Management, pages 1--31, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. A. Schulz, A. Hadjakos, and H. Paulheim. A multiindicator approach for geolocalization of tweets. In ICWSM, pages 573--582, 2013.Google ScholarGoogle Scholar
  68. A. Signorini, A. M. Segre, and P. M. Polgreen. The use of Twitter to track levels of disease activity and public concern in the U.S. during the influenza A H1N1 pandemic. PloS one, 6(5), Jan. 2011.Google ScholarGoogle Scholar
  69. Y. Stavrakas and V. Plachouras. A platform for supporting data analytics on twitter challenges and objectives. Intl. Workshop on Knowledge Extraction & Consolidation from Social Media, (Ict 270239), 2013.Google ScholarGoogle Scholar
  70. Tumasjan, Andranik, T. O. Sprenger, P. G. Sandner, and I. M. Welpe. Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment. In ICWSM, pages 178--185, 2010.Google ScholarGoogle Scholar
  71. J. Weng, E.-p. Lim, and J. Jiang. TwitterRank : Finding topic-sensitive influential twitterers. In WSDM, pages 261--270, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. J. S.White, J. N. Matthews, and J. L. Stacy. Coalmine: an experience in building a system for social media analytics. In I. V. Ternovskiy and P. Chin, editors, Proceedings of SPIE, volume 8408, 2012.Google ScholarGoogle Scholar
  73. P. T. Wood. Query languages for graph databases. SIGMOD Record, 41(1):50--60, Apr. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. S. Wu, J. M. Hofman, W. A. Mason, and D. J. Watts. Who says what to whom on twitter. In WWW, pages 705--714, Mar. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. X. Yan, P. S. Yu, and J. Han. Graph indexing : A frequent structure-based approach. In SIGMOD, pages 335--346, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. J. Yin, S. Karimi, B. Robinson, and M. Cameron. ESA: emergency situation awareness via microbloggers. In CIKM, pages 2701--2703, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Twitter analytics: a big data management perspective

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGKDD Explorations Newsletter
            ACM SIGKDD Explorations Newsletter  Volume 16, Issue 1
            Special issue on big data
            June 2014
            63 pages
            ISSN:1931-0145
            EISSN:1931-0153
            DOI:10.1145/2674026
            Issue’s Table of Contents

            Copyright © 2014 Authors

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 25 September 2014

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader