Abstract
With the inception of the Twitter microblogging platform in 2006, a myriad of research efforts have emerged studying different aspects of the Twittersphere. Each study exploits its own tools and mechanisms to capture, store, query and analyze Twitter data. Inevitably, platforms have been developed to replace this ad-hoc exploration with a more structured and methodological form of analysis. Another body of literature focuses on developing languages for querying Tweets. This paper addresses issues around the big data nature of Twitter and emphasizes the need for new data management and query language frameworks that address limitations of existing systems. We review existing approaches that were developed to facilitate twitter analytics followed by a discussion on research issues and technical challenges in developing integrated solutions.
- FaceBook Query Language(FQL) overview. https://developers.facebook.com/docs/technical-guides/fql.Google Scholar
- Neo4j: The world's leading graph database. http://www.neo4j.org/.Google Scholar
- Sparksee: Scalable high-performance graph database. http://www.sparsity-technologies.com/.Google Scholar
- Titan: distributed graph database. http://thinkaurelius.github.io/titan.Google Scholar
- TrendsMap, Realtime local twitter trends. http://trendsmap.com/.Google Scholar
- Twitalyzer: Serious analytics for social business. http://twitalyzer.com.Google Scholar
- Yahoo! Query Language guide on YDN. https:// developer.yahoo.com/yql/.Google Scholar
- F. Abel, C. Hauff, and G. Houben. Twitcident: fighting fire with information from social web streams. In WWW, pages 305--308, 2012. Google ScholarDigital Library
- Amer-Yahia, V. Markl, A. Halevy, A. Doan, G. Alonso, D. Kossmann, and G. Weikum. Databases and Web 2.0 panel at VLDB 2007. In SIGMOD Record, volume 37, pages 49--52, Mar. 2008. Google ScholarDigital Library
- S. AmerYahia;, L. V. Lakshmanan;, and Cong Yu. SocialScope : Enabling information discovery on social content sites. In CIDR, 2009.Google Scholar
- T. Baldwin, P. Cook, and B. Han. A support platform for event detection using social intelligence. In Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 69--72, 2012. Google Scholar
- L. Barbosa and J. Feng. Robust sentiment detection on Twitter from biased and noisy data. pages 36--44, Aug. 2010. Google ScholarDigital Library
- M. S. Bernstein, B. Suh, L. Hong, J. Chen, S. Kairam, and E. H. Chi. Eddi: interactive topic-based browsing of social status streams. In 23nd annual ACM symposium on User interface software and technology - UIST, pages 303--312, Oct. 2010. Google Scholar
- A. Bifet and E. Frank. Sentiment knowledge discovery in twitter streaming data. Discovery Science. Springer Berlin Heidelberg, pages 1--15, Oct. 2010. Google ScholarDigital Library
- A. Black, C. Mascaro, M. Gallagher, and S. P. Goggins. Twitter Zombie: Architecture for capturing, socially transforming and analyzing the Twittersphere. In International conference on Supporting group work, pages 229--238, 2012. Google ScholarDigital Library
- M. Boanjak and E. Oliveira. TwitterEcho - A distributed focused crawler to support open research with twitter data. In International conference companion on World Wide Web, pages 1233--1239, 2012. Google ScholarDigital Library
- K. Bontcheva and L. Derczynski. TwitIE: an opensource information extraction pipeline for microblog text. In International Conference on Recent Advances in Natural Language Processing, 2013.Google Scholar
- C. Budak, T. Georgiou, and D. E. Abbadi. GeoScope: Online detection of geo-correlated information trends in social networks. PVLDB, 7(4):229--240, 2013.Google ScholarDigital Library
- C. Byun, H. Lee, Y. Kim, and K. K. Kim. Twitter data collecting tool with rule-based filtering and analysis module. International Journal of Web Information Systems, 9(3):184--203, 2013.Google ScholarCross Ref
- S. Carter, W. Weerkamp, and M. Tsagkias. Microblog language identification: Overcoming the limitations of short, unedited and idiomatic text. Language Resources and Evaluation, 47(1):195--215, June 2012. Google ScholarDigital Library
- M. Cha, H. Haddadi, F. Benevenuto, and K. P. Gummadi. Measuring user influence in twitter: The million follower fallacy. In ICWSM, pages 10--17, 2010.Google Scholar
- S. Chandra, L. Khan, and F. B. Muhaya. Estimating twitter user location using social interactions--a content based approach. In IEEE Conference on Privacy, Security, Risk and Trust, pages 838--843, Oct. 2011.Google ScholarCross Ref
- C. Chen, F. Li, C. Ooi, and S. Wu. TI : An efficient indexing mechanism for real-time search. In SIGMOD, pages 649--660, 2011. Google ScholarDigital Library
- Z. Cheng, J. Caverlee, K. Lee, and C. Science. A content-driven framework for geo-locating microblog users. ACM Transactions on Intelligent Systems and Technology, 2012. Google ScholarDigital Library
- M. Cheong and S. Ray. A literature review of recent microblogging developments. Technical report, Clayton School of Information Technology, Monash University, 2011.Google Scholar
- Chew, Cynthia, and G. Eysenbach. Pandemics in the age of twitter: content analysis of tweets during the 2009 H1N1 outbreak. PloS one, 5(11), 2010.Google Scholar
- B. O. Connor, N. A. Smith, and E. P. Xing. A latent variable model for geographic lexical variation. In Conference on Empirical Methods in Natural Language Processing, pages 1277--1287, 2010. Google ScholarDigital Library
- Conover, Michael, J. Ratkiewicz, M. Francisco, B. Gonçalves, F. Menczer, and A. Flammini. Political polarization on Twitter. In ICWSM, 2011.Google Scholar
- J. David. Thats what friends are for inferring location in online social media platforms based on social relationships. In ICWSM, 2013.Google Scholar
- Diego Serrano, Eleni Stroulia, Denilson Barbosa and V. Guana. SociQL: A query language for the social Web. In E. Kranakis, editor, Advances in Network Analysis and its Applications, chapter 17, pages 381--406. 2013.Google Scholar
- Y. Doytsher and B. Galon. Querying geo-social data by bridging spatial networks and social networks. In 2nd ACM SIGSPATIAL International Workshop on Location Based Social Networks, pages 39--46, 2010. Google ScholarDigital Library
- A. Dries, S. Nijssen, and L. De Raedt. A query language for analyzing networks. In CIKM, pages 485--494, 2009. Google ScholarDigital Library
- M. Efron. Hashtag retrieval in a microblogging environment. pages 787--788, 2010. Google ScholarDigital Library
- S. Frénot and S. Grumbach. An in-browser microblog ranking engine. In International conference on Advances in Conceptual Modeling, volume 7518, pages 78--88, 2012. Google ScholarDigital Library
- G. Golovchinsky and M. Efron. Making sense of Twitter search. In CHI, 2010.Google Scholar
- M. Graham, S. A. Hale, and D. Gaffney. Where in the world are you -- Geolocation and language identification in Twitter. In ICWSM, pages 518--521, 2012.Google Scholar
- B. Hecht, L. Hong, B. Suh, and E. Chi. Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles. In Conference on Human Factors in Computing Systems, pages 237--246, 2011. Google ScholarDigital Library
- B. J. Jansen, M. Zhang, K. Sobel, and A. Chowdury. Twitter power: Tweets as electronic word of mouth. Journal of the American Society for Information Science and Technology, 60(11):2169--2188, Nov. 2009. Google ScholarDigital Library
- J. Jiang, L. Hidayah, T. Elsayed, and H. Ramadan. BEST of KAUST at TREC-2011 : Building effective search in Twitter. TREC, 2011.Google Scholar
- P. Jürgens, A. Jungherr, and H. Schoen. Small worlds with a difference: new gatekeepers and the filtering of political information on Twitter. In International Web Science Conference-WebSci, pages 1--5, June 2011. Google ScholarDigital Library
- U. Kang, D. H. Chau, and C. Faloutsos. Managing and mining large graphs : Systems and implementations. In SIGMOD, volume 1, pages 589--592, 2012. Google ScholarDigital Library
- U. Kang and C. Faloutsos. Big graph mining : Algorithms and discoveries. SIGKDD Explorations, 14(2):29--36, 2013. Google ScholarDigital Library
- U. Kang, H. Tong, J. Sun, C.-Y. Lin, and C. Faloutsos. Gbase: An efficient analysis platform for large graphs. VLDB Journal, 21(5):637--650, June 2012. Google ScholarDigital Library
- S. Kumar, G. Barbier, M. Abbasi, and H. Liu. Tweet-Tracker: An analysis tool for humanitarian and disaster relief. In ICWSM, pages 661--662, 2011.Google Scholar
- H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In WWW, pages 591--600, 2010. Google ScholarDigital Library
- C.-H. Lee, H.-C. Yang, T.-F. Chien, and W.-S. Wen. A novel approach for event detection by mining spatiotemporal information on microblogs. In International Conference on Advances in Social Networks Analysis and Mining, pages 254--259, July 2011. Google ScholarDigital Library
- C. Li, J. Weng, Q. He, Y. Yao, and A. Datta. TwiNER: named entity recognition in targeted twitter stream. In SIGIR, pages 721--730, 2012. Google ScholarDigital Library
- A. Marcus, M. Bernstein, and O. Badar. Tweets as data: demonstration of TweeQL and Twitinfo. In SIGMOD, pages 1259--1261, 2011. Google ScholarDigital Library
- A. Marcus, M. Bernstein, and O. Badar. Processing and visualizing the data in tweets. SIGMOD Record, 40(4), 2012. Google ScholarDigital Library
- M. S. Martín and C. Gutierrez. Representing, querying and transforming social networks with RDF/SPARQL. European Semantic Web Conference, pages 293--307, 2009. Google ScholarDigital Library
- P. T. W. Mauro San Martín, Claudio Gutierrez. SNQL : A social network query and transformation language. In 5th Alberto Mendelzon International Workshop on Foundations of Data Management, 2011.Google Scholar
- M. Mcglohon and C. Faloutsos. Statistical properties of social networks. In C. C. Aggarwal, editor, Social Network Data Analytics, chapter 2, pages 17--42. 2011.Google Scholar
- P. Mendes, A. Passant, and P. Kapanipathi. Twarql: tapping into the wisdom of the crowd. In Proceedings of the 6th International Conference on Semantic Systems, pages 3--5, 2010. Google ScholarDigital Library
- F. Morstatter, S. Kumar, H. Liu, and R. Maciejewski. Understanding Twitter data with TweetXplorer. In SIGKDD, pages 1482--1485, 2013. Google ScholarDigital Library
- P. Noordhuis, M. Heijkoop, and A. Lazovik. Mining Twitter in the cloud: A case study. In IEEE 3rd International Conference on Cloud Computing, pages 107--114, July 2010. Google ScholarDigital Library
- I. Ounis, C. Macdonald, J. Lin, and I. Soboroff. Overview of the TREC-2011 Microblog Track. In 20th Text REtrieval Conference (TREC), 2011.Google Scholar
- A. Pak and P. Paroubek. Twitter as a corpus for sentiment analysis and opinion mining. In International Conference on Language Resources and Evaluation, pages 1320--1326, 2010.Google Scholar
- Paul, M. J, and M. Dredze. In ICWSM, pages 265--272.Google Scholar
- Plachouras and Y. Stavrakas. Querying term associations and their temporal evolution in social data. In International VLDB Workshop on Online Social Systems, 2012.Google Scholar
- V. Plachouras, Y. Stavrakas, and A. Andreou. Assessing the coverage of data collection campaigns on Twitter: A case study. In On the Move to Meaningful Internet Systems: OTM 2013 Workshops, pages 598--607. 2013.Google Scholar
- D. Preotiuc-Pietro, S. Samangooei, and T. Cohn. Trendminer : An architecture for real time analysis of social media text. In Workshop on RealTime Analysis and Mining of Social Streams, pages 4--7, 2012.Google Scholar
- L. Ratinov and D. Roth. Design challenges and misconceptions in named entity recognition. In Conference on Computational Natural Language Learning (CoNLL), number June, pages 147--155, 2009. Google ScholarDigital Library
- A. Ritter, S. Clark, and O. Etzioni. Named entity recognition in tweets : an experimental study. In Conference on Empirical Methods in Natural Language Processing, pages 1524--1534, 2011. Google ScholarDigital Library
- R. Ronen and O. Shmueli. SoQL: A language for querying and creating data in social networks. In ICDE, pages 1595--1602, Mar. 2009. Google ScholarDigital Library
- T. Sakaki. Earthquake shakes twitter users : Real-time event detection by social sensors. In WWW, pages 851--860, 2010. Google ScholarDigital Library
- S. Salihoglu and J. Widom. GPS : A graph processing system. In International Conference on Scientific and Statistical Database Management, pages 1--31, 2013. Google ScholarDigital Library
- A. Schulz, A. Hadjakos, and H. Paulheim. A multiindicator approach for geolocalization of tweets. In ICWSM, pages 573--582, 2013.Google Scholar
- A. Signorini, A. M. Segre, and P. M. Polgreen. The use of Twitter to track levels of disease activity and public concern in the U.S. during the influenza A H1N1 pandemic. PloS one, 6(5), Jan. 2011.Google Scholar
- Y. Stavrakas and V. Plachouras. A platform for supporting data analytics on twitter challenges and objectives. Intl. Workshop on Knowledge Extraction & Consolidation from Social Media, (Ict 270239), 2013.Google Scholar
- Tumasjan, Andranik, T. O. Sprenger, P. G. Sandner, and I. M. Welpe. Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment. In ICWSM, pages 178--185, 2010.Google Scholar
- J. Weng, E.-p. Lim, and J. Jiang. TwitterRank : Finding topic-sensitive influential twitterers. In WSDM, pages 261--270, 2010. Google ScholarDigital Library
- J. S.White, J. N. Matthews, and J. L. Stacy. Coalmine: an experience in building a system for social media analytics. In I. V. Ternovskiy and P. Chin, editors, Proceedings of SPIE, volume 8408, 2012.Google Scholar
- P. T. Wood. Query languages for graph databases. SIGMOD Record, 41(1):50--60, Apr. 2012. Google ScholarDigital Library
- S. Wu, J. M. Hofman, W. A. Mason, and D. J. Watts. Who says what to whom on twitter. In WWW, pages 705--714, Mar. 2011. Google ScholarDigital Library
- X. Yan, P. S. Yu, and J. Han. Graph indexing : A frequent structure-based approach. In SIGMOD, pages 335--346, 2004. Google ScholarDigital Library
- J. Yin, S. Karimi, B. Robinson, and M. Cameron. ESA: emergency situation awareness via microbloggers. In CIKM, pages 2701--2703, 2012. Google ScholarDigital Library
Index Terms
- Twitter analytics: a big data management perspective
Recommendations
Realtime analytics @ twitter
CloudDB '13: Proceedings of the fifth international workshop on Cloud data managementIn this talk, we will discuss the data pipeline at Twitter that collects, aggregates and processes large volumes of data in real time and also how it fits in the broader data infrastructure ecosystem. We will also discuss challenges we have faced and ...
Big Data Analytics on Twitter: A Systematic Review of Applications and Methods
Big Data – BigData 2018AbstractAs the amount of digital data is growing at an exponential rate, the emphasis is on forming an insight from the data. Although the new fields of research, including Twitter data analytics, are proven to be fruitful, there is a lack of literature ...
Comments