ABSTRACT
Social networks are quickly becoming the primary medium for discussing what is happening around real-world events. The information that is generated on social platforms like Twitter can produce rich data streams for immediate insights into ongoing matters and the conversations around them. To tackle the problem of event detection, we model events as a list of clusters of trending entities over time. We describe a real-time system for discovering events that is modular in design and novel in scale and speed: it applies clustering on a large stream with millions of entities per minute and produces a dynamically updated set of events. In order to assess clustering methodologies, we build an evaluation dataset derived from a snapshot of the full Twitter Firehose and propose novel metrics for measuring clustering quality. Through experiments and system profiling, we highlight key results from the offline and online pipelines. Finally, we visualize a high profile event on Twitter to show the importance of modeling the evolution of events, especially those detected from social data streams.
- Daniel Archambault, Derek Greene, Pádraig Cunningham, and Neil Hurley. 2011. ThemeCrowds: Multiresolution summaries of twitter usage. In Proceedings of the 3rd international workshop on Search and mining user-generated contents. ACM, 77--84. Google ScholarDigital Library
- Farzindar Atefeh and Wael Khreich. 2015. A survey of techniques for event detection in twitter. Computational Intelligence, Vol. 31, 1 (2015), 132--164. Google ScholarDigital Library
- Amit Bagga and Breck Baldwin. 1998. Entity-based cross-document coreferencing using the vector space model. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 79--85. Google ScholarDigital Library
- Hila Becker, Mor Naaman, and Luis Gravano. 2011. Beyond Trending Topics: Real-World Event Identification on Twitter. Icwsm, Vol. 11, 2011 (2011), 438--441.Google Scholar
- Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, Vol. 2008, 10 (2008), P10008.Google ScholarCross Ref
- Deepayan Chakrabarti, Ravi Kumar, and Andrew Tomkins. 2006. Evolutionary clustering. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 554--560. Google ScholarDigital Library
- Marian Dork, Daniel Gruen, Carey Williamson, and Sheelagh Carpendale. 2010. A visual backchannel for large-scale events. IEEE transactions on visualization and computer graphics, Vol. 16, 6 (2010), 1129--1138. Google ScholarDigital Library
- Amosse Edouard, Elena Cabrio, Sara Tonelli, and Nhan Le Thanh. 2017. Graph-based event extraction from twitter. In RANLP17.Google Scholar
- Gabriel Pui Cheong Fung, Jeffrey Xu Yu, Philip S Yu, and Hongjun Lu. 2005. Parameter free bursty events detection in text streams. In Proceedings of the 31st international conference on Very large data bases. VLDB Endowment, 181--192. Google ScholarDigital Library
- Salvatore Gaglio, Giuseppe Lo Re, and Marco Morana. 2016. A framework for real-time Twitter data analysis. Computer Communications, Vol. 73 (2016), 236--242. Google ScholarDigital Library
- Adrien Guille and Cécile Favre. 2015. Event detection, tracking, and visualization in twitter: a mention-anomaly-based approach. Social Network Analysis and Mining, Vol. 5, 1 (2015), 18.Google ScholarCross Ref
- Mahmud Hasan, Mehmet A Orgun, and Rolf Schwitter. 2016. TwitterNews: real time event detection from the Twitter data stream. PeerJ PrePrints, Vol. 4 (2016), e2297v1.Google ScholarDigital Library
- Mahmud Hasan, Mehmet A Orgun, and Rolf Schwitter. 2017. A survey on real-time event detection from the twitter data stream. Journal of Information Science (2017), 0165551517698564. Google ScholarDigital Library
- Harold W Kuhn. 1955. The Hungarian method for the assignment problem. Naval research logistics quarterly, Vol. 2, 1--2 (1955), 83--97.Google Scholar
- Pei Lee, Laks VS Lakshmanan, and Evangelos E Milios. 2014. Incremental cluster evolution tracking from highly dynamic network data. In 2014 IEEE 30th International Conference on Data Engineering (ICDE). IEEE, 3--14.Google ScholarCross Ref
- Chenliang Li, Aixin Sun, and Anwitaman Datta. 2012. Twevent: segment-based event detection from tweets. In Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 155--164. Google ScholarDigital Library
- Jianxin Li, Zhenying Tai, Richong Zhang, Weiren Yu, and Lu Liu. 2014. Online bursty event detection from microblog. In Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing. IEEE Computer Society, 865--870. Google ScholarDigital Library
- Quanzhi Li, Armineh Nourbakhsh, Sameena Shah, and Xiaomo Liu. 2017. Real-time novel event detection from social media. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE). IEEE, 1129--1139.Google ScholarCross Ref
- Michael Mathioudakis and Nick Koudas. 2010. Twittermonitor: trend detection over the twitter stream. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 1155--1158. Google ScholarDigital Library
- Andrew J McMinn and Joemon M Jose. 2015. Real-time entity-based event detection for twitter. In International conference of the cross-language evaluation forum for european languages. Springer, 65--77. Google ScholarDigital Library
- Andrew J McMinn, Yashar Moshfeghi, and Joemon M Jose. 2013. Building a large-scale corpus for evaluating event detection on twitter. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 409--418. Google ScholarDigital Library
- Mahdi Namazifar. 2017. Named Entity Sequence Classification. arXiv preprint arXiv:1712.02316 (2017).Google Scholar
- Mark EJ Newman. 2003. The structure and function of complex networks. SIAM review, Vol. 45, 2 (2003), 167--256.Google Scholar
- J Walker Orr, Prasad Tadepalli, and Xiaoli Fern. 2018. Event Detection with Neural Networks: A Rigorous Empirical Evaluation. arXiv preprint arXiv:1808.08504 (2018).Google Scholar
- Miles Osborne, Sean Moran, Richard McCreadie, Alexander Von Lunen, Martin D Sykora, Elizabeth Cano, Neil Ireson, Craig Macdonald, Iadh Ounis, Yulan He, et almbox. 2014. Real-time detection, tracking, and monitoring of automatically discovered events in social media. (2014).Google Scholar
- Ruchi Parikh and Kamalakar Karlapalem. 2013. Et: events from tweets. In Proceedings of the 22nd international conference on world wide web. ACM, 613--620. Google ScholarDigital Library
- Savs a Petrović, Miles Osborne, and Victor Lavrenko. 2010. Streaming first story detection with application to twitter. In Human language technologies: The 2010 annual conference of the north american chapter of the association for computational linguistics. Association for Computational Linguistics, 181--189. Google ScholarDigital Library
- William M Rand. 1971. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, Vol. 66, 336 (1971), 846--850.Google ScholarCross Ref
- Jörg Reichardt and Stefan Bornholdt. 2006. Statistical mechanics of community detection. Physical Review E, Vol. 74, 1 (2006), 016110.Google ScholarCross Ref
- Giovanni Stilo and Paola Velardi. 2016. Efficient temporal mining of micro-blog texts and its application to event discovery. Data Mining and Knowledge Discovery, Vol. 30, 2 (2016), 372--402. Google ScholarDigital Library
- Gerret Von Nordheim, Karin Boczek, and Lars Koppers. 2018. Sourcing the Sources: An analysis of the use of Twitter and Facebook as a journalistic source over 10 years in The New York Times, The Guardian, and Süddeutsche Zeitung. Digital Journalism, Vol. 6, 7 (2018), 807--828.Google ScholarCross Ref
- Jianshu Weng and Bu-Sung Lee. 2011. Event detection in twitter. ICWSM, Vol. 11 (2011), 401--408.Google Scholar
- Yiming Yang, Tom Pierce, and Jaime Carbonell. 1998. A study of retrospective and on-line event detection. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 28--36. Google ScholarDigital Library
Index Terms
- Real-time Event Detection on Social Data Streams
Recommendations
Bursty Event Detection in Twitter Streams
Social media, in recent years, have become an invaluable source of information for both public and private organizations to enhance the comprehension of people interests and the onset of new events. Twitter, especially, allows a fast spread of news and ...
Discovering Credible Events in Near Real Time from Social Media Streams
WWW '15 Companion: Proceedings of the 24th International Conference on World Wide WebMy proposed research addresses fundamental deficiencies in social media-based event detection by discovering high-impact moments and evaluating their credibility rapidly. Results from my preliminary work demonstrate one can discover compelling moments ...
Identifying relevant event content for real-time event detection
ASONAM '14: Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and MiningA variety of event detection algorithms for microblog services have been proposed, but their accuracy relies on the microblog feeds they analyse. Existing research explores datasets that are collected using either a set of manually predefined terms or ...
Comments