skip to main content
10.1145/3292500.3330689acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Real-time Event Detection on Social Data Streams

Published:25 July 2019Publication History

ABSTRACT

Social networks are quickly becoming the primary medium for discussing what is happening around real-world events. The information that is generated on social platforms like Twitter can produce rich data streams for immediate insights into ongoing matters and the conversations around them. To tackle the problem of event detection, we model events as a list of clusters of trending entities over time. We describe a real-time system for discovering events that is modular in design and novel in scale and speed: it applies clustering on a large stream with millions of entities per minute and produces a dynamically updated set of events. In order to assess clustering methodologies, we build an evaluation dataset derived from a snapshot of the full Twitter Firehose and propose novel metrics for measuring clustering quality. Through experiments and system profiling, we highlight key results from the offline and online pipelines. Finally, we visualize a high profile event on Twitter to show the importance of modeling the evolution of events, especially those detected from social data streams.

References

  1. Daniel Archambault, Derek Greene, Pádraig Cunningham, and Neil Hurley. 2011. ThemeCrowds: Multiresolution summaries of twitter usage. In Proceedings of the 3rd international workshop on Search and mining user-generated contents. ACM, 77--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Farzindar Atefeh and Wael Khreich. 2015. A survey of techniques for event detection in twitter. Computational Intelligence, Vol. 31, 1 (2015), 132--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Amit Bagga and Breck Baldwin. 1998. Entity-based cross-document coreferencing using the vector space model. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 79--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Hila Becker, Mor Naaman, and Luis Gravano. 2011. Beyond Trending Topics: Real-World Event Identification on Twitter. Icwsm, Vol. 11, 2011 (2011), 438--441.Google ScholarGoogle Scholar
  5. Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, Vol. 2008, 10 (2008), P10008.Google ScholarGoogle ScholarCross RefCross Ref
  6. Deepayan Chakrabarti, Ravi Kumar, and Andrew Tomkins. 2006. Evolutionary clustering. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 554--560. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Marian Dork, Daniel Gruen, Carey Williamson, and Sheelagh Carpendale. 2010. A visual backchannel for large-scale events. IEEE transactions on visualization and computer graphics, Vol. 16, 6 (2010), 1129--1138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Amosse Edouard, Elena Cabrio, Sara Tonelli, and Nhan Le Thanh. 2017. Graph-based event extraction from twitter. In RANLP17.Google ScholarGoogle Scholar
  9. Gabriel Pui Cheong Fung, Jeffrey Xu Yu, Philip S Yu, and Hongjun Lu. 2005. Parameter free bursty events detection in text streams. In Proceedings of the 31st international conference on Very large data bases. VLDB Endowment, 181--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Salvatore Gaglio, Giuseppe Lo Re, and Marco Morana. 2016. A framework for real-time Twitter data analysis. Computer Communications, Vol. 73 (2016), 236--242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Adrien Guille and Cécile Favre. 2015. Event detection, tracking, and visualization in twitter: a mention-anomaly-based approach. Social Network Analysis and Mining, Vol. 5, 1 (2015), 18.Google ScholarGoogle ScholarCross RefCross Ref
  12. Mahmud Hasan, Mehmet A Orgun, and Rolf Schwitter. 2016. TwitterNews: real time event detection from the Twitter data stream. PeerJ PrePrints, Vol. 4 (2016), e2297v1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mahmud Hasan, Mehmet A Orgun, and Rolf Schwitter. 2017. A survey on real-time event detection from the twitter data stream. Journal of Information Science (2017), 0165551517698564. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Harold W Kuhn. 1955. The Hungarian method for the assignment problem. Naval research logistics quarterly, Vol. 2, 1--2 (1955), 83--97.Google ScholarGoogle Scholar
  15. Pei Lee, Laks VS Lakshmanan, and Evangelos E Milios. 2014. Incremental cluster evolution tracking from highly dynamic network data. In 2014 IEEE 30th International Conference on Data Engineering (ICDE). IEEE, 3--14.Google ScholarGoogle ScholarCross RefCross Ref
  16. Chenliang Li, Aixin Sun, and Anwitaman Datta. 2012. Twevent: segment-based event detection from tweets. In Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 155--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jianxin Li, Zhenying Tai, Richong Zhang, Weiren Yu, and Lu Liu. 2014. Online bursty event detection from microblog. In Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing. IEEE Computer Society, 865--870. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Quanzhi Li, Armineh Nourbakhsh, Sameena Shah, and Xiaomo Liu. 2017. Real-time novel event detection from social media. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE). IEEE, 1129--1139.Google ScholarGoogle ScholarCross RefCross Ref
  19. Michael Mathioudakis and Nick Koudas. 2010. Twittermonitor: trend detection over the twitter stream. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 1155--1158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Andrew J McMinn and Joemon M Jose. 2015. Real-time entity-based event detection for twitter. In International conference of the cross-language evaluation forum for european languages. Springer, 65--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Andrew J McMinn, Yashar Moshfeghi, and Joemon M Jose. 2013. Building a large-scale corpus for evaluating event detection on twitter. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 409--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mahdi Namazifar. 2017. Named Entity Sequence Classification. arXiv preprint arXiv:1712.02316 (2017).Google ScholarGoogle Scholar
  23. Mark EJ Newman. 2003. The structure and function of complex networks. SIAM review, Vol. 45, 2 (2003), 167--256.Google ScholarGoogle Scholar
  24. J Walker Orr, Prasad Tadepalli, and Xiaoli Fern. 2018. Event Detection with Neural Networks: A Rigorous Empirical Evaluation. arXiv preprint arXiv:1808.08504 (2018).Google ScholarGoogle Scholar
  25. Miles Osborne, Sean Moran, Richard McCreadie, Alexander Von Lunen, Martin D Sykora, Elizabeth Cano, Neil Ireson, Craig Macdonald, Iadh Ounis, Yulan He, et almbox. 2014. Real-time detection, tracking, and monitoring of automatically discovered events in social media. (2014).Google ScholarGoogle Scholar
  26. Ruchi Parikh and Kamalakar Karlapalem. 2013. Et: events from tweets. In Proceedings of the 22nd international conference on world wide web. ACM, 613--620. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Savs a Petrović, Miles Osborne, and Victor Lavrenko. 2010. Streaming first story detection with application to twitter. In Human language technologies: The 2010 annual conference of the north american chapter of the association for computational linguistics. Association for Computational Linguistics, 181--189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. William M Rand. 1971. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, Vol. 66, 336 (1971), 846--850.Google ScholarGoogle ScholarCross RefCross Ref
  29. Jörg Reichardt and Stefan Bornholdt. 2006. Statistical mechanics of community detection. Physical Review E, Vol. 74, 1 (2006), 016110.Google ScholarGoogle ScholarCross RefCross Ref
  30. Giovanni Stilo and Paola Velardi. 2016. Efficient temporal mining of micro-blog texts and its application to event discovery. Data Mining and Knowledge Discovery, Vol. 30, 2 (2016), 372--402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Gerret Von Nordheim, Karin Boczek, and Lars Koppers. 2018. Sourcing the Sources: An analysis of the use of Twitter and Facebook as a journalistic source over 10 years in The New York Times, The Guardian, and Süddeutsche Zeitung. Digital Journalism, Vol. 6, 7 (2018), 807--828.Google ScholarGoogle ScholarCross RefCross Ref
  32. Jianshu Weng and Bu-Sung Lee. 2011. Event detection in twitter. ICWSM, Vol. 11 (2011), 401--408.Google ScholarGoogle Scholar
  33. Yiming Yang, Tom Pierce, and Jaime Carbonell. 1998. A study of retrospective and on-line event detection. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 28--36. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Real-time Event Detection on Social Data Streams

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
                July 2019
                3305 pages
                ISBN:9781450362016
                DOI:10.1145/3292500

                Copyright © 2019 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 25 July 2019

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                KDD '19 Paper Acceptance Rate110of1,200submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

                Upcoming Conference

                KDD '24

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader