Abstract
Event detection has been one of the most important research topics in social media analysis. Most of the traditional approaches detect events based on fixed temporal and spatial resolutions, while in reality events of different scales usually occur simultaneously, namely, they span different intervals in time and space. In this paper, we propose a novel approach towards multiscale event detection using social media data, which takes into account different temporal and spatial scales of events in the data. Specifically, we explore the properties of the wavelet transform, which is a well-developed multiscale transform in signal processing, to enable automatic handling of the interaction between temporal and spatial scales. We then propose a novel algorithm to compute a data similarity graph at appropriate scales and detect events of different scales simultaneously by a single graph-based clustering process. Furthermore, we present spatiotemporal statistical analysis of the noisy information present in the data stream, which allows us to define a novel term-filtering procedure for the proposed event detection algorithm and helps us study its behavior using simulated noisy data. Experimental results on both synthetically generated data and real world data collected from Twitter demonstrate the meaningfulness and effectiveness of the proposed approach. Our framework further extends to numerous application domains that involve multiscale and multiresolution data analysis.
Similar content being viewed by others
Notes
Throughout the paper, we use “scales” and “resolutions” interchangeably.
Since we are interested in local clusters, we apply the non-recursive version of the Louvain method which stops after the first iteration.
One may think of applying LED with small values for \(T_t\) and \(T_d\) before grouping similar clusters together using a second clustering step. In fact, the second and further iterations of the Louvain method already offers such a grouping. Alternatively, a hierarchical clustering algorithm can be applied to the clusters obtained by LED. However, such further grouping process does not usually lead to a clear interpretation in terms of the spatiotemporal scales of the resulting event clusters, and it is often difficult to decide when to stop the recursive process and output the eventual clusters.
When two tweets come from the same geographical cell, they would share the same time series for any common term. In this case, the correlation of DWT coefficients would always be 1 regardless of the level at which we compute the transform (or the temporal scale). This special case can be interpreted as only keeping the spatial constraint in LED but relaxing the temporal constraint.
The direct usage of the CSR tests for the whole input tweet stream would not be particularly informative since both of our algorithms construct a similarity graph between tweets where the edge weights (i.e., the similarities between tweets) are based on the terms that two tweets have in common. In this case, noise or event-irrelevant tweets would affect the construction of the graph only when two “noise” tweets have a term in common (i.e., resulting in the formation of an edge that connects event-irrelevant tweets in the tweet similarity graph).
F-measure is computed as \((1+\beta ^2) \cdot \frac{\textit{Precision} \cdot \textit{Recall}}{(\beta ^2 \cdot \textit{Precision}) + \textit{Recall}}.\)
References
Aggarwal CC, Subbian K (2012) Event detection in social streams. In: SIAM international conference on data mining (SDM), Anaheim, CA
Atefeh F, Khreich W (2013) A survey of techniques for event detection in Twitter. Comput Intell
Becker H, Naaman M, Gravano L (2009) Event identification in social media. In: ACM SIGMOD workshop on the web and databases (WebDB), Providence, RI
Becker H, Naaman M, Gravano L (2010) Learning similarity metrics for event identification in social media. In: The third ACM international conference on web search and data mining (WSDM), New York City, NY
Becker H, Naaman M, Gravano L (2011) Beyond trending topics: real-world event identification on Twitter. In: The fifth international AAAI conference on weblogs and social media (ICWSM), Barcelona
Berlingerio M, Calabrese F, Lorenzo GD, Dong X, Gkoufas Y, Mavroeidis D (2013) SaferCity: a system for detecting and analyzing incidents from social media. In: IEEE international conference on data mining (ICDM), Dallas, TX
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech 10:P10008 (12pp)
Chen L, Roy A (2009) Event detection from flickr data through wavelet-based spatial analysis. In: The 18th ACM conference on information and knowledge management (CIKM), Hong Kong
Cooper M, Foote J, Girgensohn A, Wilcox L (2005) Temporal event clustering for digital photo collections. ACM Trans Multimed Comput Commun Appl (TOMCCAP) 1(3):269–288
Cordeiro M (2012) Twitter event detection: combining wavelet analysis and topic inference summarization. In: Doctoral symposium on informatics engineering, Porto
Cressie N, Wikle CK (2011) Statistics for spatio-temporal data (Wiley series in probability and statistics). Wiley, New York
Daubechies I (1992) Ten lectures on wavelets. In: SIAM
Lappas T, Vieira MR, Gunopulos D, Tsotras VJ (2012) On the spatiotemporal burstiness of terms. In: The 38th international conference on very large databases, Istanbul
Lee CH, Yang HC, Chien TF, Wen WS (2011) A novel approach for event detection by mining spatio-temporal information on microblogs. In: International conference on advances in social networks analysis and mining (ASONAM), Kaohsiung
Li C, Sun A, Datta A (2012a) Twevent: segment-based event detection from Tweets. In: The 21st ACM international conference on information and knowledge management (CIKM), Maui, HI
Li R, Lei KH, Khadiwala R, Chang KCC (2012b) TEDAS: a Twitter-based event detection and analysis system. In: The 28th IEEE international conference on data engineering (ICDE), Washington, DC
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Marcus A, Bernstein MS, Badar O, Karger DR, Madden S, Miller RC (2011) Twitinfo: aggregating and visualizing microblogs for event exploration. In: ACM CHI conference on human factors in computing systems, Vancouver
Newman MEJ (2006) Modularity and community structure in networks. Proc Natl Acad Sci USA 103(23):8577–8582
Ozdikis O, Senkul P, Oguztuzun H (2012) Semantic expansion of hashtags for enhanced event detection in Twitter. In: The first international workshop on online social systems (WOSS), Istanbul
Papadopoulos S, Zigkolis C, Kompatsiaris Y, Vakali A (2011) Cluster-based landmark and event detection for tagged photo collections. IEEE MultiMed 18(1):52–63
Parikh R, Karlapalem K (2013) ET: events from Tweets. In: The 22nd international conference on world wide web (WWW), Rio de Janeiro
Petrovic S, Osborne M, Lavrenko V (2010) Streaming first story detection with application to Twitter. In: The 11th annual conference of the North American chapter of the association for computational linguistics, Los Angeles, CA
Rattenbury T, Good N, Naaman M (2007) Towards automatic extraction of event and place semantics from Flickr tags. In: ACM SIGIR conference on research and development on information retrieval, Amsterdam
Reuter T, Papadopoulos S, Petkos G, Mezaris V, Kompatsiaris Y, Cimiano P, de Vries C, Geva S (2013) Social event detection at mediaeval 2013: challenges, datasets, and evaluation. In: Mediaeval benchmarking initiative for multimedia evaluation (MediaEval) 2013 workshop, Barcelona
Ronhovde P, Chakrabarty S, Hu D, Sahu M, Sahu KK, Kelton KF, Mauro NA, Nussinov Z (2011) Detecting hidden spatial and spatio-temporal structures in glasses and complex physical systems by multiresolution network clustering. Eur Phys J E 34:105
Ronhovde P, Chakrabarty S, Hu D, Sahu M, Sahu KK, Kelton KF, Mauro NA, Nussinov Z (2012) Detection of hidden structures for arbitrary scales in complex physical systems. Sci Rep 2:329
Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. In: The 19th international conference on world wide web (WWW), Raleigh, NC
Sankaranarayanan J, Samet H, Teitler BE, Lieberman MD, Sperling J (2009) TwitterStand: news in Tweets. In: The 17th ACM SIGSPATIAL international conference on advances in geographic information systems, Seattle, WA
Sayyadi H, Hurst M, Maykov A (2009) Event detection and tracking in social streams. In: The third international AAAI conference on weblogs and social media (ICWSM), San Jose, CA
Sheikholeslami G, Chatterjee S, Zhang A (2000) WaveCluster: a multi-resolution clustering approach for very large spatial databases. Int J Very Large Data Bases 8(3–4):289–304
Sugitani T, Shirakawa M, Hara T, Nishio S (2013) Detecting local events by analyzing spatiotemporal locality of Tweets. In: The 27th international conference on advanced information networking and applications workshops (WAINA), Barcelona
Thom D, Bosch H, Koch S, Woerner M, Ertl T (2012) Spatiotemporal anomaly detection through visual analysis of geolocated Twitter messages. In: 2012 IEEE Pacific visualization symposium (PacificVis), Songdo
Tremblay N, Borgnat P (2012) Multiscale community mining in networks using spectral graph wavelets. arXiv:1212.0689
von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Walther M, Kaisser M (2013) Geo-spatial event detection in the Twitter stream. In: The 35th European conference on information retrieval (ECIR), Moscow
Weng J, Lee BS (2011) Event detection in Twitter. In: The fifth international AAAI conference on weblogs and social media (ICWSM), Barcelona
Witkin A (1983) Scale space filtering. In: International joint conference on artificial intelligence (IJCAI), Karlsruhe
Zaharieva M, Zeppelzauer M, Breiteneder C (2013) Automated social event detection in large photo collections. In: ACM international conference on multimedia retrieval, Dallas, TX
Zeimpekis D, Gallopoulos E (2006) TMG: a MATLAB toolbox for generating term-document matrices from text collections. In: Kogan J, Nicholas C, and Teboulle M (eds) Grouping multidimensional data: recent advances in clustering. pp 187–210
Acknowledgments
X. Dong is supported by a Swiss National Science Foundation Mobility Fellowship. This work was done while X. Dong and D. Mavroeidis were at IBM Research - Ireland.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editors: Joao Gama, Indre Zliobaite, Alipio Jorge, and Concha Bielza.
Rights and permissions
About this article
Cite this article
Dong, X., Mavroeidis, D., Calabrese, F. et al. Multiscale event detection in social media. Data Min Knowl Disc 29, 1374–1405 (2015). https://doi.org/10.1007/s10618-015-0421-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-015-0421-2