Skip to main content
Log in

Multiscale event detection in social media

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Event detection has been one of the most important research topics in social media analysis. Most of the traditional approaches detect events based on fixed temporal and spatial resolutions, while in reality events of different scales usually occur simultaneously, namely, they span different intervals in time and space. In this paper, we propose a novel approach towards multiscale event detection using social media data, which takes into account different temporal and spatial scales of events in the data. Specifically, we explore the properties of the wavelet transform, which is a well-developed multiscale transform in signal processing, to enable automatic handling of the interaction between temporal and spatial scales. We then propose a novel algorithm to compute a data similarity graph at appropriate scales and detect events of different scales simultaneously by a single graph-based clustering process. Furthermore, we present spatiotemporal statistical analysis of the noisy information present in the data stream, which allows us to define a novel term-filtering procedure for the proposed event detection algorithm and helps us study its behavior using simulated noisy data. Experimental results on both synthetically generated data and real world data collected from Twitter demonstrate the meaningfulness and effectiveness of the proposed approach. Our framework further extends to numerous application domains that involve multiscale and multiresolution data analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. Throughout the paper, we use “scales” and “resolutions” interchangeably.

  2. Since we are interested in local clusters, we apply the non-recursive version of the Louvain method which stops after the first iteration.

  3. One may think of applying LED with small values for \(T_t\) and \(T_d\) before grouping similar clusters together using a second clustering step. In fact, the second and further iterations of the Louvain method already offers such a grouping. Alternatively, a hierarchical clustering algorithm can be applied to the clusters obtained by LED. However, such further grouping process does not usually lead to a clear interpretation in terms of the spatiotemporal scales of the resulting event clusters, and it is often difficult to decide when to stop the recursive process and output the eventual clusters.

  4. When two tweets come from the same geographical cell, they would share the same time series for any common term. In this case, the correlation of DWT coefficients would always be 1 regardless of the level at which we compute the transform (or the temporal scale). This special case can be interpreted as only keeping the spatial constraint in LED but relaxing the temporal constraint.

  5. The direct usage of the CSR tests for the whole input tweet stream would not be particularly informative since both of our algorithms construct a similarity graph between tweets where the edge weights (i.e., the similarities between tweets) are based on the terms that two tweets have in common. In this case, noise or event-irrelevant tweets would affect the construction of the graph only when two “noise” tweets have a term in common (i.e., resulting in the formation of an edge that connects event-irrelevant tweets in the tweet similarity graph).

  6. F-measure is computed as \((1+\beta ^2) \cdot \frac{\textit{Precision} \cdot \textit{Recall}}{(\beta ^2 \cdot \textit{Precision}) + \textit{Recall}}.\)

  7. https://dev.twitter.com/streaming/overview/request-parameters#locations.

  8. http://en.wikipedia.org/wiki/Occupy_Wall_Street.

References

  • Aggarwal CC, Subbian K (2012) Event detection in social streams. In: SIAM international conference on data mining (SDM), Anaheim, CA

  • Atefeh F, Khreich W (2013) A survey of techniques for event detection in Twitter. Comput Intell

  • Becker H, Naaman M, Gravano L (2009) Event identification in social media. In: ACM SIGMOD workshop on the web and databases (WebDB), Providence, RI

  • Becker H, Naaman M, Gravano L (2010) Learning similarity metrics for event identification in social media. In: The third ACM international conference on web search and data mining (WSDM), New York City, NY

  • Becker H, Naaman M, Gravano L (2011) Beyond trending topics: real-world event identification on Twitter. In: The fifth international AAAI conference on weblogs and social media (ICWSM), Barcelona

  • Berlingerio M, Calabrese F, Lorenzo GD, Dong X, Gkoufas Y, Mavroeidis D (2013) SaferCity: a system for detecting and analyzing incidents from social media. In: IEEE international conference on data mining (ICDM), Dallas, TX

  • Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech 10:P10008 (12pp)

    Google Scholar 

  • Chen L, Roy A (2009) Event detection from flickr data through wavelet-based spatial analysis. In: The 18th ACM conference on information and knowledge management (CIKM), Hong Kong

  • Cooper M, Foote J, Girgensohn A, Wilcox L (2005) Temporal event clustering for digital photo collections. ACM Trans Multimed Comput Commun Appl (TOMCCAP) 1(3):269–288

    Article  Google Scholar 

  • Cordeiro M (2012) Twitter event detection: combining wavelet analysis and topic inference summarization. In: Doctoral symposium on informatics engineering, Porto

  • Cressie N, Wikle CK (2011) Statistics for spatio-temporal data (Wiley series in probability and statistics). Wiley, New York

    Google Scholar 

  • Daubechies I (1992) Ten lectures on wavelets. In: SIAM

  • Lappas T, Vieira MR, Gunopulos D, Tsotras VJ (2012) On the spatiotemporal burstiness of terms. In: The 38th international conference on very large databases, Istanbul

  • Lee CH, Yang HC, Chien TF, Wen WS (2011) A novel approach for event detection by mining spatio-temporal information on microblogs. In: International conference on advances in social networks analysis and mining (ASONAM), Kaohsiung

  • Li C, Sun A, Datta A (2012a) Twevent: segment-based event detection from Tweets. In: The 21st ACM international conference on information and knowledge management (CIKM), Maui, HI

  • Li R, Lei KH, Khadiwala R, Chang KCC (2012b) TEDAS: a Twitter-based event detection and analysis system. In: The 28th IEEE international conference on data engineering (ICDE), Washington, DC

  • Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Marcus A, Bernstein MS, Badar O, Karger DR, Madden S, Miller RC (2011) Twitinfo: aggregating and visualizing microblogs for event exploration. In: ACM CHI conference on human factors in computing systems, Vancouver

  • Newman MEJ (2006) Modularity and community structure in networks. Proc Natl Acad Sci USA 103(23):8577–8582

    Article  Google Scholar 

  • Ozdikis O, Senkul P, Oguztuzun H (2012) Semantic expansion of hashtags for enhanced event detection in Twitter. In: The first international workshop on online social systems (WOSS), Istanbul

  • Papadopoulos S, Zigkolis C, Kompatsiaris Y, Vakali A (2011) Cluster-based landmark and event detection for tagged photo collections. IEEE MultiMed 18(1):52–63

    Article  Google Scholar 

  • Parikh R, Karlapalem K (2013) ET: events from Tweets. In: The 22nd international conference on world wide web (WWW), Rio de Janeiro

  • Petrovic S, Osborne M, Lavrenko V (2010) Streaming first story detection with application to Twitter. In: The 11th annual conference of the North American chapter of the association for computational linguistics, Los Angeles, CA

  • Rattenbury T, Good N, Naaman M (2007) Towards automatic extraction of event and place semantics from Flickr tags. In: ACM SIGIR conference on research and development on information retrieval, Amsterdam

  • Reuter T, Papadopoulos S, Petkos G, Mezaris V, Kompatsiaris Y, Cimiano P, de Vries C, Geva S (2013) Social event detection at mediaeval 2013: challenges, datasets, and evaluation. In: Mediaeval benchmarking initiative for multimedia evaluation (MediaEval) 2013 workshop, Barcelona

  • Ronhovde P, Chakrabarty S, Hu D, Sahu M, Sahu KK, Kelton KF, Mauro NA, Nussinov Z (2011) Detecting hidden spatial and spatio-temporal structures in glasses and complex physical systems by multiresolution network clustering. Eur Phys J E 34:105

    Article  Google Scholar 

  • Ronhovde P, Chakrabarty S, Hu D, Sahu M, Sahu KK, Kelton KF, Mauro NA, Nussinov Z (2012) Detection of hidden structures for arbitrary scales in complex physical systems. Sci Rep 2:329

    Article  Google Scholar 

  • Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. In: The 19th international conference on world wide web (WWW), Raleigh, NC

  • Sankaranarayanan J, Samet H, Teitler BE, Lieberman MD, Sperling J (2009) TwitterStand: news in Tweets. In: The 17th ACM SIGSPATIAL international conference on advances in geographic information systems, Seattle, WA

  • Sayyadi H, Hurst M, Maykov A (2009) Event detection and tracking in social streams. In: The third international AAAI conference on weblogs and social media (ICWSM), San Jose, CA

  • Sheikholeslami G, Chatterjee S, Zhang A (2000) WaveCluster: a multi-resolution clustering approach for very large spatial databases. Int J Very Large Data Bases 8(3–4):289–304

    Article  Google Scholar 

  • Sugitani T, Shirakawa M, Hara T, Nishio S (2013) Detecting local events by analyzing spatiotemporal locality of Tweets. In: The 27th international conference on advanced information networking and applications workshops (WAINA), Barcelona

  • Thom D, Bosch H, Koch S, Woerner M, Ertl T (2012) Spatiotemporal anomaly detection through visual analysis of geolocated Twitter messages. In: 2012 IEEE Pacific visualization symposium (PacificVis), Songdo

  • Tremblay N, Borgnat P (2012) Multiscale community mining in networks using spectral graph wavelets. arXiv:1212.0689

  • von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    Article  MathSciNet  Google Scholar 

  • Walther M, Kaisser M (2013) Geo-spatial event detection in the Twitter stream. In: The 35th European conference on information retrieval (ECIR), Moscow

  • Weng J, Lee BS (2011) Event detection in Twitter. In: The fifth international AAAI conference on weblogs and social media (ICWSM), Barcelona

  • Witkin A (1983) Scale space filtering. In: International joint conference on artificial intelligence (IJCAI), Karlsruhe

  • Zaharieva M, Zeppelzauer M, Breiteneder C (2013) Automated social event detection in large photo collections. In: ACM international conference on multimedia retrieval, Dallas, TX

  • Zeimpekis D, Gallopoulos E (2006) TMG: a MATLAB toolbox for generating term-document matrices from text collections. In: Kogan J, Nicholas C, and Teboulle M (eds) Grouping multidimensional data: recent advances in clustering. pp 187–210

Download references

Acknowledgments

X. Dong is supported by a Swiss National Science Foundation Mobility Fellowship. This work was done while X. Dong and D. Mavroeidis were at IBM Research - Ireland.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaowen Dong.

Additional information

Responsible editors: Joao Gama, Indre Zliobaite, Alipio Jorge, and Concha Bielza.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, X., Mavroeidis, D., Calabrese, F. et al. Multiscale event detection in social media. Data Min Knowl Disc 29, 1374–1405 (2015). https://doi.org/10.1007/s10618-015-0421-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-015-0421-2

Keywords

Navigation