ABSTRACT
Anomalous subgraph detection has been successfully applied to event detection in social media. However, the subgraph detection problembecomes challenging when the social media network incorporates abundant attributes, which leads to a multivariate network. The multivariate characteristic makes most existing methods incapable to tackle this problem effectively and efficiently, as it involves joint feature selection and subgraph detection that has not been well addressed in the current literature, especially, in the dynamic multivariate networks in which attributes evolve over time.
This paper presents a generic framework, namely dynamic multivariate evolving anomalous subgraphs scanning (DMGraphScan), to addressthis problem in dynamic multivariate social media networks. We generalize traditional nonparametric statistics, and propose a new class of scan statistic functions for measuring the joint significance of evolving subgraphs and subsets of attributes to indicate the ongoing or forthcoming event in dynamic multivariate networks. We reformulate each scan statistic function as a sequence of subproblems with provable guarantees, and then propose an efficient approximation algorithm for tackling each subproblem. This algorithm resorts to the Lagrangian relaxation and a dynamic programming based on tree-shaped priors. As a case study, we conduct extensive experiments to demonstrate the performance of our proposed approach on two real-world applications (flu outbreak detection, haze detection) in different domains.
- L. Akoglu, H. Tong, and D. Koutra. Graph based anomaly detection and description: a survey. Data Mining and Knowledge Discovery, 29(3):626--688, 2015. Google ScholarDigital Library
- R. H. Berk and D. H. Jones. Goodness-of-fit test statistics that dominate the kolmogorov statistics. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 47(1):47--59, 1979.Google Scholar
- H. S. Burkom. Biosurveillance applying scan statistics with multiple, disparate data sources. Journal of Urban Health, 80(1):i57--i65, 2003.Google Scholar
- F. Chen and D. B. Neill. Non-parametric scan statistics for disease outbreak detection on twitter. Online journal of public health informatics, 6(1):e155, 2014a.Google Scholar
- F. Chen and D. B. Neill. Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1166--1175, 2014b. Google ScholarDigital Library
- F. Chen and D. B. Neill. Human rights event detection from heterogeneous social media graphs. Big Data, 3(1):34--40, 2015.Google ScholarCross Ref
- X. Dong, D. Mavroeidis, F. Calabrese, and P. Frossard. Multiscale event detection in social media. Data Mining and Knowledge Discovery, 29(5):1374--1405, 2015. Google ScholarDigital Library
- A. Gionis, M. Mathioudakis, and A. Ukkonen. Bump hunting in the dark: Local discrepancy maximization on graphs. IEEE Transactions on Knowledge and Data Engineering, 2016. Google ScholarDigital Library
- M. Kulldorff, F. Mostashari, L. Duczmal, W. Katherine Yih, K. Kleinman, and R. Platt. Multivariate scan statistics for disease surveillance. Statistics in medicine, 26(8):1824--1833, 2007.Google ScholarCross Ref
- T. Lappas, M. R. Vieira, D. Gunopulos, and V. J. Tsotras. On the spatiotemporal burstiness of terms. Proceedings of the VLDB Endowment, 5(9):836--847, 2012. Google ScholarDigital Library
- J. Li, J. Wen, Z. Tai, R. Zhang, and W. Yu. Bursty event detection from microblog: a distributed and incremental approach. Concurrency and Computation: Practice and Experience, 2015. Google ScholarDigital Library
- E. McFowland, S. Speakman, and D. B. Neill. Fast generalized subset scan for anomalous pattern detection. Journal of Machine Learning Research, 14(1):1533--1561, 2013. Google ScholarDigital Library
- M. Mongiovi, P. Bogdanov, R. Ranca, E. E. Papalexakis, C. Faloutsos, and A. K. Singh. Netspot: Spotting significant anomalous regions on dynamic networks. In Proceedings of the 2013 SIAM International Conference on Data Mining, pages 28--36, 2013.Google ScholarCross Ref
- P. Rozenshtein, A. Anagnostopoulos, A. Gionis, and N. Tatti. Event detection in activity networks. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1176--1185, 2014. Google ScholarDigital Library
- T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World Wide Web, pages 851--860, 2010. Google ScholarDigital Library
- N. Wu, F. Chen, J. Li, B. Zhou, and N. Ramakrishnan. Efficient nonparametric subgraph detection using tree shaped priors. In AAAI, pages 1352--1358, 2016. Google ScholarDigital Library
- Z. Yin, L. Cao, J. Han, C. Zhai, and T. Huang. Geographical topic discovery and comparison. In Proceedings of the 20th international conference on World Wide Web, pages 247--256, 2011. Google ScholarDigital Library
Index Terms
- An Efficient Approach to Event Detection and Forecasting in Dynamic Multivariate Social Media Networks
Recommendations
Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data miningEvent detection in social media is an important but challenging problem. Most existing approaches are based on burst detection, topic modeling, or clustering techniques, which cannot naturally model the implicit heterogeneous network structure in social ...
College students social media use and communication network heterogeneity
This study examined whether and how the usage of social media can influence college students' level of network heterogeneity and how network heterogeneity is associated with levels of bridging/bonding social capital and subjective well-being. In ...
Uses and gratifications of social networking sites for bridging and bonding social capital
Applying uses and gratifications theory (UGT) and social capital theory, our study examined users of four social networking sites (SNSs) (Facebook, Twitter, Instagram, and Snapchat), and their influence on online bridging and bonding social capital. ...
Comments