Abstract
Given a large-scale and high-order tensor, how can we detect dense subtensors in it? Can we spot them in near-linear time but with quality guarantees? Extensive previous work has shown that dense subtensors, as well as dense subgraphs, indicate anomalous or fraudulent behavior (e.g., lockstep behavior in social networks). However, available algorithms for detecting dense subtensors are not satisfactory in terms of speed, accuracy, and flexibility. In this work, we propose two algorithms, called M-Zoom and M-Biz, for fast and accurate dense-subtensor detection with various density measures. M-Zoom gives a lower bound on the density of detected subtensors, while M-Biz guarantees the local optimality of detected subtensors. M-Zoom and M-Biz can be combined, giving the following advantages: (1) Scalable: scale near-linearly with all aspects of tensors and are up to 114× faster than state-of-the-art methods with similar accuracy, (2) Provably accurate: provide a guarantee on the lowest density and local optimality of the subtensors they find, (3) Flexible: support multi-subtensor detection and size bounds as well as diverse density measures, and (4) Effective: successfully detected edit wars and bot activities in Wikipedia, and spotted network attacks from a TCP dump with near-perfect accuracy (AUC = 0.98).
- Leman Akoglu, Mary McGlohon, and Christos Faloutsos. 2010. Oddball: Spotting anomalies in weighted graphs. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 410--421. Google ScholarDigital Library
- Reid Andersen and Kumar Chellapilla. 2009. Finding dense subgraphs with size bounds. In Proceedings of International Workshop on Algorithms and Models for the Web-Graph. Springer, 25--37. Google ScholarDigital Library
- Brett W. Bader and Tamara G. Kolda. 2007. Efficient MATLAB computations with sparse and factored tensors. SIAM Journal on Scientific Computing 30, 1 (2007), 205--231. Google ScholarDigital Library
- Bahman Bahmani, Ashish Goel, and Kamesh Munagala. 2014. Efficient primal-dual graph algorithms for mapreduce. In Proceedings of International Workshop on Algorithms and Models for the Web-Graph. Springer, 59--78.Google ScholarCross Ref
- Bahman Bahmani, Ravi Kumar, and Sergei Vassilvitskii. 2012. Densest subgraph in streaming and MapReduce. Proceedings of the VLDB Endowment 5, 5 (2012), 454--465. Google ScholarDigital Library
- Oana Denisa Balalau, Francesco Bonchi, T. H. Chan, Francesco Gullo, and Mauro Sozio. 2015. Finding subgraphs with maximum total density and limited overlap. In Proceedings of the 8th ACM International Conference on Web Search and Data Mining. ACM, 379--388. Google ScholarDigital Library
- James Bennett and Stan Lanning. 2007. The Netflix prize. In Proceedings of KDD Cup and Workshop, Vol. 2007. ACM, 35.Google Scholar
- Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. 2013. Copycatch: Stopping group attacks by spotting lockstep behavior in social networks. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 119--130. Google ScholarDigital Library
- Sayan Bhattacharya, Monika Henzinger, Danupon Nanongkai, and Charalampos Tsourakakis. 2015. Space-and time-efficient algorithm for maintaining dense subgraphs on one-pass dynamic streams. In Proceedings of the 47th Annual ACM Symposium on Theory of Computing. ACM, 173--182. Google ScholarDigital Library
- Loïc Cerf, Jérémy Besson, Céline Robardet, and Jean-François Boulicaut. 2008. Data peeler: Contraint-based closed pattern mining in n-ary relations. In Proceedings of the 2008 SIAM International Conference on Data Mining. SIAM, 37--48.Google ScholarCross Ref
- Moses Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In Proceedings of International Workshop on Approximation Algorithms for Combinatorial Optimization. Springer, 84--95. Google ScholarDigital Library
- Gideon Dror, Noam Koenigstein, Yehuda Koren, and Markus Weimer. 2012. The yahoo! music dataset and KDD-cup’11. In Proceedings of KDD Cup 2011. ACM, 3--18. Google ScholarDigital Library
- Alessandro Epasto, Silvio Lattanzi, and Mauro Sozio. 2015. Efficient densest subgraph computation in evolving graphs. In Proceedings of the 24th International Conference on World Wide Web. ACM, 300--310. Google ScholarDigital Library
- Esther Galbrun, Aristides Gionis, and Nikolaj Tatti. 2016. Top-k overlapping densest subgraphs. Data Mining and Knowledge Discovery 30, 5 (2016), 1134--1165. Google ScholarDigital Library
- David Gibson, Ravi Kumar, and Andrew Tomkins. 2005. Discovering large dense subgraphs in massive graphs. In Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 721--732. Google ScholarDigital Library
- Andrew V. Goldberg. 1984. Finding a Maximum Density Subgraph. University of California Berkeley, CA.Google ScholarDigital Library
- Bryan Hooi, Kijung Shin, Hyun Ah Song, Alex Beutel, Neil Shah, and Christos Faloutsos. 2017. Graph-based fraud detection in the face of camouflage. ACM Transactions on Knowledge Discovery from Data 11, 4 (2017), 44. Google ScholarDigital Library
- Dmitry I. Ignatov, Sergei O. Kuznetsov, Jonas Poelmans, and Leonid E. Zhukov. 2013. Can triconcepts become triclusters?International Journal of General Systems 42, 6 (2013), 572--593.Google Scholar
- Meng Jiang, Alex Beutel, Peng Cui, Bryan Hooi, Shiqiang Yang, and Christos Faloutsos. 2015. A general suspiciousness metric for dense blocks in multimodal data. In Proceedings of the 2015 IEEE International Conference on Data Mining (ICDM’15). IEEE, 781--786. Google ScholarDigital Library
- Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014a. Catchsync: Catching synchronized behavior in large directed graphs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 941--950. Google ScholarDigital Library
- Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014b. Inferring strange behavior from connectivity pattern in social networks. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 126--138.Google ScholarCross Ref
- Ravi Kannan and V. Vinay. 1999. Analyzing the Structure of Large Graphs. Rheinische Friedrich-Wilhelms-Universität Bonn.Google Scholar
- Samir Khuller and Barna Saha. 2009. On finding dense subgraphs. In International Colloquium on Automata, Languages, and Programming. Springer, 597--608. Google ScholarDigital Library
- Tamara G. Kolda and Brett W. Bader. 2009. Tensor decompositions and applications. SIAM Review 51, 3 (2009), 455--500. Google ScholarDigital Library
- Jérôme Kunegis. 2013. Konect: The Koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 1343--1350. Google ScholarDigital Library
- Victor E. Lee, Ning Ruan, Ruoming Jin, and Charu Aggarwal. 2010. A survey of algorithms for dense subgraph discovery. In Proceedings of Managing and Mining Graph Data. Springer, 303--336.Google ScholarCross Ref
- Koji Maruhashi, Fan Guo, and Christos Faloutsos. 2011. Multiaspectforensics: Pattern mining on large-scale heterogeneous networks with tensor analysis. In Proceedings of 2011 International Conference on Advances in Social Networks Analysis and Mining (ASONAM’11). IEEE, 203--210. Google ScholarDigital Library
- Julian McAuley, Rahul Pandey, and Jure Leskovec. 2015. Inferring networks of substitutable and complementary products. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 785--794. Google ScholarDigital Library
- Andrew McGregor, David Tench, Sofya Vorotnikova, and Hoa T. Vu. 2015. Densest subgraph in dynamic graph streams. In Proceedings of International Symposium on Mathematical Foundations of Computer Science. Springer, 472--482.Google Scholar
- Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement. ACM, 29--42. Google ScholarDigital Library
- Shashank Pandit, Duen Horng Chau, Samuel Wang, and Christos Faloutsos. 2007. Netprobe: A fast and scalable system for fraud detection in online auction networks. In Proceedings of the 16th International Conference on World Wide Web. ACM, 201--210. Google ScholarDigital Library
- B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju, and Christos Faloutsos. 2010. Eigenspokes: Surprising patterns and scalable community chipping in large graphs. In Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 435--448. Google ScholarDigital Library
- Jouni K. Seppänen and Heikki Mannila. 2004. Dense itemsets. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 683--688. Google ScholarDigital Library
- Neil Shah, Alex Beutel, Brian Gallagher, and Christos Faloutsos. 2014. Spotting suspicious link behavior with fbox: An adversarial perspective. In Proceedings of 2014 IEEE International Conference on Data Mining (ICDM’14). IEEE, 959--964. Google ScholarDigital Library
- Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos. 2016a. CoreScope: Graph mining using k-core analysis - patterns, anomalies and algorithms. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’16). IEEE, 469--478.Google ScholarCross Ref
- Kijung Shin, Bryan Hooi, and Christos Faloutsos. 2016b. M-Zoom: Fast dense-block detection in tensors with quality guarantees. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 264--280. Google ScholarDigital Library
- Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos. 2017a. D-cube: Dense-block detection in terabyte-scale tensors. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining. ACM, 681--689. Google ScholarDigital Library
- Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos. 2017b. DenseAlert: Incremental dense-subtensor detection in tensor streams. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1057--1066. Google ScholarDigital Library
- Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. 2013. Denser than the densest subgraph: Extracting optimal quasi-cliques with quality guarantees. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 104--112. Google ScholarDigital Library
Index Terms
- Fast, Accurate, and Flexible Algorithms for Dense Subtensor Mining
Recommendations
DenseAlert: Incremental Dense-Subtensor Detection in Tensor Streams
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningConsider a stream of retweet events - how can we spot fraudulent lock-step behavior in such multi-aspect data (i.e., tensors) evolving over time? Can we detect it in real time, with an accuracy guarantee? Past studies have shown that dense subtensors ...
D-Cube: Dense-Block Detection in Terabyte-Scale Tensors
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data MiningHow can we detect fraudulent lockstep behavior in large-scale multi-aspect data (i.e., tensors)? Can we detect it when data are too large to fit in memory or even on a disk? Past studies have shown that dense blocks in real-world tensors (e.g., social ...
Hierarchical Dense Pattern Detection in Tensors
Dense subtensor detection gains remarkable success in spotting anomalies and fraudulent behaviors for multi-aspect data (i.e., tensors), like in social media and event streams. Existing methods detect the densest subtensors flatly and separately, with the ...
Comments