Fast, Accurate, and Flexible Algorithms for Dense Subtensor Mining

Authors:
Kijung Shin

Carnegie Mellon University, Forbes Avenue, Pittsburgh, PA

Carnegie Mellon University, Forbes Avenue, Pittsburgh, PA

0000-0002-2872-1526
View Profile

,
Bryan Hooi

Carnegie Mellon University, Forbes Avenue, Pittsburgh, PA

Carnegie Mellon University, Forbes Avenue, Pittsburgh, PA

0000-0002-5645-1754
View Profile

,
Christos Faloutsos

Carnegie Mellon University, Forbes Avenue, Pittsburgh, PA

Carnegie Mellon University, Forbes Avenue, Pittsburgh, PA
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 12 Issue 3Article No.: 28pp 1–30https://doi.org/10.1145/3154414

Published:23 January 2018Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Given a large-scale and high-order tensor, how can we detect dense subtensors in it? Can we spot them in near-linear time but with quality guarantees? Extensive previous work has shown that dense subtensors, as well as dense subgraphs, indicate anomalous or fraudulent behavior (e.g., lockstep behavior in social networks). However, available algorithms for detecting dense subtensors are not satisfactory in terms of speed, accuracy, and flexibility. In this work, we propose two algorithms, called M-Zoom and M-Biz, for fast and accurate dense-subtensor detection with various density measures. M-Zoom gives a lower bound on the density of detected subtensors, while M-Biz guarantees the local optimality of detected subtensors. M-Zoom and M-Biz can be combined, giving the following advantages: (1) Scalable: scale near-linearly with all aspects of tensors and are up to 114× faster than state-of-the-art methods with similar accuracy, (2) Provably accurate: provide a guarantee on the lowest density and local optimality of the subtensors they find, (3) Flexible: support multi-subtensor detection and size bounds as well as diverse density measures, and (4) Effective: successfully detected edit wars and bot activities in Wikipedia, and spotted network attacks from a TCP dump with near-perfect accuracy (AUC = 0.98).

References

Leman Akoglu, Mary McGlohon, and Christos Faloutsos. 2010. Oddball: Spotting anomalies in weighted graphs. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 410--421. Google ScholarDigital Library
Reid Andersen and Kumar Chellapilla. 2009. Finding dense subgraphs with size bounds. In Proceedings of International Workshop on Algorithms and Models for the Web-Graph. Springer, 25--37. Google ScholarDigital Library
Brett W. Bader and Tamara G. Kolda. 2007. Efficient MATLAB computations with sparse and factored tensors. SIAM Journal on Scientific Computing 30, 1 (2007), 205--231. Google ScholarDigital Library
Bahman Bahmani, Ashish Goel, and Kamesh Munagala. 2014. Efficient primal-dual graph algorithms for mapreduce. In Proceedings of International Workshop on Algorithms and Models for the Web-Graph. Springer, 59--78.Google ScholarCross Ref
Bahman Bahmani, Ravi Kumar, and Sergei Vassilvitskii. 2012. Densest subgraph in streaming and MapReduce. Proceedings of the VLDB Endowment 5, 5 (2012), 454--465. Google ScholarDigital Library
Oana Denisa Balalau, Francesco Bonchi, T. H. Chan, Francesco Gullo, and Mauro Sozio. 2015. Finding subgraphs with maximum total density and limited overlap. In Proceedings of the 8th ACM International Conference on Web Search and Data Mining. ACM, 379--388. Google ScholarDigital Library
James Bennett and Stan Lanning. 2007. The Netflix prize. In Proceedings of KDD Cup and Workshop, Vol. 2007. ACM, 35.Google Scholar
Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. 2013. Copycatch: Stopping group attacks by spotting lockstep behavior in social networks. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 119--130. Google ScholarDigital Library
Sayan Bhattacharya, Monika Henzinger, Danupon Nanongkai, and Charalampos Tsourakakis. 2015. Space-and time-efficient algorithm for maintaining dense subgraphs on one-pass dynamic streams. In Proceedings of the 47th Annual ACM Symposium on Theory of Computing. ACM, 173--182. Google ScholarDigital Library
Loïc Cerf, Jérémy Besson, Céline Robardet, and Jean-François Boulicaut. 2008. Data peeler: Contraint-based closed pattern mining in n-ary relations. In Proceedings of the 2008 SIAM International Conference on Data Mining. SIAM, 37--48.Google ScholarCross Ref
Moses Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In Proceedings of International Workshop on Approximation Algorithms for Combinatorial Optimization. Springer, 84--95. Google ScholarDigital Library
Gideon Dror, Noam Koenigstein, Yehuda Koren, and Markus Weimer. 2012. The yahoo&excl; music dataset and KDD-cup’11. In Proceedings of KDD Cup 2011. ACM, 3--18. Google ScholarDigital Library
Alessandro Epasto, Silvio Lattanzi, and Mauro Sozio. 2015. Efficient densest subgraph computation in evolving graphs. In Proceedings of the 24th International Conference on World Wide Web. ACM, 300--310. Google ScholarDigital Library
Esther Galbrun, Aristides Gionis, and Nikolaj Tatti. 2016. Top-k overlapping densest subgraphs. Data Mining and Knowledge Discovery 30, 5 (2016), 1134--1165. Google ScholarDigital Library
David Gibson, Ravi Kumar, and Andrew Tomkins. 2005. Discovering large dense subgraphs in massive graphs. In Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 721--732. Google ScholarDigital Library
Andrew V. Goldberg. 1984. Finding a Maximum Density Subgraph. University of California Berkeley, CA.Google ScholarDigital Library
Bryan Hooi, Kijung Shin, Hyun Ah Song, Alex Beutel, Neil Shah, and Christos Faloutsos. 2017. Graph-based fraud detection in the face of camouflage. ACM Transactions on Knowledge Discovery from Data 11, 4 (2017), 44. Google ScholarDigital Library
Dmitry I. Ignatov, Sergei O. Kuznetsov, Jonas Poelmans, and Leonid E. Zhukov. 2013. Can triconcepts become triclusters?International Journal of General Systems 42, 6 (2013), 572--593.Google Scholar
Meng Jiang, Alex Beutel, Peng Cui, Bryan Hooi, Shiqiang Yang, and Christos Faloutsos. 2015. A general suspiciousness metric for dense blocks in multimodal data. In Proceedings of the 2015 IEEE International Conference on Data Mining (ICDM’15). IEEE, 781--786. Google ScholarDigital Library
Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014a. Catchsync: Catching synchronized behavior in large directed graphs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 941--950. Google ScholarDigital Library
Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014b. Inferring strange behavior from connectivity pattern in social networks. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 126--138.Google ScholarCross Ref
Ravi Kannan and V. Vinay. 1999. Analyzing the Structure of Large Graphs. Rheinische Friedrich-Wilhelms-Universität Bonn.Google Scholar
Samir Khuller and Barna Saha. 2009. On finding dense subgraphs. In International Colloquium on Automata, Languages, and Programming. Springer, 597--608. Google ScholarDigital Library
Tamara G. Kolda and Brett W. Bader. 2009. Tensor decompositions and applications. SIAM Review 51, 3 (2009), 455--500. Google ScholarDigital Library
Jérôme Kunegis. 2013. Konect: The Koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 1343--1350. Google ScholarDigital Library
Victor E. Lee, Ning Ruan, Ruoming Jin, and Charu Aggarwal. 2010. A survey of algorithms for dense subgraph discovery. In Proceedings of Managing and Mining Graph Data. Springer, 303--336.Google ScholarCross Ref
Koji Maruhashi, Fan Guo, and Christos Faloutsos. 2011. Multiaspectforensics: Pattern mining on large-scale heterogeneous networks with tensor analysis. In Proceedings of 2011 International Conference on Advances in Social Networks Analysis and Mining (ASONAM’11). IEEE, 203--210. Google ScholarDigital Library
Julian McAuley, Rahul Pandey, and Jure Leskovec. 2015. Inferring networks of substitutable and complementary products. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 785--794. Google ScholarDigital Library
Andrew McGregor, David Tench, Sofya Vorotnikova, and Hoa T. Vu. 2015. Densest subgraph in dynamic graph streams. In Proceedings of International Symposium on Mathematical Foundations of Computer Science. Springer, 472--482.Google Scholar
Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement. ACM, 29--42. Google ScholarDigital Library
Shashank Pandit, Duen Horng Chau, Samuel Wang, and Christos Faloutsos. 2007. Netprobe: A fast and scalable system for fraud detection in online auction networks. In Proceedings of the 16th International Conference on World Wide Web. ACM, 201--210. Google ScholarDigital Library
B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju, and Christos Faloutsos. 2010. Eigenspokes: Surprising patterns and scalable community chipping in large graphs. In Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 435--448. Google ScholarDigital Library
Jouni K. Seppänen and Heikki Mannila. 2004. Dense itemsets. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 683--688. Google ScholarDigital Library
Neil Shah, Alex Beutel, Brian Gallagher, and Christos Faloutsos. 2014. Spotting suspicious link behavior with fbox: An adversarial perspective. In Proceedings of 2014 IEEE International Conference on Data Mining (ICDM’14). IEEE, 959--964. Google ScholarDigital Library
Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos. 2016a. CoreScope: Graph mining using k-core analysis - patterns, anomalies and algorithms. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’16). IEEE, 469--478.Google ScholarCross Ref
Kijung Shin, Bryan Hooi, and Christos Faloutsos. 2016b. M-Zoom: Fast dense-block detection in tensors with quality guarantees. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 264--280. Google ScholarDigital Library
Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos. 2017a. D-cube: Dense-block detection in terabyte-scale tensors. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining. ACM, 681--689. Google ScholarDigital Library
Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos. 2017b. DenseAlert: Incremental dense-subtensor detection in tensor streams. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1057--1066. Google ScholarDigital Library
Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. 2013. Denser than the densest subgraph: Extracting optimal quasi-cliques with quality guarantees. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 104--112. Google ScholarDigital Library

Index Terms

Fast, Accurate, and Flexible Algorithms for Dense Subtensor Mining
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Anomaly detection
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

DenseAlert: Incremental Dense-Subtensor Detection in Tensor Streams
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Consider a stream of retweet events - how can we spot fraudulent lock-step behavior in such multi-aspect data (i.e., tensors) evolving over time? Can we detect it in real time, with an accuracy guarantee? Past studies have shown that dense subtensors ...
Read More
D-Cube: Dense-Block Detection in Terabyte-Scale Tensors
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

How can we detect fraudulent lockstep behavior in large-scale multi-aspect data (i.e., tensors)? Can we detect it when data are too large to fit in memory or even on a disk? Past studies have shown that dense blocks in real-world tensors (e.g., social ...
Read More
Hierarchical Dense Pattern Detection in Tensors
Dense subtensor detection gains remarkable success in spotting anomalies and fraudulent behaviors for multi-aspect data (i.e., tensors), like in social media and event streams. Existing methods detect the densest subtensors flatly and separately, with the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Knowledge Discovery from Data Volume 12, Issue 3
June 2018
360 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3178546
Editors:
Charu Aggarwal
IBM T. J. Watson Research, USA
,
Xindong Wu
University of Louisiana at Lafayette, USA
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 January 2018
- Accepted: 1 October 2017
- Revised: 1 June 2017
- Received: 1 December 2016
Published in tkdd Volume 12, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Tensor
anomaly detection
dense subtensor
fraud detection
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 18
  Total Citations
  View Citations
- 471
  Total Downloads
- Downloads (Last 12 months)34
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fast, Accurate, and Flexible Algorithms for Dense Subtensor Mining

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

DenseAlert: Incremental Dense-Subtensor Detection in Tensor Streams

D-Cube: Dense-Block Detection in Terabyte-Scale Tensors

Hierarchical Dense Pattern Detection in Tensors