skip to main content
research-article
Public Access

Fast, Accurate, and Flexible Algorithms for Dense Subtensor Mining

Published:23 January 2018Publication History
Skip Abstract Section

Abstract

Given a large-scale and high-order tensor, how can we detect dense subtensors in it? Can we spot them in near-linear time but with quality guarantees? Extensive previous work has shown that dense subtensors, as well as dense subgraphs, indicate anomalous or fraudulent behavior (e.g., lockstep behavior in social networks). However, available algorithms for detecting dense subtensors are not satisfactory in terms of speed, accuracy, and flexibility. In this work, we propose two algorithms, called M-Zoom and M-Biz, for fast and accurate dense-subtensor detection with various density measures. M-Zoom gives a lower bound on the density of detected subtensors, while M-Biz guarantees the local optimality of detected subtensors. M-Zoom and M-Biz can be combined, giving the following advantages: (1) Scalable: scale near-linearly with all aspects of tensors and are up to 114× faster than state-of-the-art methods with similar accuracy, (2) Provably accurate: provide a guarantee on the lowest density and local optimality of the subtensors they find, (3) Flexible: support multi-subtensor detection and size bounds as well as diverse density measures, and (4) Effective: successfully detected edit wars and bot activities in Wikipedia, and spotted network attacks from a TCP dump with near-perfect accuracy (AUC = 0.98).

References

  1. Leman Akoglu, Mary McGlohon, and Christos Faloutsos. 2010. Oddball: Spotting anomalies in weighted graphs. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 410--421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Reid Andersen and Kumar Chellapilla. 2009. Finding dense subgraphs with size bounds. In Proceedings of International Workshop on Algorithms and Models for the Web-Graph. Springer, 25--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brett W. Bader and Tamara G. Kolda. 2007. Efficient MATLAB computations with sparse and factored tensors. SIAM Journal on Scientific Computing 30, 1 (2007), 205--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bahman Bahmani, Ashish Goel, and Kamesh Munagala. 2014. Efficient primal-dual graph algorithms for mapreduce. In Proceedings of International Workshop on Algorithms and Models for the Web-Graph. Springer, 59--78.Google ScholarGoogle ScholarCross RefCross Ref
  5. Bahman Bahmani, Ravi Kumar, and Sergei Vassilvitskii. 2012. Densest subgraph in streaming and MapReduce. Proceedings of the VLDB Endowment 5, 5 (2012), 454--465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Oana Denisa Balalau, Francesco Bonchi, T. H. Chan, Francesco Gullo, and Mauro Sozio. 2015. Finding subgraphs with maximum total density and limited overlap. In Proceedings of the 8th ACM International Conference on Web Search and Data Mining. ACM, 379--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. James Bennett and Stan Lanning. 2007. The Netflix prize. In Proceedings of KDD Cup and Workshop, Vol. 2007. ACM, 35.Google ScholarGoogle Scholar
  8. Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. 2013. Copycatch: Stopping group attacks by spotting lockstep behavior in social networks. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 119--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Sayan Bhattacharya, Monika Henzinger, Danupon Nanongkai, and Charalampos Tsourakakis. 2015. Space-and time-efficient algorithm for maintaining dense subgraphs on one-pass dynamic streams. In Proceedings of the 47th Annual ACM Symposium on Theory of Computing. ACM, 173--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Loïc Cerf, Jérémy Besson, Céline Robardet, and Jean-François Boulicaut. 2008. Data peeler: Contraint-based closed pattern mining in n-ary relations. In Proceedings of the 2008 SIAM International Conference on Data Mining. SIAM, 37--48.Google ScholarGoogle ScholarCross RefCross Ref
  11. Moses Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In Proceedings of International Workshop on Approximation Algorithms for Combinatorial Optimization. Springer, 84--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gideon Dror, Noam Koenigstein, Yehuda Koren, and Markus Weimer. 2012. The yahoo! music dataset and KDD-cup’11. In Proceedings of KDD Cup 2011. ACM, 3--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Alessandro Epasto, Silvio Lattanzi, and Mauro Sozio. 2015. Efficient densest subgraph computation in evolving graphs. In Proceedings of the 24th International Conference on World Wide Web. ACM, 300--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Esther Galbrun, Aristides Gionis, and Nikolaj Tatti. 2016. Top-k overlapping densest subgraphs. Data Mining and Knowledge Discovery 30, 5 (2016), 1134--1165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. David Gibson, Ravi Kumar, and Andrew Tomkins. 2005. Discovering large dense subgraphs in massive graphs. In Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 721--732. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Andrew V. Goldberg. 1984. Finding a Maximum Density Subgraph. University of California Berkeley, CA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Bryan Hooi, Kijung Shin, Hyun Ah Song, Alex Beutel, Neil Shah, and Christos Faloutsos. 2017. Graph-based fraud detection in the face of camouflage. ACM Transactions on Knowledge Discovery from Data 11, 4 (2017), 44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Dmitry I. Ignatov, Sergei O. Kuznetsov, Jonas Poelmans, and Leonid E. Zhukov. 2013. Can triconcepts become triclusters?International Journal of General Systems 42, 6 (2013), 572--593.Google ScholarGoogle Scholar
  19. Meng Jiang, Alex Beutel, Peng Cui, Bryan Hooi, Shiqiang Yang, and Christos Faloutsos. 2015. A general suspiciousness metric for dense blocks in multimodal data. In Proceedings of the 2015 IEEE International Conference on Data Mining (ICDM’15). IEEE, 781--786. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014a. Catchsync: Catching synchronized behavior in large directed graphs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 941--950. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014b. Inferring strange behavior from connectivity pattern in social networks. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 126--138.Google ScholarGoogle ScholarCross RefCross Ref
  22. Ravi Kannan and V. Vinay. 1999. Analyzing the Structure of Large Graphs. Rheinische Friedrich-Wilhelms-Universität Bonn.Google ScholarGoogle Scholar
  23. Samir Khuller and Barna Saha. 2009. On finding dense subgraphs. In International Colloquium on Automata, Languages, and Programming. Springer, 597--608. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Tamara G. Kolda and Brett W. Bader. 2009. Tensor decompositions and applications. SIAM Review 51, 3 (2009), 455--500. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jérôme Kunegis. 2013. Konect: The Koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 1343--1350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Victor E. Lee, Ning Ruan, Ruoming Jin, and Charu Aggarwal. 2010. A survey of algorithms for dense subgraph discovery. In Proceedings of Managing and Mining Graph Data. Springer, 303--336.Google ScholarGoogle ScholarCross RefCross Ref
  27. Koji Maruhashi, Fan Guo, and Christos Faloutsos. 2011. Multiaspectforensics: Pattern mining on large-scale heterogeneous networks with tensor analysis. In Proceedings of 2011 International Conference on Advances in Social Networks Analysis and Mining (ASONAM’11). IEEE, 203--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Julian McAuley, Rahul Pandey, and Jure Leskovec. 2015. Inferring networks of substitutable and complementary products. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 785--794. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Andrew McGregor, David Tench, Sofya Vorotnikova, and Hoa T. Vu. 2015. Densest subgraph in dynamic graph streams. In Proceedings of International Symposium on Mathematical Foundations of Computer Science. Springer, 472--482.Google ScholarGoogle Scholar
  30. Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement. ACM, 29--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Shashank Pandit, Duen Horng Chau, Samuel Wang, and Christos Faloutsos. 2007. Netprobe: A fast and scalable system for fraud detection in online auction networks. In Proceedings of the 16th International Conference on World Wide Web. ACM, 201--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju, and Christos Faloutsos. 2010. Eigenspokes: Surprising patterns and scalable community chipping in large graphs. In Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 435--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jouni K. Seppänen and Heikki Mannila. 2004. Dense itemsets. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 683--688. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Neil Shah, Alex Beutel, Brian Gallagher, and Christos Faloutsos. 2014. Spotting suspicious link behavior with fbox: An adversarial perspective. In Proceedings of 2014 IEEE International Conference on Data Mining (ICDM’14). IEEE, 959--964. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos. 2016a. CoreScope: Graph mining using k-core analysis - patterns, anomalies and algorithms. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’16). IEEE, 469--478.Google ScholarGoogle ScholarCross RefCross Ref
  36. Kijung Shin, Bryan Hooi, and Christos Faloutsos. 2016b. M-Zoom: Fast dense-block detection in tensors with quality guarantees. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 264--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos. 2017a. D-cube: Dense-block detection in terabyte-scale tensors. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining. ACM, 681--689. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos. 2017b. DenseAlert: Incremental dense-subtensor detection in tensor streams. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1057--1066. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. 2013. Denser than the densest subgraph: Extracting optimal quasi-cliques with quality guarantees. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 104--112. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fast, Accurate, and Flexible Algorithms for Dense Subtensor Mining

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Knowledge Discovery from Data
        ACM Transactions on Knowledge Discovery from Data  Volume 12, Issue 3
        June 2018
        360 pages
        ISSN:1556-4681
        EISSN:1556-472X
        DOI:10.1145/3178546
        Issue’s Table of Contents

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 January 2018
        • Accepted: 1 October 2017
        • Revised: 1 June 2017
        • Received: 1 December 2016
        Published in tkdd Volume 12, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader