skip to main content
research-article
Public Access

Catching Synchronized Behaviors in Large Networks: A Graph Mining Approach

Published:29 June 2016Publication History
Skip Abstract Section

Abstract

Given a directed graph of millions of nodes, how can we automatically spot anomalous, suspicious nodes judging only from their connectivity patterns? Suspicious graph patterns show up in many applications, from Twitter users who buy fake followers, manipulating the social network, to botnet members performing distributed denial of service attacks, disturbing the network traffic graph. We propose a fast and effective method, CatchSync, which exploits two of the tell-tale signs left in graphs by fraudsters: (a) synchronized behavior: suspicious nodes have extremely similar behavior patterns because they are often required to perform some task together (such as follow the same user); and (b) rare behavior: their connectivity patterns are very different from the majority. We introduce novel measures to quantify both concepts (“synchronicity” and “normality”) and we propose a parameter-free algorithm that works on the resulting synchronicity-normality plots. Thanks to careful design, CatchSync has the following desirable properties: (a) it is scalable to large datasets, being linear in the graph size; (b) it is parameter free; and (c) it is side-information-oblivious: it can operate using only the topology, without needing labeled data, nor timing information, and the like., while still capable of using side information if available. We applied CatchSync on three large, real datasets, 1-billion-edge Twitter social graph, 3-billion-edge, and 12-billion-edge Tencent Weibo social graphs, and several synthetic ones; CatchSync consistently outperforms existing competitors, both in detection accuracy by 36% on Twitter and 20% on Tencent Weibo, as well as in speed.

References

  1. Charu C. Aggarwal. 2011. An Introduction to Social Network Data Analytics. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Leman Akoglu, Mary McGlohon, and Christos Faloutsos. 2010. Oddball: Spotting anomalies in weighted graphs. In Advances in Knowledge Discovery and Data Mining. Springer, 410--421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. 2013. CopyCatch: Stopping group attacks by spotting lockstep behavior in social networks. In Proceedings of the 22nd International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, IW3C2, 119--130 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. 2000. Graph structure in the web. Comput. Netw. 33, 1 (2000), 309--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Qiang Cao, Michael Sirivianos, Xiaowei Yang, and Tiago Pregueiro. 2012. Aiding the detection of fake accounts in large scale social online services. In NSDI. USENIX, 197--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Deepayan Chakrabarti. 2004. Autopart: Parameter-free graph partitioning and outlier detection. In Knowledge Discovery in Databases: PKDD 2004. Springer, 112--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Comput. Surv. (CSUR) 41, 3 (2009), 15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Duen Horng Chau, Shashank Pandit, and Christos Faloutsos. 2006. Detecting fraudulent personalities in networks of online auctioneers. In Knowledge Discovery in Databases: PKDD 2006. Springer, 103--114.Google ScholarGoogle Scholar
  9. Chen Chen, Xifeng Yan, Feida Zhu, Jiawei Han, and S. Yu Philip. 2009. Graph OLAP: A multi-dimensional framework for graph data analysis. Knowl. Inf. Syst. 21, 1 (2009), 41--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jie Chen and Yousef Saad. 2012. Dense subgraph extraction with application to community detection. IEEE Trans. Knowl. Data Eng. 24, 7 (2012), 1216--1230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fan Chung and Linyuan Lu. 2002. The average distances in random graphs with given expected degrees. Proc. National Acad. Sci. 99, 25 (2002), 15879--15882.Google ScholarGoogle ScholarCross RefCross Ref
  12. Diane J. Cook and Lawrence B. Holder. 2006. Mining Graph Data. John Wiley & Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Emiliano De Cristofaro, Arik Friedman, Guillaume Jourjon, Mohamed Ali Kaafar, and M. Zubair Shafiq. 2014. Paying for likes? Understanding facebook like fraud using honeypots. In Proceedings of the 2014 Conference on Internet Measurement Conference. ACM, 129--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. William Eberle and Lawrence Holder. 2007. Discovering structural anomalies in graph-based data. In 7th IEEE International Conference on Data Mining Workshops, 2007 (ICDM Workshops 2007). IEEE, 393--398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. 1999. On power-law relationships of the internet topology. In ACM SIGCOMM Computer Communication Review, Vol. 29. ACM, 251--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recognit. Lett. 27, 8 (2006), 861--874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Santo Fortunato. 2010. Community detection in graphs. Phys. Rep. 486, 3 (2010), 75--174.Google ScholarGoogle ScholarCross RefCross Ref
  18. Christos Giatsidis, Dimitrios M. Thilikos, and Michalis Vazirgiannis. 2011. D-cores: Measuring collaboration of directed graphs based on degeneracy. In IEEE 11th International Conference on Data Mining (ICDM 2011). IEEE, 201--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Xia Hu, Jiliang Tang, Yanchao Zhang, and Huan Liu. 2013. Social spammer detection in microblogging. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence. AAAI Press, 2633--2639. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014a. CatchSync: Catching synchronized behavior in large directed graphs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 941--950. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014b. Detecting suspicious following behavior in multimillion-node social networks. In Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 305--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014c. Inferring strange behavior from connectivity pattern in social networks. In Advances in Knowledge Discovery and Data Mining. Springer, 126--138.Google ScholarGoogle Scholar
  23. George Karypis and Vipin Kumar. 1995. Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0. (1995).Google ScholarGoogle Scholar
  24. Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46, 5 (1999), 604--632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ravi Kumar, Jasmine Novak, and Andrew Tomkins. 2010. Structure and evolution of online social networks. In Link Mining: Models, Algorithms, and Applications. Springer, 337--357.Google ScholarGoogle Scholar
  26. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web. ACM, 591--600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kyumin Lee, James Caverlee, and Steve Webb. 2010. Uncovering social spammers: Social honeypots+ machine learning. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 435--442. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han, and S. Yu Philip. 2005. Mining behavior graphs for” backtrace” of noncrashing bugs. In SDM. SIAM, 286--297.Google ScholarGoogle Scholar
  29. H. D. K. Moonesinghe and Pang-Ning Tan. 2008. Outrank: A graph-based outlier detection framework using random walk. Int. J. Artif. Intell. Tools 17, 01 (2008), 19--36.Google ScholarGoogle ScholarCross RefCross Ref
  30. Caleb C. Noble and Diane J. Cook. 2003. Graph-based anomaly detection. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 631--636. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Shashank Pandit, Duen Horng Chau, Samuel Wang, and Christos Faloutsos. 2007. Netprobe: A fast and scalable system for fraud detection in online auction networks. In Proceedings of the 16th International Conference on World Wide Web. ACM, 201--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jian Pei, Daxin Jiang, and Aidong Zhang. 2005. On mining cross-graph quasi-cliques. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM, 228--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Charles Perez, Marc Lemercier, Babiga Birregah, and Alain Corpel. 2011. Spot 1.0: Scoring suspicious profiles on twitter. In International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2011). IEEE, 377--381. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju, and Christos Faloutsos. 2010. Eigenspokes: Surprising patterns and scalable community chipping in large graphs. In Advances in Knowledge Discovery and Data Mining. Springer, 435--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Shashi Shekhar, Chang-Tien Lu, and Pusheng Zhang. 2001. Detecting graph-based spatial outliers: Algorithms and applications (a summary of results). In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 371--376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna. 2010. Detecting spammers on social networks. In Proceedings of the 26th Annual Computer Security Applications Conference. ACM, 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, and Christos Faloutsos. 2005. Neighborhood formation and anomaly detection in bipartite graphs. In 5th IEEE International Conference on Data Mining. IEEE, 418--425. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. David M. J. Tax and Robert P. W. Duin. 1998. Outlier detection using classifier instability. In Advances in Pattern Recognition. Springer, 593--601. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. 2013. Denser than the densest subgraph: Extracting optimal quasi-cliques with quality guarantees. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 104--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Xifeng Yan and Jiawei Han. 2003. CloseGraph: Mining closed frequent graph patterns. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 286--295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Zhaonian Zou, Jianzhong Li, Hong Gao, and Shuo Zhang. 2010. Mining frequent subgraph patterns from uncertain graph data. IEEE Trans. Knowl. Data Eng. 22, 9 (2010), 1203--1218. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Catching Synchronized Behaviors in Large Networks: A Graph Mining Approach

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Knowledge Discovery from Data
          ACM Transactions on Knowledge Discovery from Data  Volume 10, Issue 4
          Special Issue on SIGKDD 2014, Special Issue on BIGCHAT and Regular Papers
          July 2016
          417 pages
          ISSN:1556-4681
          EISSN:1556-472X
          DOI:10.1145/2936311
          Issue’s Table of Contents

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 29 June 2016
          • Revised: 1 March 2015
          • Accepted: 1 March 2015
          • Received: 1 September 2014
          Published in tkdd Volume 10, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader