Abstract
Given a directed graph of millions of nodes, how can we automatically spot anomalous, suspicious nodes judging only from their connectivity patterns? Suspicious graph patterns show up in many applications, from Twitter users who buy fake followers, manipulating the social network, to botnet members performing distributed denial of service attacks, disturbing the network traffic graph. We propose a fast and effective method, CatchSync, which exploits two of the tell-tale signs left in graphs by fraudsters: (a) synchronized behavior: suspicious nodes have extremely similar behavior patterns because they are often required to perform some task together (such as follow the same user); and (b) rare behavior: their connectivity patterns are very different from the majority. We introduce novel measures to quantify both concepts (“synchronicity” and “normality”) and we propose a parameter-free algorithm that works on the resulting synchronicity-normality plots. Thanks to careful design, CatchSync has the following desirable properties: (a) it is scalable to large datasets, being linear in the graph size; (b) it is parameter free; and (c) it is side-information-oblivious: it can operate using only the topology, without needing labeled data, nor timing information, and the like., while still capable of using side information if available. We applied CatchSync on three large, real datasets, 1-billion-edge Twitter social graph, 3-billion-edge, and 12-billion-edge Tencent Weibo social graphs, and several synthetic ones; CatchSync consistently outperforms existing competitors, both in detection accuracy by 36% on Twitter and 20% on Tencent Weibo, as well as in speed.
- Charu C. Aggarwal. 2011. An Introduction to Social Network Data Analytics. Springer. Google ScholarDigital Library
- Leman Akoglu, Mary McGlohon, and Christos Faloutsos. 2010. Oddball: Spotting anomalies in weighted graphs. In Advances in Knowledge Discovery and Data Mining. Springer, 410--421. Google ScholarDigital Library
- Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. 2013. CopyCatch: Stopping group attacks by spotting lockstep behavior in social networks. In Proceedings of the 22nd International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, IW3C2, 119--130 Google ScholarDigital Library
- Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. 2000. Graph structure in the web. Comput. Netw. 33, 1 (2000), 309--320. Google ScholarDigital Library
- Qiang Cao, Michael Sirivianos, Xiaowei Yang, and Tiago Pregueiro. 2012. Aiding the detection of fake accounts in large scale social online services. In NSDI. USENIX, 197--210. Google ScholarDigital Library
- Deepayan Chakrabarti. 2004. Autopart: Parameter-free graph partitioning and outlier detection. In Knowledge Discovery in Databases: PKDD 2004. Springer, 112--124. Google ScholarDigital Library
- Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Comput. Surv. (CSUR) 41, 3 (2009), 15. Google ScholarDigital Library
- Duen Horng Chau, Shashank Pandit, and Christos Faloutsos. 2006. Detecting fraudulent personalities in networks of online auctioneers. In Knowledge Discovery in Databases: PKDD 2006. Springer, 103--114.Google Scholar
- Chen Chen, Xifeng Yan, Feida Zhu, Jiawei Han, and S. Yu Philip. 2009. Graph OLAP: A multi-dimensional framework for graph data analysis. Knowl. Inf. Syst. 21, 1 (2009), 41--63. Google ScholarDigital Library
- Jie Chen and Yousef Saad. 2012. Dense subgraph extraction with application to community detection. IEEE Trans. Knowl. Data Eng. 24, 7 (2012), 1216--1230. Google ScholarDigital Library
- Fan Chung and Linyuan Lu. 2002. The average distances in random graphs with given expected degrees. Proc. National Acad. Sci. 99, 25 (2002), 15879--15882.Google ScholarCross Ref
- Diane J. Cook and Lawrence B. Holder. 2006. Mining Graph Data. John Wiley & Sons. Google ScholarDigital Library
- Emiliano De Cristofaro, Arik Friedman, Guillaume Jourjon, Mohamed Ali Kaafar, and M. Zubair Shafiq. 2014. Paying for likes? Understanding facebook like fraud using honeypots. In Proceedings of the 2014 Conference on Internet Measurement Conference. ACM, 129--136. Google ScholarDigital Library
- William Eberle and Lawrence Holder. 2007. Discovering structural anomalies in graph-based data. In 7th IEEE International Conference on Data Mining Workshops, 2007 (ICDM Workshops 2007). IEEE, 393--398. Google ScholarDigital Library
- Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. 1999. On power-law relationships of the internet topology. In ACM SIGCOMM Computer Communication Review, Vol. 29. ACM, 251--262. Google ScholarDigital Library
- Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recognit. Lett. 27, 8 (2006), 861--874. Google ScholarDigital Library
- Santo Fortunato. 2010. Community detection in graphs. Phys. Rep. 486, 3 (2010), 75--174.Google ScholarCross Ref
- Christos Giatsidis, Dimitrios M. Thilikos, and Michalis Vazirgiannis. 2011. D-cores: Measuring collaboration of directed graphs based on degeneracy. In IEEE 11th International Conference on Data Mining (ICDM 2011). IEEE, 201--210. Google ScholarDigital Library
- Xia Hu, Jiliang Tang, Yanchao Zhang, and Huan Liu. 2013. Social spammer detection in microblogging. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence. AAAI Press, 2633--2639. Google ScholarDigital Library
- Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014a. CatchSync: Catching synchronized behavior in large directed graphs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 941--950. Google ScholarDigital Library
- Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014b. Detecting suspicious following behavior in multimillion-node social networks. In Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 305--306. Google ScholarDigital Library
- Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014c. Inferring strange behavior from connectivity pattern in social networks. In Advances in Knowledge Discovery and Data Mining. Springer, 126--138.Google Scholar
- George Karypis and Vipin Kumar. 1995. Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0. (1995).Google Scholar
- Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46, 5 (1999), 604--632. Google ScholarDigital Library
- Ravi Kumar, Jasmine Novak, and Andrew Tomkins. 2010. Structure and evolution of online social networks. In Link Mining: Models, Algorithms, and Applications. Springer, 337--357.Google Scholar
- Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web. ACM, 591--600. Google ScholarDigital Library
- Kyumin Lee, James Caverlee, and Steve Webb. 2010. Uncovering social spammers: Social honeypots+ machine learning. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 435--442. Google ScholarDigital Library
- Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han, and S. Yu Philip. 2005. Mining behavior graphs for” backtrace” of noncrashing bugs. In SDM. SIAM, 286--297.Google Scholar
- H. D. K. Moonesinghe and Pang-Ning Tan. 2008. Outrank: A graph-based outlier detection framework using random walk. Int. J. Artif. Intell. Tools 17, 01 (2008), 19--36.Google ScholarCross Ref
- Caleb C. Noble and Diane J. Cook. 2003. Graph-based anomaly detection. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 631--636. Google ScholarDigital Library
- Shashank Pandit, Duen Horng Chau, Samuel Wang, and Christos Faloutsos. 2007. Netprobe: A fast and scalable system for fraud detection in online auction networks. In Proceedings of the 16th International Conference on World Wide Web. ACM, 201--210. Google ScholarDigital Library
- Jian Pei, Daxin Jiang, and Aidong Zhang. 2005. On mining cross-graph quasi-cliques. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM, 228--238. Google ScholarDigital Library
- Charles Perez, Marc Lemercier, Babiga Birregah, and Alain Corpel. 2011. Spot 1.0: Scoring suspicious profiles on twitter. In International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2011). IEEE, 377--381. Google ScholarDigital Library
- B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju, and Christos Faloutsos. 2010. Eigenspokes: Surprising patterns and scalable community chipping in large graphs. In Advances in Knowledge Discovery and Data Mining. Springer, 435--448. Google ScholarDigital Library
- Shashi Shekhar, Chang-Tien Lu, and Pusheng Zhang. 2001. Detecting graph-based spatial outliers: Algorithms and applications (a summary of results). In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 371--376. Google ScholarDigital Library
- Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna. 2010. Detecting spammers on social networks. In Proceedings of the 26th Annual Computer Security Applications Conference. ACM, 1--9. Google ScholarDigital Library
- Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, and Christos Faloutsos. 2005. Neighborhood formation and anomaly detection in bipartite graphs. In 5th IEEE International Conference on Data Mining. IEEE, 418--425. Google ScholarDigital Library
- David M. J. Tax and Robert P. W. Duin. 1998. Outlier detection using classifier instability. In Advances in Pattern Recognition. Springer, 593--601. Google ScholarDigital Library
- Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. 2013. Denser than the densest subgraph: Extracting optimal quasi-cliques with quality guarantees. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 104--112. Google ScholarDigital Library
- Xifeng Yan and Jiawei Han. 2003. CloseGraph: Mining closed frequent graph patterns. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 286--295. Google ScholarDigital Library
- Zhaonian Zou, Jianzhong Li, Hong Gao, and Shuo Zhang. 2010. Mining frequent subgraph patterns from uncertain graph data. IEEE Trans. Knowl. Data Eng. 22, 9 (2010), 1203--1218. Google ScholarDigital Library
Index Terms
- Catching Synchronized Behaviors in Large Networks: A Graph Mining Approach
Recommendations
Catching a viral video
The sharing and re-sharing of videos on social sites, blogs e-mail, and other means has given rise to the phenomenon of viral videos--videos that become popular through internet sharing. In this paper we seek to better understand viral videos on YouTube ...
Inferring lockstep behavior from connectivity pattern in large graphs
Given multimillion-node graphs such as "who-follows-whom", "patent-cites-patent", "user-likes-page" and "actor/director-makes-movie" networks, how can we find unexpected behaviors? When companies operate on the graphs with monetary incentives to sell ...
Efficient and Accurate Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks
In this article, we propose Segugio, a novel defense system that allows for efficiently tracking the occurrence of new malware-control domain names in very large ISP networks. Segugio passively monitors the DNS traffic to build a machine-domain ...
Comments