research-article

Public Access

Catching Synchronized Behaviors in Large Networks: A Graph Mining Approach

Authors:
Meng Jiang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Peng Cui

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Alex Beutel

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Christos Faloutsos

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Shiqiang Yang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 10 Issue 4Article No.: 35pp 1–27https://doi.org/10.1145/2746403

Published:29 June 2016Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Given a directed graph of millions of nodes, how can we automatically spot anomalous, suspicious nodes judging only from their connectivity patterns? Suspicious graph patterns show up in many applications, from Twitter users who buy fake followers, manipulating the social network, to botnet members performing distributed denial of service attacks, disturbing the network traffic graph. We propose a fast and effective method, CatchSync, which exploits two of the tell-tale signs left in graphs by fraudsters: (a) synchronized behavior: suspicious nodes have extremely similar behavior patterns because they are often required to perform some task together (such as follow the same user); and (b) rare behavior: their connectivity patterns are very different from the majority. We introduce novel measures to quantify both concepts (“synchronicity” and “normality”) and we propose a parameter-free algorithm that works on the resulting synchronicity-normality plots. Thanks to careful design, CatchSync has the following desirable properties: (a) it is scalable to large datasets, being linear in the graph size; (b) it is parameter free; and (c) it is side-information-oblivious: it can operate using only the topology, without needing labeled data, nor timing information, and the like., while still capable of using side information if available. We applied CatchSync on three large, real datasets, 1-billion-edge Twitter social graph, 3-billion-edge, and 12-billion-edge Tencent Weibo social graphs, and several synthetic ones; CatchSync consistently outperforms existing competitors, both in detection accuracy by 36% on Twitter and 20% on Tencent Weibo, as well as in speed.

References

Charu C. Aggarwal. 2011. An Introduction to Social Network Data Analytics. Springer. Google ScholarDigital Library
Leman Akoglu, Mary McGlohon, and Christos Faloutsos. 2010. Oddball: Spotting anomalies in weighted graphs. In Advances in Knowledge Discovery and Data Mining. Springer, 410--421. Google ScholarDigital Library
Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. 2013. CopyCatch: Stopping group attacks by spotting lockstep behavior in social networks. In Proceedings of the 22nd International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, IW3C2, 119--130 Google ScholarDigital Library
Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. 2000. Graph structure in the web. Comput. Netw. 33, 1 (2000), 309--320. Google ScholarDigital Library
Qiang Cao, Michael Sirivianos, Xiaowei Yang, and Tiago Pregueiro. 2012. Aiding the detection of fake accounts in large scale social online services. In NSDI. USENIX, 197--210. Google ScholarDigital Library
Deepayan Chakrabarti. 2004. Autopart: Parameter-free graph partitioning and outlier detection. In Knowledge Discovery in Databases: PKDD 2004. Springer, 112--124. Google ScholarDigital Library
Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Comput. Surv. (CSUR) 41, 3 (2009), 15. Google ScholarDigital Library
Duen Horng Chau, Shashank Pandit, and Christos Faloutsos. 2006. Detecting fraudulent personalities in networks of online auctioneers. In Knowledge Discovery in Databases: PKDD 2006. Springer, 103--114.Google Scholar
Chen Chen, Xifeng Yan, Feida Zhu, Jiawei Han, and S. Yu Philip. 2009. Graph OLAP: A multi-dimensional framework for graph data analysis. Knowl. Inf. Syst. 21, 1 (2009), 41--63. Google ScholarDigital Library
Jie Chen and Yousef Saad. 2012. Dense subgraph extraction with application to community detection. IEEE Trans. Knowl. Data Eng. 24, 7 (2012), 1216--1230. Google ScholarDigital Library
Fan Chung and Linyuan Lu. 2002. The average distances in random graphs with given expected degrees. Proc. National Acad. Sci. 99, 25 (2002), 15879--15882.Google ScholarCross Ref
Diane J. Cook and Lawrence B. Holder. 2006. Mining Graph Data. John Wiley & Sons. Google ScholarDigital Library
Emiliano De Cristofaro, Arik Friedman, Guillaume Jourjon, Mohamed Ali Kaafar, and M. Zubair Shafiq. 2014. Paying for likes? Understanding facebook like fraud using honeypots. In Proceedings of the 2014 Conference on Internet Measurement Conference. ACM, 129--136. Google ScholarDigital Library
William Eberle and Lawrence Holder. 2007. Discovering structural anomalies in graph-based data. In 7th IEEE International Conference on Data Mining Workshops, 2007 (ICDM Workshops 2007). IEEE, 393--398. Google ScholarDigital Library
Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. 1999. On power-law relationships of the internet topology. In ACM SIGCOMM Computer Communication Review, Vol. 29. ACM, 251--262. Google ScholarDigital Library
Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recognit. Lett. 27, 8 (2006), 861--874. Google ScholarDigital Library
Santo Fortunato. 2010. Community detection in graphs. Phys. Rep. 486, 3 (2010), 75--174.Google ScholarCross Ref
Christos Giatsidis, Dimitrios M. Thilikos, and Michalis Vazirgiannis. 2011. D-cores: Measuring collaboration of directed graphs based on degeneracy. In IEEE 11th International Conference on Data Mining (ICDM 2011). IEEE, 201--210. Google ScholarDigital Library
Xia Hu, Jiliang Tang, Yanchao Zhang, and Huan Liu. 2013. Social spammer detection in microblogging. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence. AAAI Press, 2633--2639. Google ScholarDigital Library
Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014a. CatchSync: Catching synchronized behavior in large directed graphs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 941--950. Google ScholarDigital Library
Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014b. Detecting suspicious following behavior in multimillion-node social networks. In Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 305--306. Google ScholarDigital Library
Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014c. Inferring strange behavior from connectivity pattern in social networks. In Advances in Knowledge Discovery and Data Mining. Springer, 126--138.Google Scholar
George Karypis and Vipin Kumar. 1995. Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0. (1995).Google Scholar
Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46, 5 (1999), 604--632. Google ScholarDigital Library
Ravi Kumar, Jasmine Novak, and Andrew Tomkins. 2010. Structure and evolution of online social networks. In Link Mining: Models, Algorithms, and Applications. Springer, 337--357.Google Scholar
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web. ACM, 591--600. Google ScholarDigital Library
Kyumin Lee, James Caverlee, and Steve Webb. 2010. Uncovering social spammers: Social honeypots+ machine learning. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 435--442. Google ScholarDigital Library
Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han, and S. Yu Philip. 2005. Mining behavior graphs for” backtrace” of noncrashing bugs. In SDM. SIAM, 286--297.Google Scholar
H. D. K. Moonesinghe and Pang-Ning Tan. 2008. Outrank: A graph-based outlier detection framework using random walk. Int. J. Artif. Intell. Tools 17, 01 (2008), 19--36.Google ScholarCross Ref
Caleb C. Noble and Diane J. Cook. 2003. Graph-based anomaly detection. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 631--636. Google ScholarDigital Library
Shashank Pandit, Duen Horng Chau, Samuel Wang, and Christos Faloutsos. 2007. Netprobe: A fast and scalable system for fraud detection in online auction networks. In Proceedings of the 16th International Conference on World Wide Web. ACM, 201--210. Google ScholarDigital Library
Jian Pei, Daxin Jiang, and Aidong Zhang. 2005. On mining cross-graph quasi-cliques. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM, 228--238. Google ScholarDigital Library
Charles Perez, Marc Lemercier, Babiga Birregah, and Alain Corpel. 2011. Spot 1.0: Scoring suspicious profiles on twitter. In International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2011). IEEE, 377--381. Google ScholarDigital Library
B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju, and Christos Faloutsos. 2010. Eigenspokes: Surprising patterns and scalable community chipping in large graphs. In Advances in Knowledge Discovery and Data Mining. Springer, 435--448. Google ScholarDigital Library
Shashi Shekhar, Chang-Tien Lu, and Pusheng Zhang. 2001. Detecting graph-based spatial outliers: Algorithms and applications (a summary of results). In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 371--376. Google ScholarDigital Library
Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna. 2010. Detecting spammers on social networks. In Proceedings of the 26th Annual Computer Security Applications Conference. ACM, 1--9. Google ScholarDigital Library
Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, and Christos Faloutsos. 2005. Neighborhood formation and anomaly detection in bipartite graphs. In 5th IEEE International Conference on Data Mining. IEEE, 418--425. Google ScholarDigital Library
David M. J. Tax and Robert P. W. Duin. 1998. Outlier detection using classifier instability. In Advances in Pattern Recognition. Springer, 593--601. Google ScholarDigital Library
Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. 2013. Denser than the densest subgraph: Extracting optimal quasi-cliques with quality guarantees. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 104--112. Google ScholarDigital Library
Xifeng Yan and Jiawei Han. 2003. CloseGraph: Mining closed frequent graph patterns. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 286--295. Google ScholarDigital Library
Zhaonian Zou, Jianzhong Li, Hong Gao, and Shuo Zhang. 2010. Mining frequent subgraph patterns from uncertain graph data. IEEE Trans. Knowl. Data Eng. 22, 9 (2010), 1203--1218. Google ScholarDigital Library

Index Terms

Catching Synchronized Behaviors in Large Networks: A Graph Mining Approach

Recommendations

Catching a viral video

The sharing and re-sharing of videos on social sites, blogs e-mail, and other means has given rise to the phenomenon of viral videos--videos that become popular through internet sharing. In this paper we seek to better understand viral videos on YouTube ...
Read More
Inferring lockstep behavior from connectivity pattern in large graphs

Given multimillion-node graphs such as "who-follows-whom", "patent-cites-patent", "user-likes-page" and "actor/director-makes-movie" networks, how can we find unexpected behaviors? When companies operate on the graphs with monetary incentives to sell ...
Read More
Efficient and Accurate Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks

In this article, we propose Segugio, a novel defense system that allows for efficiently tracking the occurrence of new malware-control domain names in very large ISP networks. Segugio passively monitors the DNS traffic to build a machine-domain ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Knowledge Discovery from Data Volume 10, Issue 4
Special Issue on SIGKDD 2014, Special Issue on BIGCHAT and Regular Papers
July 2016
417 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/2936311
Editor:
Philip S. Yu
University of Illinois at Chicago, USA
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 June 2016
- Revised: 1 March 2015
- Accepted: 1 March 2015
- Received: 1 September 2014
Published in tkdd Volume 10, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Anomaly detection
connectivity pattern
graph mining
suspicious behavior
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 55
  Total Citations
  View Citations
- 838
  Total Downloads
- Downloads (Last 12 months)80
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Catching Synchronized Behaviors in Large Networks: A Graph Mining Approach

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Catching a viral video

Inferring lockstep behavior from connectivity pattern in large graphs

Efficient and Accurate Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Catching Synchronized Behaviors in Large Networks: A Graph Mining Approach

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Catching a viral video

Inferring lockstep behavior from connectivity pattern in large graphs

Efficient and Accurate Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media