Abstract
How much has a network changed since yesterday? How different is the wiring of Bob’s brain (a left-handed male) and Alice’s brain (a right-handed female), and how is it different? Graph similarity with given node correspondence, i.e., the detection of changes in the connectivity of graphs, arises in numerous settings. In this work, we formally state the axioms and desired properties of the graph similarity functions, and evaluate when state-of-the-art methods fail to detect crucial connectivity changes in graphs. We propose DeltaCon, a principled, intuitive, and scalable algorithm that assesses the similarity between two graphs on the same nodes (e.g., employees of a company, customers of a mobile carrier). In conjunction, we propose DeltaCon-Attr, a related approach that enables attribution of change or dissimilarity to responsible nodes and edges. Experiments on various synthetic and real graphs showcase the advantages of our method over existing similarity measures. Finally, we employ DeltaCon and DeltaCon-Attr on real applications: (a) we classify people to groups of high and low creativity based on their brain connectivity graphs, (b) do temporal anomaly detection in the who-emails-whom Enron graph and find the top culprits for the changes in the temporal corporate email graph, and (c) recover pairs of test-retest large brain scans ( ∼17M edges, up to 90M edges) for 21 subjects.
- Leman Akoglu*, Duen Horng Chau*, U. Kang*, Danai Koutra*, and Christos Faloutsos. 2012. OPAvion: mining and visualization in large graphs. In Proceedings of the 2012 ACM International Conference on Management of Data (SIGMOD). ACM, 717--720. Google ScholarDigital Library
- Leman Akoglu and Christos Faloutsos. 2010. Event detection in time series of mobile communication graphs. In 27th Army Science Conference. 77--79.Google Scholar
- Leman Akoglu, Hanghang Tong, and Danai Koutra. 2014. Graph-based anomaly detection and description: a survey. Data Min. Knowl. Disc. 29, 3, 626--688. Google ScholarDigital Library
- David Aldous and James Allen Fill. 2002. Reversible Markov Chains and Random Walks on Graphs. (2002). Unfinished monograph, recompiled 2014. Retrieved from http://www.stat.berkeley.edu/∼aldous/ RWG/book.html.Google Scholar
- Basak Alper, Benjamin Bach, Nathalie Henry Riche, Tobias Isenberg, and Jean-Daniel Fekete. 2013. Weighted graph comparison techniques for brain connectivity analysis. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’13). ACM, New York, NY, USA, 483--492. Google ScholarDigital Library
- Keith Andrews, Martin Wohlfahrt, and Gerhard Wurzinger. 2009. Visual graph comparison. In Proceedings of the 13th International Conference on Information Visualization - Showcase (IV). IEEE, 62--67. Google ScholarDigital Library
- Mohsen Bayati, David F. Gleich, Amin Saberi, and Ying Wang. 2013. Message-passing algorithms for sparse network alignment. 7, 1 (2013), Article 3, 31 pages. Google ScholarDigital Library
- Michele Berlingerio, Danai Koutra, Tina Eliassi-Rad, and Christos Faloutsos. 2013. Network similarity via multiple social theories. In Proceedings of the IEEE/ACM Conference on Advances in Social Networks Analysis and Mining (ASONAM’13). ACM, 1439--1440. Google ScholarDigital Library
- Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. 2013. CopyCatch: Stopping group attacks by spotting lockstep behavior in social networks. In Proceedings of the 22nd International Conference on World Wide Web (WWW). ACM, 119--130. Google ScholarDigital Library
- Karsten M. Borgwardt and Hans-Peter Kriegel. 2005. Shortest-path kernels on graphs. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM’05). IEEE Computer Society, Washington, DC, USA, 74--81. Google ScholarDigital Library
- S. Bradde, Alfredo Braunstein, H. Mahmoudi, F. Tria, Martin Weigt, and Riccardo Zecchina. 2010. Aligning graphs and finding substructures by a cavity approach. Europhys. Lett. 89, 3 (2010).Google ScholarCross Ref
- Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30, 1--7 (1998), 107--117. Google ScholarDigital Library
- Horst Bunke, Peter J. Dickinson, Miro Kraetzl, and Walter D. Wallis. 2006. A Graph-Theoretic Approach to Enterprise Network Dynamics (PCS). Birkhauser. Google ScholarDigital Library
- Rajmonda Sulo Caceres, Tanya Y. Berger-Wolf, and Robert Grossman. 2011. Temporal scale of processes in dynamic networks. In Proceedings of the Data Mining Workshops (ICDMW) at the 11th IEEE International Conference on Data Mining (ICDM). 925--932. Google ScholarDigital Library
- Duen Horng Chau, Carey Nachenberg, Jeffrey Wilhelm, Adam Wright, and Christos Faloutsos. 2011. Large scale graph mining and inference for malware detection. In Proceedings of the 11th SIAM International Conference on Data Mining (SDM), Mesa, AZ, 131--142.Google Scholar
- Yihua Chen, Eric K. Garcia, Maya R. Gupta, Ali Rahimi, and Luca Cazzanti. 2009. Similarity-based classification: concepts and algorithms. J. Mach. Learn. Res. 10, (June 2009), 747--776. Google ScholarDigital Library
- Donatello Conte, Pasquale Foggia, Carlo Sansone, and Mario Vento. 2004. Thirty years of graph matching in pattern recognition. Int. J. Pattern Recognit. Artificial Intell. 18, 3 (2004), 265--298.Google ScholarCross Ref
- Fabrizio Costa and Kurt De Grave. 2010. Fast neighborhood subgraph pairwise distance kernel. In Proceedings of the 26th International Conference on Machine Learning. Omnipress, 255--262.Google Scholar
- Peter Doyle and James Laurie Snell. 1984. Random Walks and Electric Networks. Vol. 22. Mathematical Association America, New York.Google Scholar
- Hewayda Elghawalby and Edwin R. Hancock. 2008. Measuring graph similarity using spectral geometry. In Proceedings of the 5th International Conference on Image Analysis and Recognition (ICIAR). Springer, 517--526. Google ScholarDigital Library
- Cesim Erten, Philip J. Harding, Stephen G. Kobourov, Kevin Wampler, and Gary Yee. 2003. GraphAEL: graph animations with evolving layouts. In Proceedings of the 11th International Symposium in Graph Drawing (GD), Perugia, Italy, Vol. 2912. 98--110.Google Scholar
- Miroslav Fiedler. 1973. Algebraic connectivity of graphs. Czechoslovak Math. J. 23, 98 (1973), 298--305.Google ScholarCross Ref
- Linton C. Freeman. 1977. A set of measures of centrality based on betweenness. Sociometry 40, 1 (1977), 35--41.Google ScholarCross Ref
- Thomas Gärtner, Peter A. Flach, and Stefan Wrobel. 2003. On graph kernels: hardness results and efficient alternatives. In Proceedings of the 16th Annual Conference on Computational Learning Theory and the 7th Kernel Workshop. Springer, 129--143.Google ScholarCross Ref
- Wolfgang Gatterbauer, Stephan Günnemann, Danai Koutra, and Christos Faloutsos. 2015. Linearized and single-pass belief propagation. Proc. VLDB Endow. 8, 5 (2015), 581--592. Google ScholarDigital Library
- Michael Gleicher, Danielle Albers Szafir, Rick Walker, Ilir Jusufi, Charles D. Hansen, and Jonathan C. Roberts. 2011. Visual comparison for information visualization. Inf. Visualization 10, 4 (October 2011), 289--309. Google ScholarDigital Library
- William R. Gray, John A. Bogovic, Joshua T. Vogelstein, Bennett A. Landman, Jerry L. Prince, and R. Jacob Vogelstein. 2012. Magnetic resonance connectome automated pipeline: an overview. IEEE Pulse 3, 2 (2012), 42--48.Google ScholarCross Ref
- R. Guha, Ravi Kumar, Prabhakar Raghavan, and Andrew Tomkins. 2004. Propagation of trust and distrust. In Proceedings of the 13th International Conference on World Wide Web (WWW), New York, NY. ACM, 403--412. Google ScholarDigital Library
- Mountaz Hascoët and Pierre Dragicevic. 2012. Interactive graph matching and visual comparison of graphs and clustered graphs. In Proceedings of the International Working Conference on Advanced Visual Interfaces (AVI’12). ACM, New York, NY, USA, 522--529. Google ScholarDigital Library
- Taher H. Haveliwala. 2003. Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search. IEEE Trans. Knowl. Data Eng. 15, 4 (2003), 784--796. Google ScholarDigital Library
- Keith Henderson, Tina Eliassi-Rad, Christos Faloutsos, Leman Akoglu, Lei Li, Koji Maruhashi, B. Aditya Prakash, and Hanghang Tong. 2010. Metric forensics: a multi-level approach for mining volatile graphs. In Proceedings of the 16th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). ACM, 163--172. Google ScholarDigital Library
- Tamás Horváth, Thomas Gärtner, and Stefan Wrobel. 2004. Cyclic pattern kernels for predictive graph mining. In Proceedings of the 10th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). ACM, 158--167. Google ScholarDigital Library
- Tsuyoshi Ide and Hisashi Kashima. 2004. Eigenspace-based anomaly detection in computer systems. In Proceedings of the 10th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). 440--449. Google ScholarDigital Library
- Glen Jeh and Jennifer Widom. 2002. SimRank: a measure of structural-context similarity. In Proceedings of the 8th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Edmonton, Alberta. ACM, 538--543. Google ScholarDigital Library
- U. Kang, Jay-Yoon Lee, Danai Koutra, and Christos Faloutsos. 2014. Net-ray: visualizing and mining web-scale graphs. In Proceedings of the 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). 348--361.Google ScholarCross Ref
- U. Kang, Hanghang Tong, and Jimeng Sun. 2012. Fast random walk graph kernel. In Proceedings of the 12th SIAM International Conference on Data Mining (SDM). 828--838.Google ScholarCross Ref
- G. Karypis and V. Kumar. 1995. METIS: unstructured graph partitioning and sparse matrix ordering system. The University of Minnesota (1995).Google Scholar
- Hisashi Kashima, Koji Tsuda, and Akihiro Inokuchi. 2003. Marginalized kernels between labeled graphs. In Proceedings of the 20th International Conference on Machine Learning. AAAI Press, 321--328.Google Scholar
- Alexander K. Kelmans. 1976. Comparison of graphs by their number of spanning trees. Discrete Math. 16, 3 (1976), 241--261.Google ScholarCross Ref
- Nguyen Lu Dang Khoa and Sanjay Chawla. 2010. Robust outlier detection using commute time and eigenspace embedding. In Proceedings of the 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Hyderabad, India (Lecture Notes in Computer Science), Vol. 6119. Springer, 422--434. Google ScholarDigital Library
- Heung-Nam Kim and Abdulmotaleb El Saddik. 2011. Personalized pagerank vectors for tag recommendations: inside folkrank. In Proceedings of the5th ACM Conference on Recommender Systems. ACM, 45--52. Google ScholarDigital Library
- Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5 (1999), 604--632. Google ScholarDigital Library
- Bryan Klimt and Yiming Yang. 2004. Introducing the enron corpus. In Proceedings of the 1st Conference on Email and Anti-Spam.Google Scholar
- Danai Koutra, U. Kang, Jilles Vreeken, and Christos Faloutsos. 2014. VoG: summarizing and understanding large graphs. In Proceedings of the 14th SIAM International Conference on Data Mining (SDM). SIAM, Philadelphia, PA, 91--99.Google ScholarCross Ref
- Danai Koutra, U. Kang, Jilles Vreeken, and Christos Faloutsos. 2015. Summarizing and understanding large graphs. In Statistical Analysis and Data Mining. John Wiley & Sons, Inc.Google Scholar
- Danai Koutra, Tai-You Ke, U. Kang, Duen Horng Chau, Hsing-Kuo Kenneth Pao, and Christos Faloutsos. 2011. Unifying guilt-by-association approaches: theorems and fast algorithms. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD). 245--260. Google ScholarDigital Library
- Danai Koutra, Evangelos E. Papalexakis, and Christos Faloutsos. 2012. TensorSplat: spotting latent anomalies in time. In 16th Panhellenic Conference on Informatics (PCI). IEEE, 144--149. Google ScholarDigital Library
- Danai Koutra, Vasileios Koutras, B. Aditya Prakash, and Christos Faloutsos. 2013c. Patterns amongst competing task frequencies: super-linearities, and the Almond-DG model. In Proceedings of the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). Springer Science + Business Media, 201--212.Google ScholarCross Ref
- Danai Koutra, Hanghang Tong, and David Lubensky. 2013a. Big-align: fast bipartite graph alignment. In Proceedings of the 14th IEEE International Conference on Data Mining (ICDM). 389--398.Google ScholarCross Ref
- Danai Koutra, Joshua Vogelstein, and Christos Faloutsos. 2013b. DeltaCon: a principled massive-graph similarity function. In Proceedings of the 13th SIAM International Conference on Data Mining (SDM). 162--170.Google ScholarCross Ref
- Jay Yoon Lee, U. Kang, Danai Koutra, and Christos Faloutsos. 2013. Fast anomaly detection despite the duplicates. In Proceedings of the 22nd International Conference on World Wide Web (WWW Companion Volume). International World Wide Web Conferences Steering Committee, 195--196. Google ScholarDigital Library
- Jure Leskovec, Deepayan Chakrabarti, Jon M. Kleinberg, and Christos Faloutsos. 2005. Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication. In Proceedings of the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD). 133--145. Google ScholarDigital Library
- Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2007. Graph evolution: densification and shrinking diameters. IEEE Trans. Knowl. Data Eng. 1, 1 (March 2007), Article No. 2. Google ScholarDigital Library
- Cuiping Li, Jiawei Han, Guoming He, Xin Jin, Yizhou Sun, Yintao Yu, and Tianyi Wu. 2010. Fast computation of simrank for static and dynamic information networks. In Proceedings of the 13th International Conference on Extending Database Technology (EDBT’10). ACM, New York, NY, 465--476. Google ScholarDigital Library
- Geng Li, Murat Semerci, Bulent Yener, and Mohammed J. Zaki. 2011. Graph classification via topological and label attributes. In Proceedings of the 9th International Workshop on Mining and Learning with Graphs (MLG).Google Scholar
- Owen Macindoe and Whitman Richards. 2010. Graph comparison using fine structure analysis. In Proceedings of the International Conference on Privacy, Security, Risk and Trust (SocialCom/PASSAT). IEEE, 193--200. Google ScholarDigital Library
- Pierre Mahé and Jean-Philippe Vert. 2009. Graph kernels based on tree patterns for molecules. Mach. Learn. 75, 1 (April 2009), 3--35. Google ScholarDigital Library
- Ching-Hao Mao, Chung-Jung Wu, Evangelos E Papalexakis, Christos Faloutsos, and Tien-Cheu Kao. 2014. MalSpot: multi2 malicious network behavior patterns analysis. In PAKDD. 1--14.Google Scholar
- Koji Maruhashi, Fan Guo, and Christos Faloutsos. 2011. Multiaspectforensics: pattern mining on large-scale heterogeneous networks with tensor analysis. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 203--210. Google ScholarDigital Library
- Mary McGlohon, Stephen Bay, Markus G. Anderle, David M. Steier, and Christos Faloutsos. 2009. SNARE: a link analytic system for graph labeling and risk detection. In Proceedings of the 15th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). 1265--1274. Google ScholarDigital Library
- Mark E. J. Newman. 2005. A measure of betweenness centrality based on random walks. Soc. Netw. 27, 1 (2005), 39--54.Google ScholarCross Ref
- Caleb C. Noble and Diane J. Cook. 2003. Graph-based anomaly detection. In Proceedings of the 9th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). ACM, 631--636. Google ScholarDigital Library
- OCP. 2014. Open Connectome Project. Retrieved from http://www.openconnectomeproject.org.Google Scholar
- Panagiotis Papadimitriou, Ali Dasdan, and Hector Garcia-Molina. 2008. Web graph similarity for anomaly detection. J. Internet Services Applications 1, 1 (2008), 19--30.Google ScholarCross Ref
- Mitchell Peabody. 2003. Finding Groups of Graphs in Databases. Master’s thesis. Drexel University, Philadelphia, PA.Google Scholar
- Jan Ramon and Thomas Gärtner. 2003. Expressivity versus efficiency of graph kernels. In Proceedings of the 1st International Workshop on Mining Graphs, Trees and Sequences. 65--74.Google Scholar
- Stephen Ranshous, Shitian Shen, Danai Koutra, Steven Harenberg, Christos Faloutsos, and Nagiza F. Samatova. 2015. Anomaly detection in dynamic networks: a survey. WIREs Comp Stat 7, 223--247. DOI:10.1002/wics.1347Google ScholarDigital Library
- William Gray Roncal, Zachary H. Koterba, Disa Mhembere, Dean Kleissas, Joshua T. Vogelstein, Randal C. Burns, Anita R. Bowles, Dimitrios K. Donavos, Sephira Ryman, Rex E. Jung, Lei Wu, Vince D. Calhoun, and R. Jacob Vogelstein. 2013. MIGRAINE: MRI graph reliability analysis and inference for connectomics. In Proceedings of the IEEE Global Conference on Signal and Information Processing (GlobalSIP).Google Scholar
- Neil Shah, Danai Koutra, Tianmin Zou, Brian Gallagher, and Christos Faloutsos. 2015. TimeCrunch: interpretable dynamic graph summarization. In Proceedings of the 21st ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). 1055--1064. Google ScholarDigital Library
- Nino Shervashidze and Karsten Borgwardt. 2009. Fast subtree kernels on graphs. In Proceedings of the 23rd Annual Conference on Neural Information Processing Systems (NIPS). Vancouver, BC. 1660--1668.Google Scholar
- Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, and Karsten M. Borgwardt. 2011. Weisfeiler-Lehman graph kernels. J. Mach. Learn. Res. 12, (November 2011), 2539--2561. Google ScholarDigital Library
- Nino Shervashidze, S. V. N. Vishwanathan, Tobias Petri, Kurt Mehlhorn, and Karsten Borgwardt. 2009. Efficient graphlet kernels for large graph comparison. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS 2009), Vol. 5. Journal of Machine Learning Research, 488--495.Google Scholar
- SNAP. 2016. Retrieved http://snap.stanford.edu/data/index.html#web. Last accessed: 2016.Google Scholar
- Kumar Sricharan and Kamalika Das. 2014. Localizing anomalous changes in time-evolving graphs. In Proceedings of the 2014 ACM International Conference on Management of Data (SIGMOD). ACM, 1347--1358. Google ScholarDigital Library
- Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad, Michalis Faloutsos, and Christos Faloutsos. 2012. Gelling, and melting, large graphs by edge manipulation. In Proceedings of the 21st ACM Conference on Information and Knowledge Management (CIKM). ACM, 245--254. Google ScholarDigital Library
- Frank van Ham, Hans-Jörg Schulz, and Joan M. Dimicco. 2009. Honeycomb: visual analysis of large scale social networks. In Human-Computer Interaction - INTERACT 2009. Lecture Notes in Computer Science, Vol. 5727. Springer, Berlin, 429--442. Google ScholarDigital Library
- S. V. N. Vishwanathan, Nicol N. Schraudolph, Risi Imre Kondor, and Karsten M. Borgwardt. 2010. Graph kernels. Journal of Machine Learning Research 11 (2010), 1201--1242. Google ScholarCross Ref
- Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P. Gummadi. 2009. On the evolution of user interaction in facebook. In Proceedings of the 2nd ACM SIGCOMM Workshop on Social Networks (WOSN). 37--42. Google ScholarDigital Library
- Heng Wang, Minh Tang, Y. Park, and C. E. Priebe. 2014. Locality statistics for anomaly detection in time series of graphs. IEEE Trans. Signal Process. 62, 3 (February 2014), 703--717. Google ScholarDigital Library
- Ye Wang, Srinivasan Parthasarathy, and Shirish Tatikonda. 2011. Locality sensitive outlier detection: a ranking driven approach. In Proceedings of the 27th International Conference on Data Engineering (ICDE). 410--421. Google ScholarDigital Library
- Richard C. Wilson and Ping Zhu. 2008. A study of graph spectra for comparing graphs and trees. J. Pattern Recognit. 41, 9 (2008), 2833--2841. Google ScholarDigital Library
- Ömer Nebil Yaveroğlu, Noël Malod-Dognin, Darren Davis, Zoran Levnajić, Vuk Janjic, Rasa Karapandza, Aleksandar Stojmirovic, and Nataša Pržulj. 2014. Revealing the hidden language of complex networks. Scientific Reports 4, Article number: 4547.Google Scholar
- Jonathan S. Yedidia, William T. Freeman, and Yair Weiss. 2003. Understanding belief propagation and its generalizations. Exploring Artificial Intelligence in the New Millennium 239--269. Google ScholarDigital Library
- Weiren Yu, Xuemin Lin, Wenjie Zhang, Lijun Chang, and Jian Pei. 2013. More is simpler: effectively and efficiently assessing node-pair similarities based on hyperlinks. Proc. VLDB Endow. 7, 1 (2013), 13--24. Google ScholarDigital Library
Index Terms
- DeltaCon: Principled Massive-Graph Similarity Function with Attribution
Recommendations
NetLSD: Hearing the Shape of a Graph
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningComparison among graphs is ubiquitous in graph analytics. However, it is a hard task in terms of the expressiveness of the employed similarity measure and the efficiency of its computation. Ideally, graph comparison should be invariant to the order of ...
Web graph similarity for anomaly detection (poster)
WWW '08: Proceedings of the 17th international conference on World Wide WebWeb graphs are approximate snapshots of the web, created by search engines. Their creation is an error-prone procedure that relies on the availability of Internet nodes and the faultless operation of multiple software and hardware units. Checking the ...
Structural similarity of directed universal hierarchical graphs: A low computational complexity approach
In the present paper we mainly introduce an efficient approach to measure the structural similarity of so called directed universal hierarchical graphs. We want to underline that directed universal hierarchical graphs can be obtained from generalized ...
Comments