Skip to main content
Erschienen in: Knowledge and Information Systems 2/2016

01.08.2016 | Regular Paper

Inferring lockstep behavior from connectivity pattern in large graphs

verfasst von: Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, Shiqiang Yang

Erschienen in: Knowledge and Information Systems | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Given multimillion-node graphs such as “who-follows-whom”, “patent-cites-patent”, “user-likes-page” and “actor/director-makes-movie” networks, how can we find unexpected behaviors? When companies operate on the graphs with monetary incentives to sell Twitter “Followers” and Facebook page “Likes”, the graphs show strange connectivity patterns. In this paper, we study a complete graph from a large Twitter-style social network, spanning up to 3.33 billion edges. We report strange deviations from typical patterns like smooth degree distributions. We find that such deviations are often due to “lockstep behavior” that large groups of followers connect to the same groups of followees. Our first contribution is that we study strange patterns on the adjacency matrix and in the spectral subspaces with respect to several flavors of lockstep. We discover that (a) the lockstep behaviors on the graph shape dense “block” in its adjacency matrix and creates “rays” in spectral subspaces, and (b) partially overlapping of the behaviors shape “staircase” in its adjacency matrix and creates “pearls” in spectral subspaces. The second contribution is that we provide a fast algorithm, using the discovery as a guide for practitioners, to detect users who offer the lockstep behaviors in undirected/directed/bipartite graphs. We carry out extensive experiments on both synthetic and real datasets, as well as public datasets from IMDb and US Patent. The results demonstrate the scalability and effectiveness of our proposed algorithm.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Becker RA, Volinsky C, Wilks AR (2010) Fraud detection in telecommunications: history and lessons learned. Technometrics 52(1):20–33MathSciNetCrossRef Becker RA, Volinsky C, Wilks AR (2010) Fraud detection in telecommunications: history and lessons learned. Technometrics 52(1):20–33MathSciNetCrossRef
2.
Zurück zum Zitat Chau DH, Pandit S, Faloutsos C (2006) Detecting fraudulent personalities in networks of online auctioneers. In: Fürnkranz J, Scheffer T, Spiliopoulou M (eds) Knowledge discovery in databases: PKDD 2006. Springer, Berlin Heidelberg, pp 103–114CrossRef Chau DH, Pandit S, Faloutsos C (2006) Detecting fraudulent personalities in networks of online auctioneers. In: Fürnkranz J, Scheffer T, Spiliopoulou M (eds) Knowledge discovery in databases: PKDD 2006. Springer, Berlin Heidelberg, pp 103–114CrossRef
3.
Zurück zum Zitat Beutel A, Xu W, Guruswami V, Palow C, Faloutsos C (2013) CopyCatch: stopping group attacks by spotting lockstep behavior in social networks. In: Proceedings of the 22nd international conference on World Wide Web, pp 119–130. International World Wide Web Conferences Steering Committee Beutel A, Xu W, Guruswami V, Palow C, Faloutsos C (2013) CopyCatch: stopping group attacks by spotting lockstep behavior in social networks. In: Proceedings of the 22nd international conference on World Wide Web, pp 119–130. International World Wide Web Conferences Steering Committee
4.
Zurück zum Zitat Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2008) Statistical properties of community structure in large social and information networks. In: Proceedings of the 17th international conference on World Wide Web, pp 695-704. ACM Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2008) Statistical properties of community structure in large social and information networks. In: Proceedings of the 17th international conference on World Wide Web, pp 695-704. ACM
6.
Zurück zum Zitat Chen Jie, Saad Yousef (2012) Dense subgraph extraction with application to community detection. Knowl Data Eng IEEE Trans 24(7):1216–1230CrossRef Chen Jie, Saad Yousef (2012) Dense subgraph extraction with application to community detection. Knowl Data Eng IEEE Trans 24(7):1216–1230CrossRef
7.
Zurück zum Zitat Zha H, He X, Ding C, Simon H, Gu M (2001) Bipartite graph partitioning and data clustering. In: Proceedings of the tenth international conference on Information and knowledge management, pp 25–32. ACM Zha H, He X, Ding C, Simon H, Gu M (2001) Bipartite graph partitioning and data clustering. In: Proceedings of the tenth international conference on Information and knowledge management, pp 25–32. ACM
8.
Zurück zum Zitat Gnnemann S, Boden B, Frber I, Seidl T (2013) Efficient Mining of Combined Subspace and Subgraph Clusters in Graphs with Feature Vectors. In: Pei J, Tseng VS, Cao L, Motoda H, Xu G (eds) Advances in knowledge discovery and data mining. Springer, Berlin, Heidelberg, pp 261–275CrossRef Gnnemann S, Boden B, Frber I, Seidl T (2013) Efficient Mining of Combined Subspace and Subgraph Clusters in Graphs with Feature Vectors. In: Pei J, Tseng VS, Cao L, Motoda H, Xu G (eds) Advances in knowledge discovery and data mining. Springer, Berlin, Heidelberg, pp 261–275CrossRef
9.
Zurück zum Zitat Chung F, Linyuan L (2002) The average distances in random graphs with given expected degrees. Proc Natl Acad Sci 99(25):15879–15882MathSciNetCrossRefMATH Chung F, Linyuan L (2002) The average distances in random graphs with given expected degrees. Proc Natl Acad Sci 99(25):15879–15882MathSciNetCrossRefMATH
10.
Zurück zum Zitat Jiang M, Cui P, Beutel A, Faloutsos C, Yang S (2014) Inferring strange behavior from connectivity pattern in social networks. In: Tseng VS, Ho TB, Zhou Z-H, Chen ALP, Kao H-Y (eds) Advances in knowledge discovery and data mining. Springer, pp 126–138 Jiang M, Cui P, Beutel A, Faloutsos C, Yang S (2014) Inferring strange behavior from connectivity pattern in social networks. In: Tseng VS, Ho TB, Zhou Z-H, Chen ALP, Kao H-Y (eds) Advances in knowledge discovery and data mining. Springer, pp 126–138
11.
Zurück zum Zitat Chakrabarti S (2002) Mining the web: discovering knowledge from hypertext data. Elsevier, San Francisco Chakrabarti S (2002) Mining the web: discovering knowledge from hypertext data. Elsevier, San Francisco
12.
Zurück zum Zitat Aggarwal CC, Wang H (2010) Managing and mining graph data, vol 40. Springer, New York Aggarwal CC, Wang H (2010) Managing and mining graph data, vol 40. Springer, New York
13.
Zurück zum Zitat Pei J, Jiang D, Zhang A (2005) On mining cross-graph quasi-cliques. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp 228–238. ACM Pei J, Jiang D, Zhang A (2005) On mining cross-graph quasi-cliques. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp 228–238. ACM
14.
Zurück zum Zitat Jiang D, Pei J (2009) Mining frequent cross-graph quasi-cliques. ACM Trans Knowl Discov Data (TKDD) 2(4):16MathSciNet Jiang D, Pei J (2009) Mining frequent cross-graph quasi-cliques. ACM Trans Knowl Discov Data (TKDD) 2(4):16MathSciNet
15.
Zurück zum Zitat Yan X, Han J (2003) CloseGraph: mining closed frequent graph patterns. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 286–295. ACM Yan X, Han J (2003) CloseGraph: mining closed frequent graph patterns. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 286–295. ACM
16.
Zurück zum Zitat Lahiri M, Berger-Wolf TY (2010) Periodic subgraph mining in dynamic networks. Knowl Inf Syst 24(3):467–497CrossRef Lahiri M, Berger-Wolf TY (2010) Periodic subgraph mining in dynamic networks. Knowl Inf Syst 24(3):467–497CrossRef
17.
Zurück zum Zitat Bahmani B, Kumar R, Vassilvitskii Sergei (2012) Densest subgraph in streaming and mapreduce. Proc VLDB Endow 5(5):454–465CrossRef Bahmani B, Kumar R, Vassilvitskii Sergei (2012) Densest subgraph in streaming and mapreduce. Proc VLDB Endow 5(5):454–465CrossRef
18.
Zurück zum Zitat Jiang M, Cui P, Beutel A, Faloutsos C, Yang S (2014) CatchSync: catching synchronized behavior in large directed graphs. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 941–950. ACM Jiang M, Cui P, Beutel A, Faloutsos C, Yang S (2014) CatchSync: catching synchronized behavior in large directed graphs. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 941–950. ACM
19.
Zurück zum Zitat Moonesinghe HDK, Tan P-N (2008) Outrank: a graph-based outlier detection framework using random walk. Int J Artif Intell Tools 17(1):19–36CrossRef Moonesinghe HDK, Tan P-N (2008) Outrank: a graph-based outlier detection framework using random walk. Int J Artif Intell Tools 17(1):19–36CrossRef
20.
Zurück zum Zitat Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRefMATH Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRefMATH
21.
Zurück zum Zitat Dhillon IS, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29(11):1944–1957CrossRef Dhillon IS, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29(11):1944–1957CrossRef
22.
Zurück zum Zitat Chakrabarti D (2004) Autopart: Parameter-free graph partitioning and outlier detection. In: Boulicaut J-F, Esposito F, Giannotti F, Pedreschi D (eds) Knowledge discovery in databases: PKDD 2004, vol 3202. Springer, Berlin, Heidelberg, pp 112–124 Chakrabarti D (2004) Autopart: Parameter-free graph partitioning and outlier detection. In: Boulicaut J-F, Esposito F, Giannotti F, Pedreschi D (eds) Knowledge discovery in databases: PKDD 2004, vol 3202. Springer, Berlin, Heidelberg, pp 112–124
23.
Zurück zum Zitat Akoglu L, McGlohon M, Faloutsos C (2010) Oddball: spotting anomalies in weighted graphs. AKDDM 17(1):410–421 Akoglu L, McGlohon M, Faloutsos C (2010) Oddball: spotting anomalies in weighted graphs. AKDDM 17(1):410–421
24.
Zurück zum Zitat Feng J, He X, Konte B, Bhm C, Plant C (2012) Summarization-based mining bipartite graphs. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1249–1257. ACM Feng J, He X, Konte B, Bhm C, Plant C (2012) Summarization-based mining bipartite graphs. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1249–1257. ACM
25.
Zurück zum Zitat Jiang M, Cui P, Beutel A, Faloutsos C, Yang S (2014) Detecting suspicious following behavior in multimillion-node social networks. In: Proceedings of the companion publication of the 23rd international conference on World wide web companion, pp 305–306. International World Wide Web Conferences Steering Committee Jiang M, Cui P, Beutel A, Faloutsos C, Yang S (2014) Detecting suspicious following behavior in multimillion-node social networks. In: Proceedings of the companion publication of the 23rd international conference on World wide web companion, pp 305–306. International World Wide Web Conferences Steering Committee
26.
Zurück zum Zitat Jiang M, Hooi B, Beutel A, Yang S, Cui P, Faloutsos C (2015) A general suspiciousness metric for dense blocks in multimodal data. In: Proceedings of IEEE international conference on data mining. IEEE Jiang M, Hooi B, Beutel A, Yang S, Cui P, Faloutsos C (2015) A general suspiciousness metric for dense blocks in multimodal data. In: Proceedings of IEEE international conference on data mining. IEEE
27.
Zurück zum Zitat Yan D, Huang L, Jordan MI (2009) Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 907–916. ACM Yan D, Huang L, Jordan MI (2009) Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 907–916. ACM
28.
Zurück zum Zitat Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering analysis and an algorithm. In: Proceedings of advances in neural information processing systems. Cambridge, MIT Press 14: 849–856 Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering analysis and an algorithm. In: Proceedings of advances in neural information processing systems. Cambridge, MIT Press 14: 849–856
29.
Zurück zum Zitat Huang L, Yan D, Jordan MI, Taft N (2008) Spectral clustering with perturbed data. In: NIPS, vol 21 Huang L, Yan D, Jordan MI, Taft N (2008) Spectral clustering with perturbed data. In: NIPS, vol 21
30.
Zurück zum Zitat Prakash BA, Sridharan A, Seshadri M, Machiraju S, Faloutsos C (2010) Eigenspokes: surprising patterns and scalable community chipping in large graphs. In: Advances in knowledge discovery and data mining, pp 435–448. Springer, Berlin, Heidelberg Prakash BA, Sridharan A, Seshadri M, Machiraju S, Faloutsos C (2010) Eigenspokes: surprising patterns and scalable community chipping in large graphs. In: Advances in knowledge discovery and data mining, pp 435–448. Springer, Berlin, Heidelberg
31.
Zurück zum Zitat Ying X, Xintao W (2009) On randomness measures for social networks. SDM 9:709–720 Ying X, Xintao W (2009) On randomness measures for social networks. SDM 9:709–720
32.
Zurück zum Zitat Wu L, Ying X, Wu X, Zhou Z-H (2011) Line orthogonality in adjacency eigenspace with application to community partition. In: Proceedings of the twenty-second international joint conference on artificial intelligence, Vol 3, pp 2349–2354. AAAI Press Wu L, Ying X, Wu X, Zhou Z-H (2011) Line orthogonality in adjacency eigenspace with application to community partition. In: Proceedings of the twenty-second international joint conference on artificial intelligence, Vol 3, pp 2349–2354. AAAI Press
33.
Zurück zum Zitat Newman Mark EJ (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104MathSciNetCrossRef Newman Mark EJ (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104MathSciNetCrossRef
34.
Zurück zum Zitat Satuluri V, Parthasarathy S (2009) Scalable graph clustering using stochastic flows: applications to community discovery. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 737–746. ACM Satuluri V, Parthasarathy S (2009) Scalable graph clustering using stochastic flows: applications to community discovery. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 737–746. ACM
35.
Zurück zum Zitat Aaron C, Newman MEJ, Cristopher M (2004) Finding community structure in very large networks. Phys Rev E 70(6):066111CrossRef Aaron C, Newman MEJ, Cristopher M (2004) Finding community structure in very large networks. Phys Rev E 70(6):066111CrossRef
36.
Zurück zum Zitat Wakita K, Tsurumi T (2007) Finding community structure in mega-scale social networks:[extended abstract]. In: Proceedings of the 16th international conference on World Wide Web, pp 1275–1276. ACM Wakita K, Tsurumi T (2007) Finding community structure in mega-scale social networks:[extended abstract]. In: Proceedings of the 16th international conference on World Wide Web, pp 1275–1276. ACM
38.
Zurück zum Zitat Brownrigg DRK (1984) The weighted median filter. Commun ACM 27(8):807–818CrossRef Brownrigg DRK (1984) The weighted median filter. Commun ACM 27(8):807–818CrossRef
39.
Zurück zum Zitat Kang U, Meeder B, Papalexakis EE, Faloutsos C (2014) Heigen: spectral analysis for billion-scale graphs. Knowl Data Eng IEEE Trans 26(2):350–362CrossRef Kang U, Meeder B, Papalexakis EE, Faloutsos C (2014) Heigen: spectral analysis for billion-scale graphs. Knowl Data Eng IEEE Trans 26(2):350–362CrossRef
40.
Zurück zum Zitat Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Wiener Janet (2000) Graph structure in the web. Comput Netw 33(1):309–320CrossRef Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Wiener Janet (2000) Graph structure in the web. Comput Netw 33(1):309–320CrossRef
41.
Zurück zum Zitat Faloutsos M, Faloutsos P, Faloutsos C (1999) On power-law relationships of the internet topology. In: ACM SIGCOMM computer communication review, vol 29, no 4, pp 251–262. ACM Faloutsos M, Faloutsos P, Faloutsos C (1999) On power-law relationships of the internet topology. In: ACM SIGCOMM computer communication review, vol 29, no 4, pp 251–262. ACM
42.
Zurück zum Zitat Hall BH, Jaffe AB, Trajtenberg M (2001) The NBER patent citations data file: lessons, insights and methodological tools. In: NBER working papers 8498, National Bureau of Economic Research, Inc Hall BH, Jaffe AB, Trajtenberg M (2001) The NBER patent citations data file: lessons, insights and methodological tools. In: NBER working papers 8498, National Bureau of Economic Research, Inc
43.
Zurück zum Zitat Trappey CV, Trappey AJC, Wu C-Y (2001) Clustering patents using non-exhaustive overlaps. J Syst Sci Syst Eng 19(2):162–181CrossRef Trappey CV, Trappey AJC, Wu C-Y (2001) Clustering patents using non-exhaustive overlaps. J Syst Sci Syst Eng 19(2):162–181CrossRef
Metadaten
Titel
Inferring lockstep behavior from connectivity pattern in large graphs
verfasst von
Meng Jiang
Peng Cui
Alex Beutel
Christos Faloutsos
Shiqiang Yang
Publikationsdatum
01.08.2016
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 2/2016
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-015-0883-y

Weitere Artikel der Ausgabe 2/2016

Knowledge and Information Systems 2/2016 Zur Ausgabe