skip to main content
article

Graph mining: Laws, generators, and algorithms

Published:29 June 2006Publication History
Skip Abstract Section

Abstract

How does the Web look? How could we tell an abnormal social network from a normal one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks to sociology to biology and many more. Indeed, any M : N relation in database terminology can be represented as a graph. A lot of these questions boil down to the following: “How can we generate synthetic but realistic graphs?” To answer this, we must first understand what patterns are common in real-world graphs and can thus be considered a mark of normality/realism. This survey give an overview of the incredible variety of work that has been done on these problems. One of our main contributions is the integration of points of view from physics, mathematics, sociology, and computer science. Further, we briefly describe recent advances on some related and interesting graph problems.

References

  1. Adamic, L. A. and Huberman, B. A. 2000. Power-law distribution of the World Wide Web. Science 287, 2115.]]Google ScholarGoogle ScholarCross RefCross Ref
  2. Adamic, L. A. and Huberman, B. A. 2001. The Web's hidden order. Comm. ACM 44, 9, 55--60.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Adamic, L. A., Lukose, R. M., Puniyani, A. R., and Huberman, B. A. 2001. Search in power-law networks. Physical Rev. E 64, 4, 046135 1--8.]]Google ScholarGoogle ScholarCross RefCross Ref
  4. Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. In International Conference on Very Large Data Bases. Morgan Kaufmann, San Francisco, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Aiello, W., Chung, F., and Lu, L. 2000. A random graph model for massive graphs. In ACM Symposium on Theory of Computing. ACM Press, New York, NY, 171--180.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Aiello, W., Chung, F., and Lu, L. 2001. Random evolution in massive graphs. In IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Albert, R. and Barabási, A.-L. 2000. Topology of evolving networks: local events and universality. Physical Rev. Lett. 85, 24, 5234--5237.]]Google ScholarGoogle ScholarCross RefCross Ref
  8. Albert, R. and Barabási, A.-L. 2002. Statistical mechanics of complex networks. Rev. Modern Physics 74, 1, 47--97.]]Google ScholarGoogle ScholarCross RefCross Ref
  9. Albert, R., Jeong, H., and Barabási, A.-L. 1999. Diameter of the World-Wide Web. Nature 401, 130--131.]]Google ScholarGoogle ScholarCross RefCross Ref
  10. Albert, R., Jeong, H., and Barabási, A.-L. 2000. Error and attack tolerance of complex networks. Nature 406, 378--381.]]Google ScholarGoogle ScholarCross RefCross Ref
  11. Alon, N. 1998. Spectral techniques in graph algorithms. In Lecture Notes in Computer Science 1380, C. L. Lucchesi and A. V. Moura, Eds. Springer-Verlag, Berlin, 206--215.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Alon, N., Yuster, R., and Zwick, U. 1997. Finding and counting given length cycles. Algorithmica 17, 3, 209--223.]]Google ScholarGoogle ScholarCross RefCross Ref
  13. Amaral, L. A. N., Scala, A., Barthélémy, M., and Stanley, H. E. 2000. Classes of small-world networks. Proceedings of the National Academy of Sciences 97, 21, 11149--11152.]]Google ScholarGoogle ScholarCross RefCross Ref
  14. Baeza-Yates, R. and Poblete, B. 2003. Evolution of the Chilean Web structure composition. In Latin American Web Congress. IEEE Computer Society Press, Los Alamitos, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Bailey, N. T. J. 1974. The Mathematical Theory of Infectious Diseases and its Applications 2nd Ed. Charles Griffin, London, UK.]]Google ScholarGoogle Scholar
  16. Baker, W. E. and Faulkner, R. R. 1993. The social organization of conspiracy: Illegal networks in the Heavy Electrical Equipment industry. Amer. Sociolog. Rev. 58, 6, 837--860.]]Google ScholarGoogle ScholarCross RefCross Ref
  17. Bar-Yossef, Z., Kumar, R., and Sivakumar, D. 2002. Reductions in streaming algorithms, with an application to counting triangles in graphs. In ACM-SIAM Symposium on Discrete Algorithms. SIAM, Philadelphia, PA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Barabási, A.-L. 2002. Linked: The New Science of Networks 1st Ed. Perseus Books Group, New York, NY.]]Google ScholarGoogle Scholar
  19. Barabási, A.-L. and Albert, R. 1999. Emergence of scaling in random networks. Science 286, 509--512.]]Google ScholarGoogle ScholarCross RefCross Ref
  20. Barabási, A.-L., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., and Vicsek, T. 2002. Evolution of the social network of scientific collaborations. Physica A 311, 590--614.]]Google ScholarGoogle ScholarCross RefCross Ref
  21. Beirlant, J., de Wet, T., and Goegebeur, Y. 2005. A goodness-of-fit statistic for Pareto-type behaviour. J. Computat. Appl. Math. 186, 1, 99--116.]]Google ScholarGoogle ScholarCross RefCross Ref
  22. Ben-Hur, A. and Guyon, I. 2003. Detecting stable clusters using principal component analysis. In Methods in Molecular Biology. M. J. Brownstein and A. Khudorsky, Eds. Humana Press, Totowa, NJ, 159--182.]]Google ScholarGoogle Scholar
  23. Berger, N., Borgs, C., Chayes, J. T., D'Souza, R. M., and Kleinberg, B. D. 2005. Competition-induced preferential attachment. Combinat. Probabil. Comput. 14, 697--721.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Berry, M. W. 1992. Large scale singular value computations. Inte. J. Supercomput. Applic. 6, 1, 13--49.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Bi, Z., Faloutsos, C., and Korn, F. 2001. The DGX distribution for mining massive, skewed data. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining. ACM Press, New York, NY, 17--26.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Bianconi, G. and Barabási, A.-L. 2001. Competition and multiscaling in evolving networks. Europhysics Letters 54, 4, 436--442.]]Google ScholarGoogle ScholarCross RefCross Ref
  27. Boguñá, M. and Pastor-Satorras, R. 2002. Epidemic spreading in correlated complex networks. Physical Rev. E 66, 4, 047104.]]Google ScholarGoogle ScholarCross RefCross Ref
  28. Boldi, P., Codenotti, B., Santini, M., and Vigna, S. 2002. Structural properties of the African Web. In International World Wide Web Conference. ACM Press, New York, NY.]]Google ScholarGoogle Scholar
  29. Bollobás, B. 1985. Random Graphs. Academic Press, London, UK.]]Google ScholarGoogle Scholar
  30. Bollobás, B., Borgs, C., Chayes, J. T., and Riordan, O. 2003. Directed scale-free graphs. In ACM-SIAM Symposium on Discrete Algorithms. SIAM, Philadelphia, PA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Bollobás, B. and Riordan, O. 2002. The diameter of a scale-free random graph. Combinatorica.]]Google ScholarGoogle Scholar
  32. Bonacich, P. 1987. Power and centrality: A family of measures. Amer. J. Sociol. 92, 5 (Mar.), 1170--1182.]]Google ScholarGoogle ScholarCross RefCross Ref
  33. Borgatti, S. 2002. The key player problem. In Proceedings of the National Academy of Sciences Workshop on Terrorism. National Academy of Sciences, Washington DC.]]Google ScholarGoogle Scholar
  34. Borgatti, S. and Everett, M. G. 1989. The class of all regular equivalences: Algebraic structure and computation. Social Networks 11, 65--88.]]Google ScholarGoogle ScholarCross RefCross Ref
  35. Borgatti, S. and Everett, M. G. 1999. Models of core/periphery structures. Social Networks 21, 275--295.]]Google ScholarGoogle Scholar
  36. Borgatti, S., Everett, M. G., and Freeman, L. C. 1999. UCINET V User's Guide. Analytic Technologies.]]Google ScholarGoogle Scholar
  37. Borgs, C., Chayes, J. T., Mahdian, M., and Saberi, A. 2004. Exploring the community structure of newsgroups (Extended Abstract). In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining. ACM Press, New York, NY.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Brandes, U., Gaertler, M., and Wagner, D. 2003. Experiments on graph clustering algorithms. In European Symposium on Algorithms. Springer Verlag, Berlin, Germany.]]Google ScholarGoogle Scholar
  39. Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30, 1--7, 107--117.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Broder, A. Z., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., and Wiener, J. 2000. Graph structure in the web: experiments and models. In International World Wide Web Conference. ACM Press, New York, NY.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Bu, T. and Towsley, D. 2002. On distinguishing between Internet power law topology generators. In IEEE INFOCOM. IEEE Computer Society Press, Los Alamitos, CA.]]Google ScholarGoogle Scholar
  42. Burt, R. S. 1992. Structural Holes. Harvard University Press, Cambridge, MA.]]Google ScholarGoogle Scholar
  43. Burt, R. S. 2001. Structural holes versus network closure as social capital. In Social Capital: Theory and Research. N. Lin, K. S. Cook, and R. S. Burt, Eds. Aldine de Gruyter, Hawthorne, NY.]]Google ScholarGoogle Scholar
  44. Callaway, D. S., Newman, M. E. J., Strogatz, S. H., and Watts, D. J. 2000. Network robustness and fragility: Percolation on random graphs. Physical Rev. Lett. 85, 25, 5468--5471.]]Google ScholarGoogle ScholarCross RefCross Ref
  45. Calvert, K. L., Doar, M. B., and Zegura, E. W. 1997. Modeling Internet topology. IEEE Comm. 35, 6, 160--163.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Carlson, J. M. and Doyle, J. 1999. Highly optimized tolerance: A mechanism for power laws in designed systems. Physical Rev. E 60, 2, 1412--1427.]]Google ScholarGoogle ScholarCross RefCross Ref
  47. Chakrabarti, D. 2004. AutoPart: Parameter-free graph partitioning and outlier detection. In Conference on Principles and Practice of Knowledge Discovery in Databases. Springer, Berlin, Germany.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Chakrabarti, D., Papadimitriou, S., Modha, D. S., and Faloutsos, C. 2004. Fully automatic Cross-associations. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining. ACM Press, New York, NY.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Chakrabarti, D., Zhan, Y., and Faloutsos, C. 2004. R-MAT: A recursive model for graph mining. In SIAM Data Mining Conference. SIAM, Philadelphia, PA.]]Google ScholarGoogle Scholar
  50. Chakrabarti, S. 1999. Recent results in automatic Web resource discovery. ACM Comput. Surv. 31, 4.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Chakrabarti, S., Punera, K., and Subramanyam, M. 2002. Accelerated focused crawling through online relevance feedback. In International World Wide Web Conference. ACM Press, New York, NY, 148--159.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Chakrabarti, S., van den Berg, M., and Dom, B. E. 1999. Focused crawling: A new approach to topic-specific Web resource discovery. Comput. Netw. 31, 11--16, 1623--1640.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Chen, Q., Chang, H., Govindan, R., Jamin, S., Shenker, S., and Willinger, W. 2001. The origin of power laws in Internet topologies revisited. In IEEE INFOCOM. IEEE Computer Society Press, Los Alamitos, CA.]]Google ScholarGoogle Scholar
  54. Clauset, A., Newman, M. E. J., and Moore, C. 2004. Finding community structure of very large networks. Physical Rev. E 70, 066111.]]Google ScholarGoogle ScholarCross RefCross Ref
  55. Coleman, J. S. 1988. Social capital in the creation of human capital. Amer. J. Sociol. 94, S95--S121.]]Google ScholarGoogle ScholarCross RefCross Ref
  56. Cooper, C. and Frieze, A. 2003. A general model of web graphs. Rand. Struct. Algor. 22, 3, 311--335.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Cooper, C. and Frieze, A. 2004. The size of the largest strongly connected component of a random digraph with a given degree sequence. Combinat. Probabil. Comput. 13, 3, 319--337.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Cormen, T. H., Leiserson, C. E., and Rivest, R. L. 1992. Introduction to Algorithms 6th Ed. MIT Press and McGraw-Hill Book Company, Cambridge, MA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Cross, R., Borgatti, S., and Parker, A. 2002. Making invisible work visible: Using social network analysis to support strategic collaboration. CA. Manag. Rev. 44, 2, 25--46.]]Google ScholarGoogle ScholarCross RefCross Ref
  60. Crovella, M. and Taqqu, M. S. 1999. Estimating the heavy tail index from scaling properties. Metho. Comput. Appl. Probabil. 1, 1, 55--79.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. de Solla Price, D. J. 1976. A general theory of bibliometric and other cumulative advantage processes. J. Amer. Soci. Inform. Science 27, 292--306.]]Google ScholarGoogle ScholarCross RefCross Ref
  62. Dehaspe, L. and Toivonen, H. 1999. Discovery of frequent datalog patterns. Data Mining Knowl. Discov. 3, 1, 7--36.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Dehaspe, L., Toivonen, H., and King, R. D. 1998. Finding frequent substructures in chemical compounds. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining. ACM Press, New York, NY.]]Google ScholarGoogle Scholar
  64. Dill, S., Kumar, R., McCurley, K. S., Rajagopalan, S., Sivakumar, D., and Tomkins, A. 2001. Self-similarity in the Web. In International Conference on Very Large Data Bases. Morgan Kaufmann, San Francisco, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Dombroski, M., Fischbeck, P., and Carley, K. M. 2003. Estimating the shape of covert networks. In Proceedings of the 8th International Command and Control Research and Technology Symposium.]]Google ScholarGoogle Scholar
  66. Domingos, P. and Richardson, M. 2001. Mining the network value of customers. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining. ACM Press, New York, NY.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Dorogovtsev, S. N., Goltsev, A. V., and Mendes, J. F. 2002. Pseudofractal scale-free web. Physical Rev. E 65, 6, 066122.]]Google ScholarGoogle ScholarCross RefCross Ref
  68. Dorogovtsev, S. N. and Mendes, J. F. 2003. Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford University Press, Oxford, UK.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Dorogovtsev, S. N., Mendes, J. F., and Samukhin, A. N. 2000. Structure of growing networks with preferential linking. Physical Rev. Lett. 85, 21, 4633--4636.]]Google ScholarGoogle ScholarCross RefCross Ref
  70. Dorogovtsev, S. N., Mendes, J. F., and Samukhin, A. N. 2001. Giant strongly connected component of directed networks. Physical Rev. E 64, 025101 1--4.]]Google ScholarGoogle ScholarCross RefCross Ref
  71. Doyle, J. and Carlson, J. M. 2000. Power laws, highly optimized tolerance, and generalized source coding. Physical Rev. Lett. 84, 24 (June), 5656--5659.]]Google ScholarGoogle ScholarCross RefCross Ref
  72. Drineas, P., Frieze, A., Kannan, R., Vempala, S., and Vinay, V. 1999. Clustering in large graphs and matrices. In ACM-SIAM Symposium on Discrete Algorithms. SIAM, Philadelphia, PA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Džeroski, S. and Lavrač, N. 2001. Relational Data Mining. Springer Verlag, Berlin, Germany.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Erdős, P. and Rényi, A. 1960. On the evolution of random graphs. Publication of the Mathematical Institute of the Hungarian Acadamy of Science 5, 17--61.]]Google ScholarGoogle Scholar
  75. Erdős, P. and Rényi, A. 1961. On the strength of connectedness of random graphs. Acta Mathematica Scientia Hungary 12, 261--267.]]Google ScholarGoogle ScholarCross RefCross Ref
  76. Everitt, B. S. 1974. Cluster Analysis. John Wiley, New York, NY.]]Google ScholarGoogle Scholar
  77. Fabrikant, A., Koutsoupias, E., and Papadimitriou, C. H. 2002. Heuristically optimized trade-offs: A new paradigm for power laws in the Internet. In Proceedings of the International Colloquium on Automata, Languages and Programming. Springer Verlag, Berlin, Germany, 110--122.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Faloutsos, M., Faloutsos, P., and Faloutsos, C. 1999. On power-law relationships of the Internet topology. In Conference of the ACM Special Interest Group on Data Communications (SIGCOMM). ACM Press, New York, NY, 251--262.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Feuerverger, A. and Hall, P. 1999. Estimating a tail exponent by modelling departure from a Pareto distribution. Annals Statist. 27, 2, 760--781.]]Google ScholarGoogle ScholarCross RefCross Ref
  80. Flake, G. W., Lawrence, S., and Giles, C. L. 2000. Efficient identification of Web communities. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining. ACM Press, New York, NY.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Freeman, L. C. 1977. A set of measures of centrality based on betweenness. Sociometry 40, 1, 35--41.]]Google ScholarGoogle ScholarCross RefCross Ref
  82. Gibson, D., Kleinberg, J., and Raghavan, P. 1998. Inferring web communities from link topology. In ACM Conference on Hypertext and Hypermedia. ACM Press, New York, NY, 225--234.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Giles, C. L., Bollacker, K., and Lawrence, S. 1998. Citeseer: An automatic citation indexing system. In ACM Conference on Digital Libraries. ACM Press, New York, NY.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Girvan, M. and Newman, M. E. J. 2002. Community structure in social and biological networks. In Proceedings of the National Academy of Sciences. Vol. 99. National Academy of Sciences, Washington DC.]]Google ScholarGoogle Scholar
  85. Glover, F. 1989. Tabu search---part 1. ORSA J. Comput. 1, 3, 190--206.]]Google ScholarGoogle ScholarCross RefCross Ref
  86. Goh, K.-I., Oh, E., Jeong, H., Kahng, B., and Kim, D. 2002. Classificaton of scale-free networks. In Proceedings of the National Academy of Sciences. Vol. 99. National Academy of Sciences, Washington DC, 12583--12588.]]Google ScholarGoogle Scholar
  87. Goldstein, M. L., Morris, S. A., and Yen, G. G. 2004. Problems with fitting to the power-law distribution. European Physics J. B 41, 255--258.]]Google ScholarGoogle ScholarCross RefCross Ref
  88. Govindan, R. and Tangmunarunkit, H. 2000. Heuristics for Internet map discovery. In IEEE INFOCOM. IEEE Computer Society Press, Los Alamitos, CA, 1371--1380.]]Google ScholarGoogle Scholar
  89. Granovetter, M. S. 1973. The strength of weak ties. Amer. J. Sociol. 78, 6 (May), 1360--1380.]]Google ScholarGoogle ScholarCross RefCross Ref
  90. Hanneman, R. A. and Riddle, M. 2005. Introduction to social network methods. http://faculty.ucr.edu/hanneman/nettext/.]]Google ScholarGoogle Scholar
  91. Hill, B. M. 1975. A simple approach to inference about the tail of a distribution. Annals Statist. 3, 5, 1163--1174.]]Google ScholarGoogle ScholarCross RefCross Ref
  92. Holder, L., Cook, D., and Djoko, S. 1994. Substructure discovery in the SUBDUE system. In National Conference on Artificial Intelligence Workshop on Knowledge Discovery in Databases. AAAI Press, Menlo Park, CA, 169--180.]]Google ScholarGoogle Scholar
  93. Inokuchi, A., Washio, T., and Motoda, H. 2000. An apriori-based algorithm for mining frequent substructures from graph data. In Conference on Principles and Practice of Knowledge Discovery in Databases. Springer, Berlin, Germany.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N., and Barabási, A.-L. 2000. The large-scale organization of metabolic networks. Nature 407, 6804, 651--654.]]Google ScholarGoogle Scholar
  95. Kannan, R., Vempala, S., and Vetta, A. 2000. On clusterings -- good, bad and spectral. In IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Karypis, G. and Kumar, V. 1998. Multilevel algorithms for multi-constraint graph partitioning. Tech. Rep. 98-019, University of Minnesota.]]Google ScholarGoogle Scholar
  97. Kautz, H., Selman, B., and Shah, M. 1997. Referralweb: Combining social networks and collaborative filtering. Comm. ACM 40, 3, 63--65.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Kempe, D., Kleinberg, J., and Demers, A. J. 2001. Spatial gossip and resource location protocols. In ACM Symposium on Theory of Computing. ACM Press, New York, NY.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Kephart, J. O. and White, S. R. 1991. Directed-graph epidemiological models of computer viruses. In IEEE Symposium on Research in Security and Privacy. IEEE Computer Society Press, Los Alamitos, CA.]]Google ScholarGoogle Scholar
  100. Kephart, J. O. and White, S. R. 1993. Measuring and modeling computer virus prevalence. In IEEE Symposium on Research in Security and Privacy. IEEE Computer Society Press, Los Alamitos, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Killworth, P. D. and Bernard, H. R. 1978. Reverse small world experiment. Social Networks 1, 2, 103--210.]]Google ScholarGoogle ScholarCross RefCross Ref
  102. Kleinberg, J. 1999a. Authoritative sources in a hyperlinked environment. J. ACM 46, 5, 604--632.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Kleinberg, J. 1999b. The small-world phenomenon: An algorithmic perspective. Tech. Rep. 99-1776, Computer Science Department, Cornell.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. Kleinberg, J. 2001. Small world phenomena and the dynamics of information. In Neural Information Processing Systems Conference. MIT Press, Cambridge, MA.]]Google ScholarGoogle Scholar
  105. Kleinberg, J., Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. 1999. The web as a graph: Measurements, models and methods. In International Computing and Combinatorics Conference. Springer, Berlin, Germany.]]Google ScholarGoogle Scholar
  106. Krapivsky, P. L. and Redner, S. 2001. Organization of growing random networks. Physical Rev. E 63, 6, 066123 1--14.]]Google ScholarGoogle ScholarCross RefCross Ref
  107. Krebs, V. E. 2001. Mapping networks of terrorist cells. Connections 24, 3, 43--52.]]Google ScholarGoogle Scholar
  108. Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A., and Upfal, E. 2000. Stochastic models for the Web graph. In IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. 1999. Extracting large-scale knowledge bases from the web. In International Conference on Very Large Data Bases. Morgan Kaufmann, San Francisco, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. Kuramochi, M. and Karypis, G. 2001. Frequent subgraph discovery. In IEEE International Conference on Data Mining. IEEE Computer Society Press, Los Alamitos, CA, 313--320.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. Kuramochi, M. and Karypis, G. 2002. Discovering frequent geometric subgraphs. In IEEE International Conference on Data Mining. IEEE Computer Society Press, Los Alamitos, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. Leskovec, J., Chakrabarti, D., Kleinberg, J., and Faloutsos, C. 2005. Realistic, mathematically tractable graph generation and evolution, using Kronecker Multiplication. In Conference on Principles and Practice of Knowledge Discovery in Databases. Springer, Berlin, Germany.]]Google ScholarGoogle Scholar
  113. Leskovec, J., Kleinberg, J., and Faloutsos, C. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining. ACM Press, New York, NY.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. Madeira, S. C. and Oliveira, A. L. 2004. Biclustering algorithms for biological data analysis: A survey. IEEE Trans. Computat. Biology Bioinform. 1, 1, 24--45.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. McGovern, A. and Jensen, D. 2003. Identifying predictive structures in relational data using multiple instance learning. In International Conference on Machine Learning. AAAI Press, Menlo Park, CA.]]Google ScholarGoogle Scholar
  116. Medina, A., Matta, I., and Byers, J. 2000. On the origin of power laws in Internet topologies. In Conference of the ACM Special Interest Group on Data Communications (SIGCOMM). ACM Press, New York, NY, 18--34.]]Google ScholarGoogle Scholar
  117. Mihail, M. and Papadimitriou, C. H. 2002. On the eigenvalue power law. In International Workshop on Randomization and Approximation Techniques in Computer Science. Springer Verlag, Berlin, Germany.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. Milgram, S. 1967. The small-world problem. Psychology Today 2, 60--67.]]Google ScholarGoogle Scholar
  119. Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovshii, D., and Alon, U. 2002. Network motifs: Simple building blocks of complex networks. Science 298, 824--827.]]Google ScholarGoogle ScholarCross RefCross Ref
  120. Mitzenmacher, M. 2001. A brief history of generative models for power law and lognormal distributions. In Proceedings of the 39th Annual Allerton Conference on Communication, Control, and Computing. UIUC Press, Urbana-Champaign, IL.]]Google ScholarGoogle Scholar
  121. Montgomery, A. L. and Faloutsos, C. 2001. Identifying Web browsing trends and patterns. IEEE Comput. 34, 7, 94--95.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. Moody, J. 2001. Race, school integration, and friendship segregation in America. Amer. J. Sociol. 107, 3, 679--716.]]Google ScholarGoogle ScholarCross RefCross Ref
  123. Newman, M. E. J. 2003. The structure and function of complex networks. SIAM Rev. 45, 167--256.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. Newman, M. E. J. 2005. Power laws, pareto distributions and Zipf's law. Contemp. Physics 46, 323--351.]]Google ScholarGoogle ScholarCross RefCross Ref
  125. Newman, M. E. J., Forrest, S., and Balthrop, J. 2002. Email networks and the spread of computer viruses. Physical Rev. E 66, 3, 035101 1--4.]]Google ScholarGoogle ScholarCross RefCross Ref
  126. Newman, M. E. J., Girvan, M., and Farmer, J. D. 2002. Optimal design, robustness and risk aversion. Physical Rev. Lett. 89, 2, 028301 1--4.]]Google ScholarGoogle ScholarCross RefCross Ref
  127. Newman, M. E. J., Strogatz, S. H., and Watts, D. J. 2001. Random graphs with arbitrary degree distributions and their applications. Physical Rev. E 64, 2, 026118 1--17.]]Google ScholarGoogle ScholarCross RefCross Ref
  128. Nijssen, S. and Kok, J. 2001. Faster association rules for multiple relations. In International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Francisco, CA.]]Google ScholarGoogle Scholar
  129. Palmer, C., Gibbons, P. B., and Faloutsos, C. 2002. ANF: A fast and scalable tool for data mining in massive graphs. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining. ACM Press, New York, NY.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  130. Palmer, C. and Steffan, J. G. 2000. Generating network topologies that obey power laws. In IEEE Global Telecommunications Conference. IEEE Computer Society Press, Los Alamitos, CA.]]Google ScholarGoogle Scholar
  131. Pandurangan, G., Raghavan, P., and Upfal, E. 2002. Using PageRank to characterize Web structure. In International Computing and Combinatorics Conference. Springer, Berlin, Germany.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  132. Pastor-Satorras, R., Vásquez, A., and Vespignani, A. 2001. Dynamical and correlation properties of the Internet. Physical Rev. Lett. 87, 25, 258701 1--4.]]Google ScholarGoogle ScholarCross RefCross Ref
  133. Pastor-Satorras, R. and Vespignani, A. 2001b. Epidemic dynamics and endemic states in complex networks. Physical Rev. E 63, 6, 066117 1--8.]]Google ScholarGoogle ScholarCross RefCross Ref
  134. Pastor-Satorras, R. and Vespignani, A. 2001a. Epidemic spreading in scale-free networks. Physical Rev. Lett. 86, 14, 3200--3203.]]Google ScholarGoogle ScholarCross RefCross Ref
  135. Pastor-Satorras, R. and Vespignani, A. 2002a. Epidemic dynamics in finite size scale-free networks. Physical Rev. E 65, 3, 035108 1--4.]]Google ScholarGoogle ScholarCross RefCross Ref
  136. Pastor-Satorras, R. and Vespignani, A. 2002b. Immunization of complex networks. Physical Rev. E 65, 3, 036104 1--8.]]Google ScholarGoogle ScholarCross RefCross Ref
  137. Pennock, D. M., Flake, G. W., Lawrence, S., Glover, E. J., and Giles, C. L. 2002. Winners don't take all: Characterizing the competition for links on the Web. Proceedings of the National Academy of Sciences 99, 8, 5207--5211.]]Google ScholarGoogle ScholarCross RefCross Ref
  138. Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. 1992. Numerical Recipes in C 2nd Ed. Cambridge University Press, Cambridge, UK.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  139. Ravasz, E. and Barabási, A.-L. 2002. Hierarchical organization in complex networks. Physical Rev. E 65, 026112 1--7.]]Google ScholarGoogle Scholar
  140. Redner, S. 1998. How popular is your paper? an empirical study of the citation distribution. European Physics J. B 4, 131--134.]]Google ScholarGoogle ScholarCross RefCross Ref
  141. Richardson, M. and Domingos, P. 2002. Mining knowledge-sharing sites for viral marketing. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining. ACM Press, New York, NY, 61--70.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  142. Schwartz, M. F. and Wood, D. C. M. 1993. Discovering shared interests using graph analysis. Comm. ACM 36, 8, 78--89.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  143. Simon, H. 1955. On a class of skew distribution functions. Biometrika 42, 3/4, 425--440.]]Google ScholarGoogle Scholar
  144. Solé, R. V. and Montoya, J. M. 2001. Complexity and fragility in ecological networks. In Proceedings of the Royal Society of London B. Vol. 268. The Royal Society, London, UK, 2039--2045.]]Google ScholarGoogle Scholar
  145. Sparrow, M. K. 1991. The application of network analysis to criminal intelligence: An assessment of the prospects. Social Networks 13, 3, 251--274.]]Google ScholarGoogle ScholarCross RefCross Ref
  146. Spielman, D. A. and Teng, S.-H. 1996. Spectral partitioning works: Planar graphs and finite element meshes. In IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, CA, 96--105.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  147. Tangmunarunkit, H., Govindan, R., Jamin, S., Shenker, S., and Willinger, W. 2001. Network topologies, power laws, and hierarchy. Tech. Rep. 01-746, University of Southern California.]]Google ScholarGoogle Scholar
  148. Tangmunarunkit, H., Govindan, R., Jamin, S., Shenker, S., and Willinger, W. 2002. Network topology generators: Degree-based vs. structural. In Conference of the ACM Special Interest Group on Data Communications (SIGCOMM). ACM Press, New York, NY.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  149. Tauro, S. L., Palmer, C., Siganos, G., and Faloutsos, M. 2001. A simple conceptual model for the Internet topology. In Global Internet. IEEE Computer Society Press, Los Alamitos, CA.]]Google ScholarGoogle Scholar
  150. Tibshirani, R., Walther, G., and Hastie, T. 2001. Estimating the number of clusters in a dataset via the Gap statistic. J. Royal Statist. Soc., B 63, 411--423.]]Google ScholarGoogle ScholarCross RefCross Ref
  151. Travers, J. and Milgram, S. 1969. An experimental study of the Small World problem. Sociometry 32, 4, 425--443.]]Google ScholarGoogle ScholarCross RefCross Ref
  152. Tyler, J. R., Wilkinson, D. M., and Huberman, B. A. 2003. Email as Spectroscopy: Automated Discovery of Community Structure Within Organizations. Kluwer, B. V., The Netherlands.]]Google ScholarGoogle Scholar
  153. van Dongen, S. M. 2000. Graph clustering by flow simulation. Ph.D. thesis, Univesity of Utrecht.]]Google ScholarGoogle Scholar
  154. Virtanen, S. 2003. Clustering the Chilean Web. In Latin American Web Congress. IEEE Computer Society Press, Los Alamitos, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  155. Wang, Y., Chakrabarti, D., Wang, C., and Faloutsos, C. 2003. Epidemic spreading in real networks: An eigenvalue viewpoint. In Symposium on Reliable Distributed Systems. IEEE Computer Society Press, Los Alamitos, CA, 25--34.]]Google ScholarGoogle Scholar
  156. Wasserman, S. and Faust, K. 1994. Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge, UK.]]Google ScholarGoogle Scholar
  157. Watts, D. J. 2003. Six Degrees: The Science of a Connected Age 1st Ed. W. W. Norton and Company, New York, NY.]]Google ScholarGoogle Scholar
  158. Watts, D. J., Dodds, P. S., and Newman, M. E. J. 2002. Identity and search in social networks. Science 296, 1302--1305.]]Google ScholarGoogle ScholarCross RefCross Ref
  159. Watts, D. J. and Strogatz, S. H. 1998. Collective dynamics of ‘small-world’ networks. Nature 393, 440--442.]]Google ScholarGoogle ScholarCross RefCross Ref
  160. Waxman, B. M. 1988. Routing of multipoint connections. IEEE J. Select. Areas Comm. 6, 9 (Dec.), 1617--1622.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  161. Weeks, M. R., Clair, S., Borgatti, S., Radda, K., and Schensul, J. J. 2002. Social networks of drug users in high-risk sites: Finding the connections. AIDS Behav. 6, 2, 193--206.]]Google ScholarGoogle ScholarCross RefCross Ref
  162. Winick, J. and Jamin, S. 2002. Inet-3.0: Internet Topology Generator. Tech. Rep. CSE-TR-456-02, University of Michigan, Ann Arbor, MS.]]Google ScholarGoogle Scholar
  163. Wu, F. and Huberman, B. A. 2004. Finding communities in linear time: A physics approach. European Physics J. B 38, 2, 331--338.]]Google ScholarGoogle ScholarCross RefCross Ref
  164. Yan, X. and Han, J. 2002. gSpan: Graph-based substructure pattern mining. In IEEE International Conference on Data Mining. IEEE Computer Society Press, Los Alamitos, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  165. Yook, S.-H., Jeong, H., and Barabási, A.-L. 2002. Modeling the Internet's large-scale topology. Proceedings of the National Academy of Sciences 99, 21, 13382--13386.]]Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Graph mining: Laws, generators, and algorithms

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Computing Surveys
              ACM Computing Surveys  Volume 38, Issue 1
              2006
              173 pages
              ISSN:0360-0300
              EISSN:1557-7341
              DOI:10.1145/1132952
              Issue’s Table of Contents

              Copyright © 2006 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 29 June 2006
              Published in csur Volume 38, Issue 1

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader