ABSTRACT
Information diffusion and virus propagation are fundamental processes talking place in networks. While it is often possible to directly observe when nodes become infected, observing individual transmissions (i.e., who infects whom or who influences whom) is typically very difficult. Furthermore, in many applications, the underlying network over which the diffusions and propagations spread is actually unobserved. We tackle these challenges by developing a method for tracing paths of diffusion and influence through networks and inferring the networks over which contagions propagate. Given the times when nodes adopt pieces of information or become infected, we identify the optimal network that best explains the observed infection times. Since the optimization problem is NP-hard to solve exactly, we develop an efficient approximation algorithm that scales to large datasets and in practice gives provably near-optimal performance. We demonstrate the effectiveness of our approach by tracing information cascades in a set of 170 million blogs and news articles over a one year period to infer how information flows through the online media space. We find that the diffusion network of news tends to have a core-periphery structure with a small set of core media sites that diffuse information to the rest of the Web. These sites tend to have stable circles of influence with more general news media sites acting as connectors between them.
Supplemental Material
- Supporting website. http://snap.stanford.edu/netinf.Google Scholar
- . Adar and L. A. Adamic. Tracking information epidemics in blogspace. In Web Intelligence, pages 207--214, 2005. Google ScholarDigital Library
- .-L. Barabási. The origin of bursts and heavy tails in human dynamics. Nature, 435:207, 2005.Google ScholarCross Ref
- .-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286:509--512, 1999.Google ScholarCross Ref
- . Clauset, C. Moore, and M. E. J. Newman. Hierarchical structure and the prediction of missing links in networks. Nature, 453(7191):98--101, 2008.Google ScholarCross Ref
- . Crane and D. Sornette. Robust dynamic classes revealed by measuring the response function of a social system. Proc. of the National Academy of Sciences, 105(41):15649--15653, October 2008.Google ScholarCross Ref
- . Edmonds. Optimum branchings. Journal of Resesearch of the National Bureau of Standards, (71B):233--240, 1967.Google Scholar
- . Erd\Hos and A. Rényi. On the evolution of random graphs. Publication of the Mathematical Institute of the Hungarian Academy of Science, 5:17--67, 1960.Google Scholar
- . Friedman and D. Koller. Being Bayesian about network structure. A Bayesian approach to structure discovery in Bayesian networks. Machine Learning, 50(1):95--125, 2003.Google Scholar
- . Friedman, I. Nachman, and D. Pe'er. Learning Bayesian network structure from massive datasets: The "Sparse Candidate" algorithm. In Proc. UAI, 1999. Google ScholarDigital Library
- . Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. Information diffusion through blogspace. In WWW '04: Proc. of the 13th international conference on World Wide Web, pages 491--501, 2004. Google ScholarDigital Library
- . Katz and P. Lazarsfeld. Personal influence: The part played by people in the flow of mass communications. Free Press, 1955.Google Scholar
- . Kempe, J. M. Kleinberg, and E. Tardos. Maximizing the spread of influence through a social network. In KDD '03: Proc. of the 9th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 137--146, 2003. Google ScholarDigital Library
- . Khuller, A. Moss, and J. Naor. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39--45, 1999. Google ScholarDigital Library
- . Knuth. The art of computer programming. Addison-Wesley, 1968.Google Scholar
- . Kumar, J. Novak, P. Raghavan, and A. Tomkins. Structure and evolution of blogspace. CACM, 47(12):35--39, 2004. Google ScholarDigital Library
- . Leskovec, L. A. Adamic, and B. A. Huberman. The dynamics of viral marketing. In EC '06: Proc. of the 7th ACM conference on Electronic commerce, pages 228--237, 2006. Google ScholarDigital Library
- . Leskovec, L. Backstrom, and J. Kleinberg. Meme-tracking and the dynamics of the news cycle. In KDD '09: Proc. of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 497--506, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- . Leskovec and C. Faloutsos. Scalable modeling of real graphs using kronecker multiplication. In ICML '07: Proc. of the 24th International Conference on Machine Learning, page 504, 2007. Google ScholarDigital Library
- . Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In KDD '05: Proc. of the 11th ACM SIGKDD international conference on Knowledge discovery in data mining, page 187, 2005. Google ScholarDigital Library
- . Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. In KDD '07: Proc. of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 420--429, 2007. Google ScholarDigital Library
- . Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Statistical properties of community structure in large social and information networks. In WWW '08: Proc. of the 17th International Conference on World Wide Web, 2008. Google ScholarDigital Library
- . Leskovec, M. McGlohon, C. Faloutsos, N. Glance, and M. Hurst. Cascading behavior in large blog graphs. In SDM '07: Proc. of the SIAM Conference on Data Mining, 2007.Google Scholar
- . Leskovec, A. Singh, and J. M. Kleinberg. Patterns of influence in a recommendation network. In PAKDD '06: Proc. of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 380--389, 2006. Google ScholarDigital Library
- . Liben-Nowell and J. Kleinberg. Tracing the flow of information on a global scale using Internet chain-letter data. Proc. of the National Academy of Sciences, 105(12):4633--4638, 25 Mar. 2008.Google ScholarCross Ref
- . D. Malmgren, D. B. Stouffer, A. E. Motter, and L. A. A. N. Amaral. A poissonian explanation for heavy tails in e-mail communication. Proc. of the National Academy of Sciences, 105(47):18153--18158, November 2008.Google ScholarCross Ref
- . Nemhauser, L. Wolsey, and M. Fisher. An analysis of approximations for maximizing submodular set functions. Mathematical Programming, 14(1):265--294, 1978.Google ScholarDigital Library
- . M. Rogers. Diffusion of Innovations. Free Press, New York, fourth edition, 1995.Google Scholar
- . Wallinga and P. Teunis. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. American Journal of Epidemiology, 160(6):509--516, 2004.Google ScholarCross Ref
- . J. Watts and P. S. Dodds. Influentials, networks, and public opinion formation. Journal of Consumer Research, 34(4):441--458, December 2007.Google ScholarCross Ref
Index Terms
- Inferring networks of diffusion and influence
Recommendations
Structure and dynamics of information pathways in online media
WSDM '13: Proceedings of the sixth ACM international conference on Web search and data miningDiffusion of information, spread of rumors and infectious diseases are all instances of stochastic processes that occur over the edges of an underlying network. Many times networks over which contagions spread are unobserved, and such networks are often ...
Inferring Networks of Diffusion and Influence
Information diffusion and virus propagation are fundamental processes taking place in networks. While it is often possible to directly observe when nodes become infected with a virus or publish the information, observing individual transmissions (who ...
Information diffusion and external influence in networks
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data miningSocial networks play a fundamental role in the diffusion of information. However, there are two different ways of how information reaches a person in a network. Information reaches us through connections in our social networks, as well as through the ...
Comments