ABSTRACT
Real-world networks, such as the World Wide Web and online social networks, are very large and are evolving rapidly. Thus tracking personalized PageRank in such evolving networks is an important challenge in network analysis and graph mining.
In this paper, we propose an efficient online algorithm for tracking personalized PageRank in an evolving network. The proposed algorithm tracks personalized PageRank accurately (i.e., within a given accuracy ε > 0). Moreover it can update the personalized PageRank scores in amortized O(1/ε) iterations for each graph modification. In addition, when m edges are randomly and sequentially inserted, the total number of iterations is expected to be O(log m/ε).
We evaluated our algorithm in real-world networks. In average case, for each edge insertion and deletion, our algorithm updated the personalized PageRank in 3us in a web graph with 105M vertices and 3.7B edges, and 20ms in a social network with 42M vertices and 1.5B edges. By comparing existing state-of-the-arts algorithms, our algorithm is 2--290 times faster with an equal accuracy.
Supplemental Material
- R. Andersen, F. Chung, and K. Lang. Local graph partitioning using pagerank vectors. In FOCS, pages 475--486, 2006. Google ScholarDigital Library
- B. Bahmani, A. Chowdhury, and A. Goel. Fast incremental and personalized pagerank. VLDB, 4(3):173--184, 2010. Google ScholarDigital Library
- B. Bahmani, R. Kumar, M. Mahdian, and E. Upfal. Pagerank on an evolving graph. In KDD, 2012. Google ScholarDigital Library
- A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509--512, 1999.Google ScholarCross Ref
- A. A. Benczúr, K. Csalogány, T. Sarlós, and M. Uher. SpamRank--fully automatic link spam detection work in progress. In AIRWeb, 2005.Google Scholar
- P. Berkhin. Bookmark-coloring approach to personalized pagerank computing. Internet Math., 3(1):41--62, 2006.Google ScholarCross Ref
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Comput. Networks ISDN Syst., 30(1):107--117, 1998. Google ScholarDigital Library
- M. Cha, H. Haddadi, F. Benevenuto, and K. P. Gummadi. Measuring user influence in twitter: The million follower fallacy. In ICWSM, pages 10--17, 2010.Google Scholar
- S. Chien, C. Dwork, R. Kumar, D. R. Simon, and D. Sivakumar. Link evolution: Analysis and algorithms. Internet Math., 1(3):277--304, 2004.Google ScholarCross Ref
- Y. Chung, M. Toyoda, and M. Kitsuregawa. Detecting link hijacking by web spammers. In PAKDD, 2009. Google ScholarDigital Library
- Y. Chung, M. Toyoda, and M. Kitsuregawa. A study of link farm distribution and evolution using a time series of web snapshots. In AIRWeb, pages 9--16, 2009. Google ScholarDigital Library
- Y. Chung, M. Toyoda, and M. Kitsuregawa. Identifying spam link generators for monitoring emerging web spam. In WICOW, pages 51--58, 2010. Google ScholarDigital Library
- G. M. Del Corso, A. Gullí, and F. Romani. Fast pagerank computation via a sparse linear system. Internet Math., 2(3):251--273, 2005.Google ScholarCross Ref
- Y. Fujiwara, M. Nakatsuji, M. Onizuka, and M. Kitsuregawa. Fast and exact top-k search for random walk with restart. VLDB, 5(5):442--453, 2012. Google ScholarDigital Library
- D. Gleich, L. Zhukov, and P. Berkhin. Fast parallel pagerank: A linear system approach. Yahoo! Research Technical Report YRL-2004-038, 13:22, 2004.Google Scholar
- Z. Gyöngyi and H. Garcia-Molina. Link spam alliances. In VLDB, pages 517--528, 2005. Google ScholarDigital Library
- Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In VLDB, pages 576--587, 2004. Google ScholarDigital Library
- W. Hoeffding. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc., 58(301):13--30, 1963.Google ScholarCross Ref
- G. Jeh and J. Widom. Scaling personalized web search. In WWW, pages 271--279, 2003. Google ScholarDigital Library
- S. D. Kamvar, T. H. Haveliwala, C. D. Manning, and G. H. Golub. Extrapolation methods for accelerating pagerank computations. In WWW, pages 261--270, 2003. Google ScholarDigital Library
- A. N. Langville and C. D. Meyer. Deeper inside pagerank. Internet Math., 1(3):335--380, 2004.Google ScholarCross Ref
- A. N. Langville and C. D. Meyer. Updating markov chains with an eye on google's pagerank. SIAM J. Matrix Anal. Appl., 27(4):968--987, 2006. Google ScholarDigital Library
- P. Lofgren, S. Banerjee, A. Goel, and C. Seshadhri. FAST-PPR: Scaling personalized pagerank estimation for large graphs. In KDD, pages 1436--1445, 2014. Google ScholarDigital Library
- T. Maehara, T. Akiba, Y. Iwata, and K. Kawarabayashi. Computing personalized pagerank quickly by exploiting graph structures. VLDB, 7(12), 2014. Google ScholarDigital Library
- F. McSherry. A uniform approach to accelerated pagerank computation. In WWW, pages 575--582, 2005. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.Google Scholar
- A. D. Sarma, S. Gollapudi, and R. Panigrahy. Estimating pagerank on graph streams. J. ACM, 58(3):13, 2011. Google ScholarDigital Library
- G. Shen, B. Gao, T.-Y. Liu, G. Feng, S. Song, and H. Li. Detecting link spam using temporal information. In ICDM, pages 1049--1053, 2006. Google ScholarDigital Library
- H. A. Simon and A. Ando. Aggregation of variables in dynamic systems. Econometrica, 29:114--138, 1961.Google ScholarCross Ref
- R. V. Southwell. Relaxation Methods in Engineering Science. Oxford University Press, 1940.Google Scholar
- R. V. Southwell. Relaxation Methods in Theoretical Physics. Oxford University Press, 1946.Google Scholar
- D. A. Spielman and S.-H. Teng. A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM J. Comput., 42(1):1--26, 2013.Google ScholarDigital Library
- J. Weng, E. P. Lim, J. Jiang, and Q. He. TwitterRank: finding topic-sensitive influential twitterers. In WSDM, pages 261--270, 2010. Google ScholarDigital Library
Index Terms
- Efficient PageRank Tracking in Evolving Networks
Recommendations
Approximate Personalized PageRank on Dynamic Graphs
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningWe propose and analyze two algorithms for maintaining approximate Personalized PageRank (PPR) vectors on a dynamic graph, where edges are added or deleted. Our algorithms are natural dynamic versions of two known local variations of power iteration. One,...
Agenda: Robust Personalized PageRanks in Evolving Graphs
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge ManagementGiven a source node s and a target node t in a graph G, the Personalized PageRank (PPR) from s to t is the probability of a random walk starting from s terminates at t. PPR is a classic measure of the relevance among different nodes in a graph, and has ...
Parallel personalized pagerank on dynamic graphs
Proceedings of the 44th International Conference on Very Large Data Bases, Rio de Janeiro, BrazilPersonalized PageRank (PPR) is a well-known proximity measure in graphs. To meet the need for dynamic PPR maintenance, recent works have proposed a local update scheme to support incremental computation. Nevertheless, sequential execution of the scheme ...
Comments