ABSTRACT
PageRank is defined as the stationary state of a Markov chain. The chain is obtained by perturbing the transition matrix induced by a web graph with a damping factor α that spreads uniformly part of the rank. The choice of α is eminently empirical, and in most cases the original suggestion α = 0.85 by Brin and Page is still used. Recently, however, the behaviour of PageRank with respect to changes in α was discovered to be useful in link-spam detection[21]. Moreover, an analytical justification of the value chosen for α is still missing. In this paper, we give the first mathematical analysis of PageRank when α changes. In particular, we show that, contrarily to popular belief, for real-world graphs values of α close to 1 do not give a more meaningful ranking. Then, we give closed-form formulae for PageRank derivatives of any order, and an extension of the Power Method that approximates them with convergence O (tk αt) for the k-th derivative. Finally, we show a tight connection between iterated computation and analytical behaviour by proving that the k-th iteration of the Power Method gives exactly the PageRank value obtained using a Maclaurin polynomial of degree k. The latter result paves the way towards the application of analytical methods to the study of PageRank.
- Paolo Boldi, Bruno Codenotti, Massimo Santini, and Sebastiano Vigna. Ubicrawler: A scalable fully distributed web crawler. Software: Practice & Experience, 34(8):711--726, 2004. Google ScholarDigital Library
- Paolo Boldi and Sebastiano Vigna. The WebGraph framework I: Compression techniques. In Proc. WWW 13, pages 595--601, Manhattan, USA, 2004. ACM Press. Google ScholarDigital Library
- Soumen Chakrabarti, Byron Dom, David Gibson, Jon Kleinberg, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. Hypersearching the web. Scientific American, June 1999.Google Scholar
- Scott C. Deerwester, Susan T. Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.Google ScholarCross Ref
- Nadav Eiron, Kevin S. McCurley, and John A. Tomlin. Ranking the web frontier. In Proceedings of the thirteenth international conference on World--Wide Web, pages 309--318. ACM Press, 2004. Google ScholarDigital Library
- Gene H. Golub and Chen Greif. Arnoldi-type algorithms for computing stationary distribution vectors, with application to PageRank. Technical Report SCCM-04-15, Stanford University Technical Report, 2004.Google Scholar
- Taher Haveliwala. Efficient computation of PageRank. Technical report, Stanford University Technical Report, October 1999.Google Scholar
- Taher Haveliwala and Sepandar Kamvar. The condition number of the PageRank problem. Technical Report~36, Stanford University Technical Report, June 2003.Google Scholar
- Taher Haveliwala and Sepandar Kamvar. The second eigenvalue of the Google matrix. Technical Report~20, Stanford University Technical Report, March 2003.Google Scholar
- Sepandar Kamvar, Taher Haveliwala, Christopher Manning, and Gene Golub. Exploiting the block structure of the web for computing PageRank. Technical Report 17, Stanford University Technical Report, March 2003.Google Scholar
- Sepandar D. Kamvar, Taher H. Haveliwala, Christopher D. Manning, and Gene H. Golub. Extrapolation methods for accelerating PageRank computations. In Proceedings of the twelfth international conference on World Wide Web, pages 261--270. ACM Press, 2003. Google ScholarDigital Library
- Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, September 1999. Google ScholarDigital Library
- Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, D. Sivakumar, Andrew Tomkins, and Eli Upfal. The Web as a graph. In Proc. 19th ACM SIGACT-SIGMOD-AIGART Symp. Principles of Database Systems, PODS, pages 1--10. ACM Press, 2000. Google ScholarDigital Library
- Amy N. Langville and Carl D. Meyer. Deeper inside PageRank. Internet Mathematics, 1(3):355--400, 2004.Google ScholarCross Ref
- Chris Pan-Chi Lee, Gene H. Golub, and Stefanos A. Zenios. A fast two-stage algorithm for computing PageRank and its extensions. Technical report, Stanford University Technical Report, 2004.Google Scholar
- Ronny Lempel and Shlomo Moran. SALSA: the stochastic approach for link-structure analysis. ACM Trans. Inf. Syst., 19(2):131--160, 2001. Google ScholarDigital Library
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, Stanford University, Stanford, CA, USA, 1998.Google Scholar
- Luca Pretto. A theoretical analysis of google's PageRank. In Proceedings of the Ninth Symposium on String Processing and Information Retrieval, pages 131--144, 2002. Google ScholarDigital Library
- Luca Pretto. A theoretical approach to link analysis algorithms, 2002. PhD Thesis.Google Scholar
- Eugene Seneta. Non-Negative Matrices and Markov Chains. Springer Series in Statistics. Springer-Verlag, 1981.Google Scholar
- Hui Zhang, Ashish Goel, Ramesh Govindan, Kahn Mason, and Benjamin~Van Roy. Making eigenvector-based reputation systems robust to collusion. In Stefano Leonardi, editor, Proceedings WAW 2004, number 3243 in LNCS, pages 92--104. Springer-Verlag, 2004.Google Scholar
Index Terms
- PageRank as a function of the damping factor
Recommendations
Topic-sensitive PageRank
WWW '02: Proceedings of the 11th international conference on World Wide WebIn the original PageRank algorithm for improving the ranking of search-query results, a single PageRank vector is computed, using the link structure of the Web, to capture the relative "importance" of Web pages, independent of any particular search ...
Beyond PageRank: machine learning for static ranking
WWW '06: Proceedings of the 15th international conference on World Wide WebSince the publication of Brin and Page's paper on PageRank, many in the Web community have depended on PageRank for the static (query-independent) ordering of Web pages. We show that we can significantly outperform PageRank using features that are ...
PageRank revisited
PageRank, one part of the search engine Google, is one of the most prominent link-based rankings of documents in the World Wide Web. Usually it is described as a Markov chain modeling a specific random surfer. In this article, an alternative ...
Comments