ABSTRACT
Personalized PageRank expresses link-based page quality around user selected pages. The only previous personalized PageRank algorithm that can serve on-line queries for an unrestricted choice of pages on large graphs is our Monte Carlo algorithm [WAW 2004]. In this paper we achieve unrestricted personalization by combining rounding and randomized sketching techniques in the dynamic programming algorithm of Jeh and Widom [WWW 2003]. We evaluate the precision of approximation experimentally on large scale real-world data and find significant improvement over previous results. As a key theoretical contribution we show that our algorithms use an optimal amount of space by also improving earlier asymptotic worst-case lower bounds. Our lower bounds and algorithms apply to the SimRank as well; of independent interest is the reduction of the SimRank computation to personalized PageRank.
- Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Sampling algorithms: Lower bounds and applications. Proc of 33rd STOC, 2001. Google ScholarDigital Library
- B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, 1970. Google ScholarDigital Library
- P. Boldi and S. Vigna. The webgraph framework I: Compression techniques. Proc of 13th WWW, pp. 595--602, 2004. Google ScholarDigital Library
- A. Z. Broder and M. Mitzenmacher. Network applications of Bloom filters: A survey. Internet Mathematics, 1(4):485--509, 2005.Google ScholarCross Ref
- M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. Proc of 29th ICALP, pp. 693--703, 2002. Google ScholarDigital Library
- Y.-Y. Chen, Q. Gan, and T. Suel. Local methods for estimating PageRank values. Proc of 12th CIKM, pp. 381--389, 2004. Google ScholarDigital Library
- G. Cormode and S. Muthukrishnan. An improved data stream summary: The Count-Min sketch and its applications. Journal of Algorithms, 55(1):58--75, 2005. Google ScholarDigital Library
- G. Cormode and S. Muthukrishnan. Summarizing and mining skewed data streams. Proc of 5th SIAM Intl. Conf. on Data Mining, 2005.Google ScholarCross Ref
- R. Fagin, R. Kumar, M. Mahdian, D. Sivakumar, and E. Vee. Comparing and aggregating rankings with ties. Proc of 23rd PODS, 2004. Google ScholarDigital Library
- D. Fogaras. Where to start browsing the web? Proc of 3rd I2CS, Springer LNCS vol. 2877, pp. 65--79, 2003.Google Scholar
- D. Fogaras and B. Rácz. Towards scaling fully personalized PageRank. Proc of 3rd WAW, pp. 105--117, 2004. Full version to appear in Internet Mathematics.Google ScholarCross Ref
- D. Fogaras and B. Rácz. Scaling link-based similarity search. Proc of 14th WWW, pp. 641--650, 2005. Full version available at www.ilab.sztaki.hu/websearch/Publications/. Google ScholarDigital Library
- T. H. Haveliwala. Topic-sensitive PageRank: A context-sensitive ranking algorithm for web search. IEEE Transactions on Knowledge and Data Engineering, 15(4):784--796, 2003. Google ScholarDigital Library
- M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. In External Memory Algorithms, DIMACS Book Series vol. 50., pp. 107--118. American Mathematical Society, 1999. Google ScholarDigital Library
- J. Hirai, S. Raghavan, H. Garcia-Molina, and A. Paepcke. WebBase: A repository of web pages. Proc of 9th WWW, pp. 277--293, 2000. Google ScholarDigital Library
- G. Jeh and J. Widom. SimRank: A measure of structural-context similarity. Pro of 8th SIGKDD, pp. 538--543, 2002. Google ScholarDigital Library
- G. Jeh and J. Widom. Scaling personalized web search. Proc of 12th WWW, pp. 271--279, 2003. Google ScholarDigital Library
- S. Kamvar, T. H. Haveliwala, C. Manning, and G. Golub. Exploiting the block structure of the web for computing PageRank. Technical Report 2003-17, Stanford University, 2003.Google Scholar
- M. G. Kendall. Rank Correlation Methods. Hafner Publishing Co., New York, 1955.Google Scholar
- J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999. Google ScholarDigital Library
- E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge University Press, 1997. Google ScholarDigital Library
- F. McSherry. A uniform approach to accelerated PageRank computation. Proc of 14th WWW, pp. 575--582, 2005. Google ScholarDigital Library
- S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Comp. Sci., 1(2), 2005. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford University, 1998.Google Scholar
- C. R. Palmer, P. B. Gibbons, and C. Faloutsos. ANF: A fast and scalable tool for data mining in massive graphs. Proc of 8th SIGKDD, pp. 81--90, 2002. Google ScholarDigital Library
- P. K. C. Singitham, M. S. Mahabhashyam, and P. Raghavan. Efficiency-quality tradeoffs for vector score aggregation. Proc of 30th VLDB, pp. 624--635, 2004. Google ScholarDigital Library
- J. S. Vitter. External memory algorithms and data structures: Dealing with massive data. ACM Computing Surveys, 33(2):209--271, 2001. Google ScholarDigital Library
Index Terms
- To randomize or not to randomize: space optimal summaries for hyperlink analysis
Recommendations
A near-optimal algorithm for estimating the entropy of a stream
We describe a simple algorithm for approximating the empirical entropy of a stream of m values up to a multiplicative factor of (1+ϵ) using a single pass, O(ϵ−2 log (δ−1) log m) words of space, and O(log ϵ−1 + log log δ−1 + log log m) processing time ...
A recommender system based on collaborative filtering using ontology and dimensionality reduction techniques
A new method is developed for recommender systems.The recommender system is developed based on collaborative filtering.Scalability and sparsity issues in recommender systems are solved.MovieLens and Yahoo! Webscope R4 datasets are used for method ...
PageRank revisited
PageRank, one part of the search engine Google, is one of the most prominent link-based rankings of documents in the World Wide Web. Usually it is described as a Markov chain modeling a specific random surfer. In this article, an alternative ...
Comments