ABSTRACT
While the PageRank algorithm has proven to be very effective for ranking Web pages, the rank scores of Web pages can be manipulated. To handle the manipulation problem and to cast a new insight on the Web structure, we propose a ranking algorithm called DiffusionRank. DiffusionRank is motivated by the heat diffusion phenomena, which can be connected to Web ranking because the activities flow on the Web can be imagined as heat flow, the link from a page to another can be treated as the pipe of an air-conditioner, and heat flow can embody the structure of the underlying Web graph. Theoretically we show that DiffusionRank can serve as a generalization of PageRank when the heat diffusion co-efficient γ tends to infinity. In such a case 1=γ= 0, DiffusionRank (PageRank) has low ability of anti-manipulation. When γ = 0, DiffusionRank obtains the highest ability of anti-manipulation, but in such a case, the web structure is completely ignored. Consequently, γ is an interesting factor that can control the balance between the ability of preserving the original Web and the ability of reducing the effect of manipulation. It is found empirically that, when γ = 1, DiffusionRank has a Penicillin-like effect on the link manipulation. Moreover, DiffusionRank can be employed to find group-to-group relations on the Web, to divide the Web graph into several parts, and to find link communities. Experimental results show that the DiffusionRank algorithm achieves the above mentioned advantages as expected.
- E. Agichtein, E. Brill, and S. T. Dumais. Improving web search ranking by incorporating user behavior information. In E. N. Efthimiadis, S. T. Dumais, D. Hawking, and K. Järvelin, editors, Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 19--26, 2006. Google ScholarDigital Library
- R. A. Baeza-Yates, P. Boldi, and C. Castillo. Generalizing pagerank: damping functions for link-based ranking algorithms. In E. N. Efthimiadis, S. T. Dumais, D. Hawking, and K. Järvelin, editors, Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 308--315, 2006. Google ScholarDigital Library
- M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373--1396, Jun 2003. Google ScholarDigital Library
- B. Bollobás. Random Graphs. Academic Press Inc. (London), 1985.Google Scholar
- C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning (ICML), pages 89--96, 2005. Google ScholarDigital Library
- N. Eiron, K. S. McCurley, and J. A. Tomlin. Ranking the web frontier. In Proceeding of the 13th World Wide Web Conference (WWW), pages 309--318, 2004. Google ScholarDigital Library
- Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In M. A. Nascimento, M. T. Özsu, D. Kossmann, R. J. Miller, J. A. Blakeley, and K. B. Schiefer, editors, Proceedings of the Thirtieth International Conference on Very Large Data Bases (VLDB), pages 576--587, 2004. Google ScholarDigital Library
- S. D. Kamvar, T. H. Haveliwala, C. D. Manning, and G. H. Golub. Exploiting the block structure of the web for computing pagerank. Technical report, Stanford University, 2003.Google Scholar
- R. I. Kondor and J. D. Lafferty. Diffusion kernels on graphs and other discrete input spaces. In C. Sammut and A. G. Hoffmann, editors, Proceedings of the Nineteenth International Conference on Machine Learning (ICML), pages 315--322, 2002. Google ScholarDigital Library
- J. Lafferty and G. Lebanon. Diffusion kernels on statistical manifolds. Journal of Machine Learning Research, 6:129--163, Jan 2005. Google ScholarDigital Library
- C. R. MacCluer. The many proofs and applications of perron's theorem. SIAM Review, 42(3):487--498, 2000. Google ScholarDigital Library
- A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly. Detecting spam web pages through content analysis. In Proceedings of the 15th international conference on World Wide Web (WWW), pages 83--92, 2006. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report Paper SIDL-WP-1999-0120 (version of 11/11/1999), Stanford Digital Library Technologies Project, 1999.Google Scholar
- H. Yang, I. King, and M. R. Lyu. NHDC and PHDC: Non-propagating and propagating heat diffusion classifiers. In Proceedings of the 12th International Conference on Neural Information Processing (ICONIP), pages 394--399, 2005.Google Scholar
- H. Yang, I. King, and M. R. Lyu. Predictive ranking: a novel page ranking approach by estimating the web structure. In Proceedings of the 14th international conference on World Wide Web (WWW) - Special interest tracks and posters, pages 944--945, 2005. Google ScholarDigital Library
- H. Yang, I. King, and M. R. Lyu. Predictive random graph ranking on the web. In Proceedings of the IEEE World Congress on Computational Intelligence (WCCI), pages 3491--3498, 2006.Google Scholar
- D. Zhou, J. Weston, A. Gretton, O. Bousquet, and B. Schölkopf. Ranking on data manifolds. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16 (NIPS 2003), 2004.Google Scholar
Index Terms
- DiffusionRank: a possible penicillin for web spamming
Recommendations
Link Analysis: Hubs and Authorities on the World Wide Web
Ranking the tens of thousands of retrieved webpages for a user query on a Web search engine such that the most informative webpages are on the top is a key information retrieval technology. A popular ranking algorithm is the HITS algorithm of Kleinberg. ...
A Googol of Information about Google
Timothy P. Chartier reviews Google's PageRank and Beyond: The Science of Search Engine Rankings by Amy Langville and Carl Meyer.
Content and link-structure perspective of ranking webpages: A review
AbstractThe delivery of ranked relevant results is probably the most important factor in making a web search engine acceptable to its users. This inspiration has led the search engine engineers and researchers to conceive ranking algorithms ...
Comments