skip to main content
10.1145/1277741.1277815acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

DiffusionRank: a possible penicillin for web spamming

Authors Info & Claims
Published:23 July 2007Publication History

ABSTRACT

While the PageRank algorithm has proven to be very effective for ranking Web pages, the rank scores of Web pages can be manipulated. To handle the manipulation problem and to cast a new insight on the Web structure, we propose a ranking algorithm called DiffusionRank. DiffusionRank is motivated by the heat diffusion phenomena, which can be connected to Web ranking because the activities flow on the Web can be imagined as heat flow, the link from a page to another can be treated as the pipe of an air-conditioner, and heat flow can embody the structure of the underlying Web graph. Theoretically we show that DiffusionRank can serve as a generalization of PageRank when the heat diffusion co-efficient γ tends to infinity. In such a case 1=γ= 0, DiffusionRank (PageRank) has low ability of anti-manipulation. When γ = 0, DiffusionRank obtains the highest ability of anti-manipulation, but in such a case, the web structure is completely ignored. Consequently, γ is an interesting factor that can control the balance between the ability of preserving the original Web and the ability of reducing the effect of manipulation. It is found empirically that, when γ = 1, DiffusionRank has a Penicillin-like effect on the link manipulation. Moreover, DiffusionRank can be employed to find group-to-group relations on the Web, to divide the Web graph into several parts, and to find link communities. Experimental results show that the DiffusionRank algorithm achieves the above mentioned advantages as expected.

References

  1. E. Agichtein, E. Brill, and S. T. Dumais. Improving web search ranking by incorporating user behavior information. In E. N. Efthimiadis, S. T. Dumais, D. Hawking, and K. Järvelin, editors, Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 19--26, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. A. Baeza-Yates, P. Boldi, and C. Castillo. Generalizing pagerank: damping functions for link-based ranking algorithms. In E. N. Efthimiadis, S. T. Dumais, D. Hawking, and K. Järvelin, editors, Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 308--315, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373--1396, Jun 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Bollobás. Random Graphs. Academic Press Inc. (London), 1985.Google ScholarGoogle Scholar
  5. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning (ICML), pages 89--96, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. N. Eiron, K. S. McCurley, and J. A. Tomlin. Ranking the web frontier. In Proceeding of the 13th World Wide Web Conference (WWW), pages 309--318, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In M. A. Nascimento, M. T. Özsu, D. Kossmann, R. J. Miller, J. A. Blakeley, and K. B. Schiefer, editors, Proceedings of the Thirtieth International Conference on Very Large Data Bases (VLDB), pages 576--587, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. D. Kamvar, T. H. Haveliwala, C. D. Manning, and G. H. Golub. Exploiting the block structure of the web for computing pagerank. Technical report, Stanford University, 2003.Google ScholarGoogle Scholar
  9. R. I. Kondor and J. D. Lafferty. Diffusion kernels on graphs and other discrete input spaces. In C. Sammut and A. G. Hoffmann, editors, Proceedings of the Nineteenth International Conference on Machine Learning (ICML), pages 315--322, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Lafferty and G. Lebanon. Diffusion kernels on statistical manifolds. Journal of Machine Learning Research, 6:129--163, Jan 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. R. MacCluer. The many proofs and applications of perron's theorem. SIAM Review, 42(3):487--498, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly. Detecting spam web pages through content analysis. In Proceedings of the 15th international conference on World Wide Web (WWW), pages 83--92, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report Paper SIDL-WP-1999-0120 (version of 11/11/1999), Stanford Digital Library Technologies Project, 1999.Google ScholarGoogle Scholar
  14. H. Yang, I. King, and M. R. Lyu. NHDC and PHDC: Non-propagating and propagating heat diffusion classifiers. In Proceedings of the 12th International Conference on Neural Information Processing (ICONIP), pages 394--399, 2005.Google ScholarGoogle Scholar
  15. H. Yang, I. King, and M. R. Lyu. Predictive ranking: a novel page ranking approach by estimating the web structure. In Proceedings of the 14th international conference on World Wide Web (WWW) - Special interest tracks and posters, pages 944--945, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Yang, I. King, and M. R. Lyu. Predictive random graph ranking on the web. In Proceedings of the IEEE World Congress on Computational Intelligence (WCCI), pages 3491--3498, 2006.Google ScholarGoogle Scholar
  17. D. Zhou, J. Weston, A. Gretton, O. Bousquet, and B. Schölkopf. Ranking on data manifolds. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16 (NIPS 2003), 2004.Google ScholarGoogle Scholar

Index Terms

  1. DiffusionRank: a possible penicillin for web spamming

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
        July 2007
        946 pages
        ISBN:9781595935977
        DOI:10.1145/1277741

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 July 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader