Skip to main content
Erschienen in: Journal of Intelligent Information Systems 1/2014

01.08.2014

Link-based web spam detection using weight properties

verfasst von: Kwang Leng Goh, Ravi Kumar Patchmuthu, Ashutosh Kumar Singh

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 1/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Link spam is created with the intention of boosting one target’s rank in exchange of business profit. This unethical way of deceiving Web search engines is known as Web spam. Since then many anti-link spam detection techniques have constantly being proposed. Web spam detection is a crucial task due to its devastation towards Web search engines and global cost of billion dollars annually. In this paper, we proposed a novel technique by incorporating weight properties to enhance the Web spam detection algorithms. Weight properties can be defined as the influences of one Web node towards another Web node. We modified existing Web spam detection algorithms with our novel technique to evaluate the performances on a large public Web spam dataset – WEBSPAM-UK2007. The overall performance have shown that the modified algorithms outperform the benchmark algorithms up to 30.5 % improvement at host level and 6.11 % improvement at page level.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Becchetti, L., Castillo, C., Donato, D., Leonardi, S., Baeza-Yates, R. (2006). Using rank propagation and probabilistic counting for link-based spam detection. In Proceedings of the workshop on web mining and web usage analysis (WebKDD 2006), 20-23 August. Philadelphia: ACM Press. Becchetti, L., Castillo, C., Donato, D., Leonardi, S., Baeza-Yates, R. (2006). Using rank propagation and probabilistic counting for link-based spam detection. In Proceedings of the workshop on web mining and web usage analysis (WebKDD 2006), 20-23 August. Philadelphia: ACM Press.
Zurück zum Zitat Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P. (2005). Link analysis ranking: algorithms, theory, and experiments. ACM Transactions on Internet Technology, 5(1), 231–297. doi:10.1145/1052934.052942.CrossRef Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P. (2005). Link analysis ranking: algorithms, theory, and experiments. ACM Transactions on Internet Technology, 5(1), 231–297. doi:10.​1145/​1052934.​052942.CrossRef
Zurück zum Zitat Castillo, C., Donato, D., Becchetti, L., Boldi, P., Santini, M., Vigna, S. (2006). A reference collection for web spam. SIGIR Forum, 40(2). Castillo, C., Donato, D., Becchetti, L., Boldi, P., Santini, M., Vigna, S. (2006). A reference collection for web spam. SIGIR Forum, 40(2).
Zurück zum Zitat Eiron, N., McCurley, K.S., Tomlin, J.A. (2004). Ranking the web frontier. Paper presented at the proceedings of the 13th international conference on world wide web, 19-21 May. New York. Eiron, N., McCurley, K.S., Tomlin, J.A. (2004). Ranking the web frontier. Paper presented at the proceedings of the 13th international conference on world wide web, 19-21 May. New York.
Zurück zum Zitat Fetterly, D., Manasse, M., Najork, M. (2004). Spam, damn spam, and statistics: using statistical analysis to locate spam web pages. Paper presented at the proceedings of the 7th international workshop on the web and databases: colocated with ACM SIGMOD/PODS 2004, 1718 June. Paris: Maison de la Chimie. Fetterly, D., Manasse, M., Najork, M. (2004). Spam, damn spam, and statistics: using statistical analysis to locate spam web pages. Paper presented at the proceedings of the 7th international workshop on the web and databases: colocated with ACM SIGMOD/PODS 2004, 1718 June. Paris: Maison de la Chimie.
Zurück zum Zitat Gyöngyi, Z., & Garcia-Molina, H. (2005). Web spam taxonomy. In Proceedings of the 1st international workshop on adversarial information retrieval on the web (AIRWeb), 10–14 May (pp. 39–47). Chiba. Gyöngyi, Z., & Garcia-Molina, H. (2005). Web spam taxonomy. In Proceedings of the 1st international workshop on adversarial information retrieval on the web (AIRWeb), 10–14 May (pp. 39–47). Chiba.
Zurück zum Zitat Gyöngyi, Z., Garcia-Molina, H., Pedersen, J. (2004). Combating web spam with TrustRank. In Proceedings of the thirtieth international conference on very large data bases (pp. 576–587) VLDB Endowment. Toronto. Gyöngyi, Z., Garcia-Molina, H., Pedersen, J. (2004). Combating web spam with TrustRank. In Proceedings of the thirtieth international conference on very large data bases (pp. 576–587) VLDB Endowment. Toronto.
Zurück zum Zitat Gyöngyi, Z., Berkhin, P., Garcia-Molina, H., Pedersen, J. (2006). Link spam detection based on mass estimation. In: Proceedings of the 32nd international conference on very large data bases (pp. 439–450). VLDB Endowment. Seoul. Gyöngyi, Z., Berkhin, P., Garcia-Molina, H., Pedersen, J. (2006). Link spam detection based on mass estimation. In: Proceedings of the 32nd international conference on very large data bases (pp. 439–450). VLDB Endowment. Seoul.
Zurück zum Zitat Krishnan, V., & Raj, R. (2006). Web spam detection with anti-TrustRank. In Proceedings of the 2nd international workshop on adversarial information retrieval on the web (AIRWeb), 10 August (pp. 37–40). Seattle. Krishnan, V., & Raj, R. (2006). Web spam detection with anti-TrustRank. In Proceedings of the 2nd international workshop on adversarial information retrieval on the web (AIRWeb), 10 August (pp. 37–40). Seattle.
Zurück zum Zitat Leng, A.G.K., Patchmuthu, R.K., Singh, A.K. (2012a). Incorporating weight properties in detection of web spam. In The 2nd international conference on uncertainty reasoning and knowledge engineering, 14-15 August (pp 18–21). Jakarta: IEEE. Leng, A.G.K., Patchmuthu, R.K., Singh, A.K. (2012a). Incorporating weight properties in detection of web spam. In The 2nd international conference on uncertainty reasoning and knowledge engineering, 14-15 August (pp 18–21). Jakarta: IEEE.
Zurück zum Zitat Li, L., Shang, Y., Zhang, W. (2002). Improvement of HITS-based algorithms on web documents. In Proceedings of the 11th international conference on world wide web (pp. 527–535). ACM. Li, L., Shang, Y., Zhang, W. (2002). Improvement of HITS-based algorithms on web documents. In Proceedings of the 11th international conference on world wide web (pp. 527–535). ACM.
Zurück zum Zitat Liang, C., Ru, L., Zhu, X. (2007). R-SpamRank: a spam detection algorithm based on link analysis. Journal of Computational Information Systems, 3(4), 1705–1712. Liang, C., Ru, L., Zhu, X. (2007). R-SpamRank: a spam detection algorithm based on link analysis. Journal of Computational Information Systems, 3(4), 1705–1712.
Zurück zum Zitat Nemirovsky, D., & Avrachenkov, K. (2008). Weighted PageRank: Cluster-related weights. Nemirovsky, D., & Avrachenkov, K. (2008). Weighted PageRank: Cluster-related weights.
Zurück zum Zitat Nie, L., Wu, B., Davison, B.D. (2007). Winnowing wheat from the chaff: propagating trust to sift spam from the web. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, 23-27 July (pp. 869–870). New York: ACM. doi:10.1145/1277741.1277950. Nie, L., Wu, B., Davison, B.D. (2007). Winnowing wheat from the chaff: propagating trust to sift spam from the web. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, 23-27 July (pp. 869–870). New York: ACM. doi:10.​1145/​1277741.​1277950.
Zurück zum Zitat Noi, L.D., Hagenbuchner, M., Scarselli, F., Tsoi, A.C. (2010). Web spam detection by probability mapping GraphSOMs and Graph Neural Networks. In: Proceedings of the 20th international conference on artificial neural networks: Part II, Thessaloniki, Greece, 15–18 September (pp. 372–381). Germany: Springer. Noi, L.D., Hagenbuchner, M., Scarselli, F., Tsoi, A.C. (2010). Web spam detection by probability mapping GraphSOMs and Graph Neural Networks. In: Proceedings of the 20th international conference on artificial neural networks: Part II, Thessaloniki, Greece, 15–18 September (pp. 372–381). Germany: Springer.
Zurück zum Zitat Qi, C., Song-Nian, Y., Sisi, C. (2008). Link variable TrustRank for fighting web spam. In: Proceedings of international conference on computer science and software engineering, Wuhan, China, 12–14 Dec (pp. 1004–1007). Wuhan. doi:10.1109/csse.2008.1099. Qi, C., Song-Nian, Y., Sisi, C. (2008). Link variable TrustRank for fighting web spam. In: Proceedings of international conference on computer science and software engineering, Wuhan, China, 12–14 Dec (pp. 1004–1007). Wuhan. doi:10.​1109/​csse.​2008.​1099.
Zurück zum Zitat Wu, B., & Davison, B.D. (2005a). Cloaking and redirection: a preliminary study. In: Proceedings of the 1st international workshop on adversarial information retrieval on the web (AIRWeb), Chiba, Japan, 10–14 May (pp. 39–47). Chiba. Wu, B., & Davison, B.D. (2005a). Cloaking and redirection: a preliminary study. In: Proceedings of the 1st international workshop on adversarial information retrieval on the web (AIRWeb), Chiba, Japan, 10–14 May (pp. 39–47). Chiba.
Zurück zum Zitat Wu, B., & Davison, B.D. (2005b). Identifying link farm spam pages. In: Proceedings of special interest tracks and posters of the 14th international conference on world wide web (pp. 820–829). New York: ACM. doi:10.1145/1062745.1062762.CrossRef Wu, B., & Davison, B.D. (2005b). Identifying link farm spam pages. In: Proceedings of special interest tracks and posters of the 14th international conference on world wide web (pp. 820–829). New York: ACM. doi:10.​1145/​1062745.​1062762.CrossRef
Zurück zum Zitat Wu, B., Goel, V., Davison, B.D. (2006a). Propagating trust and distrust to demote web spam. Paper presented at the world wide web (WWW2006) Workshop on Models of Trust for the Web (MTW), 22–26 May. Edinburgh. Wu, B., Goel, V., Davison, B.D. (2006a). Propagating trust and distrust to demote web spam. Paper presented at the world wide web (WWW2006) Workshop on Models of Trust for the Web (MTW), 22–26 May. Edinburgh.
Zurück zum Zitat Wu, B., Goel, V., Davison, B.D. (2006b). Topical TrustRank: using topicality to combat web spam. In Proceedings of the 15th international conference on world wide web, Edinburgh, Scotland, 22–26 May (pp. 63–72). New York: ACM. doi:10.1145/1135777.1135792.CrossRef Wu, B., Goel, V., Davison, B.D. (2006b). Topical TrustRank: using topicality to combat web spam. In Proceedings of the 15th international conference on world wide web, Edinburgh, Scotland, 22–26 May (pp. 63–72). New York: ACM. doi:10.​1145/​1135777.​1135792.CrossRef
Zurück zum Zitat Xing, W., & Ghorbani, A. (2004). Weighted pagerank algorithm. In: Communication Networks and Services Research, 2004. Proceedings. Second Annual Conference on (pp. 305–314). IEEE. Xing, W., & Ghorbani, A. (2004). Weighted pagerank algorithm. In: Communication Networks and Services Research, 2004. Proceedings. Second Annual Conference on (pp. 305–314). IEEE.
Zurück zum Zitat Yang, H., King, I., Lyu, M.R. (2007). DiffusionRank: a possible penicillin for web spamming. In Proceedings of the 30th annual international acm sigir conference on research and development in information retrieval, Amsterdam, The Netherlands, 23-27 July (pp. 431–438). New York: ACM. doi:10.1145/1277741.1277815. Yang, H., King, I., Lyu, M.R. (2007). DiffusionRank: a possible penicillin for web spamming. In Proceedings of the 30th annual international acm sigir conference on research and development in information retrieval, Amsterdam, The Netherlands, 23-27 July (pp. 431–438). New York: ACM. doi:10.​1145/​1277741.​1277815.
Zurück zum Zitat Zhang, Y., Jiang, Q., Zhang, L., Zhu, Y. (2009). Exploiting bidirectional links: making spamming detection easier. In Proceedings of the 18th ACM conference on Information and knowledge management, Hong Kong, China (pp. 1839–1842). ACM, 1646244. doi:10.1145/1645953.1646244. Zhang, Y., Jiang, Q., Zhang, L., Zhu, Y. (2009). Exploiting bidirectional links: making spamming detection easier. In Proceedings of the 18th ACM conference on Information and knowledge management, Hong Kong, China (pp. 1839–1842). ACM, 1646244. doi:10.​1145/​1645953.​1646244.
Zurück zum Zitat Zhang, X., Wang, Y., Mou, N., Liang, W. (2011). Propagating both trust and distrust with target differentiation for combating web spam. In W. Burgard, D. Roth (Eds.), Proceedings of the twenty-fifth conference on artificial intelligence (AAAI-11) (pp. 1292–1297). San Francisco: AAAI Press, conf/aaai/ZhangWML11. Zhang, X., Wang, Y., Mou, N., Liang, W. (2011). Propagating both trust and distrust with target differentiation for combating web spam. In W. Burgard, D. Roth (Eds.), Proceedings of the twenty-fifth conference on artificial intelligence (AAAI-11) (pp. 1292–1297). San Francisco: AAAI Press, conf/aaai/ZhangWML11.
Metadaten
Titel
Link-based web spam detection using weight properties
verfasst von
Kwang Leng Goh
Ravi Kumar Patchmuthu
Ashutosh Kumar Singh
Publikationsdatum
01.08.2014
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 1/2014
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-014-0310-y

Weitere Artikel der Ausgabe 1/2014

Journal of Intelligent Information Systems 1/2014 Zur Ausgabe