Skip to main content
Erschienen in: Journal of Intelligent Information Systems 2/2017

10.01.2017

Distrust seed set propagation algorithm to detect web spam

verfasst von: Kwang Leng Goh, Ravi Kumar Patchmuthu, Ashutosh Kumar Singh

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 2/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Web spam uses numerous techniques to misguide Web search engines in exchange of financial profit. A myriad of semi-automatic propagation model has been proposed with the purpose of combating Web spam. In this paper, distrust propagation is used to detect Web spam. An automatic distrust seed set propagation algorithm (DSP), which acts as an extension to the seed set to propagate distrust further to detect more Web spam. Experiments are conducted on WEBSPAM-UK2006 and WEBSPAM-UK2007 dataset; the results have shown that DSP enhanced the baseline algorithms and detected 17.73 % more spam hosts in the former dataset and detected 8.59 % more spam hosts in later dataset.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Brin, S., & Page, L (1998). The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems, 30(1), 107–117.CrossRef Brin, S., & Page, L (1998). The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems, 30(1), 107–117.CrossRef
Zurück zum Zitat Brinkmeier, M. (2006). Pagerank revisited. ACM Transactions on Internet Technology (TOIT), 6(3), 282–301.CrossRef Brinkmeier, M. (2006). Pagerank revisited. ACM Transactions on Internet Technology (TOIT), 6(3), 282–301.CrossRef
Zurück zum Zitat Castillo, C., Chellapilla, K., & Davison, B.D. (2007). Web spam challenge track i. Castillo, C., Chellapilla, K., & Davison, B.D. (2007). Web spam challenge track i.
Zurück zum Zitat Castillo, C, Chellapilla, K, & Denoyer, L (2008). Web spam challenge 2008. Castillo, C, Chellapilla, K, & Denoyer, L (2008). Web spam challenge 2008.
Zurück zum Zitat Chen, Q., Yu, S. N., & Cheng, S. (2008). Link variable trustrank for fighting web spam. In Computer science and software engineering, 2008 international conference on, IEEE, (Vol. 4 pp. 1004–1007). Chen, Q., Yu, S. N., & Cheng, S. (2008). Link variable trustrank for fighting web spam. In Computer science and software engineering, 2008 international conference on, IEEE, (Vol. 4 pp. 1004–1007).
Zurück zum Zitat Eiron, N., McCurley, K.S., & Tomlin, J.A. (2004). Ranking the web frontier. In Proceedings of the 13th international conference on World Wide Web (pp. 309–318): ACM. Eiron, N., McCurley, K.S., & Tomlin, J.A. (2004). Ranking the web frontier. In Proceedings of the 13th international conference on World Wide Web (pp. 309–318): ACM.
Zurück zum Zitat Goh, K. L., & Singh, A. K. (2015). Comprehensive literature review on machine learning structures for web spam classification. Procedia Computer Science, 70, 434–441.CrossRef Goh, K. L., & Singh, A. K. (2015). Comprehensive literature review on machine learning structures for web spam classification. Procedia Computer Science, 70, 434–441.CrossRef
Zurück zum Zitat Goh, K.L., Patchmuthu, R.K., & Singh, A.K. (2014a). Link-based web spam detection using weight properties. Journal of Intelligent Information Systems, 43(1), 129–145.CrossRef Goh, K.L., Patchmuthu, R.K., & Singh, A.K. (2014a). Link-based web spam detection using weight properties. Journal of Intelligent Information Systems, 43(1), 129–145.CrossRef
Zurück zum Zitat Goh, K.L.A., Kumar Singh, A., Ravi Kumar, P., & Mohan, A. (2014b). Tprank: Contend with web spam using trust propagation. Cybernetics and Systems, 45(4), 307–323.CrossRef Goh, K.L.A., Kumar Singh, A., Ravi Kumar, P., & Mohan, A. (2014b). Tprank: Contend with web spam using trust propagation. Cybernetics and Systems, 45(4), 307–323.CrossRef
Zurück zum Zitat Gyöngyi, Z., Garcia-Molina, H., & Pedersen, J. (2004). Combating web spam with trustrank. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, VLDB Endowment (pp. 576–587). Gyöngyi, Z., Garcia-Molina, H., & Pedersen, J. (2004). Combating web spam with trustrank. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, VLDB Endowment (pp. 576–587).
Zurück zum Zitat Gyongyi, Z., Berkhin, P., Garcia-Molina, H., & Pedersen, J. (2006). Link spam detection based on mass estimation. In Proceedings of the 32nd international conference on Very large data bases, VLDB Endowment (pp. 439–450). Gyongyi, Z., Berkhin, P., Garcia-Molina, H., & Pedersen, J. (2006). Link spam detection based on mass estimation. In Proceedings of the 32nd international conference on Very large data bases, VLDB Endowment (pp. 439–450).
Zurück zum Zitat Krishnan, V. (2006). Web spam detection with anti-trust rank. In In AIRWEB (pp. 37–40). Krishnan, V. (2006). Web spam detection with anti-trust rank. In In AIRWEB (pp. 37–40).
Zurück zum Zitat Leng, A.G.K., Kumar, P.R., Singh, A.K., & Mohan, A. (2012a). Link-based spam algorithms in adversarial information retrieval. Cybernetics and Systems, 43(6), 459–475.CrossRef Leng, A.G.K., Kumar, P.R., Singh, A.K., & Mohan, A. (2012a). Link-based spam algorithms in adversarial information retrieval. Cybernetics and Systems, 43(6), 459–475.CrossRef
Zurück zum Zitat Leng, A.G.K., Patchmuthu, R., & Singh, A.K. (2012b). Incorporating weight properties in detection of web spam. In The 2nd international conference on uncertainty reasoning and knowledge engineering, 14-15 August (pp. 18–21). Leng, A.G.K., Patchmuthu, R., & Singh, A.K. (2012b). Incorporating weight properties in detection of web spam. In The 2nd international conference on uncertainty reasoning and knowledge engineering, 14-15 August (pp. 18–21).
Zurück zum Zitat Li, Z., Qiancheng, J., & Yan, Z. (2008). From good to bad ones: Making spam detection easier. In IEEE 8th International Conference on Computer and Information Technology Workshops (pp. 129–134), DOI 10.1109/CIT. 2008.Workshops.49, (to appear in print). Li, Z., Qiancheng, J., & Yan, Z. (2008). From good to bad ones: Making spam detection easier. In IEEE 8th International Conference on Computer and Information Technology Workshops (pp. 129–134), DOI 10.​1109/​CIT.​ 2008.​Workshops.​49, (to appear in print).
Zurück zum Zitat Liang, C., Ru, L., & Zhu, X. (2007). R-spamrank: a spam detection algorithm based on link analysis. Journal of Computational Information Systems, 3(4), 1705–1712. Liang, C., Ru, L., & Zhu, X. (2007). R-spamrank: a spam detection algorithm based on link analysis. Journal of Computational Information Systems, 3(4), 1705–1712.
Zurück zum Zitat Nie, L., Wu, B., & Davison, B.D. (2007). Winnowing wheat from the chaff: Propagating trust to sift spam from the web. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 869–870): ACM. Nie, L., Wu, B., & Davison, B.D. (2007). Winnowing wheat from the chaff: Propagating trust to sift spam from the web. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 869–870): ACM.
Zurück zum Zitat Shen, G., Gao, B., Liu, T.Y., Feng, G., Song, S., & Li, H. (2006). Detecting link spam using temporal information, IEEE. Shen, G., Gao, B., Liu, T.Y., Feng, G., Song, S., & Li, H. (2006). Detecting link spam using temporal information, IEEE.
Zurück zum Zitat Wu, B., & Davison, B.D. (2005). Identifying link farm spam pages. In Special interest tracks and posters of the 14th international conference on World Wide Web (pp. 820–829): ACM. Wu, B., & Davison, B.D. (2005). Identifying link farm spam pages. In Special interest tracks and posters of the 14th international conference on World Wide Web (pp. 820–829): ACM.
Zurück zum Zitat Wu, B., Goel, V., & Davison, B.D. (2006a). Propagating trust and distrust to demote web spam. MTW 190. Wu, B., Goel, V., & Davison, B.D. (2006a). Propagating trust and distrust to demote web spam. MTW 190.
Zurück zum Zitat Wu, B., Goel, V., & Davison, B.D. (2006b). Topical trustrank: Using topicality to combat web spam. In Proceedings of the 15th international conference on World Wide Web (pp. 63–72): ACM. Wu, B., Goel, V., & Davison, B.D. (2006b). Topical trustrank: Using topicality to combat web spam. In Proceedings of the 15th international conference on World Wide Web (pp. 63–72): ACM.
Zurück zum Zitat Yang, H., King, I., & Lyu, M.R. (2007). Diffusionrank: a possible penicillin for web spamming. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 431–438): ACM. Yang, H., King, I., & Lyu, M.R. (2007). Diffusionrank: a possible penicillin for web spamming. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 431–438): ACM.
Zurück zum Zitat Zhang, X., Han, B., & Liang, W. (2009a). Automatic seed set expansion for trust propagation based anti-spamming algorithms. In Proceedings of the eleventh international workshop on Web information and data management (pp. 31–38): ACM. Zhang, X., Han, B., & Liang, W. (2009a). Automatic seed set expansion for trust propagation based anti-spamming algorithms. In Proceedings of the eleventh international workshop on Web information and data management (pp. 31–38): ACM.
Zurück zum Zitat Zhang, X., Wang, Y., Mou, N., & Liang, W. (2011). Propagating both trust and distrust with target differentiation for combating web spam. In: AAAI. Zhang, X., Wang, Y., Mou, N., & Liang, W. (2011). Propagating both trust and distrust with target differentiation for combating web spam. In: AAAI.
Zurück zum Zitat Zhang, Y., Jiang, Q., Zhang, L., & Zhu, Y. (2009b). Exploiting bidirectional links: making spamming detection easier. In Proceedings of the 18th ACM conference on Information and knowledge management (pp. 1839–1842): ACM. Zhang, Y., Jiang, Q., Zhang, L., & Zhu, Y. (2009b). Exploiting bidirectional links: making spamming detection easier. In Proceedings of the 18th ACM conference on Information and knowledge management (pp. 1839–1842): ACM.
Metadaten
Titel
Distrust seed set propagation algorithm to detect web spam
verfasst von
Kwang Leng Goh
Ravi Kumar Patchmuthu
Ashutosh Kumar Singh
Publikationsdatum
10.01.2017
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 2/2017
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-016-0439-y

Weitere Artikel der Ausgabe 2/2017

Journal of Intelligent Information Systems 2/2017 Zur Ausgabe