Skip to main content
Erschienen in:
Buchtitelbild

2017 | OriginalPaper | Buchkapitel

Spammer Classification Using Ensemble Methods over Content-Based Features

verfasst von : Aaisha Makkar, Shivani Goel

Erschienen in: Proceedings of Sixth International Conference on Soft Computing for Problem Solving

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As the web documents are raising at high scale, it is very difficult to access useful information. Search engines play a major role in retrieval of relevant information and knowledge. They deal with managing large amount of information with efficient page ranking algorithms. Still web spammers try to intrude the search engine results by various web spamming techniques for their personal benefit. According to the recent report from Internetlivestats in March (2016), an Internet survey company, states that there are currently 3.4 billion Internet users in the world. From this survey it can be judged that the search engines play a vital role in retrieval of information. In this research, we have investigated fifteen different machine learning classification algorithms over content based features to classify the spam and non spam web pages. Ensemble approach is done by using three algorithms which are computed as best on the basis of various parameters. Ten Fold Cross-validation approach is also used.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Spirin, N., Jiawei, H.: Survey on web spam detection: principles and algorithms. ACM SIGKDD Explor. Newsl. 13(2), 50–64 (2012)CrossRef Spirin, N., Jiawei, H.: Survey on web spam detection: principles and algorithms. ACM SIGKDD Explor. Newsl. 13(2), 50–64 (2012)CrossRef
2.
Zurück zum Zitat Gyongyi, Z., Garcia-Molina, H.: Web spam taxonomy. In: First International Workshop on Adversarial Information Retrieval on the Web, pp. 1–9 (2005) Gyongyi, Z., Garcia-Molina, H.: Web spam taxonomy. In: First International Workshop on Adversarial Information Retrieval on the Web, pp. 1–9 (2005)
3.
Zurück zum Zitat Becchetti, L., Castillo, C., Donato, D., Baeza-Yates, R., Leonardi, S.: Link analysis for web spam detection. ACM Trans. Web (TWEB) 2(1) (2008) Becchetti, L., Castillo, C., Donato, D., Baeza-Yates, R., Leonardi, S.: Link analysis for web spam detection. ACM Trans. Web (TWEB) 2(1) (2008)
4.
Zurück zum Zitat Gyongyi, Z., Berkhin, P., Garcia-Molina, H., Pedersen, J.: Link spam detection based on mass estimation. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 439–450 (2006) Gyongyi, Z., Berkhin, P., Garcia-Molina, H., Pedersen, J.: Link spam detection based on mass estimation. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 439–450 (2006)
5.
Zurück zum Zitat Zhou, B., Pei, J.: Link spam target detection using page farms. ACM Trans. Knowl. Discovery Data (TKDD) 3(3), 1–38 (2009). Article No. 13CrossRef Zhou, B., Pei, J.: Link spam target detection using page farms. ACM Trans. Knowl. Discovery Data (TKDD) 3(3), 1–38 (2009). Article No. 13CrossRef
6.
Zurück zum Zitat Bhattarai, A., Rus, V., Dasgupta, D.: Characterizing comment spam in the blogosphere through content analysis. In: IEEE Symposium on Computational Intelligence in Cyber Security, CICS 2009, pp. 37–44 (2009) Bhattarai, A., Rus, V., Dasgupta, D.: Characterizing comment spam in the blogosphere through content analysis. In: IEEE Symposium on Computational Intelligence in Cyber Security, CICS 2009, pp. 37–44 (2009)
7.
Zurück zum Zitat Basavaraju, M., Prabhakar, D.R.: A novel method of spam mail detection using text based clustering approach. Int. J. Comput. Appl. 5(4), 15–25 (2010) Basavaraju, M., Prabhakar, D.R.: A novel method of spam mail detection using text based clustering approach. Int. J. Comput. Appl. 5(4), 15–25 (2010)
8.
Zurück zum Zitat Wu, B., Davison, B.D.: Identifying link farm spam pages. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp. 820–829. ACM (2005) Wu, B., Davison, B.D.: Identifying link farm spam pages. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp. 820–829. ACM (2005)
9.
Zurück zum Zitat Xing, W., Ghorbani, A.: Weighted pagerank algorithm. In: Proceedings of Second Annual Conference on Communication Networks and Services Research, pp. 305–314. IEEE (2009) Xing, W., Ghorbani, A.: Weighted pagerank algorithm. In: Proceedings of Second Annual Conference on Communication Networks and Services Research, pp. 305–314. IEEE (2009)
10.
Zurück zum Zitat Kleinberg, J.M., Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.S.: The web as a graph: measurements, models, and methods. In: Asano, T., Imai, H., Lee, D.T., Nakano, S., Tokuyama, T. (eds.) COCOON 1999. LNCS, vol. 1627, pp. 1–17. Springer, Heidelberg (1999). doi:10.1007/3-540-48686-0_1 CrossRef Kleinberg, J.M., Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.S.: The web as a graph: measurements, models, and methods. In: Asano, T., Imai, H., Lee, D.T., Nakano, S., Tokuyama, T. (eds.) COCOON 1999. LNCS, vol. 1627, pp. 1–17. Springer, Heidelberg (1999). doi:10.​1007/​3-540-48686-0_​1 CrossRef
11.
Zurück zum Zitat Bidoki, A.M.Z., Yazdani, N.: DistanceRank: an intelligent ranking algorithm for web pages. Inf. Process. Manag. 44(2), 877–892 (2008)CrossRef Bidoki, A.M.Z., Yazdani, N.: DistanceRank: an intelligent ranking algorithm for web pages. Inf. Process. Manag. 44(2), 877–892 (2008)CrossRef
12.
Zurück zum Zitat Lempel, R., Moran, S.: The stochastic approach for link-structure analysis (SALSA) and the TKC effect. Comput. Netw. 33(1), 387–401 (2000)CrossRef Lempel, R., Moran, S.: The stochastic approach for link-structure analysis (SALSA) and the TKC effect. Comput. Netw. 33(1), 387–401 (2000)CrossRef
13.
Zurück zum Zitat Sangeetha, M., Joseph, K.S.: Page ranking algorithms used in Web Mining. In: International Conference on Information Communication and Embedded Systems (ICICES), pp. 1–7. IEEE (2014) Sangeetha, M., Joseph, K.S.: Page ranking algorithms used in Web Mining. In: International Conference on Information Communication and Embedded Systems (ICICES), pp. 1–7. IEEE (2014)
Metadaten
Titel
Spammer Classification Using Ensemble Methods over Content-Based Features
verfasst von
Aaisha Makkar
Shivani Goel
Copyright-Jahr
2017
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-3325-4_1