skip to main content
10.1145/1367497.1367685acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
poster

Improving web spam detection with re-extracted features

Published:21 April 2008Publication History

ABSTRACT

Web spam detection has become one of the top challenges for the Internet search industry. Instead of using some heuristic rules, we propose a feature re-extraction strategy to optimize the detection result. Based on the predicted spamicity obtained by the preliminary detection, through the host level web graph, three types of features are extracted. Experiments on WEBSPAM-UK2006 benchmark show that with this strategy, the performance of web spam detection can be improved evidently.

References

  1. A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly. Detecting Spam Web Pages through Content Analysis. In Proc. of the WWW'06, May,2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri. Know Your Neighbors: Web Spam Detection Using the Web Topology. SIGIR'07, July, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. G. Geng, C. H. Wang, Q. D. Li, L. Xu and X. B. Jin, Boosting the Performance of Web Spam Detection with Ensemble Under-Sampling Classification, FSKD'07, China, August, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Yahoo! Research: Web Collection UK-2006". http://research.yahoo.com/ Crawled by the Laboratory of Web Algorithmics, University of Milan, 2007.Google ScholarGoogle Scholar
  5. Q. Q. Gan and Torsten Suel. Improving Web Spam Classifiers Using Link Structure. AIRWeb'07, Banff, Canada, May,2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Improving web spam detection with re-extracted features

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            WWW '08: Proceedings of the 17th international conference on World Wide Web
            April 2008
            1326 pages
            ISBN:9781605580852
            DOI:10.1145/1367497

            Copyright © 2008 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 21 April 2008

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • poster

            Acceptance Rates

            Overall Acceptance Rate1,899of8,196submissions,23%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader