skip to main content
10.1145/1244408acmotherconferencesBook PagePublication Pagesiea-aeiConference Proceedingsconference-collections
AIRWeb '07: Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
ACM2007 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
AIRWeb'07: AIRWeb'07, Third International Workshop on Adversarial Information Retrieval on the Web Banff Alberta Canada 8 May 2007
ISBN:
978-1-59593-732-2
Published:
08 May 2007

Bibliometrics
Skip Abstract Section
Abstract

The ubiquitous use of search engines to discover and access internet content shows clearly the success of information retrieval algorithms. However, unlike controlled collections, the vast majority of the Web pages lack an authority asserting their quality. This openness of the Web has been the key to its rapid growth and success, but this openness is also a major source of new adversarial challenges for information retrieval methods.

Adversarial Information Retrieval addresses tasks such as gathering, indexing, filtering, retrieving and ranking information from collections wherein a subset has been manipulated maliciously. On the Web, the predominant form of such manipulation is "search engine spamming" or spamdexing, i.e., malicious attempts to influence the outcome of ranking algorithms, aimed at getting an undeserved high ranking for some items in the collection. There is an economic incentive to rank higher in search engines, considering that a good ranking on them is strongly correlated with more traffic, which often translates to more revenue.

The third AIRWeb workshop in Banff, Canada as part of WWW2007 builds on the previous successful meetings at Chiba, Japan as part of WWW2005, and at Seattle, USA as part of SIGIR2006. The papers in this workshop proceedings provide a combination of mature and early-stage work in web-based adversarial IR.

Skip Table Of Content Section
SESSION: Temporal and topological factors
Article
Splog detection using self-similarity analysis on blog temporal dynamics

This paper focuses on spam blog (splog) detection. Blogs are highly popular, new media social communication mechanisms. The presence of splogs degrades blog search results as well as wastes network resources. In our approach we exploit unique blog ...

Article
Improving web spam classification using rank-time features

In this paper, we study the classification of web spam. Web spam refers to pages that use techniques to mislead search engines into assigning them higher rank, thus increasing their site traffic. Our contributions are two fold. First, we find that the ...

Article
Improving web spam classifiers using link structure

Web spam has been recognized as one of the top challenges in the search engine industry [14]. A lot of recent work has addressed the problem of detecting or demoting web spam, including both content spam [16, 12] and link spam [22, 13]. However, any ...

Article
Transductive link spam detection

Web spam can significantly deteriorate the quality of search engines. Early web spamming techniques mainly manipulate page content. Since linkage information is widely used in web search, link-based spamming has also developed. So far, many techniques ...

SESSION: Link farms
Article
Using spam farm to boost PageRank

Nowadays web spamming has emerged to take the economic advantage of high search rankings and threatened the accuracy and fairness of those rankings. Understanding spamming techniques is essential for evaluating the strength and weakness of a ranking ...

Article
Extracting link spam using biased random walks from spam seed sets

Link spam deliberately manipulates hyperlinks between web pages in order to unduly boost the search engine ranking of one or more target pages. Link based ranking algorithms such as PageRank, HITS, and other derivatives are especially vulnerable to link ...

Article
A large-scale study of link spam detection by graph algorithms

Link spam refers to attempts to promote the ranking of spammers' web sites by deceiving link-based ranking algorithms in search engines. Spammers often create densely connected link structure of sites so called "link farm". In this paper, we study the ...

Article
Measuring similarity to detect qualified links

The early success of link-based ranking algorithms was predicated on the assumption that links imply merit of the target pages. However, today many links exist for purposes other than to confer authority. Such links bring noise into link analysis and ...

SESSION: Tagging, P2P, cloaking, and commercial intent
Article
Combating spam in tagging systems

Tagging systems allow users to interactively annotate a pool of shared resources using descriptive tags. As tagging systems are gaining in popularity, they become more susceptible to tag spam: misleading tags that are generated in order to increase the ...

Article
New metrics for reputation management in P2P networks

In this work we study the effectiveness of mechanisms for decentralized reputation management in P2P networks. We depart from Eigen Trust, an algorithm designed for reputation management in file sharing applications over p2p networks. EigenTrust has ...

Article
Computing trusted authority scores in peer-to-peer web search networks

Peer-to-peer (P2P) networks have received great attention for sharing and searching information in large user communities. The open and anonymous nature of P2P networks is one of its main strengths, but it also opens doors to manipulation of the ...

Article
A taxonomy of JavaScript redirection spam

Redirection spam presents a web page with false content to a crawler for indexing, but automatically redirects the browser to a different web page. Redirection is usually immediate (on page load) but may also be triggered by a timer or a harmless user ...

Article
Web spam detection via commercial intent analysis

We propose a number of features for Web spam filtering based on the occurrence of keywords that are either of high advertisement value or highly spammed. Our features include popular words from search engine query logs as well as high cost or volume ...

Contributors
  • Pompeu Fabra University Barcelona
  • Microsoft Research
  • Lehigh University
  1. Proceedings of the 3rd international workshop on Adversarial information retrieval on the web

    Recommendations