The ubiquitous use of search engines to discover and access internet content shows clearly the success of information retrieval algorithms. However, unlike controlled collections, the vast majority of the Web pages lack an authority asserting their quality. This openness of the Web has been the key to its rapid growth and success, but this openness is also a major source of new adversarial challenges for information retrieval methods.
Adversarial Information Retrieval addresses tasks such as gathering, indexing, filtering, retrieving and ranking information from collections wherein a subset has been manipulated maliciously. On the Web, the predominant form of such manipulation is "search engine spamming" or spamdexing, i.e., malicious attempts to influence the outcome of ranking algorithms, aimed at getting an undeserved high ranking for some items in the collection. There is an economic incentive to rank higher in search engines, considering that a good ranking on them is strongly correlated with more traffic, which often translates to more revenue.
The third AIRWeb workshop in Banff, Canada as part of WWW2007 builds on the previous successful meetings at Chiba, Japan as part of WWW2005, and at Seattle, USA as part of SIGIR2006. The papers in this workshop proceedings provide a combination of mature and early-stage work in web-based adversarial IR.
Proceeding Downloads
Splog detection using self-similarity analysis on blog temporal dynamics
This paper focuses on spam blog (splog) detection. Blogs are highly popular, new media social communication mechanisms. The presence of splogs degrades blog search results as well as wastes network resources. In our approach we exploit unique blog ...
Improving web spam classification using rank-time features
In this paper, we study the classification of web spam. Web spam refers to pages that use techniques to mislead search engines into assigning them higher rank, thus increasing their site traffic. Our contributions are two fold. First, we find that the ...
Improving web spam classifiers using link structure
Web spam has been recognized as one of the top challenges in the search engine industry [14]. A lot of recent work has addressed the problem of detecting or demoting web spam, including both content spam [16, 12] and link spam [22, 13]. However, any ...
Transductive link spam detection
Web spam can significantly deteriorate the quality of search engines. Early web spamming techniques mainly manipulate page content. Since linkage information is widely used in web search, link-based spamming has also developed. So far, many techniques ...
Using spam farm to boost PageRank
Nowadays web spamming has emerged to take the economic advantage of high search rankings and threatened the accuracy and fairness of those rankings. Understanding spamming techniques is essential for evaluating the strength and weakness of a ranking ...
Extracting link spam using biased random walks from spam seed sets
Link spam deliberately manipulates hyperlinks between web pages in order to unduly boost the search engine ranking of one or more target pages. Link based ranking algorithms such as PageRank, HITS, and other derivatives are especially vulnerable to link ...
A large-scale study of link spam detection by graph algorithms
Link spam refers to attempts to promote the ranking of spammers' web sites by deceiving link-based ranking algorithms in search engines. Spammers often create densely connected link structure of sites so called "link farm". In this paper, we study the ...
Measuring similarity to detect qualified links
The early success of link-based ranking algorithms was predicated on the assumption that links imply merit of the target pages. However, today many links exist for purposes other than to confer authority. Such links bring noise into link analysis and ...
Combating spam in tagging systems
Tagging systems allow users to interactively annotate a pool of shared resources using descriptive tags. As tagging systems are gaining in popularity, they become more susceptible to tag spam: misleading tags that are generated in order to increase the ...
New metrics for reputation management in P2P networks
In this work we study the effectiveness of mechanisms for decentralized reputation management in P2P networks. We depart from Eigen Trust, an algorithm designed for reputation management in file sharing applications over p2p networks. EigenTrust has ...
Computing trusted authority scores in peer-to-peer web search networks
Peer-to-peer (P2P) networks have received great attention for sharing and searching information in large user communities. The open and anonymous nature of P2P networks is one of its main strengths, but it also opens doors to manipulation of the ...
A taxonomy of JavaScript redirection spam
Redirection spam presents a web page with false content to a crawler for indexing, but automatically redirects the browser to a different web page. Redirection is usually immediate (on page load) but may also be triggered by a timer or a harmless user ...
Web spam detection via commercial intent analysis
We propose a number of features for Web spam filtering based on the occurrence of keywords that are either of high advertisement value or highly spammed. Our features include popular words from search engine query logs as well as high cost or volume ...
- Proceedings of the 3rd international workshop on Adversarial information retrieval on the web