Proceedings of the 3rd international workshop on Adversarial information retrieval on the web

AIRWeb '07: Proceedings of the 3rd international workshop on Adversarial information retrieval on the web

May 2007

2007 Proceeding

Conference Chairs:
Carlos Castillo
Yahoo! Research
,
Kumar Chellapilla
Microsoft Live Labs
,
Brian D. Davison
Lehigh University

Publisher:

Association for Computing Machinery
New York
NY
United States

Conference:

AIRWeb'07: AIRWeb'07, Third International Workshop on Adversarial Information Retrieval on the Web Banff Alberta Canada 8 May 2007

ISBN:

978-1-59593-732-2

Published:

08 May 2007

Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Get Alerts for this ConferenceAlerts Save to BinderBinder

Save to Binder

Create a New Binder

Name

Export CitationCitation

Share on

Bibliometrics

Citation count

372

Downloads (6 weeks)

Downloads (12 months)

Downloads (cumulative)

6,822

Sections

AIRWeb '07: Proceedings of the 3rd international workshop on Adversarial information retrieval on the web

2007

Previous Next

Skip Abstract Section

Abstract

The ubiquitous use of search engines to discover and access internet content shows clearly the success of information retrieval algorithms. However, unlike controlled collections, the vast majority of the Web pages lack an authority asserting their quality. This openness of the Web has been the key to its rapid growth and success, but this openness is also a major source of new adversarial challenges for information retrieval methods.

Adversarial Information Retrieval addresses tasks such as gathering, indexing, filtering, retrieving and ranking information from collections wherein a subset has been manipulated maliciously. On the Web, the predominant form of such manipulation is "search engine spamming" or spamdexing, i.e., malicious attempts to influence the outcome of ranking algorithms, aimed at getting an undeserved high ranking for some items in the collection. There is an economic incentive to rank higher in search engines, considering that a good ranking on them is strongly correlated with more traffic, which often translates to more revenue.

The third AIRWeb workshop in Banff, Canada as part of WWW2007 builds on the previous successful meetings at Chiba, Japan as part of WWW2005, and at Seattle, USA as part of SIGIR2006. The papers in this workshop proceedings provide a combination of mature and early-stage work in web-based adversarial IR.

Proceeding Downloads

PDFFront matter (Contents, Committees, Introduction)

Skip Table Of Content Section

Select All

Export Citations Save to Binder

SESSION: Temporal and topological factors

Article

Splog detection using self-similarity analysis on blog temporal dynamics

Yu-Ru Lin,
Hari Sundaram,
Yun Chi,
Junichi Tatemura,
Belle L. Tseng

pp 1–8https://doi.org/10.1145/1244408.1244410

This paper focuses on spam blog (splog) detection. Blogs are highly popular, new media social communication mechanisms. The presence of splogs degrades blog search results as well as wastes network resources. In our approach we exploit unique blog ...

- 32
- 711
Metrics
Total Citations32
Total Downloads711
Last 12 Months6
Last 6 weeks2

Abstract
Get Access

Article

Improving web spam classification using rank-time features

Krysta M. Svore,
Qiang Wu,
Chris J. C. Burges,
Aaswath Raman

pp 9–16https://doi.org/10.1145/1244408.1244411

In this paper, we study the classification of web spam. Web spam refers to pages that use techniques to mislead search engines into assigning them higher rank, thus increasing their site traffic. Our contributions are two fold. First, we find that the ...

- 21
- 559
Metrics
Total Citations21
Total Downloads559
Last 12 Months1
Last 6 weeks0

Abstract
Get Access

Article

Improving web spam classifiers using link structure

Qingqing Gan,
Torsten Suel

pp 17–20https://doi.org/10.1145/1244408.1244412

Web spam has been recognized as one of the top challenges in the search engine industry [14]. A lot of recent work has addressed the problem of detecting or demoting web spam, including both content spam [16, 12] and link spam [22, 13]. However, any ...

- 31
- 602
Metrics
Total Citations31
Total Downloads602
Last 12 Months1
Last 6 weeks0

Abstract
Get Access

Article

Transductive link spam detection

Dengyong Zhou,
Christopher J. C. Burges,
Tao Tao

pp 21–28https://doi.org/10.1145/1244408.1244413

Web spam can significantly deteriorate the quality of search engines. Early web spamming techniques mainly manipulate page content. Since linkage information is widely used in web search, link-based spamming has also developed. So far, many techniques ...

- 23
- 397
Metrics
Total Citations23
Total Downloads397
Last 12 Months2
Last 6 weeks0

Abstract
Get Access

SESSION: Link farms

Article

Using spam farm to boost PageRank

Ye Du,
Yaoyun Shi,
Xin Zhao

pp 29–36https://doi.org/10.1145/1244408.1244415

Nowadays web spamming has emerged to take the economic advantage of high search rankings and threatened the accuracy and fairness of those rankings. Understanding spamming techniques is essential for evaluating the strength and weakness of a ranking ...

- 21
- 515
Metrics
Total Citations21
Total Downloads515
Last 12 Months4
Last 6 weeks0

Abstract
Get Access

Article

Extracting link spam using biased random walks from spam seed sets

Baoning Wu,
Kumar Chellapilla

pp 37–44https://doi.org/10.1145/1244408.1244416

Link spam deliberately manipulates hyperlinks between web pages in order to unduly boost the search engine ranking of one or more target pages. Link based ranking algorithms such as PageRank, HITS, and other derivatives are especially vulnerable to link ...

- 16
- 380
Metrics
Total Citations16
Total Downloads380
Last 12 Months4
Last 6 weeks1

Abstract
Get Access

Article

A large-scale study of link spam detection by graph algorithms

Hiroo Saito,
Masashi Toyoda,
Masaru Kitsuregawa,
Kazuyuki Aihara

pp 45–48https://doi.org/10.1145/1244408.1244417

Link spam refers to attempts to promote the ranking of spammers' web sites by deceiving link-based ranking algorithms in search engines. Spammers often create densely connected link structure of sites so called "link farm". In this paper, we study the ...

- 40
- 502
Metrics
Total Citations40
Total Downloads502
Last 12 Months14
Last 6 weeks5

Abstract
Get Access

Article

Measuring similarity to detect qualified links

Xiaoguang Qi,
Lan Nie,
Brian D. Davison

pp 49–56https://doi.org/10.1145/1244408.1244418

The early success of link-based ranking algorithms was predicated on the assumption that links imply merit of the target pages. However, today many links exist for purposes other than to confer authority. Such links bring noise into link analysis and ...

- 13
- 311
Metrics
Total Citations13
Total Downloads311
Last 12 Months1
Last 6 weeks0

Abstract
Get Access

SESSION: Tagging, P2P, cloaking, and commercial intent

Article

Combating spam in tagging systems

Georgia Koutrika,
Frans Adjie Effendi,
Zoltán Gyöngyi,
Paul Heymann,
Hector Garcia-Molina

pp 57–64https://doi.org/10.1145/1244408.1244420

Tagging systems allow users to interactively annotate a pool of shared resources using descriptive tags. As tagging systems are gaining in popularity, they become more susceptible to tag spam: misleading tags that are generated in order to increase the ...

- 83
- 765
Metrics
Total Citations83
Total Downloads765
Last 12 Months0
Last 6 weeks0

Abstract
Get Access

Article

New metrics for reputation management in P2P networks

Debora Donato,
Mario Paniccia,
Maddalena Selis,
Carlos Castillo,
Giovanni Cortese,
Stefano Leonardi

pp 65–72https://doi.org/10.1145/1244408.1244421

In this work we study the effectiveness of mechanisms for decentralized reputation management in P2P networks. We depart from Eigen Trust, an algorithm designed for reputation management in file sharing applications over p2p networks. EigenTrust has ...

- 26
- 594
Metrics
Total Citations26
Total Downloads594
Last 12 Months10
Last 6 weeks0

Abstract
Get Access

Article

Computing trusted authority scores in peer-to-peer web search networks

Josiane Xavier Parreira,
Debora Donato,
Carlos Castillo,
Gerhard Weikum

pp 73–80https://doi.org/10.1145/1244408.1244422

Peer-to-peer (P2P) networks have received great attention for sharing and searching information in large user communities. The open and anonymous nature of P2P networks is one of its main strengths, but it also opens doors to manipulation of the ...

- 12
- 426
Metrics
Total Citations12
Total Downloads426
Last 12 Months0
Last 6 weeks0

Abstract
Get Access

Article

A taxonomy of JavaScript redirection spam

Kumar Chellapilla,
Alexey Maykov

pp 81–88https://doi.org/10.1145/1244408.1244423

Redirection spam presents a web page with false content to a crawler for indexing, but automatically redirects the browser to a different web page. Redirection is usually immediate (on page load) but may also be triggered by a timer or a harmless user ...

- 34
- 660
Metrics
Total Citations34
Total Downloads660
Last 12 Months17
Last 6 weeks2

Abstract
Get Access

Article

Web spam detection via commercial intent analysis

András Benczúr,
István Bíró,
Károly Csalogány,
Tamás Sarlós

pp 89–92https://doi.org/10.1145/1244408.1244424

We propose a number of features for Web spam filtering based on the occurrence of keywords that are either of high advertisement value or highly spammed. Our features include popular words from search engine query logs as well as high cost or volume ...

- 19
- 386
Metrics
Total Citations19
Total Downloads386
Last 12 Months3
Last 6 weeks0

Abstract
Get Access

Cited By

Castillo C, Chellapilla K and Davison B (2008). Adversarial Information Retrieval on the Web (AIRWeb 2007), ACM SIGIR Forum, 10.1145/1394251.1394267, 42:1, (68-72), Online publication date: 1-Jun-2008.

Save to Binder

Create a New Binder

Name

Contributors

Carlos Castillo
Pompeu Fabra University Barcelona
- Publication Years2002 - 2024
- Publication counts139
- Citation count8,654
- Available for Download114
- Downloads (cumulative)135,951
- Downloads (12 months)9,678
- Downloads (6 weeks)1,673
- Average Downloads per Article1,193
- Average Citation per Article62
View Full Profile
Kumar H Chellapilla
Microsoft Research
- Publication Years1996 - 2010
- Publication counts37
- Citation count757
- Available for Download12
- Downloads (cumulative)6,245
- Downloads (12 months)168
- Downloads (6 weeks)35
- Average Downloads per Article520
- Average Citation per Article20
View Full Profile
Brian D. Davison
Lehigh University
- Publication Years1997 - 2023
- Publication counts98
- Citation count2,969
- Available for Download71
- Downloads (cumulative)63,866
- Downloads (12 months)6,289
- Downloads (6 weeks)859
- Average Downloads per Article900
- Average Citation per Article30
View Full Profile

Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
1. Information systems
  1. Information retrieval

Recommendations

MIR '06: Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Read More
MIR '07: Proceedings of the international workshop on Workshop on multimedia information retrieval
Read More
MIR '05: Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval
Read More

Comments

Export Citations

Select Citation format

Please download or close your previous search result export first before starting a new bulk export.
Preview is not available.
By clicking download,a status dialog will open to start the export process. The process may takea few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress.
Download
- Download citation
- Copy citation

Save to Binder

Sections

Proceeding Downloads

Cited By

Save to Binder

Recommendations

MIR '06: Proceedings of the 8th ACM international workshop on Multimedia information retrieval

MIR '07: Proceedings of the international workshop on Workshop on multimedia information retrieval

MIR '05: Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval