Proceedings of the 2008 International Conference on Web Search and Data Mining

WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data Mining

February 2008

2008 Proceeding

General Chair:
Marc Najork
Microsoft, USA
,
Program Chairs:
Andrei Broder
Yahoo!, USA
,
Soumen Chakrabarti
IIT Bombay, India

Publisher:

Association for Computing Machinery
New York
NY
United States

Conference:

Palo Alto California USA February 11 - 12, 2008

ISBN:

978-1-59593-927-2

Published:

11 February 2008

Sponsors:

SIGMOD, SIGWEB, SIGKDD, ACM, SIGIR

Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Get Alerts for this ConferenceAlerts Save to BinderBinder

Save to Binder

Create a New Binder

Name

Export CitationCitation

Share on

Next Conference

WSDM '25

Sponsor:
sigir
sigkdd
sigmod
sigweb

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany

Bibliometrics

Citation count

4,589

Downloads (6 weeks)

249

Downloads (12 months)

1,670

Downloads (cumulative)

63,642

Sections

WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data Mining

2008

Previous Next

Skip Abstract Section

Abstract

WSDM was announced at WWW 2007 in Banff in May 2007 and thereafter on several electronic bulletin boards. Abstracts were sought by 30th July and full paper submissions by the 6th August.

Despite the rather short notice and tight deadlines, we received 151 submissions from around the world. With the help of the steering committee we decided on novel reviewing system and a two-tier technical program committee was formed There were 52 regular program committee members. Each paper was first reviewed by at least three regular PC members. After this phase was completed, we retained about 60 papers with the highest scores for a second round of evaluation by a senior program committee with 11 members. Each retained paper was reviewed by two senior PC members, who strove to ensure that all regular PC members had a consistent view of the contributions of the paper (although their opinions could, of course, differ quantitatively) and had written clear, well-justified and useful reviews for the authors. In many cases, the senior PCs effectively made accept/reject decisions. The final decision was made by the PC chairs who took into account all the scores and comments, novelty, technical depth, elegance, practical application, impact, and presentation. Notifications of acceptance of 24 full papers were sent out on 20th October

Overall, we are pleased with the quality and mix of the papers we accepted. Most are solidly practical papers with extensive experimental evaluation while a few are of a more theoretical nature, but we believe all of them have the potential to significantly influence the practice of Web search and mining in coming years. The acceptance ratio of 24/151 = 16 percent is consistent with the leading ACM and IEEE conferences in similar or related areas. For the first ever WSDM conference, we decided to have only a single track of full-length papers and not have short papers, poster papers, or demos, although this might change over time

Proceeding Downloads

PDF(title page, copyright, welcome from the conference chair, welcome from the technical program co-chairs contents, organization, sponsors)

PDF(author index)

Skip Table Of Content Section

Select All

Export Citations Save to Binder

SESSION: Crawling

research-article

Crawl ordering by search impact

Sandeep Pandey,
Christopher Olston

pp 3–14https://doi.org/10.1145/1341531.1341535

We study how to prioritize the fetching of new pages under the objective of maximizing the quality of search results. In particular, our objective is to fetch new pages that have the most impact, where the impact of a page is equal to the number of ...

- 20
- 865
Metrics
Total Citations20
Total Downloads865
Last 12 Months7
Last 6 weeks0

Abstract
Get Access

SESSION: Indexing and search

research-article

On placing skips optimally in expectation

Flavio Chierichetti,
Silvio Lattanzi,
Federico Mari,
Alessandro Panconesi

pp 15–24https://doi.org/10.1145/1341531.1341537

We study the problem of optimal skip placement in an inverted list. Assuming the query distribution to be known in advance, we formally prove that an optimal skip placement can be computed quite efficiently. Our best algorithm runs in time O (n log n), ...

- 14
- 411
Metrics
Total Citations14
Total Downloads411
Last 12 Months0
Last 6 weeks0

Abstract
Get Access

research-article

Disorder inequality: a combinatorial approach to nearest neighbor search

Navin Goyal,
Yury Lifshits,
Hinrich Schütze

pp 25–32https://doi.org/10.1145/1341531.1341538

We say that an algorithm for nearest neighbor search is combinatorial if only direct comparisons between two pairwise similarity values are allowed. Combinatorial algorithms for nearest neighbor search have two important advantages: (1) they do not map ...

- 24
- 448
Metrics
Total Citations24
Total Downloads448
Last 12 Months3
Last 6 weeks0

Abstract
Get Access

research-article

Beyond basic faceted search

Ori Ben-Yitzhak,
Nadav Golbandi,
Nadav Har'El,
Ronny Lempel,
Andreas Neumann,
Shila Ofek-Koifman,
Dafna Sheinwald,
Eugene Shekita,
Benjamin Sznajder,
Sivan Yogev

pp 33–44https://doi.org/10.1145/1341531.1341539

This paper extends traditional faceted search to support richer information discovery tasks over more complex data models. Our first extension adds exible, dynamic business intelligence aggregations to the faceted application, enabling users to gain ...

- 85
- 2,087
Metrics
Total Citations85
Total Downloads2,087
Last 12 Months29
Last 6 weeks7

Abstract
Get Access

research-article

Entropy of search logs: how hard is search? with personalization? with backoff?

Qiaozhu Mei,
Kenneth Church

pp 45–54https://doi.org/10.1145/1341531.1341540

How many pages are there on the Web? 5B? 20B? More? Less? Big bets on clusters in the clouds could be wiped out if a small cache of a few million urls could capture much of the value. Language modeling techniques are applied to MSN's search logs to ...

- 43
- 956
Metrics
Total Citations43
Total Downloads956
Last 12 Months8
Last 6 weeks1

Abstract
Get Access

SESSION: Ranking

research-article

Fast learning of document ranking functions with the committee perceptron

Jonathan L. Elsas,
Vitor R. Carvalho,
Jaime G. Carbonell

pp 55–64https://doi.org/10.1145/1341531.1341542

This paper presents a new variant of the perceptron algorithm using selective committee averaging (or voting). We apply this agorithm to the problem of learning ranking functions for document retrieval, known as the "Learning to Rank" problem. Most ...

- 17
- 609
Metrics
Total Citations17
Total Downloads609
Last 12 Months0
Last 6 weeks0

Abstract
Get Access

research-article

Ranking web sites with real user traffic

Mark R. Meiss,
Filippo Menczer,
Santo Fortunato,
Alessandro Flammini,
Alessandro Vespignani

pp 65–76https://doi.org/10.1145/1341531.1341543

We analyze the traffic-weighted Web host graph obtained from a large sample of real Web users over about seven months. A number of interesting structural properties are revealed by this complex dynamic network, some in line with the well-studied boolean ...

- 54
- 1,128
Metrics
Total Citations54
Total Downloads1,128
Last 12 Months13
Last 6 weeks2

Abstract
Get Access

research-article

SoftRank: optimizing non-smooth rank metrics

Michael Taylor,
John Guiver,
Stephen Robertson,
Tom Minka

pp 77–86https://doi.org/10.1145/1341531.1341544

We address the problem of learning large complex ranking functions. Most IR applications use evaluation metrics that depend only upon the ranks of documents. However, most ranking functions generate document scores, which are sorted to produce a ...

- 221
- 1,322
Metrics
Total Citations221
Total Downloads1,322
Last 12 Months59
Last 6 weeks6

Abstract
Get Access

research-article

An experimental comparison of click position-bias models

Nick Craswell,
Onno Zoeter,
Michael Taylor,
Bill Ramsey

pp 87–94https://doi.org/10.1145/1341531.1341545

Search engine click logs provide an invaluable source of relevance information, but this information is biased. A key source of bias is presentation order: the probability of click is influenced by a document's position in the results page. This paper ...

- 576
- 3,531
Metrics
Total Citations576
Total Downloads3,531
Last 12 Months238
Last 6 weeks31

Abstract
Get Access

SESSION: Graph mining

research-article

A scalable pattern mining approach to web graph compression with communities

Gregory Buehrer,
Kumar Chellapilla

pp 95–106https://doi.org/10.1145/1341531.1341547

A link server is a system designed to support efficient implementations of graph computations on the web graph. In this work, we present a compression scheme for the web graph specifically designed to accommodate community queries and other random ...

- 164
- 1,564
Metrics
Total Citations164
Total Downloads1,564
Last 12 Months63
Last 6 weeks15

Abstract
Get Access

research-article

Collaboration over time: characterizing and modeling network evolution

Jian Huang,
Ziming Zhuang,
Jia Li,
C. Lee Giles

pp 107–116https://doi.org/10.1145/1341531.1341548

A formal type of scientific and academic collaboration is coauthorship which can be represented by a coauthorship network. Coauthorship networks are among some of the largest social networks and offer us the opportunity to study the mechanisms ...

- 76
- 1,179
Metrics
Total Citations76
Total Downloads1,179
Last 12 Months41
Last 6 weeks8

Abstract
Get Access

research-article

Preferential behavior in online groups

Lars Backstrom,
Ravi Kumar,
Cameron Marlow,
Jasmine Novak,
Andrew Tomkins

pp 117–128https://doi.org/10.1145/1341531.1341549

Online communities in the form of message boards, listservs, and newsgroups continue to represent a considerable amount of the social activity on the Internet. Every year thousands of groups ourish while others decline into relative obscurity; likewise, ...

- 62
- 1,230
Metrics
Total Citations62
Total Downloads1,230
Last 12 Months14
Last 6 weeks1

Abstract
Get Access

research-article

Connectivity structure of bipartite graphs via the KNC-plot

Ravi Kumar,
Andrew Tomkins,
Erik Vee

pp 129–138https://doi.org/10.1145/1341531.1341550

In this paper we introduce the k-neighbor connectivity plot, or KNC-plot, as a tool to study the macroscopic connectiv-ity structure of sparse bipartite graphs. Given a bipartite graph G = (U, V, E), we say that two nodes in U are k-neighbors if there ...

- 17
- 520
Metrics
Total Citations17
Total Downloads520
Last 12 Months2
Last 6 weeks0

Abstract
Get Access

SESSION: Classification

research-article

Deep classifier: automatically categorizing search results into large-scale hierarchies

Dikan Xing,
Gui-Rong Xue,
Qiang Yang,
Yong Yu

pp 139–148https://doi.org/10.1145/1341531.1341552

Organizing Web search results into hierarchical categories facilitates users' browsing through Web search results, especially for ambiguous queries where the potential results are mixed together. Previous methods on search result classification are ...

- 19
- 664
Metrics
Total Citations19
Total Downloads664
Last 12 Months3
Last 6 weeks0

Abstract
Get Access

research-article

Personal name classification in web queries

Dou Shen,
Toby Walkery,
Zijian Zhengy,
Qiang Yangz,
Ying Li

pp 149–158https://doi.org/10.1145/1341531.1341553

Personal names are an important kind of Web queries in Web search, and yet they are special in many ways. Strategies for retrieving information on personal names should therefore be different from the strategies for other types of queries. To improve ...

- 9
- 692
Metrics
Total Citations9
Total Downloads692
Last 12 Months2
Last 6 weeks0

Abstract
Get Access

research-article

Understanding temporal aspects in document classification

Fernando Mourão,
Leonardo Rocha,
Renata Araújo,
Thierson Couto,
Marcos Gonçalves,
Wagner Meira

pp 159–170https://doi.org/10.1145/1341531.1341554

Due to the increasing amount of information present on the Web, Automatic Document Classification (ADC) has become an important research topic. ADC usually follows a standard supervised learning strategy, where we first build a model using preclassified ...

- 20
- 694
Metrics
Total Citations20
Total Downloads694
Last 12 Months9
Last 6 weeks1

Abstract
Get Access

SESSION: Social search

research-article

On ranking controversies in wikipedia: models and evaluation

Ba-Quy Vuong,
Ee-Peng Lim,
Aixin Sun,
Minh-Tam Le,
Hady Wirawan Lauw,
Kuiyu Chang

pp 171–182https://doi.org/10.1145/1341531.1341556

Wikipedia 1 is a very large and successful Web 2.0 example. As the number of Wikipedia articles and contributors grows at a very fast pace, there are also increasing disputes occurring among the contributors. Disputes often happen in articles with ...

- 57
- 1,118
Metrics
Total Citations57
Total Downloads1,118
Last 12 Months19
Last 6 weeks4

Abstract
Get Access

research-article

Finding high-quality content in social media

Eugene Agichtein,
Carlos Castillo,
Debora Donato,
Aristides Gionis,
Gilad Mishne

pp 183–194https://doi.org/10.1145/1341531.1341557

The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content sites based on user contributions --social media sites -- becomes ...

- 763
- 21,979
Metrics
Total Citations763
Total Downloads21,979
Last 12 Months605
Last 6 weeks99

Abstract
Get Access

research-article

Can social bookmarking improve web search?

Paul Heymann,
Georgia Koutrika,
Hector Garcia-Molina

pp 195–206https://doi.org/10.1145/1341531.1341558

Social bookmarking is a recent phenomenon which has the potential to give us a great deal of data about pages on the web. One major question is whether that data can be used to augment systems like web search. To answer this question, over the past year ...

- 306
- 2,990
Metrics
Total Citations306
Total Downloads2,990
Last 12 Months19
Last 6 weeks3

Abstract
Get Access

research-article

Identifying the influential bloggers in a community

Nitin Agarwal,
Huan Liu,
Lei Tang,
Philip S. Yu

pp 207–218https://doi.org/10.1145/1341531.1341559

Blogging becomes a popular way for a Web user to publish information on the Web. Bloggers write blog posts, share their likes and dislikes, voice their opinions, provide suggestions, report news, and form groups in Blogosphere. Bloggers form their ...

- 331
- 4,138
Metrics
Total Citations331
Total Downloads4,138
Last 12 Months86
Last 6 weeks7

Abstract
Get Access

research-article

Opinion spam and analysis

Nitin Jindal,
Bing Liu

pp 219–230https://doi.org/10.1145/1341531.1341560

Evaluative texts on the Web have become a valuable source of opinions on products, services, events, individuals, etc. Recently, many researchers have studied such opinion sources as product reviews, forum posts, and blogs. However, existing research ...

- 841
- 5,466
Metrics
Total Citations841
Total Downloads5,466
Last 12 Months253
Last 6 weeks33

Abstract
Get Access

research-article

A holistic lexicon-based approach to opinion mining

Xiaowen Ding,
Bing Liu,
Philip S. Yu

pp 231–240https://doi.org/10.1145/1341531.1341561

One of the important types of information on the Web is the opinions expressed in the user generated content, e.g., customer reviews of products, forum posts, and blogs. In this paper, we focus on customer reviews of products. In particular, we study ...

- 767
- 5,847
Metrics
Total Citations767
Total Downloads5,847
Last 12 Months136
Last 6 weeks21

Abstract
Get Access

SESSION: Advertising

research-article

An empirical analysis of sponsored search performance in search engine advertising

Anindya Ghose,
Sha Yang

pp 241–250https://doi.org/10.1145/1341531.1341563

The phenomenon of sponsored search advertising - where advertisers pay a fee to Internet search engines to be displayed alongside organic (non-sponsored) web search results - is gaining ground as the largest source of revenues for search engines. ...

- 25
- 1,641
Metrics
Total Citations25
Total Downloads1,641
Last 12 Months45
Last 6 weeks8

Abstract
Get Access

research-article

Advertising keyword suggestion based on concept hierarchy

Yifan Chen,
Gui-Rong Xue,
Yong Yu

pp 251–260https://doi.org/10.1145/1341531.1341564

The increasing growth of the World Wide Web constantly enlarges the revenue generated by search engine advertising. Advertisers bid on keywords associated with their products to display their ads on the search result pages. Keyword suggestion methods ...

- 70
- 1,550
Metrics
Total Citations70
Total Downloads1,550
Last 12 Months15
Last 6 weeks2

Abstract
Get Access

invited-talk

Web information management: past, present and future

Hector Garcia-Molina

pp 1https://doi.org/10.1145/1341531.1341532

In this talk I will give a brief retrospective on Web Information Management, and will discuss some of the key challenges for the future. I will not give a survey of all work in the area; instead I will give my personal perspective based on work in the ...

- 2
- 551
Metrics
Total Citations2
Total Downloads551
Last 12 Months1
Last 6 weeks0

Abstract
Get Access

invited-talk

Machine reading at web scale

Oren Etzioni

pp 2https://doi.org/10.1145/1341531.1341533

- 1
- 368
Metrics
Total Citations1
Total Downloads368
Last 12 Months0
Last 6 weeks0

Get Access

Cited By

Färber M, Coutinho M and Yuan S (2023). Biases in scholarly recommender systems: impact, prevalence, and mitigation, Scientometrics, 10.1007/s11192-023-04636-2, 128:5, (2703-2736), Online publication date: 1-May-2023.
DEHKHARGHANI R, YANIKOGLU B, SAYGIN Y and OFLAZER K (2016). Sentiment analysis in Turkish at different granularity levels, Natural Language Engineering, 10.1017/S1351324916000309, 23:4, (535-559), Online publication date: 1-Jul-2017.
Chan H, Lacka E, Yee R and Lim M (2015). The role of social media data in operations and production management, International Journal of Production Research, 10.1080/00207543.2015.1053998, 55:17, (5027-5036), Online publication date: 2-Sep-2017.
Broome B, Hanratty T, Hall D, Llinas J, Kase S, Vanni M, Knight J, Su Y and Yan X (2016). Visual graph query formulation and exploration: a new perspective on information retrieval at the edge SPIE Defense + Security, 10.1117/12.2228380, , (985104), Online publication date: 12-May-2016.
King I, Li J and Chan K A brief survey of computational approaches in social computing Proceedings of the 2009 international joint conference on Neural Networks, (2699-2706)

Save to Binder

Create a New Binder

Name

Contributors

Marc A. Najork
Google LLC
- Publication Years1990 - 2023
- Publication counts106
- Citation count3,139
- Available for Download77
- Downloads (cumulative)105,719
- Downloads (12 months)37,739
- Downloads (6 weeks)6,380
- Average Downloads per Article1,373
- Average Citation per Article30
View Full Profile
Andrei Z. Broder
Google LLC
- Publication Years1982 - 2024
- Publication counts130
- Citation count6,966
- Available for Download77
- Downloads (cumulative)73,509
- Downloads (12 months)2,488
- Downloads (6 weeks)326
- Average Downloads per Article955
- Average Citation per Article54
View Full Profile
Soumen Chakrabarti
- Publication Years
- Publication counts0
- Citation count0
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article0
View Full Profile

Recommendations

WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining
Read More
WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining
Read More
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining
Read More

Acceptance Rates

Overall Acceptance Rate498of2,863submissions,17%

Year	Submitted	Accepted	Rate
WSDM '19	511	84	16%
WSDM '18	514	81	16%
WSDM '17	505	80	16%
WSDM '16	368	67	18%
WSDM '15	238	39	16%
WSDM '14	355	64	18%
WSDM '11	372	83	22%
Overall	2,863	498	17%

Comments

Export Citations

Select Citation format

Please download or close your previous search result export first before starting a new bulk export.
Preview is not available.
By clicking download,a status dialog will open to start the export process. The process may takea few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress.
Download
- Download citation
- Copy citation

Save to Binder

Sections

Proceeding Downloads

Cited By

Save to Binder

Recommendations

WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining

WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

Acceptance Rates