skip to main content
10.1145/1341531acmconferencesBook PagePublication PageswsdmConference Proceedingsconference-collections
WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data Mining
ACM2008 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
Palo Alto California USA February 11 - 12, 2008
ISBN:
978-1-59593-927-2
Published:
11 February 2008
Next Conference
Bibliometrics
Skip Abstract Section
Abstract

WSDM was announced at WWW 2007 in Banff in May 2007 and thereafter on several electronic bulletin boards. Abstracts were sought by 30th July and full paper submissions by the 6th August.

Despite the rather short notice and tight deadlines, we received 151 submissions from around the world. With the help of the steering committee we decided on novel reviewing system and a two-tier technical program committee was formed There were 52 regular program committee members. Each paper was first reviewed by at least three regular PC members. After this phase was completed, we retained about 60 papers with the highest scores for a second round of evaluation by a senior program committee with 11 members. Each retained paper was reviewed by two senior PC members, who strove to ensure that all regular PC members had a consistent view of the contributions of the paper (although their opinions could, of course, differ quantitatively) and had written clear, well-justified and useful reviews for the authors. In many cases, the senior PCs effectively made accept/reject decisions. The final decision was made by the PC chairs who took into account all the scores and comments, novelty, technical depth, elegance, practical application, impact, and presentation. Notifications of acceptance of 24 full papers were sent out on 20th October

Overall, we are pleased with the quality and mix of the papers we accepted. Most are solidly practical papers with extensive experimental evaluation while a few are of a more theoretical nature, but we believe all of them have the potential to significantly influence the practice of Web search and mining in coming years. The acceptance ratio of 24/151 = 16 percent is consistent with the leading ACM and IEEE conferences in similar or related areas. For the first ever WSDM conference, we decided to have only a single track of full-length papers and not have short papers, poster papers, or demos, although this might change over time

Skip Table Of Content Section
SESSION: Crawling
research-article
Crawl ordering by search impact

We study how to prioritize the fetching of new pages under the objective of maximizing the quality of search results. In particular, our objective is to fetch new pages that have the most impact, where the impact of a page is equal to the number of ...

SESSION: Indexing and search
research-article
On placing skips optimally in expectation

We study the problem of optimal skip placement in an inverted list. Assuming the query distribution to be known in advance, we formally prove that an optimal skip placement can be computed quite efficiently. Our best algorithm runs in time O (n log n), ...

research-article
Disorder inequality: a combinatorial approach to nearest neighbor search

We say that an algorithm for nearest neighbor search is combinatorial if only direct comparisons between two pairwise similarity values are allowed. Combinatorial algorithms for nearest neighbor search have two important advantages: (1) they do not map ...

research-article
Beyond basic faceted search

This paper extends traditional faceted search to support richer information discovery tasks over more complex data models. Our first extension adds exible, dynamic business intelligence aggregations to the faceted application, enabling users to gain ...

research-article
Entropy of search logs: how hard is search? with personalization? with backoff?

How many pages are there on the Web? 5B? 20B? More? Less? Big bets on clusters in the clouds could be wiped out if a small cache of a few million urls could capture much of the value. Language modeling techniques are applied to MSN's search logs to ...

SESSION: Ranking
research-article
Fast learning of document ranking functions with the committee perceptron

This paper presents a new variant of the perceptron algorithm using selective committee averaging (or voting). We apply this agorithm to the problem of learning ranking functions for document retrieval, known as the "Learning to Rank" problem. Most ...

research-article
Ranking web sites with real user traffic

We analyze the traffic-weighted Web host graph obtained from a large sample of real Web users over about seven months. A number of interesting structural properties are revealed by this complex dynamic network, some in line with the well-studied boolean ...

research-article
SoftRank: optimizing non-smooth rank metrics

We address the problem of learning large complex ranking functions. Most IR applications use evaluation metrics that depend only upon the ranks of documents. However, most ranking functions generate document scores, which are sorted to produce a ...

research-article
An experimental comparison of click position-bias models

Search engine click logs provide an invaluable source of relevance information, but this information is biased. A key source of bias is presentation order: the probability of click is influenced by a document's position in the results page. This paper ...

SESSION: Graph mining
research-article
A scalable pattern mining approach to web graph compression with communities

A link server is a system designed to support efficient implementations of graph computations on the web graph. In this work, we present a compression scheme for the web graph specifically designed to accommodate community queries and other random ...

research-article
Collaboration over time: characterizing and modeling network evolution

A formal type of scientific and academic collaboration is coauthorship which can be represented by a coauthorship network. Coauthorship networks are among some of the largest social networks and offer us the opportunity to study the mechanisms ...

research-article
Preferential behavior in online groups

Online communities in the form of message boards, listservs, and newsgroups continue to represent a considerable amount of the social activity on the Internet. Every year thousands of groups ourish while others decline into relative obscurity; likewise, ...

research-article
Connectivity structure of bipartite graphs via the KNC-plot

In this paper we introduce the k-neighbor connectivity plot, or KNC-plot, as a tool to study the macroscopic connectiv-ity structure of sparse bipartite graphs. Given a bipartite graph G = (U, V, E), we say that two nodes in U are k-neighbors if there ...

SESSION: Classification
research-article
Deep classifier: automatically categorizing search results into large-scale hierarchies

Organizing Web search results into hierarchical categories facilitates users' browsing through Web search results, especially for ambiguous queries where the potential results are mixed together. Previous methods on search result classification are ...

research-article
Personal name classification in web queries

Personal names are an important kind of Web queries in Web search, and yet they are special in many ways. Strategies for retrieving information on personal names should therefore be different from the strategies for other types of queries. To improve ...

research-article
Understanding temporal aspects in document classification

Due to the increasing amount of information present on the Web, Automatic Document Classification (ADC) has become an important research topic. ADC usually follows a standard supervised learning strategy, where we first build a model using preclassified ...

SESSION: Social search
research-article
On ranking controversies in wikipedia: models and evaluation

Wikipedia 1 is a very large and successful Web 2.0 example. As the number of Wikipedia articles and contributors grows at a very fast pace, there are also increasing disputes occurring among the contributors. Disputes often happen in articles with ...

research-article
Finding high-quality content in social media

The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content sites based on user contributions --social media sites -- becomes ...

research-article
Can social bookmarking improve web search?

Social bookmarking is a recent phenomenon which has the potential to give us a great deal of data about pages on the web. One major question is whether that data can be used to augment systems like web search. To answer this question, over the past year ...

research-article
Identifying the influential bloggers in a community

Blogging becomes a popular way for a Web user to publish information on the Web. Bloggers write blog posts, share their likes and dislikes, voice their opinions, provide suggestions, report news, and form groups in Blogosphere. Bloggers form their ...

research-article
Opinion spam and analysis

Evaluative texts on the Web have become a valuable source of opinions on products, services, events, individuals, etc. Recently, many researchers have studied such opinion sources as product reviews, forum posts, and blogs. However, existing research ...

research-article
A holistic lexicon-based approach to opinion mining

One of the important types of information on the Web is the opinions expressed in the user generated content, e.g., customer reviews of products, forum posts, and blogs. In this paper, we focus on customer reviews of products. In particular, we study ...

SESSION: Advertising
research-article
An empirical analysis of sponsored search performance in search engine advertising

The phenomenon of sponsored search advertising - where advertisers pay a fee to Internet search engines to be displayed alongside organic (non-sponsored) web search results - is gaining ground as the largest source of revenues for search engines. ...

research-article
Advertising keyword suggestion based on concept hierarchy

The increasing growth of the World Wide Web constantly enlarges the revenue generated by search engine advertising. Advertisers bid on keywords associated with their products to display their ads on the search result pages. Keyword suggestion methods ...

invited-talk
Web information management: past, present and future

In this talk I will give a brief retrospective on Web Information Management, and will discuss some of the key challenges for the future. I will not give a survey of all work in the area; instead I will give my personal perspective based on work in the ...

invited-talk
Machine reading at web scale
Contributors
  • Google LLC
  • Google LLC

Recommendations

Acceptance Rates

Overall Acceptance Rate498of2,863submissions,17%
YearSubmittedAcceptedRate
WSDM '195118416%
WSDM '185148116%
WSDM '175058016%
WSDM '163686718%
WSDM '152383916%
WSDM '143556418%
WSDM '113728322%
Overall2,86349817%