WSDM was announced at WWW 2007 in Banff in May 2007 and thereafter on several electronic bulletin boards. Abstracts were sought by 30th July and full paper submissions by the 6th August.
Despite the rather short notice and tight deadlines, we received 151 submissions from around the world. With the help of the steering committee we decided on novel reviewing system and a two-tier technical program committee was formed There were 52 regular program committee members. Each paper was first reviewed by at least three regular PC members. After this phase was completed, we retained about 60 papers with the highest scores for a second round of evaluation by a senior program committee with 11 members. Each retained paper was reviewed by two senior PC members, who strove to ensure that all regular PC members had a consistent view of the contributions of the paper (although their opinions could, of course, differ quantitatively) and had written clear, well-justified and useful reviews for the authors. In many cases, the senior PCs effectively made accept/reject decisions. The final decision was made by the PC chairs who took into account all the scores and comments, novelty, technical depth, elegance, practical application, impact, and presentation. Notifications of acceptance of 24 full papers were sent out on 20th October
Overall, we are pleased with the quality and mix of the papers we accepted. Most are solidly practical papers with extensive experimental evaluation while a few are of a more theoretical nature, but we believe all of them have the potential to significantly influence the practice of Web search and mining in coming years. The acceptance ratio of 24/151 = 16 percent is consistent with the leading ACM and IEEE conferences in similar or related areas. For the first ever WSDM conference, we decided to have only a single track of full-length papers and not have short papers, poster papers, or demos, although this might change over time
Proceeding Downloads
Crawl ordering by search impact
We study how to prioritize the fetching of new pages under the objective of maximizing the quality of search results. In particular, our objective is to fetch new pages that have the most impact, where the impact of a page is equal to the number of ...
On placing skips optimally in expectation
We study the problem of optimal skip placement in an inverted list. Assuming the query distribution to be known in advance, we formally prove that an optimal skip placement can be computed quite efficiently. Our best algorithm runs in time O (n log n), ...
Disorder inequality: a combinatorial approach to nearest neighbor search
We say that an algorithm for nearest neighbor search is combinatorial if only direct comparisons between two pairwise similarity values are allowed. Combinatorial algorithms for nearest neighbor search have two important advantages: (1) they do not map ...
Beyond basic faceted search
- Ori Ben-Yitzhak,
- Nadav Golbandi,
- Nadav Har'El,
- Ronny Lempel,
- Andreas Neumann,
- Shila Ofek-Koifman,
- Dafna Sheinwald,
- Eugene Shekita,
- Benjamin Sznajder,
- Sivan Yogev
This paper extends traditional faceted search to support richer information discovery tasks over more complex data models. Our first extension adds exible, dynamic business intelligence aggregations to the faceted application, enabling users to gain ...
Entropy of search logs: how hard is search? with personalization? with backoff?
How many pages are there on the Web? 5B? 20B? More? Less? Big bets on clusters in the clouds could be wiped out if a small cache of a few million urls could capture much of the value. Language modeling techniques are applied to MSN's search logs to ...
Fast learning of document ranking functions with the committee perceptron
This paper presents a new variant of the perceptron algorithm using selective committee averaging (or voting). We apply this agorithm to the problem of learning ranking functions for document retrieval, known as the "Learning to Rank" problem. Most ...
Ranking web sites with real user traffic
We analyze the traffic-weighted Web host graph obtained from a large sample of real Web users over about seven months. A number of interesting structural properties are revealed by this complex dynamic network, some in line with the well-studied boolean ...
SoftRank: optimizing non-smooth rank metrics
We address the problem of learning large complex ranking functions. Most IR applications use evaluation metrics that depend only upon the ranks of documents. However, most ranking functions generate document scores, which are sorted to produce a ...
An experimental comparison of click position-bias models
Search engine click logs provide an invaluable source of relevance information, but this information is biased. A key source of bias is presentation order: the probability of click is influenced by a document's position in the results page. This paper ...
A scalable pattern mining approach to web graph compression with communities
A link server is a system designed to support efficient implementations of graph computations on the web graph. In this work, we present a compression scheme for the web graph specifically designed to accommodate community queries and other random ...
Collaboration over time: characterizing and modeling network evolution
A formal type of scientific and academic collaboration is coauthorship which can be represented by a coauthorship network. Coauthorship networks are among some of the largest social networks and offer us the opportunity to study the mechanisms ...
Preferential behavior in online groups
Online communities in the form of message boards, listservs, and newsgroups continue to represent a considerable amount of the social activity on the Internet. Every year thousands of groups ourish while others decline into relative obscurity; likewise, ...
Connectivity structure of bipartite graphs via the KNC-plot
In this paper we introduce the k-neighbor connectivity plot, or KNC-plot, as a tool to study the macroscopic connectiv-ity structure of sparse bipartite graphs. Given a bipartite graph G = (U, V, E), we say that two nodes in U are k-neighbors if there ...
Deep classifier: automatically categorizing search results into large-scale hierarchies
Organizing Web search results into hierarchical categories facilitates users' browsing through Web search results, especially for ambiguous queries where the potential results are mixed together. Previous methods on search result classification are ...
Personal name classification in web queries
Personal names are an important kind of Web queries in Web search, and yet they are special in many ways. Strategies for retrieving information on personal names should therefore be different from the strategies for other types of queries. To improve ...
Understanding temporal aspects in document classification
Due to the increasing amount of information present on the Web, Automatic Document Classification (ADC) has become an important research topic. ADC usually follows a standard supervised learning strategy, where we first build a model using preclassified ...
On ranking controversies in wikipedia: models and evaluation
Wikipedia 1 is a very large and successful Web 2.0 example. As the number of Wikipedia articles and contributors grows at a very fast pace, there are also increasing disputes occurring among the contributors. Disputes often happen in articles with ...
Finding high-quality content in social media
The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content sites based on user contributions --social media sites -- becomes ...
Can social bookmarking improve web search?
Social bookmarking is a recent phenomenon which has the potential to give us a great deal of data about pages on the web. One major question is whether that data can be used to augment systems like web search. To answer this question, over the past year ...
Identifying the influential bloggers in a community
Blogging becomes a popular way for a Web user to publish information on the Web. Bloggers write blog posts, share their likes and dislikes, voice their opinions, provide suggestions, report news, and form groups in Blogosphere. Bloggers form their ...
Opinion spam and analysis
Evaluative texts on the Web have become a valuable source of opinions on products, services, events, individuals, etc. Recently, many researchers have studied such opinion sources as product reviews, forum posts, and blogs. However, existing research ...
A holistic lexicon-based approach to opinion mining
One of the important types of information on the Web is the opinions expressed in the user generated content, e.g., customer reviews of products, forum posts, and blogs. In this paper, we focus on customer reviews of products. In particular, we study ...
An empirical analysis of sponsored search performance in search engine advertising
The phenomenon of sponsored search advertising - where advertisers pay a fee to Internet search engines to be displayed alongside organic (non-sponsored) web search results - is gaining ground as the largest source of revenues for search engines. ...
Advertising keyword suggestion based on concept hierarchy
The increasing growth of the World Wide Web constantly enlarges the revenue generated by search engine advertising. Advertisers bid on keywords associated with their products to display their ads on the search result pages. Keyword suggestion methods ...
Web information management: past, present and future
In this talk I will give a brief retrospective on Web Information Management, and will discuss some of the key challenges for the future. I will not give a survey of all work in the area; instead I will give my personal perspective based on work in the ...
Cited By
-
Färber M, Coutinho M and Yuan S (2023). Biases in scholarly recommender systems: impact, prevalence, and mitigation, Scientometrics, 10.1007/s11192-023-04636-2, 128:5, (2703-2736), Online publication date: 1-May-2023.
-
DEHKHARGHANI R, YANIKOGLU B, SAYGIN Y and OFLAZER K (2016). Sentiment analysis in Turkish at different granularity levels, Natural Language Engineering, 10.1017/S1351324916000309, 23:4, (535-559), Online publication date: 1-Jul-2017.
-
Chan H, Lacka E, Yee R and Lim M (2015). The role of social media data in operations and production management, International Journal of Production Research, 10.1080/00207543.2015.1053998, 55:17, (5027-5036), Online publication date: 2-Sep-2017.
-
Broome B, Hanratty T, Hall D, Llinas J, Kase S, Vanni M, Knight J, Su Y and Yan X (2016). Visual graph query formulation and exploration: a new perspective on information retrieval at the edge SPIE Defense + Security, 10.1117/12.2228380, , (985104), Online publication date: 12-May-2016.
- King I, Li J and Chan K A brief survey of computational approaches in social computing Proceedings of the 2009 international joint conference on Neural Networks, (2699-2706)