Skip to main content
Top

2015 | Book

Advances in Information Retrieval

37th European Conference on IR Research, ECIR 2015, Vienna, Austria, March 29 - April 2, 2015. Proceedings

Editors: Allan Hanbury, Gabriella Kazai, Andreas Rauber, Norbert Fuhr

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

This book constitutes the proceedings of the 37th European Conference on IR Research, ECIR 2015, held in Vienna, Austria, in March/April 2015. The 44 full papers, 41 poster papers and 7 demonstrations presented together with 3 keynotes in this volume were carefully reviewed and selected from 305 submissions. The focus of the papers were on following topics: aggregated search and diversity, classification, cross-lingual and discourse, efficiency, evaluation, event mining and summarisation, information extraction, recommender systems, semantic and graph-based models, sentiment and opinion, social media, specific search tasks, temporal models and features, topic and document models, user behavior and reproducible IR.

Table of Contents

Frontmatter

Aggregated Search and Diversity

Towards Query Level Resource Weighting for Diversified Query Expansion

Diversifying query expansion that leverages multiple resources has demonstrated promising results in the task of search result diversification (SRD) on several benchmark datasets. In existing studies, however, the weight of a resource, or the degree of the contribution of that resource to SRD, is largely ignored. In this work, we present a query level resource weighting method based on a set of features which are integrated into a regression model. Accordingly, we develop an SRD system which generates for a resource a number of expansion candidates that is proportional to the weight of that resource. We thoroughly evaluate our approach on TREC 2009, 2010 and 2011 Web tracks, and show that: 1) our system outperforms the existing methods without resource weighting; and 2) query level resource weighting is superior to the non-query level resource weighting.

Arbi Bouchoucha, Xiaohua Liu, Jian-Yun Nie
Exploring Composite Retrieval from the Users’ Perspective

Aggregating results from heterogeneous sources and presenting them in a blended interface –

aggregated search

– has become standard practice for most commercial Web search engines.

Composite retrieval

is emerging as a new search paradigm, where users are presented with semantically aggregated information objects, called

bundles

, containing results originating from different verticals. In this paper we study

composite retrieval

from the user perspective. We conducted an exploratory user study where 40 participants were required to manually generate

bundles

that satisfy various information needs, using heterogeneous results retrieved by modern search engines. Our main objective was to analyse the contents and characteristics of user-generated bundles. Our results show that users generate bundles on common

subtopics

, centred around

pivot

documents, and that they favour bundles that are

relevant

,

diverse

and

cohesive

.

Horaţiu Bota, Ke Zhou, Joemon J. Jose
Improving Aggregated Search Coherence

Aggregated search is that task of blending results from different search services, or

verticals

, into the core web results. Aggregated search coherence is the extent to which results from different sources focus on similar senses of an ambiguous or underspecified query. Prior research studied the effect of aggregated search coherence on search behavior and found that the query-senses in the vertical results can affect user interaction with the web results. In this work, we develop and evaluate algorithms for

vertical results selection

—deciding which results from a particular vertical to display. Results from a large-scale user study suggest that algorithms that improve the level of coherence between the vertical and web results influence users to make more productive decisions with respect to the web results—to engage with the web results when at least one of them is relevant and, to a lesser extent, to

avoid

engaging with the web results otherwise.

Jaime Arguello
On-topic Cover Stories from News Archives

While Web or newspaper archives store large amounts of articles, they also contain a lot of near-duplicate information. Examples include articles about the same event published by multiple news agencies or articles about evolving events that lead to copies of paragraphs to provide background information. To support journalists, who attempt to read all information on a given topic at once, we propose an approach that, given a topic and a text collection, extracts a set of articles with broad coverage of the topic and minimum amount of duplicates.

We start by extracting articles related to the input topic and detecting duplicate paragraphs. We keep only one instance from each group of duplicates by using a weighted quadratic optimization problem. It finds the best position for all paragraphs, such that some articles consist mainly of distinct paragraphs and others consist mainly of duplicates. Finally, we present to the reader the articles with more distinct paragraphs. Our experiments show the high precision and recall of our approach.

Christian Schulte, Bilyana Taneva, Gerhard Weikum

Classification

Multi-emotion Detection in User-Generated Reviews

Expressions of emotion abound in user-generated content, whether it be in blogs, reviews, or on social media. Much work has been devoted to detecting and classifying these emotions, but little of it has acknowledged the fact that emotionally charged text may express multiple emotions at the same time. We describe a new dataset of user-generated movie reviews annotated for emotional expressions, and experimentally validate two algorithms that can detect multiple emotions in each sentence of these reviews.

Lars Buitinck, Jesse van Amerongen, Ed Tan, Maarten de Rijke
Classification of Historical Notary Acts with Noisy Labels

This paper approaches the problem of automatic classification of real-world historical notary acts from the 14

th

to the 20

th

century. We deal with category ambiguity, noisy labels and imbalanced data. Our goal is to assign an appropriate category for each notary act from the archive collection. We investigate a variety of existing techniques and describe a framework for dealing with noisy labels which includes category resolution, evaluation of inter-annotator agreement and the application of a two level classification. The maximum accuracy we achieve is 88%, which is comparable to the agreement between human annotators.

Julia Efremova, Alejandro Montes García, Toon Calders
ConceptFusion: A Flexible Scene Classification Framework

We introduce ConceptFusion, a method that aims high accuracy in categorizing large number of scenes, while keeping the model relatively simpler and efficient for scalability. The proposed method combines the advantages of both low-level representations and high-level semantic categories, and eliminates the distinctions between different levels through the definition of concepts. The proposed framework encodes the perspectives brought through different concepts by considering them in concept groups that are ensembled for the final decision. Experiments carried out on benchmark datasets show the effectiveness of incorporating concepts in different levels with different perspectives.

Mustafa Ilker Sarac, Ahmet Iscen, Eren Golge, Pinar Duygulu
An Audio-Visual Approach to Music Genre Classification through Affective Color Features

This paper presents a study on classifying music by affective visual information extracted frommusic videos. The proposed audio-visual approach analyzes genre specific utilization of color. A comprehensive set of color specific image processing features used for affect and emotion recognition derived from psychological experiments or art-theory is evaluated in the visual and multi-modal domain against contemporary audio content descriptors. The evaluation of the presented color features is based on comparative classification experiments on the newly introduced ‘Music Video Dataset’. Results show that a combination of the modalities can improve non-timbral and rhythmic features but show insignificant effects on high performing audio features.

Alexander Schindler, Andreas Rauber

Cross-Lingual and Discourse

Multi-modal Correlated Centroid Space for Multi-lingual Cross-Modal Retrieval

We present a novel cross-modal retrieval approach where the textual modality is present in different languages. We retrieve semantically similar documents across modalities in different languages using a correlated centroid space unsupervised retrieval (C

2

SUR) approach. C

2

SUR consists of two phases. In the first phase, we extract heterogeneous features from a multi-modal document and project it to a correlated space using kernel canonical correlation analysis (KCCA). In the second phase, correlated space centroids are obtained using clustering to retrieve cross-modal documents with different similarity measures. Experimental results show that C

2

SUR outperforms the existing state-of-the-art English cross-modal retrieval approaches and achieve similar results for other languages.

Aditya Mogadala, Achim Rettinger
A Discourse Search Engine Based on Rhetorical Structure Theory

Representing a document as a bag-of-words and using keywords to retrieve relevant documents have seen a great success in large scale information retrieval systems such as Web search engines. Bag-of-words representation is computationally efficient and with proper term weighting and document ranking methods can perform surprisingly well for a simple document representation method. However, such a representation ignores the rich discourse structure in a document, which could provide useful clues when determining the relevancy of a document to a given user query. We develop the first-ever

Discourse Search Engine

(DSE) that exploits the discourse structure in documents to overcome the limitations associated with the bag-of-words document representations in information retrieval. We use Rhetorical Structure Theory (RST) to represent a document as a discourse tree connecting numerous elementary discourse units (EDUs) via discourse relations. Given a query, our discourse search engine can retrieve not only relevant documents to the query, but also individual statements from those relevant documents that describe some discourse relations to the query. We propose several ranking scores that consider the discourse structure in the documents to measure the relevance of a pair of EDUs to a query. Moreover, we combine those individual relevance scores using a random decision forest (RDF) model to create a single relevance score. Despite the numerous challenges of constructing a rich document representation using the discourse relations in a document, our experimental results show that it improves the F-score in an information retrieval task. We publicly release our manually annotated test collection to expedite future research in discourse-based information retrieval.

Pascal Kuyten, Danushka Bollegala, Bernd Hollerit, Helmut Prendinger, Kiyoharu Aizawa
Knowledge-Based Representation for Transductive Multilingual Document Classification

Multilingual document classification is often addressed by approaches that rely on language-specific resources (e.g., bilingual dictionaries and machine translation tools) to evaluate cross-lingual document similarities. However, the required transformations may alter the original document semantics, raising additional issues to the known difficulty of obtaining high-quality labeled datasets. To overcome such issues we propose a new framework for multilingual document classification under a transductive learning setting. We exploit a large-scale multilingual knowledge base, BabelNet, to support the modeling of different language-written documents into a common conceptual space, without requiring any language translation process. We resort to a state-of-the-art transductive learner to produce the document classification. Results on two real-world multilingual corpora have highlighted the effectiveness of the proposed document model w.r.t. document representations usually involved in multilingual and cross-lingual analysis, and the robustness of the transductive setting for multilingual document classification.

Salvatore Romeo, Dino Ienco, Andrea Tagarelli
Distributional Correspondence Indexing for Cross-Language Text Categorization

Cross-Language Text Categorization (CLTC) aims at producing a classifier for a target language when the only available training examples belong to a different source language. Existing CLTC methods are usually affected by high computational costs, require external linguistic resources, or demand a considerable human annotation effort. This paper presents a simple, yet effective, CLTC method based on projecting features from both source and target languages into a common vector space, by using a computationally lightweight distributional correspondence profile with respect to a small set of pivot terms. Experiments on a popular sentiment classification dataset show that our method performs favorably to state-of-the-art methods, requiring a significantly reduced computational cost and minimal human intervention.

Andrea Esuli, Alejandro Moreo Fernández

Efficiency

Adaptive Caching of Fresh Web Search Results

In this paper, we study the problem of caching search results with a rapid rate of their degradation. We suggest a new caching algorithm, which is based on queries’ frequencies and the predicted staleness of cached results. We also introduce a new performance metric of caching algorithms called

staleness degree

, which measures the level of degradation of a cached result. In the case of frequently changing search results, this metric is more sensitive to those changes than the previously used

stale traffic ratio

.

Liudmila Ostroumova Prokhorenkova, Yury Ustinovskiy, Egor Samosvat, Damien Lefortier, Pavel Serdyukov
Approximating Weighted Hamming Distance by Probabilistic Selection for Multiple Hash Tables

With the large growth of photos on the Internet, the need for large-scale, real-time image retrieval systems is emerging. Current state-of-the-art approaches in these systems leverage binary features (e.g., hashed codes) for indexing and matching. They usually (1) index data with multiple hash tables to maximize recall, and (2) utilize weighted hamming distance (WHD) to accurately measure the hamming distance between data points. However, these methods pose several challenges. The first is in determining suitable index keys for multiple hash tables. The second is that the advantage of bitwise operations for binary features is offset by the use of floating point operations in calculating WHD. To address these challenges, we propose a probabilistic selection model that considers the weights of hash bits in constructing hash tables, and that can be used to approximate WHD (AWHD). Moreover, it is a general method that can be applied to any binary features with predefined (learned) weights. Experiments show a time savings of up to 95% when calculating AWHD compared to WHD while still achieving high retrieval accuracy.

Chiang-Yu Tsai, Yin-Hsi Kuo, Winston H. Hsu
Graph Regularised Hashing

Hashing has witnessed an increase in popularity over the past few years due to the promise of compact encoding and fast query time. In order to be effective hashing methods must maximally preserve the similarity between the data points in the underlying binary representation. The current best performing hashing techniques have utilised supervision. In this paper we propose a two-step iterative scheme, Graph Regularised Hashing (GRH), for incrementally adjusting the positioning of the hashing hypersurfaces to better conform to the supervisory signal: in the first step the binary bits are regularised using a data similarity graph so that similar data points receive similar bits. In the second step the regularised hashcodes form targets for a set of binary classifiers which shift the position of each hypersurface so as to separate opposite bits with maximum margin. GRH exhibits superior retrieval accuracy to competing hashing methods.

Sean Moran, Victor Lavrenko
Approximate Nearest-Neighbour Search with Inverted Signature Slice Lists

In this paper we present an original approach for finding approximate nearest neighbours in collections of locality-sensitive hashes. The paper demonstrates that this approach makes high-performance nearest-neighbour searching feasible on Web-scale collections and commodity hardware with minimal degradation in search quality.

Timothy Chappell, Shlomo Geva, Guido Zuccon

Evaluation

A Discriminative Approach to Predicting Assessor Accuracy

Modeling changes in individual relevance assessor performance over time offers new ways to improve the quality of relevance judgments, such as by dynamically routing judging tasks to assessors more likely to produce reliable judgments. Whereas prior assessor models have typically adopted a single generative approach, we formulate a discriminative, flexible feature-based model. This allows us to combine multiple generative models and integrate additional behavioral evidence, enabling better adaptation to temporal variance in assessor accuracy. Experiments using crowd assessor data from the NIST TREC 2011 Crowdsourcing Track show our model improves prediction accuracy by 26-36% across assessors, enabling 29-47% improved quality of relevance judgments to be collected at 17-45% lower cost.

Hyun Joon Jung, Matthew Lease
WHOSE – A Tool for Whole-Session Analysis in IIR

One of the main challenges in Interactive Information Retrieval (IIR) evaluation is the development and application of re-usable tools that allow researchers to analyze search behavior of real users in different environments and different domains, but with comparable results. Furthermore, IIR recently focuses more on the analysis of whole sessions, which includes all user interactions that are carried out within a session but also across several sessions by the same user. Some frameworks have already been proposed for the evaluation of controlled experiments in IIR, but yet no framework is available for interactive evaluation of search behavior from real-world information retrieval (IR) systems with real users. In this paper we present a framework for whole-session evaluation that can also utilize these uncontrolled data sets. The logging component can easily be integrated into real-world IR systems for generating and analyzing new log data. Furthermore, due to a supplementary mapping it is also possible to analyze existing log data. For every IR system different actions and filters can be defined. This allows system operators and researchers to use the framework for the analysis of user search behavior in their IR systems and to compare it with others. Using a graphical user interface they have the possibility to interactively explore the data set from a broad overview down to individual sessions.

Daniel Hienert, Wilko van Hoek, Alina Weber, Dagmar Kern
Looking for Books in Social Media: An Analysis of Complex Search Requests

Real-world information needs are generally complex, yet almost all research focuses on either relatively simple search based on queries or recommendation based on profiles. It is difficult to gain insight into complex information needs from observational studies with existing systems; potentially complex needs are obscured by the systems’ limitations. In this paper we study explicit information requests in social media, focusing on the rich area of social book search. We analyse a large set of annotated book requests from the LibraryThing discussion forums. We investigate 1) the comprehensiveness of book requests on the forums, 2) what relevance aspects are expressed in real-world book search requests, and 3) how different types of search topics are related to types of users, human recommendations, and results returned by retrieval and recommender systems. We find that book search requests combine search and recommendation aspects in intricate ways that require more than only traditional search or (hybrid) recommendation approaches.

Marijn Koolen, Toine Bogers, Antal van den Bosch, Jaap Kamps
How Do Gain and Discount Functions Affect the Correlation between DCG and User Satisfaction?

We present an empirical analysis of the effect that the gain and discount functions have in the correlation between

DCG

and user satisfaction. Through a large user study we estimate the relationship between satisfaction and the effectiveness computed with a test collection. In particular, we estimate the probabilities that users find a system satisfactory given a

DCG

score, and that they agree with a difference in

DCG

as to which of two systems is more satisfactory. We study this relationship for 36 combinations of gain and discount, and find that a linear gain and a constant discount are best correlated with user satisfaction.

Julián Urbano, Mónica Marrero
Different Rankers on Different Subcollections

Recent work has shown that when documents in a TREC ad hoc collection are partitioned, different rankers will perform optimally on different partitions. This result suggests that choosing different highly effective rankers for each partition and merging the results, should be able to improve overall effectiveness. Analyzing results from a novel oracle merge process, we demonstrate that this is not the case: selecting the best performing ranker on each subcollection is very unlikely to outperform just using a single best ranker across the whole collection.

Timothy Jones, Falk Scholer, Andrew Turpin, Stefano Mizzaro, Mark Sanderson
Retrievability and Retrieval Bias: A Comparison of Inequality Measures

The disposition of a retrieval system to favour certain documents over others can be quantified using retrievability. Typically, the Gini Coefficient has been used to quantify the level of bias a system imposes across the collection with a single value. However, numerous inequality measures have been proposed that may provide different insights into retrievability bias. In this paper, we examine 8 inequality measures, and see the changes in the estimation of bias on 3 standard retrieval models across their respective parameter spaces. We find that most of the measures agree with each other, and that the parameter setting that minimise the inequality according to each measure is similar. This work suggests that the standard inequality measure, the Gini Coefficient, provides similar information regarding the bias. However, we find that Palma index and 20:20 Ratio show the greatest differences and may be useful to provide a different perspective when ranking systems according to bias.

Colin Wilkie, Leif Azzopardi
Judging Relevance Using Magnitude Estimation

Magnitude estimation is a psychophysical scaling technique whereby numbers are assigned to stimuli to reflect the ratios of their perceived intensity. We report on a crowdsourcing experiment aimed at understanding if magnitude estimation can be used to gather reliable relevance judgements for documents, as is commonly required for test collection-based evaluation of information retrieval systems. Results on a small dataset show that: (i) magnitude estimation can produce relevance rankings that are consistent with more classical ordinal judgements; (ii) both an upper-bounded and an unbounded scale can be used effectively, though with some differences; (iii) the presentation order of the documents being judged has a limited effect, if any; and (iv) only a small number repeat judgements are required to obtain reliable magnitude estimation scores.

Eddy Maddalena, Stefano Mizzaro, Falk Scholer, Andrew Turpin

Event Mining and Summarisation

Retrieving Time from Scanned Books

While millions of scanned books have become available in recent years, this vast collection of data remains under-utilized. Book search is often limited to summaries or metadata, and connecting information to primary sources can be a challenge.

Even though digital books provide rich historical information on all subjects, leveraging this data is difficult. To explore how we can access this historical information, we study the problem of identifying relevant times for a given query. That is - given a user query or a description of an event, we attempt to use historical sources to locate that event in time.

We use state-of-the-art NLP tools to identify and extract mentions of times present in our corpus, and then propose a number of models for organizing this historical information.

Since no truth data is readily available for our task, we automatically derive dated event descriptions from Wikipedia, leveraging the both the wisdom of the crowd and the wisdom of experts. Using 15,000 events from between the years 1000 and 1925 as queries, we evaluate our approach on a collection of 50,000 books from the Internet Archive. We discuss the tradeoffs between context, retrieval performance, and efficiency.

John Foley, James Allan
A Noise-Filtering Approach for Spatio-temporal Event Detection in Social Media

We propose an iterative spatial-temporal mining algorithm for identifying and extracting events from social media. One of the key aspects of the proposed algorithm is a signal processing-inspired approach for viewing spatial-temporal term occurrences as signals, analyzing the noise contained in the signals, and applying noise filters to improve the quality of event extraction from these signals. The iterative event mining algorithm alternately clusters terms and then generates new filters based on the results of clustering. Through experiments on ten Twitter data sets, we find improved event retrieval compared to two baselines.

Yuan Liang, James Caverlee, Cheng Cao
Timeline Summarization from Relevant Headlines

Timeline summaries are an effective way for helping newspaper readers to keep track of long-lasting news stories, such as the Egypt revolution. A good timeline summary provides a concise description of only the main events, while maintaining good understandability. As manual construction of timelines is very time-consuming, there is a need for automatic approaches. However, automatic selection of relevant events is challenging due to the large amount of news articles published every day. Furthermore, current state-of-the-art systems produce summaries that are suboptimal in terms of relevance and understandability. We present a new approach that exploits the headlines of online news articles instead of the articles’ full text. The quantitative and qualitative results from our user studies confirm that our method outperforms state-of-the-art system in these aspects.

Giang Tran, Mohammad Alrifai, Eelco Herder

Information Extraction

A Self-training CRF Method for Recognizing Product Model Mentions in Web Forums

Important applications in product opinion mining such as opinion summarization and aspect extraction require the recognition of product mentions as a basic task. In the case of consumer electronic products, Web forums are important and popular sources of valuable opinions. Forum users often refer to products by means of their model numbers. In a post a user would employ model numbers, e.g., “BDP-93” and “BDP-103”, to compare Blu-ray players. To properly handle opinions in such a scenario, applications need to correctly recognize products by their model numbers. Forums, however, are informal and many challenges for undertaking automatic product model recognition arise, since users mention model numbers in many different ways. In this paper we propose the use of a self-training strategy to learn a suitable CRF model for this task. Our method requires only a set of seed model numbers. Experiments in four different settings demonstrate that our method, by leveraging unlabeled sentences from the target forum, yielded an improvement of 19% in recall and 12% in F-measure over a supervised CRF model.

Henry S. Vieira, Altigran S. da Silva, Marco Cristo, Edleno S. de Moura
Information Extraction Grammars

Formal grammars are extensively used to represent patterns in Information Extraction, but they do not permit the use of several types of features. Finite-state transducers, which are based on regular grammars, solve this issue, but they have other disadvantages such as the lack of expressiveness and the rigid matching priority. As an alternative, we propose Information Extraction Grammars. This model, supported on Language Theory, does permit the use of several features, solves some of the problems of finite-state transducers, and has the same computational complexity in recognition as formal grammars, whether they describe regular or context-free languages.

Mónica Marrero, Julián Urbano
Target-Based Topic Model for Problem Phrase Extraction

Discovering problems from reviews can give a company a precise view on strong and weak points of products. In this paper we present a probabilistic graphical model which aims to extract problem words and product targets from online reviews. The model extends standard LDA to discover both problem words and targets. The proposed model has two conditionally independent variables and learns two distributions over targets and over text indicators, associated with both problem labels and topics. The algorithm achieves a better performance in comparison to standard LDA in terms of the likelihood of a held-out test set.

Elena Tutubalina
On Identifying Phrases Using Collection Statistics

The use of phrases as part of similarity computations can enhance search effectiveness. But the gain comes at a cost, either in terms of index size, if all word-tuples are treated as queryable objects; or in terms of processing time, if postings lists for phrases are constructed at query time. There is also a lack of clarity as to which phrases are “interesting”, in the sense of capturing useful information. Here we explore several techniques for recognizing phrases using statistics of large-scale collections, and evaluate their quality.

Simon Gog, Alistair Moffat, Matthias Petri
MIST: Top-k Approximate Sub-string Mining Using Triplet Statistical Significance

Efficient extraction of strings or sub-strings similar to an input query string forms a necessity in applications like

instant search, record linkage

, etc., where the similarity between two strings is usually quantified by

edit

distance. This paper proposes a novel top-k approximate sub-string matching algorithm,

MIST

, for a given query, based on

Chi-squared

statistical significance of string triplets, thereby avoiding expensive edit distance computation. Experiments with real-life data validate the run-time effectiveness and accuracy of our algorithm.

Sourav Dutta

Recommender Systems

Active Learning Applied to Rating Elicitation for Incentive Purposes

Active Learning (AL) has been applied to Recommender Systems so as to elicit ratings from new users, namely

Rating Elicitation for Cold Start Purposes

. In most e-commerce systems, it is common to have the purchase information, but not the preference information, i.e., users rarely evaluate the items they purchased. In order to acquire these ratings, the e-commerce usually sends annoying notifications asking users to evaluate their purchases. The system assumes that every rating has the same impact on its overall performance and, therefore, every rating is worth the same effort to acquire. However, this might not be true and, in that case, some ratings are worth more effort than others. For instance, if the e-commerce knew beforehand which ratings will result in the greatest improvement of the overall system’s performance, it would be probably willing to reward users in exchange for these ratings. In other words, rating elicitation can go together with incentive mechanisms, namely

Rating Elicitation for Incentive Purposes

. Like in cold start cases, AL strategies could be easily applied to

Rating Elicitation for Incentive Purposes

in order to select items for evaluation. Therefore, in this work, we conduct a extensive benchmark, concerning incentives, with the main AL strategies in the literature, comparing them with respect to the overall system’s performance (MAE). Furthermore, we propose a novel AL strategy that creates a

k

-dimensional vector space, called

item space

, and selects items according to the density in this space. The

density-based strategy

has outperformed all others while making weak assumptions about the data set, which indicates that it can be an efficient default strategy for real applications.

Marden B. Pasinato, Carlos E. Mello, Geraldo Zimbrão
Entity-Centric Stream Filtering and Ranking: Filtering and Unfilterable Documents

Cumulative Citation Recommendation (CCR) is defined as: given a stream of documents on one hand and Knowledge Base (KB) entities on the other, filter, rank and recommend citation-worthy documents. The pipeline encountered in systems that approach this problem involves four stages: filtering, classification, ranking (or scoring), and evaluation. Filtering is only an initial step that reduces the web-scale corpus into a working set of documents more manageable for the subsequent stages. Nevertheless, this step has a large impact on the recall that can be attained maximally. This study analyzes in-depth the main factors that affect recall in the filtering stage. We investigate the impact of choices for corpus cleansing, entity profile construction, entity type, document type, and relevance grade. Because failing on recall in this first step of the pipeline cannot be repaired later on, we identify and characterize the citation-worthy documents that do not pass the filtering stage by examining their contents.

Gebrekirstos G. Gebremeskel, Arjen P. de Vries
Generating Music Playlists with Hierarchical Clustering and Q-Learning

Automatically generating playlists of music is an interesting area of research at present, with many online services now offering “radio channels” which attempt to play through sets of tracks a user is likely to enjoy. However, these tend to act as recommendation services, introducing a user to new music they might wish to listen to. Far less effort has gone into researching tools which learn an individual user’s tastes across their existing library of music and attempt to produce playlists fitting to their current mood. This paper describes a system that uses reinforcement learning over hierarchically-clustered sets of songs to learn a user’s listening preferences. Features extracted from the audio are also used as part of this process, allowing the software to create cohesive lists of tracks on demand or to simply play continuously from a given starting track. This new system is shown to perform well in a small user study, greatly reducing the relative number of songs that a user skips.

James King, Vaiva Imbrasaitė
Time-Sensitive Collaborative Filtering through Adaptive Matrix Completion

Real-world Recommender Systems are often facing drifts in users’ preferences and shifts in items’ perception or use. Traditional state-of-the-art methods based on matrix factorization are not originally designed to cope with these dynamic and time-varying effects and, indeed, could perform rather poorly if there is no ”reactive”, on-line model update. In this paper, we propose a new incremental matrix completion method, that automatically allows the factors related to both users and items to adapt “on-line” to such drifts. Model updates are based on a temporal regularization, ensuring smoothness and consistency over time, while leading to very efficient, easily scalable algebraic computations. Several experiments on real-world data sets show that these adaptation mechanisms significantly improve the quality of recommendations compared to the static setting and other standard on-line adaptive algorithms.

Julien Gaillard, Jean-Michel Renders
Toward the New Item Problem: Context-Enhanced Event Recommendation in Event-Based Social Networks

Increasing popularity of event-based social networks (EBSNs) calls for the developments in event recommendation techniques. However, events are uniquely different from conventional recommended items because every event to be recommended is a new item. Traditional recommendation methods such as collaborative filtering techniques, which rely on users’ rating histories, are not suitable for this problem. In this paper, we propose a novel context-enhanced event recommendation method, which exploits the rich context in EBSNs by unifying content, social and geographical information. Experiments on a real-world dataset show promising results of the proposed method.

Zhenhua Wang, Ping He, Lidan Shou, Ke Chen, Sai Wu, Gang Chen
On the Influence of User Characteristics on Music Recommendation Algorithms

We investigate a range of

music recommendation algorithm

combinations,

score aggregation functions

,

normalization techniques

, and

late fusion techniques

on approximately 200 million listening events collected through

Last.fm

. The overall goal is to identify superior combinations for the task of artist recommendation. Hypothesizing that user characteristics influence performance on these algorithmic combinations, we consider specific user groups determined by age, gender, country, and preferred genre. Overall, we find that the performance of music recommendation algorithms highly depends on user characteristics.

Markus Schedl, David Hauger, Katayoun Farrahi, Marko Tkalčič
A Study of Smoothing Methods for Relevance-Based Language Modelling of Recommender Systems

Language Models have been traditionally used in several fields like speech recognition or document retrieval. It was only recently when their use was extended to collaborative Recommender Systems. In this field, a Language Model is estimated for each user based on the probabilities of the items. A central issue in the estimation of such Language Model is smoothing, i.e., how to adjust the maximum likelihood estimator to compensate for rating sparsity. This work is devoted to explore how the classical smoothing approaches (Absolute Discounting, Jelinek-Mercer and Dirichlet priors) perform in the recommender task. We tested the different methods under the recently presented Relevance-Based Language Models for collaborative filtering, and compared how the smoothing techniques behave in terms of precision and stability. We found that Absolute Discounting is practically insensitive to the parameter value being an almost parameter-free method and, at the same time, its performance is similar to Jelinek-Mercer and Dirichlet priors.

Daniel Valcarce, Javier Parapar, Álvaro Barreiro
The Power of Contextual Suggestion

The evaluation process for the TREC Contextual Suggestion Track consumes substantial time and resources, taking place over several weeks and costing thousands of dollars in assessor remuneration. The track evaluates a point-of-interest recommendation task, using crowdsourced workers as a source of user profiles and judgments. Given the cost of assessment, we examine track data to provide guidance for future experiments on this task, particularly with respect to the number of assessors required. To provide insight, we first consider the potential impact of fewer assessors on the TREC 2013 experiments. We then provide recommendations for future experiments. Our goal is to minimize costs, while still meeting the requirements of those experiments.

Adriel Dean-Hall, Charles L. A. Clarke

Semantic and Graph-Based Models

Exploiting Semantic Annotations for Domain-Specific Entity Search

Searches on the Web of Data go beyond the retrieval of textual Web sites, and shifts the focus of search engines towards domain-specific entity data, for which the units of retrieval are domain-specific entities instead of textual documents. We study the effect of using semantic annotation in combination with a knowledge graph for domain-specific entity search. Different reasoning, indexing and query-expansion strategies are compared to study their effect in improving the effectiveness of entity search. The results show that the use of semantic annotation and background knowledge can significantly improve the retrieval effectiveness, but require graph structures to be exploited beyond standard reasoning. Our findings can help to develop more effective information and data retrieval methods that can enhance the performance of semantic search engines that operate with structured domain-specific Web data.

Tuukka Ruotsalo, Eero Hyvönen
Reachability Analysis of Graph Modelled Collections

This paper is concerned with potential recall in multimodal information retrieval in graph-based models. We provide a framework to leverage individuality and combination of features of different modalities through our formulation of faceted search. We employ a potential recall analysis on a test collection to gain insight on the corpus and further highlight the role of multiple facets, relations between the objects, and semantic links in recall improvement. We conduct the experiments on a multimodal dataset containing approximately 400,000 documents and images. We demonstrate that leveraging multiple facets increases most notably the recall for very hard topics by up to 316%.

Serwah Sabetghadam, Mihai Lupu, Ralf Bierig, Andreas Rauber
Main Core Retention on Graph-of-Words for Single-Document Keyword Extraction

In this paper, we apply the concept of

k-core

on the

graph-of-words

representation of text for single-document keyword extraction, retaining only the nodes from the main core as representative terms. This approach takes better into account proximity between keywords and variability in the number of extracted keywords through the selection of more

cohesive

subsets of nodes than with existing graph-based approaches solely based on

centrality

. Experiments on two standard datasets show statistically significant improvements in F1-score and AUC of precision/recall curve compared to baseline results, in particular when weighting the edges of the graph with the number of co-occurrences. To the best of our knowledge, this is the first application of graph degeneracy to natural language processing and information retrieval.

François Rousseau, Michalis Vazirgiannis
Entity Linking for Web Search Queries

We consider the problem of linking web search queries to entities from a knowledge base such as Wikipedia. Such linking enables converting a user’s web search session to a footprint in the knowledge base that could be used to enrich the user profile. Traditional methods for entity linking have been directed towards finding entity mentions in text documents such as news reports, each of which are possibly linked to multiple entities enabling the usage of measures like entity set coherence. Since web search queries are very small text fragments, such criteria that rely on existence of a multitude of mentions do not work too well on them. We propose a three-phase method for linking web search queries to wikipedia entities. The first phase does IR-style scoring of entities against the search query to narrow down to a subset of entities that are expanded using hyperlink information in the second phase to a larger set. Lastly, we use a graph traversal approach to identify the top entities to link the query to. Through an empirical evaluation on real-world web search queries, we illustrate that our methods significantly enhance the linking accuracy over state-of-the-art methods.

Deepak P., Sayan Ranu, Prithu Banerjee, Sameep Mehta

Sentiment and Opinion

Beyond Sentiment Analysis: Mining Defects and Improvements from Customer Feedback

Customer satisfaction is considered as one the key performance indicators within businesses. In the current competitive marketplace where businesses compete for customers, managing customer satisfaction is very essential. One of the important sources of customer feedback is product reviews. Sentiment analysis on customer reviews has been a very hot topic in the last decade. While early works were mainly focused on identifying the positiveness and negativeness of reviews, later research tries to extract more detailed information by estimating the sentiment score of each product aspect/feature. In this work, we go beyond sentiment analysis by extracting actionable information from customer feedback. We call a piece of information actionable (in the sense of customer satisfaction) if the business can use it to improve its product. We propose a technique to automatically extract defects (problem/issue/bug reports) and improvements (modification/upgrade/enhancement requests) from customer feedback. We also propose a method for summarizing extracted defects and improvements. Experimental results showed that without any manual annotation cost, the proposed semi-supervised technique can achieve comparable accuracy to a fully supervised model in identifying defects and improvements.

Samaneh Moghaddam
Measuring User Influence, Susceptibility and Cynicalness in Sentiment Diffusion

Diffusion in social networks is an important research topic lately due to massive amount of information shared on social media and Web. As information diffuses, users express sentiments which can affect the sentiments of others. In this paper, we analyze how users reinforce or modify sentiment of one another based on a set of inter-dependent latent user factors as they are engaged in diffusion of event information. We introduce these sentiment-based latent user factors, namely

influence

,

susceptibility

and

cynicalness

. We also propose the

ISC model

to relate the three factors together and develop an iterative computation approach to derive them simultaneously. We evaluate the ISC model by conducting experiments on two separate sets of Twitter data collected from two real world events. The experiments show the top influential users tend to stay consistently influential while susceptibility and cynicalness of users could changed significantly across events.

Roy Ka-Wei Lee, Ee-Peng Lim
Automated Controversy Detection on the Web

Alerting users about controversial search results can encourage critical literacy, promote healthy civic discourse and counteract the “filter bubble” effect, and therefore would be a useful feature in a search engine or browser extension. In order to implement such a feature, however, the binary classification task of determining which topics or webpages are controversial must be solved. Earlier work described a proof of concept using a supervised nearest neighbor classifier with access to an oracle of manually annotated Wikipedia articles. This paper generalizes and extends that concept by taking the human out of the loop, leveraging the rich metadata available in Wikipedia articles in a weakly-supervised classification approach. The new technique we present allows the nearest neighbor approach to be extended on a much larger scale and to other datasets. The results improve substantially over naive baselines and are nearly identical to the oracle-reliant approach by standard measures of

F

1

,

F

0.5

, and accuracy. Finally, we discuss implications of solving this problem as part of a broader subject of interest to the IR community, and suggest several avenues for further exploration in this exciting new space.

Shiri Dori-Hacohen, James Allan
Learning Sentiment Based Ranked-Lexicons for Opinion Retrieval

In contrast to classic search where users look for factual information, opinion retrieval aims at finding and ranking subjective information. A major challenge of opinion retrieval is the informal nature of user reviews and the domain specific jargon used to describe the targeted item. In this paper, we present an automatic method to learn a space model for opinion retrieval. Our approach is a generative model that learns sentiment word distributions by embedding multi-level relevance judgments in the estimation of the model parameters. In addition to sentiment word distributions, we also infer domain specific named entities that due to their popularity become a sentiment reference in their domain (e.g. name of a movie, “Batman” or specific hotel items, “carpet”). This contrasts with previous approaches that learn a word’s polarity or aspect-based polarity. Opinion retrieval experiments were done in two large datasets with over 703.000 movie reviews and 189.000 hotel reviews. The proposed method achieved better, or equal, performance than the benchmark baselines.

Filipa Peleja, João Magalhães
Topic-Dependent Sentiment Classification on Twitter

In this paper, we investigate how discovering the topic dicussed in a tweet can be used to improve its sentiment classification. In particular, a classifier is introduced consisting of a topic-specific classifier, which is only trained on tweets of the same topic of the given tweet, and a generic classifier, which is trained on all the tweets in the training set. The set of considered topics is obtained by clustering the hashtags that occur in the training set. A classifier is then used to estimate the topic of a previously unseen tweet. Experimental results based on a public Twitter dataset show that considering topic-specific sentiment classifiers indeed leads to an improvement.

Steven Van Canneyt, Nathan Claeys, Bart Dhoedt
Learning Higher-Level Features with Convolutional Restricted Boltzmann Machines for Sentiment Analysis

In recent years, learning word vector representations has attracted much interest in Natural Language Processing. Word representations or embeddings learned using unsupervised methods help addressing the problem of traditional bag-of-word approaches which fail to capture contextual semantics. In this paper we go beyond the vector representations at the word level and propose a novel framework that learns higher-level feature representations of

n

-grams, phrases and sentences using a deep neural network built from stacked Convolutional Restricted Boltzmann Machines (CRBMs). These representations have been shown to map syntactically and semantically related

n

-grams to closeby locations in the hidden feature space. We have experimented to additionally incorporate these higher-level features into supervised classifier training for two sentiment analysis tasks: subjectivity classification and sentiment classification. Our results have demonstrated the success of our proposed framework with 4% improvement in accuracy observed for subjectivity classification and improved the results achieved for sentiment classification over models trained without our higher level features.

Trung Huynh, Yulan He, Stefan Rüger

Social Media

Towards Deep Semantic Analysis of Hashtags

Hashtags are semantico-syntactic constructs used across various social networking and microblogging platforms to enable users to start a topic specific discussion or classify a post into a desired category. Segmenting and linking the entities present within the hashtags could therefore help in better understanding and extraction of information shared across the social media. However, due to lack of space delimiters in the hashtags (e.g

#nsavssnowden

), the segmentation of hashtags into constituent entities (

“NSA”

and

“Edward Snowden”

in this case) is not a trivial task. Most of the current state-of-the-art social media analytics systems like Sentiment Analysis and Entity Linking tend to either ignore hashtags, or treat them as a single word. In this paper, we present a context aware approach to segment and link entities in the hashtags to a knowledge base (KB) entry, based on the context within the tweet. Our approach segments and links the entities in hashtags such that the coherence between hashtag semantics and the tweet is maximized. To the best of our knowledge, no existing study addresses the issue of linking entities in hashtags for extracting semantic information. We evaluate our method on two different datasets, and demonstrate the effectiveness of our technique in improving the overall entity linking in tweets via additional semantic information provided by segmenting and linking entities in a hashtag.

Piyush Bansal, Romil Bansal, Vasudeva Varma
Chalk and Cheese in Twitter: Discriminating Personal and Organization Accounts

Social media have been popular not only for individuals to share contents, but also for organizations to engage users and spread information. Given the trait differences between personal and organization accounts, the ability to distinguish between the two account types is important for developing better search/recommendation engines, marketing strategies, and information dissemination platforms. However, such task is non-trivial and has not been well studied thus far. In this paper, we present a new generic framework for classifying personal and organization accounts, based upon which comprehensive and systematic investigation on a rich variety of content, social, and temporal features can be carried out. In addition to generic feature transformation pipelines, the framework features a gradient boosting classifier that is accurate/robust and facilitates good data understanding such as the importance of different features. We demonstrate the efficacy of our approach through extensive experiments on Twitter data from Singapore, by which we discover several discriminative content, social, and temporal features.

Richard Jayadi Oentaryo, Jia-Wei Low, Ee-Peng Lim
Handling Topic Drift for Topic Tracking in Microblogs

Microblogs such as Twitter have become an increasingly popular source of real-time information, where users may demand tracking the development of the topics they are interested in. We approach the problem by adapting an effective classifier based on Binomial Logistic Regression, which has shown to be state-of-art in traditional news filtering. In our adaptation, we utilize the link information to enrich tweets’ content and the social symbols to help estimate tweets’ quality. Moreover, we find that topics are very likely to drift in microblogs as a result of the information redundancy and topic divergence of tweets. To handle the topic drift over time, we adopt a cluster-based subtopic detection algorithm to help identify whether drift occurs and the detected subtopic is regarded as the current focus of the general topic to adjust topic drift. Experimental results on the corpus of TREC2012 Microblog Track show that our approach achieves remarkable performance in both T11SU and F-0.5 metrics.

Yue Fei, Yihong Hong, Jianwu Yang
Detecting Location-Centric Communities Using Social-Spatial Links with Temporal Constraints

Community detection on social networks typically aims to cluster users into different communities based on their social links. The increasing popularity of Location-based Social Networks offers the opportunity to augment these social links with spatial information, for detecting location-centric communities that frequently visit similar places. Such location-centric communities are important to companies for their location-based and mobile advertising efforts. We propose an approach to detect location-centric communities by augmenting social links with both spatial and temporal information, and demonstrate its effectiveness using two Foursquare datasets. In addition, we study the effects of social, spatial and temporal information on communities and observe the following: (i) augmenting social links with spatial and temporal information results in location-centric communities with high levels of check-in and locality similarity; (ii) using spatial and temporal information without social links however leads to communities that are less location-centric.

Kwan Hui Lim, Jeffrey Chan, Christopher Leckie, Shanika Karunasekera
Using Subjectivity Analysis to Improve Thread Retrieval in Online Forums

Finding relevant threads in online forums is challenging for internet users due to a large number of threads discussing lexically similar topics but differing in the type of information they contain (e.g., opinions, facts, emotions). Search facilities need to take into account the match between users’

intent

and the type of information contained in threads in addition to the lexical match between user queries and threads. We use intent match by incorporating subjectivity match between user queries and threads into a state-of-the-art forum thread retrieval model. Experimental results show that subjectivity match improves retrieval performance by over 10% as measured by different metrics.

Prakhar Biyani, Sumit Bhatia, Cornelia Caragea, Prasenjit Mitra
Selecting Training Data for Learning-Based Twitter Search

Learning to rank is widely applied as an effective weighting scheme for Twitter search. As most learning to rank approaches are based on supervised learning, their effectiveness can be affected by the inclusion of low-quality training data. In this paper, we propose a simple and effective approach that learns a query quality classifier, which automatically selects the training data on a per-query basis. Experimental results on the TREC Tweets13 collection show that our proposed approach outperforms the conventional application of learning to rank that learns the ranking model on all training queries available.

Dongxing Li, Ben He, Tiejian Luo, Xin Zhang
Content-Based Similarity of Twitter Users

We propose a method for computing user similarity based on a network representing the semantic relationships between the words occurring in the same tweet and the related topics. We use such specially crafted network to define several user profiles to be compared with cosine similarity. We also describe an initial experimental activity to study the effectiveness on a limited dataset.

Stefano Mizzaro, Marco Pavan, Ivan Scagnetto

Specific Search Tasks

A Corpus of Realistic Known-Item Topics with Associated Web Pages in the ClueWeb09

Known-item finding is the task of finding a previously seen item. Such items may range from visited websites to received emails but also read books or seen movies. Most of the research done on known-item finding focuses on web or email retrieval and is done on proprietary corpora not publically available. Public corpora usually are rather artificial as they contain automatically generated known-item queries or queries formulated by humans actually seeing the known-item.

In this paper, we study original known-item information needs mined from questions at the popular Yahoo!Answers Q&A service. By carefully sampling only questions with a related known-item web page in the ClueWeb09 corpus, we provide an environment for repeatable realistic studies of known-item information needs and how a retrieval system could react. In particular, our own study sheds some first light on false memories within the known-item questions articulated by the users. Our main finding shows that false memories often relate to mixed up names. This indicates that search engines not retrieving any result on a known-item query could try to avoid returning a zero-result list by ignoring or replacing names in respective query situations.

Our publically available corpus of 2,755 known-item questions mapped to web pages in the ClueWeb09 includes 240 questions with annotated and corrected false memories.

Matthias Hagen, Daniel Wägner, Benno Stein
Designing States, Actions, and Rewards for Using POMDP in Session Search

Session search is an information retrieval task that involves a sequence of queries for a complex information need. It is characterized by rich user-system interactions and temporal dependency between queries and between consecutive user behaviors. Recent efforts have been made in modeling session search using the Partially Observable Markov Decision Process (POMDP). To best utilize the POMDP model, it is crucial to find suitable definitions for its fundamental elements –

States

,

Actions

and

Rewards

. This paper investigates the best ways to design the states, actions, and rewards within a POMDP framework. We lay out available design options of these major components based on a variety of related work and experiment on combinations of these options over the TREC 2012 & 2013 Session datasets. We report our findings based on two evaluation aspects, retrieval accuracy and efficiency, and recommend practical design choices for using POMDP in session search.

Jiyun Luo, Sicong Zhang, Xuchu Dong, Hui Yang
Retrieving Medical Literature for Clinical Decision Support

Keeping current given the vast volume of medical literature published yearly poses a serious challenge for medical professionals. Thus, interest in systems that aid physicians in making clinical decisions is intensifying. A task of Clinical Decision Support (CDS) systems is retrieving highly relevant medical literature that could help healthcare professionals in formulating diagnoses or determining treatments. This search task is atypical as the queries are medical case reports, which differs in terms of size and structure from queries in other, more common search tasks. We apply query reformulation techniques to address literature search based on case reports. The proposed system achieves a statistically significant improvement over the baseline (29% – 32%) and the state-of-the-art (12% – 59%).

Luca Soldaini, Arman Cohan, Andrew Yates, Nazli Goharian, Ophir Frieder
PatNet: A Lexical Database for the Patent Domain

In the patent domain Boolean retrieval is particularly common. But despite the importance of Boolean retrieval, there is not much work in current research assisting patent experts in formulating such queries. Currently, these approaches are mostly limited to the usage of standard dictionaries, such as WordNet, to provide synonymous expansion terms. In this paper we present a new approach to support patent searchers in the query generation process. We extract a lexical database, which we call PatNet, from real query sessions of patent examiners of the United Patent and Trademark Office (USPTO). PatNet provides several types of synonym relations. Further, we apply several query term expansion strategies to improve the precision measures of PatNet in suggesting expansion terms. Experiments based on real query sessions of patent examiners show a drastic increase in precision, when considering support of the synonym relations, US patent classes, and word senses.

Wolfgang Tannebaum, Andreas Rauber
Learning to Rank Aggregated Answers for Crossword Puzzles

In this paper, we study methods for improving the quality of automatic extraction of answer candidates for automatic resolution of crossword puzzles (CPs), which we set as a new IR task. Since automatic systems use databases containing previously solved CPs, we define a new effective approach consisting in querying the database (DB) with a search engine for clues that are similar to the target one. We rerank the obtained clue list using state-of-the-art methods and go beyond them by defining new learning to rank approaches for aggregating similar clues associated with the same answer.

Massimo Nicosia, Gianni Barlacchi, Alessandro Moschitti
Diagnose This If You Can
On the Effectiveness of Search Engines in Finding Medical Self-diagnosis Information

An increasing amount of people seek health advice on the web using search engines; this poses challenging problems for current search technologies. In this paper we report an initial study of the effectiveness of current search engines in retrieving relevant information for diagnostic medical circumlocutory queries, i.e., queries that are issued by people seeking information about their health condition using a description of the symptoms they observes (e.g. hives all over body) rather than the medical term (e.g. urticaria). This type of queries frequently happens when people are unfamiliar with a domain or language and they are common among health information seekers attempting to self-diagnose or self-treat themselves. Our analysis reveals that current search engines are not equipped to effectively satisfy such information needs; this can have potential harmful outcomes on people’s health. Our results advocate for more research in developing information retrieval methods to support such complex information needs.

Guido Zuccon, Bevan Koopman, João Palotti
Sources of Evidence for Automatic Indexing of Political Texts

Political texts on the Web, documenting laws and policies and the process leading to them, are of key importance to government, industry, and every individual citizen. Yet access to such texts is difficult due to the ever increasing volume and complexity of the content, prompting the need for indexing or annotating them with a common controlled vocabulary or ontology. In this paper, we investigate the effectiveness of different sources of evidence—such as the labeled training data, textual glosses of descriptor terms, and the thesaurus structure—for automatically indexing political texts. Our main findings are the following. First, using a learning to rank (LTR) approach integrating all features, we observe significantly better performance than previous systems. Second, the analysis of feature weights reveals the relative importance of various sources of evidence, also giving insight in the underlying classification problem. Third, a lean-and-mean system using only four features (text, title, descriptor glosses, descriptor term popularity) is able to perform at 97% of the large LTR model.

Mostafa Dehghani, Hosein Azarbonyad, Maarten Marx, Jaap Kamps
Automatically Assessing Wikipedia Article Quality by Exploiting Article–Editor Networks

We consider the problem of automatically assessing Wikipedia article quality. We develop several models to rank articles by using the editing relations between articles and editors. First, we create a basic model by modeling the article-editor network. Then we design measures of an editor’s contribution and build weighted models that improve the ranking performance. Finally, we use a combination of featured article information and the weighted models to obtain the best performance. We find that using manual evaluation to assist automatic evaluation is a viable solution for the article quality assessment task on Wikipedia.

Xinyi Li, Jintao Tang, Ting Wang, Zhunchen Luo, Maarten de Rijke

Temporal Models and Features

Long Time, No Tweets! Time-aware Personalised Hashtag Suggestion

Microblogging systems, such as the popular service Twitter, are an important real-time source of information however due to the amount of new information constantly appearing on such services, it is difficult for users to organise, search and re-find posts. Hashtags, short keywords prefixed by a # symbol, can assist users in performing these tasks, however despite their utility, they are quite infrequently used. This work considers the problem of hashtag recommendation where we wish to suggest appropriate tags which the user could assign to a new post. By identifying temporal patterns in the use of hashtags and employing personalisation techniques we construct novel prediction models which build on the best features of existing methods. Using a large sample of data from the Twitter API we test our novel approaches against a number of competitive baselines and are able to demonstrate significant performance improvements, particularly for hashtags that have large amounts of historical data available.

Morgan Harvey, Fabio Crestani
Temporal Multinomial Mixture for Instance-Oriented Evolutionary Clustering

Evolutionary clustering aims at capturing the temporal evolution of clusters. This issue is particularly important in the context of social media data that are naturally temporally driven. In this paper, we propose a new probabilistic model-based evolutionary clustering technique. The Temporal Multinomial Mixture (TMM) is an extension of classical mixture model that optimizes feature co-occurrences in the trade-off with temporal smoothness. Our model is evaluated for two recent case studies on opinion aggregation over time. We compare four different probabilistic clustering models and we show the superiority of our proposal in the task of instance-oriented clustering.

Young-Min Kim, Julien Velcin, Stéphane Bonnevay, Marian-Andrei Rizoiu
Temporal Latent Topic User Profiles for Search Personalisation

The performance of search personalisation largely depends on how to build

user profiles

effectively. Many approaches have been developed to build user profiles using topics discussed in relevant documents, where the topics are usually obtained from human-generated online ontology such as Open Directory Project. The limitation of these approaches is that many documents may not contain the topics covered in the ontology. Moreover, the human-generated topics require expensive manual effort to determine the correct categories for each document. This paper addresses these problems by using Latent Dirichlet Allocation for unsupervised extraction of the topics from documents. With the learned topics, we observe that the search intent and user interests are dynamic, i.e., they change from time to time. In order to evaluate the effectiveness of temporal aspects in personalisation, we apply three typical time scales for building

a long-term profile

,

a daily profile

and

a session profile

. In the experiments, we utilise the profiles to re-rank search results returned by a commercial web search engine. Our experimental results demonstrate that our temporal profiles can significantly improve the ranking quality. The results further show a promising effect of temporal features in correlation with click entropy and query position in a search session.

Thanh Vu, Alistair Willis, Son N. Tran, Dawei Song
Document Priors Based On Time-Sensitive Social Signals

Relevance estimation of a Web resource (document) can benefit from using social signals. In this paper, we propose a language model document prior exploiting temporal characteristics of social signals. We assume that a priori significance of a document depends on the date of users actions (social signals) and on the publication date (first occurrence) of the document. Particularly, rather than estimating the priors by simply counting signals related to the document, we bias this counting by taking into account the dates of the resource and the action. We evaluate our approach on IMDb dataset containing 167438 resources and their social data collected from several social networks. The experiments show the interest of temporally-aware signals at capturing relevant resources.

Ismail Badache, Mohand Boughanem

Topic and Document Models

Prediction of Venues in Foursquare Using Flipped Topic Models

Foursquare is a highly popular location-based social platform, where users indicate their presence at venues via check-ins and/or provide venue-related tips. On Foursquare, we explore Latent Dirichlet Allocation (LDA) topic models for venue prediction: predict venues that a user is likely to visit, given his history of

other

visited venues. However we depart from prior works which regard the users as documents and their visited venues as terms. Instead we ‘flip’ LDA models such that we regard venues as documents that attract users, which are now the terms. Flipping is simple and requires no changes to the LDA mechanism. Yet it improves prediction accuracy significantly as shown in our experiments. Furthermore, flipped models are superior when we model tips and check-ins as separate modes. This enables us to use tips to improve prediction accuracy, which is previously unexplored. Lastly, we observed the largest accuracy improvement for venues with fewer visitors, implying that the flipped models cope with sparse venue data more effectively.

Wen-Haw Chong, Bing-Tian Dai, Ee-Peng Lim
Geographical Latent Variable Models for Microblog Retrieval

Although topic models designed for textual collections annotated with geographical meta-data have been previously shown to be effective at capturing vocabulary preferences of people living in different geographical regions, little is known about their utility for information retrieval in general or microblog retrieval in particular. In this work, we propose simple and scalable geographical latent variable generative models and a method to improve the accuracy of retrieval from collections of geo-tagged documents through document expansion that is based on the topics identified by the proposed models. In particular, we experimentally compare the retrieval effectiveness of four geographical latent variable models: two geographical variants of post-hoc LDA, latent variable model without hidden topics and a topic model that can separate background from geographically-specific topics. The experiments conducted on TREC microblog datasets demonstrate significant improvement in search accuracy of the proposed method over both the traditional probabilistic retrieval model and retrieval models utilizing geographical post-hoc variants of LDA.

Alexander Kotov, Vineeth Rakesh, Eugene Agichtein, Chandan K. Reddy
Nonparametric Topic Modeling Using Chinese Restaurant Franchise with Buddy Customers

Many popular latent topic models for text documents generally make two assumptions. The first assumption relates to a finite-dimensional parameter space. The second assumption is the bag-of-words assumption, restricting such models to capture the interdependence between the words. While existing nonparametric admixture models relax the first assumption, they still impose the second assumption mentioned above about bag-of-words representation. We investigate a nonparametric admixture model by relaxing both assumptions in one unified model. One challenge is that the state-of-the-art posterior inference cannot be applied directly. To tackle this problem, we propose a new metaphor in Bayesian nonparametrics known as the “Chinese Restaurant Franchise with Buddy Customers”. We conduct experiments on different datasets, and show an improvement over existing comparative models.

Shoaib Jameel, Wai Lam, Lidong Bing
A Hierarchical Tree Model for Update Summarization

Update summarization is a new challenge which combines salience ranking with novelty detection. This paper presents a generative hierarchical tree model (HTM for short) based on Hierarchical Latent Dirichlet Allocation (hLDA) to discover the topic structure within history dataset and update dataset. From the tree structure, we can clearly identify the diversity and commonality between history dataset and update dataset. A summary ranking approach is proposed based on such structure by considering different aspects such as focus, novelty and non-redundancy. Experimental results show the effectiveness of our model.

Rumeng Li, Hiroyuki Shindo
Document Boltzmann Machines for Information Retrieval

Probabilistic language modelling has been widely used in information retrieval. It estimates document models under the multinomial distribution assumption, and uses query likelihood to rank documents. In this paper, we aim to generalize this distribution assumption by exploring the use of fully-observable Boltzmann Machines (BMs) for document modelling. BM is a stochastic recurrent network and is able to model the distribution of multi-dimensional variables. It yields a kind of Boltzmann distribution which is more general than multinomial distribution. We propose a Document Boltzmann Machine (DBM) that can naturally capture the intrinsic connections among terms and estimate query likelihood efficiently. We formally prove that under certain conditions (with 1-order parameters learnt only), DBM subsumes the traditional document language model. Its relations to other graphical models in IR, e.g., MRF model, are also discussed. Our experiments on the document reranking demonstrate the potential of the proposed DBM.

Qian Yu, Peng Zhang, Yuexian Hou, Dawei Song, Jun Wang
Effective Healthcare Advertising Using Latent Dirichlet Allocation and Inference Engine

The growing access to healthcare websites has aroused the interest of designing a specific advertising system focusing on healthcare products. In this paper, we develop an advertising method which analyzes the messages posted by users on a healthcare website. The method integrates semantic analysis with an inference engine for effective healthcare advertising. Based on our experiment results, healthcare advertising systems could be enhanced by using the domain-specific knowledge to augment the content of user messages and ads.

Yen-Chiu Li, Chien Chin Chen

User Behavior

User Simulations for Interactive Search: Evaluating Personalized Query Suggestion

In this paper, we address the question “what is the influence of user search behaviour on the effectiveness of personalized query suggestion?”. We implemented a method for query suggestion that generates candidate follow-up queries from the documents clicked by the user. This is a potentially effective method for query suggestion, but it heavily depends on user behaviour. We set up a series of experiments in which we simulate a large range of user session behaviour to investigate its influence. We found that query suggestion is not profitable for all user types. We identified a number of significant effects of user behaviour on session effectiveness. In general, it appears that there is extensive interplay between the examination behaviour, the term selection behaviour, the clicking behaviour and the query modification strategy. The results suggest that query suggestion strategies need to be adapted to specific user behaviours.

Suzan Verberne, Maya Sappelli, Kalervo Järvelin, Wessel Kraaij
The Impact of Query Interface Design on Stress, Workload and Performance

We investigated how the design of the query interface impacts stress, workload and performance during information search. Two query interfaces were used: a standard interface which looks similar to contemporary, general purpose search engines with a standard query box, and an experimental (structured) interface that was designed to slow people down when querying by presenting a series of boxes for query terms. We conducted a between subjects laboratory experiment where participants were randomly assigned to use one of the query interfaces to complete two assigned search tasks. Stress was measured by recording physiological signals and with the Short Stress State Questionnaire. Workload was measured with the NASA-TLX and log data was used to characterize search behavior. The differences in stress and search behaviors were not significant, but participants who used the structured interface rated their success significantly higher than those who used the standard interface, and reported significantly less workload.

Ashlee Edwards, Diane Kelly, Leif Azzopardi
Detecting Spam URLs in Social Media via Behavioral Analysis

This paper addresses the challenge of detecting spam URLs in social media, which is an important task for shielding users from links associated with phishing, malware, and other low-quality, suspicious content. Rather than rely on traditional blacklist-based filters or content analysis of the landing page for Web URLs, we examine the behavioral factors of both who is posting the URL and who is clicking on the URL. The core intuition is that these behavioral signals may be more difficult to manipulate than traditional signals. Concretely, we propose and evaluate fifteen click and posting-based features. Through extensive experimental evaluation, we find that this purely behavioral approach can achieve high precision (0.86), recall (0.86), and area-under-the-curve (0.92), suggesting the potential for robust behavior-based spam detection.

Cheng Cao, James Caverlee
Predicting Re-finding Activity and Difficulty

In this study, we address the problem of identifying if users are attempting to re-find information and estimating the level of difficulty of the re-finding task. We propose to consider the task information (e.g. multiple queries and click information) rather than only queries. Our resultant prediction models are shown to be significantly more accurate (by 2%) than the current state of the art. While past research assumes that previous search history of the user is available to the prediction model, we examine if re-finding detection is possible without access to this information. Our evaluation indicates that such detection is possible, but more challenging. We further describe the first predictive model in detecting re-finding difficulty, showing it to be significantly better than existing approaches for detecting general search difficulty.

Sargol Sadeghi, Roi Blanco, Peter Mika, Mark Sanderson, Falk Scholer, David Vallet
User Behavior in Location Search on Mobile Devices

Location search engines are an important part of GPS-enabled devices such as mobile phones and tablet computers. In this paper, we study how users behave when they interact with a location search engine by analyzing logs from a popular GPS-navigation service to find out whether mobile users’ location search characteristics differ from those of regular web search. In particular, we analyze query- and session-based characteristics and the temporal distribution of location searches performed on smart phones and tablet computers. Our findings may be used to improve the design of search interfaces in order to help users perform location search more effectively and improve the overall experience on GPS-enabled mobile devices.

Yaser Norouzzadeh Ravari, Ilya Markov, Artem Grotov, Maarten Clements, Maarten de Rijke
Detecting the Eureka Effect in Complex Search

In search tasks that show a high complexity, users with zero or little background knowledge usually need to go through a learning curve to accomplish the tasks. In the context of patent prior art finding, we introduce a novel notion of Eureka effect in complex search tasks that leverages the sudden change of user’s perceived relevance observable in the log data.

Eureka effect

refers to the common experience of sudden understanding a previously incomprehensible problem or concept. We employ non-parametric regression to model the learning curve that exists in learning-intensive search tasks and report our preliminary findings in observing the Eureka effect in patent prior art finding.

Hui Yang, Jiyun Luo, Christopher Wing

Reproducible IR

Twitter Sentiment Detection via Ensemble Classification Using Averaged Confidence Scores

We reproduce three classification approaches with diverse feature sets for the task of classifying the sentiment expressed in a given tweet as either positive, neutral, or negative. The reproduced approaches are also combined in an ensemble, averaging the individual classifiers’ confidence scores for the three classes and deciding sentiment polarity based on these averages. Our experimental evaluation on SemEval data shows our re-implementations to slightly outperform their respective originals. Moreover, in the SemEval Twitter sentiment detection tasks of 2013 and 2014, the ensemble of reproduced approaches would have been ranked in the top-5 among 50 participants. An error analysis shows that the ensemble classifier makes few severe misclassifications, such as identifying a positive sentiment in a negative tweet or vice versa. Instead, it tends to misclassify tweets as neutral that are not, which can be viewed as the safest option.

Matthias Hagen, Martin Potthast, Michel Büchner, Benno Stein
Reproducible Experiments on Lexical and Temporal Feedback for Tweet Search

“Evaluation as a service” (EaaS) is a new methodology for community-wide evaluations where an API provides the only point of access to the collection for completing the evaluation task. Two important advantages of this model are that it enables reproducible IR experiments and encourages sharing of pluggable open-source components. In this paper, we illustrate both advantages by providing open-source implementations of lexical and temporal feedback techniques for tweet search built on the TREC Microblog API. For the most part, we are able to reproduce results reported in previous papers and confirm their general findings. However, experiments on new test collections and additional analyses provide a more nuanced look at the results and highlight issues not discussed in previous studies, particularly the large variances in effectiveness associated with training/test splits.

Jinfeng Rao, Jimmy Lin, Miles Efron
Rank-Biased Precision Reloaded: Reproducibility and Generalization

In this work we reproduce the experiments presented in the paper entitled “Rank-Biased Precision for Measurement of Retrieval Effectiveness”. This paper introduced a new effectiveness measure –

Rank- Biased Precision (RBP)

– which has become a reference point in the IR experimental evaluation panorama.

We will show that the experiments presented in the original RBP paper are repeatable and we discuss points of strength and limitations of the approach taken by the authors. We also present a generalization of the results by adopting four experimental collections and different analysis methodologies.

Nicola Ferro, Gianmaria Silvello

Demonstrations

Knowledge Journey Exhibit: Towards Age-Adaptive Search User Interfaces

We describe an information terminal that supports interactive search with an age-adaptable search user interface whose main focus group are young users. The terminal enables a flexible adaptation of the search user interface to address changing requirements of users at different age groups. The interface is operated using touch interactions as they are considered to be more natural for children than using a mouse. Users search within a safe environment; For this purpose a search index was created using a focused crawler.

Tatiana Gossen, Michael Kotzyba, Andreas Nürnberger
PopMeter: Linked-Entities in a Sentiment Graph

It is common for a celebrity, brand, or movie to become a reference in the domain and to be vastly cited as an example of a highly reputable entity. Popmeter is a search/browsing application to visualize the reputation of an entity and its corresponding sentiment connections (in

hate-it

or

love-it

manner). Popmeter is supported by a sentiment graph populated by named-entities and sentiment words. The sentiment graph is constructed by a reputation analysis procedure that models the sentiment of each sentence where the entity is mentioned. This analysis leverages on a sentiment lexicon that includes general sentiment words that characterize the general sentiment towards the targeted named-entity.

Filipa Peleja
Adaptive Faceted Ranking for Social Media Comments

Online social media systems (such as YouTube or Reddit) provide commenting features to support augmentation of social objects (e.g. video clips or news articles). Unfortunately, many comments are not useful due to the varying intentions of the authors of comments as well as the perspectives of the readers. In this paper, we present, a framework and Web-based system for adaptive faceted ranking of social media comments, which enables users to explore different facets (e.g., subjectivity or topics) and select combinations of facets in order to extract and rank comments that match their interests and are useful for them. Based on an evaluation of the framework, we find that adaptive faceted ranking shows significant improvements over prevalent ranking methods, utilized by many platforms, with respect to the users’ preferences. Demo: http://amowa.cs.univie.ac.at:8080/Frontend/

Elaheh Momeni, Simon Braendle, Eytan Adar
Signal: Advanced Real-Time Information Filtering

The overload of textual information is an ever-growing problem to be addressed by modern information filtering systems, not least because strategic decisions are heavily influenced by the news of the world. In particular, business opportunities as well as threats can arise by using up-to-date information coming from disparate sources such as articles published by global news providers but equally those found in local newspapers or relevant blogposts. Common media monitoring approaches tend to rely on large-scale, manually created boolean queries. However, in order to be effective and flexible in a business environment, user information needs require complex, adaptive representations that go beyond simple keywords. This demonstration illustrates the approach to the problem that

Signal

takes: a cloud-based architecture that processes and analyses, in real-time, all the news of the world and allows its users to specify complex information requirements based on entities, topics, industry-specific terminology and keywords.

Miguel Martinez-Alvarez, Udo Kruschwitz, Wesley Hall, Massimo Poesio
The iCrawl Wizard – Supporting Interactive Focused Crawl Specification

Collections of Web documents about specific topics are needed for many areas of current research. Focused crawling enables the creation of such collections on demand. Current focused crawlers require the user to manually specify starting points for the crawl (

seed URLs

). These are also used to describe the expected topic of the collection. The choice of seed URLs influences the quality of the resulting collection and requires a lot of expertise. In this demonstration we present the iCrawl Wizard, a tool that assists users in defining focused crawls efficiently and semi-automatically. Our tool uses major search engines and Social Media APIs as well as information extraction techniques to find seed URLs and a semantic description of the crawl intent. Using the iCrawl Wizard even non-expert users can create semantic specifications for focused crawlers interactively and efficiently.

Gerhard Gossen, Elena Demidova, Thomas Risse
Linguistically-Enhanced Search over an Open Diachronic Corpus

The BVC section of the

impact

-es diachronic corpus of historical Spanish compiles 86 books —containing approximately 2 million words. About 27% of the words —providing a representative coverage of the most frequent word forms— have been annotated with their lemma, part of speech, and modern equivalent following the Text Encoding Initiative guidelines. We describe how this type of annotation can be exploited to provide linguistically-enhanced search over historical documents. The advanced search supports queries whose search terms can be a combination of surface forms, lemmata, parts of speech and modern forms of historical variants.

Rafael C. Carrasco, Isabel Martínez-Sempere, Enrique Mollá-Gandía, Felipe Sánchez-Martínez, Gustavo Candela Romero, Maria Pilar Escobar Esteban
From Context-Aware to Context-Based: Mobile Just-In-Time Retrieval of Cultural Heritage Objects

Cultural content providers face the challenge of disseminating their content to the general public. Meanwhile, access to Web resources shifts from desktop to mobile devices and the wide range of contextual sensors of those devices can be used to proactively retrieve and present resources in an unobtrusive manner. This proactive process, also known as just-in-time retrieval, increases the amount of information viewed and hence is a viable way to increase the visibility of cultural content. We provide a contextual model for mobile just-in-time retrieval, discuss the role of sensor information for its contextual dimensions and show the model’s applicability with a prototypical implementation. Our proposed approach enriches a user’s web experience with cultural content and the developed model can provide guidance for other domains.

Jörg Schlötterer, Christin Seifert, Wolfgang Lutz, Michael Granitzer

Tutorials

Visual Analytics for Information Retrieval Evaluation (VAIRË 2015)

Measuring is a key to scientific progress. This is particularly true for research concerning complex systems, whether natural or human-built. The tutorial introduced basic and intermediate concepts about lab-based evaluation of information retrieval systems, its pitfalls, and shortcomings and it complemented them with a recent and innovative angle to evaluation: the application of methodologies and tools coming from the

Visual Analytics (VA)

domain for better interacting, understanding, and exploring the experimental results and

Information Retrieval (IR)

system behaviour.

Marco Angelini, Nicola Ferro, Giuseppe Santucci, Gianmaria Silvello
A Tutorial on Measuring Document Retrievability

Retrievability is an important and interesting indicator that can be used in a number of ways to analyse Information Retrieval systems and document collections. Rather than focusing totally on relevance, retrievability examines what is retrieved, how often it is retrieved, and whether a user is likely to retrieve it or not. This is important because a document needs to be retrieved, before it can be judged for relevance. In this tutorial, we explained the concept of retrievability along with a number of retrievability measures, how it can be estimated and how it can be used for analysis. Since retrieval precedes relevance, we described how retrievability relates to effectiveness - along with some of the insights that researchers have discovered thus far. We also showed how retrievability relates to efficiency, and how the theory of retrievability can be used to improve both effectiveness and efficiency. Then an overview of the different applications of retrievability such as Search Engine Bias, Corpus Profiling, etc. was presented, before wrapping up with challenges and opportunities. The final session of the day examined example problems and techniques to analyse and apply retrievability to other problems and domains. This tutorial was designed for: (i) researchers curious about retrievability and wanting to see how it can impact their research, (ii) researchers who would like to expand their set of analysis techniques, and/or (iii) researchers who would like to use retrievability to perform their own analysis.

Leif Azzopardi
A Formal Approach to Effectiveness Metrics for Information Access: Retrieval, Filtering, and Clustering

In this tutorial we present a formal account of evaluation metrics for three of the most salient information related tasks: Retrieval, Clustering, and Filtering. We focus on the most popular metrics and, by exploiting measurement theory, we show some constraints for suitable metrics in each of the three tasks. We also systematically compare metrics according to how they satisfy such constraints, we provide criteria to select the most adequate metric for each specific information access task, and we discuss how to combine and weight metrics.

Enrique Amigó, Julio Gonzalo, Stefano Mizzaro
Statistical Power Analysis for Sample Size Estimation in Information Retrieval Experiments with Users

One critical decision researchers must make when designing laboratory experiments with users is how many participants to study. In interactive information retrieval (IR), the determination of sample size is often based on heuristics and limited by practical constraints such as time and finances. As a result, many studies are underpowered and it is common to see researchers make statements like “With more participants significance might have been detected,” but what does this mean? What does it mean for a study to be underpowered? How does this effect what we are able to discover, how we interpret study results and how we make choices about what to study next? How does one determine an appropriate sample size? What does it even mean for a sample size to be appropriate? This tutorial addressed these uestions by introducing participants to the use of statistical power analysis for sample size estimation in laboratory experiments with users. In discussing this topic, the issues of effect size, Type I and Type II errors and experimental design, including choice of statistical procedures, were also addressed.

Diane Kelly
Join the Living Lab: Evaluating News Recommendations in Real-Time

Participants of this tutorial learnt how to participate in CLEF NEWSREEL, a living lab for the evaluation of news recommender algorithms. Various research challenges can be addressed within NEWSREEL, such as the development and evaluation of collaborative filtering or content-based filtering strategies. Satisfying information needs by techniques including preference elicitation, pattern recognition, and prediction, recommender systems connect the research areas information retrieval and machine learning.

Frank Hopfgartner, Torben Brodt

Workshops

5th Workshop on Context-Awareness in Retrieval and Recommendation

Context-aware information is widely available in various ways and is becoming more and more important for enhancing retrieval performance and recommendation results. A primary challenge is not only recommending or retrieving the most relevant items and content, but defining them ad hoc. Other relevant issues include personalizing and adapting the information and the way it is displayed to the user’s current situation and interests. Ubiquitous computing provides new means for capturing user feedback on items and offers information. This year we are particularly interested in contributions investigating how context can influence decision making in contexts such as health, finance, food, education etc. and how systems can exploit context to assert positive behavioral change.

Ernesto William De Luca, Alan Said, Fabio Crestani, David Elsweiler
Workshop Multimodal Retrieval in the Medical Domain (MRMD) 2015

The workshop Multimodal Retrieval in the Medical Domain (MRMD) dealt with various approaches of information retrieval in the medical domain including modalities such as text, structured data, semantic information, images, and videos. The goal was to bring together researchers of the various domains to combine approaches and compare experience.

The workshop included a special session on the VISCERAL benchmark that works on the retrieval of similar cases from a collection of 3D volumes of mainly CT and MRI data. Results of the participants were compared and should complement the general topic of multimodal retrieval.

Henning Müller, Oscar Alfonso Jiménez del Toro, Allan Hanbury, Georg Langs, Antonio Foncubierta–Rodríguez
Second International Workshop on Gamification for Information Retrieval (GamifIR’15)

Gamification is a popular methodology describing the trend of applying game design principles and elements, such as feedback loops, points, badges or leader boards in non-gaming environments. Gamification can have several different objectives. Besides just increasing the fun factor, these could be, for example, to achieve more accurate work, better retention rates and more cost effective solutions by relating motivations for participating as more intrinsic than conventional methods. In the context of Information Retrieval (IR), there are various tasks that can benefit from gamification techniques such as the manual annotation of documents in IR evaluation or participation in user studies to tackle interactive IR challenges. Gamification, however, comes with its own challenges and its adoption in IR is still in its infancy. Given the enormous response to the first GamifIR workshop at ECIR 2014 and the broad range of topics discussed it seemed timely and appropriate to organise a follow-up workshop.

Frank Hopfgartner, Gabriella Kazai, Udo Kruschwitz, Michael Meder, Mark Shovman
Supporting Complex Search Tasks
ECIR 2015 Workshop

There is broad consensus in the field of IR that search is complex in many use cases and applications, both on the Web and in domain specific collections, and both professionally and in our daily life. Yet our understanding of complex search tasks, in comparison to simple look up tasks, is fragmented at best. The workshop addressed the many open research questions: What are the obvious use cases and applications of complex search? What are essential features of work tasks and search tasks to take into account? And how do these evolve over time? With a multitude of information, varying from introductory to specialized, and from authoritative to speculative or opinionated, when to show what sources of information? How does the information seeking process evolve and what are relevant differences between different stages? With complex task and search process management, blending searching, browsing, and recommendations, and supporting exploratory search to sensemaking and analytics, UI and UX design pose an overconstrained challenge. How do we know that our approach is any good? Supporting complex search task requires new collaborations across the whole field of IR, and the proposed workshop will bring together a diverse group of researchers to work together on one of the greatest challenges of our field.

Maria Gäde, Mark Hall, Hugo Huurdeman, Jaap Kamps, Marijn Koolen, Mette Skov, Elaine Toms, David Walsh
Bibliometric-Enhanced Information Retrieval: 2nd International BIR Workshop

This workshop brought together experts of communities which often have been perceived as different: bibliometrics / scientometrics / informetrics on the one side and information retrieval on the other. Our motivation as organizers of the workshop started from the observation that main discourses in both fields are different, that communities are only partly overlapping and from the belief that a knowledge transfer would be profitable for both sides. Bibliometric techniques are not yet widely used to enhance retrieval processes in digital libraries, although they offer value-added effects for users. On the other hand, more and more information professionals, working in libraries and archives are confronted with applying bibliometric techniques in their services. This way knowledge exchange becomes more urgent. The first workshop set the research agenda, by introducing methods, reporting about current research problems and brainstorming about common interests. This follow-up workshop continued the overall communication, but also put one problem into the focus. In particular, we explored how statistical modelling of scholarship can improve retrieval services for specific communities, as well as for large, cross-domain collections like Mendeley or ResearchGate. This second BIR workshop continued to raise awareness of the missing link between Information Retrieval (IR) and bibliometrics and contributes to create a common ground for the incorporation of bibliometric-enhanced services into retrieval at the scholarly search engine interface.

Philipp Mayr, Ingo Frommholz, Andrea Scharnhorst, Peter Mutschke
Backmatter
Metadata
Title
Advances in Information Retrieval
Editors
Allan Hanbury
Gabriella Kazai
Andreas Rauber
Norbert Fuhr
Copyright Year
2015
Publisher
Springer International Publishing
Electronic ISBN
978-3-319-16354-3
Print ISBN
978-3-319-16353-6
DOI
https://doi.org/10.1007/978-3-319-16354-3