main-content

## Über dieses Buch

This book constitutes the refereed proceedings of the 38th European Conference on IR Research, ECIR 2016, held in Padua, Italy, in March 2016.

The 42 full papers and 28 poster papers presented together with 3 keynote talks and 6 demonstration papers, were carefully reviewed and selected from 284 submissions. The volume contains the outcome of 4 workshops as well as 4 tutorial papers in addition. Being the premier European forum for the presentation of new research results in the field of Information Retrieval, ECIR features a wide range of topics such as: social context and news, machine learning, question answering, ranking, evaluation methodology, probalistic modeling, evaluation issues, multimedia and collaborative filtering, and many more.

## Inhaltsverzeichnis

### SoRTESum: A Social Context Framework for Single-Document Summarization

The combination of web document contents, sentences and users’ comments from social networks provides a viewpoint of a web document towards a special event. This paper proposes a framework named SoRTESum to take advantage of information from Twitter viz. Diversity and reflection of document content to generate high-quality summaries by a novel sentence similarity measurement. The framework first formulates sentences and tweets by recognizing textual entailment (RTE) relation to incorporate social information. Next, they are modeled in a Dual Wing Entailment Graph, which captures the entailment relation to calculate the sentence similarity based on mutual reinforcement information. Finally, important sentences and representative tweets are selected by a ranking algorithm. By incorporating social information, SoRTESum obtained improvements over state-of-the-art unsupervised baselines e.g., Random, SentenceLead, LexRank of 0.51 %–8.8 % of ROUGE-1 and comparable results with strong supervised methods e.g., L2R and CrossL2R trained by RankBoost for single-document summarization.

Minh-Tien Nguyen, Minh-Le Nguyen

### A Graph-Based Approach to Topic Clustering for Online Comments to News

This paper investigates graph-based approaches to labeled topic clustering of reader comments in online news. For graph-based clustering we propose a linear regression model of similarity between the graph nodes (comments) based on similarity features and weights trained using automatically derived training data. To label the clusters our graph-based approach makes use of DBPedia to abstract topics extracted from the clusters. We evaluate the clustering approach against gold standard data created by human annotators and compare its results against LDA – currently reported as the best method for the news comment clustering task. Evaluation of cluster labelling is set up as a retrieval task, where human annotators are asked to identify the best cluster given a cluster label. Our clustering approach significantly outperforms the LDA baseline and our evaluation of abstract cluster labels shows that graph-based approaches are a promising method of creating labeled clusters of news comments, although we still find cases where the automatically generated abstractive labels are insufficient to allow humans to correctly associate a label with its cluster.

Ahmet Aker, Emina Kurtic, A. R. Balamurali, Monica Paramita, Emma Barker, Mark Hepple, Rob Gaizauskas

### Leveraging Semantic Annotations to Link Wikipedia and News Archives

The incomprehensible amount of information available online has made it difficult to retrospect on past events. We propose a novel linking problem to connect excerpts from Wikipedia summarizing events to online news articles elaborating on them. To address this linking problem, we cast it into an information retrieval task by treating a given excerpt as a user query with the goal to retrieve a ranked list of relevant news articles. We find that Wikipedia excerpts often come with additional semantics, in their textual descriptions, representing the time, geolocations, and named entities involved in the event. Our retrieval model leverages text and semantic annotations as different dimensions of an event by estimating independent query models to rank documents. In our experiments on two datasets, we compare methods that consider different combinations of dimensions and find that the approach that leverages all dimensions suits our problem best.

Arunav Mishra, Klaus Berberich

### Deep Learning over Multi-field Categorical Data

– A Case Study on User Response Prediction

Predicting user responses, such as click-through rate and conversion rate, are critical in many web applications including web search, personalised recommendation, and online advertising. Different from continuous raw features that we usually found in the image and audio domains, the input features in web space are always of multi-field and are mostly discrete and categorical while their dependencies are little known. Major user response prediction models have to either limit themselves to linear models or require manually building up high-order combination features. The former loses the ability of exploring feature interactions, while the latter results in a heavy computation in the large feature space. To tackle the issue, we propose two novel models using deep neural networks (DNNs) to automatically learn effective patterns from categorical feature interactions and make predictions of users’ ad clicks. To get our DNNs efficiently work, we propose to leverage three feature transformation methods, i.e., factorisation machines (FMs), restricted Boltzmann machines (RBMs) and denoising auto-encoders (DAEs). This paper presents the structure of our models and their efficient training algorithms. The large-scale experiments with real-world data demonstrate that our methods work better than major state-of-the-art models.

Weinan Zhang, Tianming Du, Jun Wang

### Supervised Local Contexts Aggregation for Effective Session Search

Existing research on web search has mainly focused on the optimization and evaluation of single queries. However, in some complex search tasks, users usually need to interact with the search engine multiple times before their needs can be satisfied, the process of which is known as session search. The key to this problem relies on how to utilize the session context from preceding interactions to improve the search accuracy for the current query. Unfortunately, existing research on this topic only formulated limited modeling for session contexts, which in fact can exhibit considerable variations. In this paper, we propose Supervised Local Context Aggregation (SLCA) as a principled framework for complex session context modeling. In SLCA, the global session context is formulated as the combination of local contexts between consecutive interactions. These local contexts are further weighted by multiple weighting hypotheses. Finally, a supervised ranking aggregation is adopted for effective optimization. Extensive experiments on TREC11/12 session track show that our proposed SLCA algorithm outperforms many other session search methods, and achieves the state-of-the-art results.

Zhiwei Zhang, Jingang Wang, Tao Wu, Pengjie Ren, Zhumin Chen, Luo Si

### An Empirical Study of Skip-Gram Features and Regularization for Learning on Sentiment Analysis

The problem of deciding the overall sentiment of a user review is usually treated as a text classification problem. The simplest machine learning setup for text classification uses a unigram bag-of-words feature representation of documents, and this has been shown to work well for a number of tasks such as spam detection and topic classification. However, the problem of sentiment analysis is more complex and not as easily captured with unigram (single-word) features. Bigram and trigram features capture certain local context and short distance negations—thus outperforming unigram bag-of-words features for sentiment analysis. But higher order n-gram features are often overly specific and sparse, so they increase model complexity and do not generalize well.In this paper, we perform an empirical study of skip-gram features for large scale sentiment analysis. We demonstrate that skip-grams can be used to improve sentiment analysis performance in a model-efficient and scalable manner via regularized logistic regression. The feature sparsity problem associated with higher order n-grams can be alleviated by grouping similar n-grams into a single skip-gram: For example, “waste time” could match the n-gram variants “waste of time”, “waste my time”, “waste more time”, “waste too much time”, “waste a lot of time”, and so on. To promote model-efficiency and prevent overfitting, we demonstrate the utility of logistic regression incorporating both L1 regularization (for feature selection) and L2 regularization (for weight distribution).

Cheng Li, Bingyu Wang, Virgil Pavlu, Javed A. Aslam

### Multi-task Representation Learning for Demographic Prediction

Demographic attributes are important resources for market analysis, which are widely used to characterize different types of users. However, such signals are only available for a small fraction of users due to the difficulty in manual collection process by retailers. Most previous work on this problem explores different types of features and usually predicts different attributes independently. However, manually defined features require professional knowledge and often suffer from under specification. Meanwhile, modeling the tasks separately may lose the ability to leverage the correlations among different attributes. In this paper, we propose a novel Multi-task Representation Learning (MTRL) model to predict users’ demographic attributes. Comparing with the previous methods, our model conveys the following merits: (1) By using a multi-task approach to learn the tasks, our model leverages the large amounts of cross-task data, which is helpful to the task with limited data; (2) MTRL uses a supervised way to learn the shared semantic representation across multiple tasks, thus it can obtain a more general and robust representation by considering the constraints among tasks. Experiments are conducted on a real-world retail dataset where three attributes (gender, marital status, and education background) are predicted. The empirical results show that our MTRL model can improve the performance significantly compared with the state-of-the-art baselines.

Pengfei Wang, Jiafeng Guo, Yanyan Lan, Jun Xu, Xueqi Cheng

### Large-Scale Kernel-Based Language Learning Through the Ensemble Nystr $$\ddot{o}$$ o ¨ m Methods

Kernel methods have been used by many Machine Learning paradigms, achieving state-of-the-art performances in many Language Learning tasks. One drawback of expressive kernel functions, such as Sequence or Tree kernels, is the time and space complexity required both in learning and classification. In this paper, the Nystr$$\ddot{o}$$o¨m methodology is studied as a viable solution to face these scalability issues. By mapping data in low-dimensional spaces as kernel space approximations, the proposed methodology positively impacts on scalability through compact linear representation of highly structured data. Computation can be also distributed on several machines by adopting the so-called Ensemble Nystr$$\ddot{o}$$o¨m Method. Experimental results show that an accuracy comparable with state-of-the-art kernel-based methods can be obtained by reducing of orders of magnitude the required operations and enabling the adoption of datasets containing more than one million examples.

Danilo Croce, Roberto Basili

### Beyond Factoid QA: Effective Methods for Non-factoid Answer Sentence Retrieval

Retrieving finer grained text units such as passages or sentences as answers for non-factoid Web queries is becoming increasingly important for applications such as mobile Web search. In this work, we introduce the answer sentence retrieval task for non-factoid Web queries, and investigate how this task can be effectively solved under a learning to rank framework. We design two types of features, namely semantic and context features, beyond traditional text matching features. We compare learning to rank methods with multiple baseline methods including query likelihood and the state-of-the-art convolutional neural network based method, using an answer-annotated version of the TREC GOV2 collection. Results show that features used previously to retrieve topical sentences and factoid answer sentences are not sufficient for retrieving answer sentences for non-factoid queries, but with semantic and context features, we can significantly outperform the baseline methods.

Liu Yang, Qingyao Ai, Damiano Spina, Ruey-Cheng Chen, Liang Pang, W. Bruce Croft, Jiafeng Guo, Falk Scholer

In many questions in Community Question Answering sites users look for the advice or opinion of other users who might offer diverse perspectives on a topic at hand. The novel task we address is providing supportive evidence for human answers to such questions, which will potentially help the asker in choosing answers that fit her needs. We present a support retrieval model that ranks sentences from Wikipedia by their presumed support for a human answer. The model outperforms a state-of-the-art textual entailment system designed to infer factual claims from texts. An important aspect of the model is the integration of relevance oriented and support oriented features.

Liora Braunstain, Oren Kurland, David Carmel, Idan Szpektor, Anna Shtok

### Does Selective Search Benefit from WAND Optimization?

Selective search is a distributed retrieval technique that reduces the computational cost of large-scale information retrieval. By partitioning the collection into topical shards, and using a resource selection algorithm to identify a subset of shards to search, selective search allows retrieval effectiveness to be maintained while evaluating fewer postings, often resulting in 90+% reductions in querying cost. However, there has been only limited attention given to the interaction between dynamic pruning algorithms and topical index shards. We demonstrate that the WAND dynamic pruning algorithm is more effective on topical index shards than it is on randomly-organized index shards, and that the savings generated by selective search and WAND are additive. We also compare two methods for applying WAND to topical shards: searching each shard with a separate top-k heap and threshold; and sequentially passing a shared top-k heap and threshold from one shard to the next, in the order established by a resource selection mechanism. Separate top-k heaps provide low query latency, whereas a shared top-k heap provides higher throughput.

Yubin Kim, Jamie Callan, J. Shane Culpepper, Alistair Moffat

### Efficient AUC Optimization for Information Ranking Applications

Adequate evaluation of an information retrieval system to estimate future performance is a crucial task. Area under the ROC curve (AUC) is widely used to evaluate the generalization of a retrieval system. However, the objective function optimized in many retrieval systems is the error rate and not the AUC value. This paper provides an efficient and effective non-linear approach to optimize AUC using additive regression trees, with a special emphasis on the use of multi-class AUC (MAUC) because multiple relevance levels are widely used in many ranking applications. Compared to a conventional linear approach, the performance of the non-linear approach is comparable on binary-relevance benchmark datasets and is better on multi-relevance benchmark datasets.

Sean J. Welleck

### Modeling User Interests for Zero-Query Ranking

Proactive search systems like Google Now and Microsoft Cortana have gained increasing popularity with the growth of mobile Internet. Unlike traditional reactive search systems where search engines return results in response to queries issued by the users, proactive systems actively push information cards to the users on mobile devices based on the context around time, location, environment (e.g., weather), and user interests. A proactive system is a zero-query information retrieval system, which makes user modeling critical for understanding user information needs. In this paper, we study user modeling in proactive search systems and propose a learning to rank method for proactive ranking. We explore a variety of ways of modeling user interests, ranging from direct modeling of historical interaction with content types to finer-grained entity-level modeling, and user demographical information. To reduce the feature sparsity problem in entity modeling, we propose semantic similarity features using word embedding and an entity taxonomy in knowledge base. Experiments performed with data from a large commercial proactive search system show that our method significantly outperforms a strong baseline method deployed in the production system.

Liu Yang, Qi Guo, Yang Song, Sha Meng, Milad Shokouhi, Kieran McDonald, W. Bruce Croft

### Adaptive Effort for Search Evaluation Metrics

We explain a wide range of search evaluation metrics as the ratio of users’ gain to effort for interacting with a ranked list of results. According to this explanation, many existing metrics measure users’ effort as linear to the (expected) number of examined results. This implicitly assumes that users spend the same effort to examine different results. We adapt current metrics to account for different effort on relevant and non-relevant documents. Results show that such adaptive effort metrics better correlate with and predict user perceptions on search quality.

Jiepu Jiang, James Allan

### Evaluating Memory Efficiency and Robustness of Word Embeddings

Skip-Gram word embeddings, estimated from large text corpora, have been shown to improve many NLP tasks through their high-quality features. However, little is known about their robustness against parameter perturbations and about their efficiency in preserving word similarities under memory constraints. In this paper, we investigate three post-processing methods for word embeddings to study their robustness and memory efficiency. We employ a dimensionality-based, a parameter-based and a resolution-based method to obtain parameter-reduced embeddings and we provide a concept that connects the three approaches. We contrast these methods with the relative accuracy loss on six intrinsic evaluation tasks and compare them with regard to the memory efficiency of the reduced embeddings. The evaluation shows that low Bit-resolution embeddings offer great potential for memory savings by alleviating the risk of accuracy loss. The results indicate that post-processed word embeddings could also enhance applications on resource limited devices with valuable word features.

Johannes Jurgovsky, Michael Granitzer, Christin Seifert

### Characterizing Relevance on Mobile and Desktop

Relevance judgments are central to Information retrieval evaluation. With increasing number of hand held devices at users disposal today, and continuous improvement in web standards and browsers, it has become essential to evaluate whether such devices and dynamic page layouts affect users notion of relevance. Given dynamic web pages and content rendering, we know little about what kind of pages are relevant on devices other than desktop. With this work, we take the first step in characterizing relevance on mobiles and desktop. We collect crowd sourced judgments on mobile and desktop to systematically determine whether screen size of a device and page layouts impact judgments. Our study shows that there are certain difference between mobile and desktop judgments. We also observe different judging times, despite similar inter-rater agreement on both devices. Finally, we also propose and evaluate display and viewport specific features to predict relevance. Our results indicate that viewport based features can be used to reliably predict mobile relevance.

Manisha Verma, Emine Yilmaz

### Probabilistic Local Expert Retrieval

This paper proposes a range of probabilistic models of local expertise based on geo-tagged social network streams. We assume that frequent visits result in greater familiarity with the location in question. To capture this notion, we rely on spatio-temporal information from users’ online check-in profiles. We evaluate the proposed models on a large-scale sample of geo-tagged and manually annotated Twitter streams. Our experiments show that the proposed methods outperform both intuitive baselines as well as established models such as the iterative inference scheme.

Wen Li, Carsten Eickhoff, Arjen P. de Vries

### Probabilistic Topic Modelling with Semantic Graph

In this paper we propose a novel framework, topic model with semantic graph (TMSG), which couples topic model with the rich knowledge from DBpedia. To begin with, we extract the disambiguated entities from the document collection using a document entity linking system, i.e., DBpedia Spotlight, from which two types of entity graphs are created from DBpedia to capture local and global contextual knowledge, respectively. Given the semantic graph representation of the documents, we propagate the inherent topic-document distribution with the disambiguated entities of the semantic graphs. Experiments conducted on two real-world datasets show that TMSG can significantly outperform the state-of-the-art techniques, namely, author-topic Model (ATM) and topic model with biased propagation (TMBP).

Long Chen, Joemon M. Jose, Haitao Yu, Fajie Yuan, Huaizhi Zhang

### Estimating Probability Density of Content Types for Promoting Medical Records Search

Disease and symptom in medical records tend to appear in different content types: positive, negative, family history and the others. Traditional information retrieval systems depending on keyword match are often adversely affected by the content types. In this paper, we propose a novel learning approach utilizing the content types as features to improve the medical records search. Particularly, the different contents from the medical records are identified using a Bayesian-based classification method. Then, we introduce our type-based weighting function to take advantage of the content types, in which the weights of the content types are automatically calculated by estimating the probability density functions in the documents. Finally, we evaluate the approach on the TREC 2011 and 2012 Medical Records data sets, in which our experimental results show that our approach is promising and superior.

Yun He, Qinmin Hu, Yang Song, Liang He

### The Curious Incidence of Bias Corrections in the Pool

Recently, it has been discovered that it is possible to mitigate the Pool Bias of Precision at cut-off (P@n) when used with the fixed-depth pooling strategy, by measuring the effect of the tested run against the pooled runs. In this paper we extend this analysis and test the existing methods on different pooling strategies, simulated on a selection of 12 TREC test collections. We observe how the different methodologies to correct the pool bias behave, and provide guidelines about which pooling strategy should be chosen.

Aldo Lipani, Mihai Lupu, Allan Hanbury

### Understandability Biased Evaluation for Information Retrieval

Although relevance is known to be a multidimensional concept, information retrieval measures mainly consider one dimension of relevance: topicality. In this paper we propose a method to integrate multiple dimensions of relevance in the evaluation of information retrieval systems. This is done within the gain-discount evaluation framework, which underlies measures like rank-biased precision (RBP), cumulative gain, and expected reciprocal rank. Albeit the proposal is general and applicable to any dimension of relevance, we study specific instantiations of the approach in the context of evaluating retrieval systems with respect to both the topicality and the understandability of retrieved documents. This leads to the formulation of understandability biased evaluation measures based on RBP. We study these measures using both simulated experiments and real human assessments. The findings show that considering both understandability and topicality in the evaluation of retrieval systems leads to claims about system effectiveness that differ from those obtained when considering topicality alone.

Guido Zuccon

### The Relationship Between User Perception and User Behaviour in Interactive Information Retrieval Evaluation

Measures of user behaviour and user perception have been used to evaluate interactive information retrieval systems. However, there have been few efforts taken to understand the relationship between these two. In this paper, we investigated both using user actions from log files, and the results of the User Engagement Scale, both of which came from a study of people interacting with a novel interface to an image collection, but with a non-purposeful task. Our results suggest that selected behavioural actions are associated with selected user perceptions (i.e., focused attention, felt involvement, and novelty), while typical search and browse actions have no association with aesthetics and perceived usability. This is a novel finding that can lead toward a more systematic user-centered evaluation.

Mengdie Zhuang, Elaine G. Toms, Gianluca Demartini

### Using Query Performance Predictors to Improve Spoken Queries

Query performance predictors estimate a query’s retrieval effectiveness without user feedback. We evaluate the usefulness of pre- and post-retrieval performance predictors for two tasks associated with speech-enabled search: (1) predicting the most effective query transcription from the recognition system’s n-best hypotheses and (2) predicting when to ask the user for a spoken query reformulation. We use machine learning to combine a wide range of query performance predictors as features and evaluate on 5,000 spoken queries collected using a crowdsourced study. Our results suggest that pre- and post-retrieval features are useful for both tasks, and that post-retrieval features are slightly better.

Jaime Arguello, Sandeep Avula, Fernando Diaz

### Fusing Web and Audio Predictors to Localize the Origin of Music Pieces for Geospatial Retrieval

Localizing the origin of a music piece around the world enables some interesting possibilities for geospatial music retrieval, for instance, location-aware music retrieval or recommendation for travelers or exploring non-Western music – a task neglected for a long time in music information retrieval (MIR). While previous approaches for the task of determining the origin of music either focused solely on exploiting the audio content or web resources, we propose a method that fuses features from both sources in a way that outperforms stand-alone approaches. To this end, we propose the use of block-level features inferred from the audio signal to model music content. We show that these features outperform timbral and chromatic features previously used for the task. On the other hand, we investigate a variety of strategies to construct web-based predictors from web pages related to music pieces. We assess different parameters for this kind of predictors (e.g., number of web pages considered) and define a confidence threshold for prediction. Fusing the proposed audio- and web-based methods by a weighted Borda rank aggregation technique, we show on a previously used dataset of music from 33 countries around the world that the median placing error can be reduced from $$1,\!815$$1,815 to 0 kilometers using K-nearest neighbor regression.

Markus Schedl, Fang Zhou

### Key Estimation in Electronic Dance Music

In this paper we study key estimation in electronic dance music, an umbrella term referring to a variety of electronic music subgenres intended for dancing at nightclubs and raves. We start by defining notions of tonality and key before outlining the basic architecture of a template-based key estimation method. Then, we report on the tonal characteristics of electronic dance music, in order to infer possible modifications of the method described. We create new key profiles combining these observations with corpus analysis, and add two pre-processing stages to the basic algorithm. We conclude by comparing our profiles to existing ones, and testing our modifications on independent datasets of pop and electronic dance music, observing interesting improvements in the performance or our algorithms, and suggesting paths for future research.

Ángel Faraldo, Emilia Gómez, Sergi Jordà, Perfecto Herrera

### Evaluating Text Summarization Systems with a Fair Baseline from Multiple Reference Summaries

Text summarization is a challenging task. Maintaining linguistic quality, optimizing both compression and retention, all while avoiding redundancy and preserving the substance of a text is a difficult process. Equally difficult is the task of evaluating such summaries. Interestingly, a summary generated from the same document can be different when written by different humans (or by the same human at different times). Hence, there is no convenient, complete set of rules to test a machine generated summary. In this paper, we propose a methodology for evaluating extractive summaries. We argue that the overlap between two summaries should be compared against the average intersection size of two random generated baselines and propose ranking machine generated summaries based on the concept of closeness with respect to reference summaries. The key idea of our methodology is the use of weighted relatedness towards the reference summaries, normalized by the relatedness of reference summaries among themselves. Our approach suggests a relative scale, and is tolerant towards the length of the summary.

Fahmida Hamid, David Haraburda, Paul Tarau

### Multi-document Summarization Based on Atomic Semantic Events and Their Temporal Relationships

Automatic multi-document summarization (MDS) is the process of extracting the most important information, such as events and entities, from multiple natural language texts focused on the same topic. In this paper, we experiment with the effects of different groups of information such as events and named entities in the domain of generic and update MDS. Our generic MDS system has outperformed the best recent generic MDS systems in DUC 2004 in terms of ROUGE-1 recall and $$f_1$$f1-measure. Update summarization is a new form of MDS, where novel yet salient sentences are chosen as summary sentences based on the assumption that the user has already read a given set of documents. We present an event based update summarization where the novelty is detected based on the temporal ordering of events, and the saliency is ensured by the event and entity distribution. To our knowledge, no other study has deeply experimented with the effects of the novelty information acquired from the temporal ordering of events (assuming that a sentence contains one or more events) in the domain of update multi-document summarization. Our update MDS system has outperformed the state-of-the-art update MDS system in terms of ROUGE-2 and ROUGE-SU4 recall measures. All our MDS systems also generate quality summaries which are manually evaluated based on popular evaluation criteria.

Yllias Chali, Mohsin Uddin

### Tweet Stream Summarization for Online Reputation Management

Producing online reputation reports for an entity (company, brand, etc.) is a focused summarization task with a distinctive feature: issues that may affect the reputation of the entity take priority in the summary. In this paper we (i) propose a novel methodology to evaluate summaries in the context of online reputation which profits from an analogy between reputation reports and the problem of diversity in search; and (ii) provide empirical evidence that incorporating priority signals may benefit this summarization task.

Jorge Carrillo-de-Albornoz, Enrique Amigó, Laura Plaza, Julio Gonzalo

### Who Wrote the Web? Revisiting Influential Author Identification Research Applicable to Information Retrieval

In this paper, we revisit author identification research by conducting a new kind of large-scale reproducibility study: we select 15 of the most influential papers for author identification and recruit a group of students to reimplement them from scratch. Since no open source implementations have been released for the selected papers to date, our public release will have a significant impact on researchers entering the field. This way, we lay the groundwork for integrating author identification with information retrieval to eventually scale the former to the web. Furthermore, we assess the reproducibility of all reimplemented papers in detail, and conduct the first comparative evaluation of all approaches on three well-known corpora.

Martin Potthast, Sarah Braun, Tolga Buz, Fabian Duffhauss, Florian Friedrich, Jörg Marvin Gülzow, Jakob Köhler, Winfried Lötzsch, Fabian Müller, Maike Elisa Müller, Robert Paßmann, Bernhard Reinke, Lucas Rettenmeier, Thomas Rometsch, Timo Sommer, Michael Träger, Sebastian Wilhelm, Benno Stein, Efstathios Stamatatos, Matthias Hagen

### Toward Reproducible Baselines: The Open-Source IR Reproducibility Challenge

The Open-Source IR Reproducibility Challenge brought together developers of open-source search engines to provide reproducible baselines of their systems in a common environment on Amazon EC2. The product is a repository that contains all code necessary to generate competitive ad hoc retrieval baselines, such that with a single script, anyone with a copy of the collection can reproduce the submitted runs. Our vision is that these results would serve as widely accessible points of comparison in future IR research. This project represents an ongoing effort, but we describe the first phase of the challenge that was organized as part of a workshop at SIGIR 2015. We have succeeded modestly so far, achieving our main goals on the Gov2 collection with seven open-source search engines. In this paper, we describe our methodology, share experimental results, and discuss lessons learned as well as next steps.

Jimmy Lin, Matt Crane, Andrew Trotman, Jamie Callan, Ishan Chattopadhyaya, John Foley, Grant Ingersoll, Craig Macdonald, Sebastiano Vigna

### Experiments in Newswire Summarisation

In this paper, we investigate extractive multi-document summarisation algorithms over newswire corpora. Examining recent findings, baseline algorithms, and state-of-the-art systems is pertinent given the current research interest in event tracking and summarisation. We first reproduce previous findings from the literature, validating that automatic summarisation evaluation is a useful proxy for manual evaluation, and validating that several state-of-the-art systems with similar automatic evaluation scores create different summaries from one another. Following this verification of previous findings, we then reimplement various baseline and state-of-the-art summarisation algorithms, and make several observations from our experiments. Our findings include: an optimised Lead baseline; indication that several standard baselines may be weak; evidence that the standard baselines can be improved; results showing that the most effective improved baselines are not statistically significantly less effective than the current state-of-the-art systems; and finally, observations that manually optimising the choice of anti-redundancy components, per topic, can lead to improvements in summarisation effectiveness.

### On the Reproducibility of the TAGME Entity Linking System

Reproducibility is a fundamental requirement of scientific research. In this paper, we examine the repeatability, reproducibility, and generalizability of TAGME, one of the most popular entity linking systems. By comparing results obtained from its public API with (re)implementations from scratch, we obtain the following findings. The results reported in the TAGME paper cannot be repeated due to the unavailability of data sources. Part of the results are reproducible through the provided API, while the rest are not reproducible. We further show that the TAGME approach is generalizable to the task of entity linking in queries. Finally, we provide insights gained during this process and formulate lessons learned to inform future reducibility efforts.

Faegheh Hasibi, Krisztian Balog, Svein Erik Bratsberg

### Correlation Analysis of Reader’s Demographics and Tweet Credibility Perception

When searching on Twitter, readers have to determine the credibility level of tweets on their own. Previous work has mostly studied how the text content of tweets influences credibility perception. In this paper, we study reader demographics and information credibility perception on Twitter. We find reader’s educational background and geo-location have significant correlation with credibility perception. Further investigation reveals that combinations of demographic attributes correlating with credibility perception are insignificant. Despite differences in demographics, readers find features regarding topic keyword and the writing style of a tweet to be independently helpful in perceiving tweets’ credibility. While previous studies reported the use of features independently, our result shows that readers use combination of features to help in making credibility perception of tweets.

Shafiza Mohd Shariff, Mark Sanderson, Xiuzhen Zhang

### Topic-Specific Stylistic Variations for Opinion Retrieval on Twitter

Twitter has emerged as a popular platform for sharing information and expressing opinions. Twitter opinion retrieval is now recognized as a powerful tool for finding people’s attitudes on different topics. However, the vast amount of data and the informal language of tweets make opinion retrieval on Twitter very challenging. In this paper, we propose to leverage topic-specific stylistic variations to retrieve tweets that are both relevant and opinionated about a particular topic. Experimental results show that integrating topic specific textual meta-communications, such as emoticons and emphatic lengthening in a ranking function can significantly improve opinion retrieval performance on Twitter.

Anastasia Giachanou, Morgan Harvey, Fabio Crestani

### Inferring Implicit Topical Interests on Twitter

Inferring user interests from their activities in the social network space has been an emerging research topic in the recent years. While much work is done towards detecting explicit interests of the users from their social posts, less work is dedicated to identifying implicit interests, which are also very important for building an accurate user model. In this paper, a graph based link prediction schema is proposed to infer implicit interests of the users towards emerging topics on Twitter. The underlying graph of our proposed work uses three types of information: user’s followerships, user’s explicit interests towards the topics, and the relatedness of the topics. To investigate the impact of each type of information on the accuracy of inferring user implicit interests, different variants of the underlying representation model are investigated along with several link prediction strategies in order to infer implicit interests. Our experimental results demonstrate that using topics relatedness information, especially when determined through semantic similarity measures, has considerable impact on improving the accuracy of user implicit interest prediction, compared to when followership information is only used.

Fattane Zarrinkalam, Hossein Fani, Ebrahim Bagheri, Mohsen Kahani

### Topics in Tweets: A User Study of Topic Coherence Metrics for Twitter Data

Twitter offers scholars new ways to understand the dynamics of public opinion and social discussions. However, in order to understand such discussions, it is necessary to identify coherent topics that have been discussed in the tweets. To assess the coherence of topics, several automatic topic coherence metrics have been designed for classical document corpora. However, it is unclear how suitable these metrics are for topic models generated from Twitter datasets. In this paper, we use crowdsourcing to obtain pairwise user preferences of topical coherences and to determine how closely each of the metrics align with human preferences. Moreover, we propose two new automatic coherence metrics that use Twitter as a separate background dataset to measure the coherence of topics. We show that our proposed Pointwise Mutual Information-based metric provides the highest levels of agreement with human preferences of topic coherence over two Twitter datasets.

Anjie Fang, Craig Macdonald, Iadh Ounis, Philip Habel

### Supporting Scholarly Search with Keyqueries

We deal with a problem faced by scholars every day: identifying relevant papers on a given topic. In particular, we focus on the scenario where a scholar can come up with a few papers (e.g., suggested by a colleague) and then wants to find “all” the other related publications. Our proposed approach to the problem is based on the concept of keyqueries: formulating keyqueries from the input papers and suggesting the top results as candidates of related work.We compare our approach to three baselines that also represent the different ways of how humans search for related work: (1) a citation-graph-based approach focusing on cited and citing papers, (2) a method formulating queries from the paper abstracts, and (3) the “related articles”-functionality of Google Scholar. The effectiveness is measured in a Cranfield-style user study on a corpus of 200,000 papers. The results indicate that our novel keyquery-based approach is on a par with the strong citation and Google Scholar baselines but with substantially different results—a combination of the different approaches yields the best results.

Matthias Hagen, Anna Beyer, Tim Gollub, Kristof Komlossy, Benno Stein

### Pseudo-Query Reformulation

Automatic query reformulation refers to rewriting a user’s original query in order to improve the ranking of retrieval results compared to the original query. We present a general framework for automatic query reformulation based on discrete optimization. Our approach, referred to as pseudo-query reformulation, treats automatic query reformulation as a search problem over the graph of unweighted queries linked by minimal transformations (e.g. term additions, deletions). This framework allows us to test existing performance prediction methods as heuristics for the graph search process. We demonstrate the effectiveness of the approach on several publicly available datasets.

Fernando Diaz

### VODUM: A Topic Model Unifying Viewpoint, Topic and Opinion Discovery

The surge of opinionated on-line texts provides a wealth of information that can be exploited to analyze users’ viewpoints and opinions on various topics. This article presents VODUM, an unsupervised Topic Model designed to jointly discover viewpoints, topics, and opinions in text. We hypothesize that partitioning topical words and viewpoint-specific opinion words using part-of-speech helps to discriminate and identify viewpoints. Quantitative and qualitative experiments on the Bitterlemons collection show the performance of our model. It outperforms state-of-the-art baselines in generalizing data and identifying viewpoints. This result stresses how important topical and opinion words separation is, and how it impacts the accuracy of viewpoint identification.

Thibaut Thonet, Guillaume Cabanac, Mohand Boughanem, Karen Pinel-Sauvagnat

### Harvesting Training Images for Fine-Grained Object Categories Using Visual Descriptions

We harvest training images for visual object recognition by casting it as an IR task. In contrast to previous work, we concentrate on fine-grained object categories, such as the large number of particular animal subspecies, for which manual annotation is expensive. We use ‘visual descriptions’ from nature guides as a novel augmentation to the well-known use of category names. We use these descriptions in both the query process to find potential category images as well as in image reranking where an image is more highly ranked if web page text surrounding it is similar to the visual descriptions. We show the potential of this method when harvesting images for 10 butterfly categories: when compared to a method that relies on the category name only, using visual descriptions improves precision for many categories.

Josiah Wang, Katja Markert, Mark Everingham

### Do Your Social Profiles Reveal What Languages You Speak? Language Inference from Social Media Profiles

In the multilingual World Wide Web, it is critical for Web applications, such as multilingual search engines and targeted international advertisements, to know what languages the user understands. However, online users are often unwilling to make the effort to explicitly provide this information. Additionally, language identification techniques struggle when a user does not use all the languages they know to directly interact with the applications. This work proposes a method of inferring the language(s) online users comprehend by analyzing their social profiles. It is mainly based on the intuition that a user’s experiences could imply what languages they know. This is nontrivial, however, as social profiles are usually incomplete, and the languages that are regionally related or similar in vocabulary may share common features; this makes the signals that help to infer language scarce and noisy. This work proposes a language and social relation-based factor graph model to address this problem. To overcome these challenges, it explores external resources to bring in more evidential signals, and exploits the dependency relations between languages as well as social relations between profiles in modeling the problem. Experiments in this work are conducted on a large-scale dataset. The results demonstrate the success of our proposed approach in language inference and show that the proposed framework outperforms several alternative methods.

Yu Xu, M. Rami Ghorab, Zhongqing Wang, Dong Zhou, Séamus Lawless

### Retrieving Hierarchical Syllabus Items for Exam Question Analysis

Educators, institutions, and certification agencies often want to know if students are being evaluated appropriately and completely with regard to a standard. To help educators understand if examinations are well-balanced or topically correct, we explore the challenge of classifying exam questions into a concept hierarchy.While the general problems of text-classification and retrieval are quite commonly studied, our domain is particularly unusual because the concept hierarchy is expert-built but without actually having the benefit of being a well-used knowledge-base.We propose a variety of approaches to this “small-scale” Information Retrieval challenge. We use an external corpus of Q&A data for expansion of concepts, and propose a model of using the hierarchy information effectively in conjunction with existing retrieval models. This new approach is more effective than typical unsupervised approaches and more robust to limited training data than commonly used text-classification or machine learning methods.In keeping with the goal of providing a service to educators for better understanding their exams, we also explore interactive methods, focusing on low-cost relevance feedback signals within the concept hierarchy to provide further gains in accuracy.

John Foley, James Allan

### Implicit Look-Alike Modelling in Display Ads

Transfer Collaborative Filtering to CTR Estimation

User behaviour targeting is essential in online advertising. Compared with sponsored search keyword targeting and contextual advertising page content targeting, user behaviour targeting builds users’ interest profiles via tracking their online behaviour and then delivers the relevant ads according to each user’s interest, which leads to higher targeting accuracy and thus more improved advertising performance. The current user profiling methods include building keywords and topic tags or mapping users onto a hierarchical taxonomy. However, to our knowledge, there is no previous work that explicitly investigates the user online visits similarity and incorporates such similarity into their ad response prediction. In this work, we propose a general framework which learns the user profiles based on their online browsing behaviour, and transfers the learned knowledge onto prediction of their ad response. Technically, we propose a transfer learning model based on the probabilistic latent factor graphic models, where the users’ ad response profiles are generated from their online browsing profiles. The large-scale experiments based on real-world data demonstrate significant improvement of our solution over some strong baselines.

Weinan Zhang, Lingxi Chen, Jun Wang

### Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recommendation

Recently, Relevance-Based Language Models have been demonstrated as an effective Collaborative Filtering approach. Nevertheless, this family of Pseudo-Relevance Feedback techniques is computationally expensive for applying them to web-scale data. Also, they require the use of smoothing methods which need to be tuned. These facts lead us to study other similar techniques with better trade-offs between effectiveness and efficiency. Specifically, in this paper, we analyse the applicability to the recommendation task of four well-known query expansion techniques with multiple probability estimates. Moreover, we analyse the effect of neighbourhood length and devise a new probability estimate that takes into account this property yielding better recommendation rankings. Finally, we find that the proposed algorithms are dramatically faster than those based on Relevance-Based Language Models, they do not have any parameter to tune (apart from the ones of the neighbourhood) and they provide a better trade-off between accuracy and diversity/novelty.

Daniel Valcarce, Javier Parapar, Álvaro Barreiro

### Language Models for Collaborative Filtering Neighbourhoods

Language Models are state-of-the-art methods in Information Retrieval. Their sound statistical foundation and high effectiveness in several retrieval tasks are key to their current success. In this paper, we explore how to apply these models to deal with the task of computing user or item neighbourhoods in a collaborative filtering scenario. Our experiments showed that this approach is superior to other neighbourhood strategies and also very efficient. Our proposal, in conjunction with a simple neighbourhood-based recommender, showed a great performance compared to state-of-the-art methods (NNCosNgbr and PureSVD) while its computational complexity is low.

Daniel Valcarce, Javier Parapar, Álvaro Barreiro

### Adaptive Collaborative Filtering with Extended Kalman Filters and Multi-armed Bandits

It is now widely recognized that, as real-world recommender systems are often facing drifts in users’ preferences and shifts in items’ perception, collaborative filtering methods have to cope with these time-varying effects. Furthermore, they have to constantly control the trade-off between exploration and exploitation, whether in a cold start situation or during a change - possibly abrupt - in the user needs and item popularity. In this paper, we propose a new adaptive collaborative filtering method, coupling Matrix Completion, extended non-linear Kalman filters and Multi-Armed Bandits. The main goal of this method is exactly to tackle simultaneously both issues – adaptivity and exploitation/exploration trade-off – in a single consistent framework, while keeping the underlying algorithms efficient and easily scalable. Several experiments on real-world datasets show that these adaptation mechanisms significantly improve the quality of recommendations compared to other standard on-line adaptive algorithms and offer “fast” learning curves in identifying the user/item profiles, even when they evolve over time.

Jean-Michel Renders

### A Business Zone Recommender System Based on Facebook and Urban Planning Data

We present ZoneRec—a zone recommendation system for physical businesses in an urban city, which uses both public business data from Facebook and urban planning data. The system consists of machine learning algorithms that take in a business’ metadata and outputs a list of recommended zones to establish the business in. We evaluate our system using data of food businesses in Singapore and assess the contribution of different feature groups to the recommendation quality.

Jovian Lin, Richard J. Oentaryo, Ee-Peng Lim, Casey Vu, Adrian Vu, Agus T. Kwee, Philips K. Prasetyo

### On the Evaluation of Tweet Timeline Generation Task

Tweet Timeline Generation (TTG) task aims to generate a timeline of relevant but novel tweets that summarizes the development of a given topic. A typical TTG system first retrieves tweets then detects novel tweets among them to form a timeline. In this paper, we examine the dependency of TTG on retrieval quality, and its effect on having biased evaluation. Our study showed a considerable dependency, however, ranking systems is not highly affected if a common retrieval run is used.

Walid Magdy, Tamer Elsayed, Maram Hasanain

### Finding Relevant Relations in Relevant Documents

This work studies the combination of a document retrieval and a relation extraction system for the purpose of identifying query-relevant relational facts. On the TREC Web collection, we assess extracted facts separately for correctness and relevance. Despite some TREC topics not being covered by the relation schema, we find that this approach reveals relevant facts, and in particular those not yet known in the knowledge base DBpedia. The study confirms that mention frequency, document relevance, and entity relevance are useful indicators for fact relevance. Still, the task remains an open research problem.

Michael Schuhmacher, Benjamin Roth, Simone Paolo Ponzetto, Laura Dietz

Online learning to rank methods aim to optimize ranking models based on user interactions. The dueling bandit gradient descent (DBGD) algorithm is able to effectively optimize linear ranking models solely from user interactions. We propose an extension of DBGD, called probabilistic multileave gradient descent (P-MGD) that builds on probabilistic multileave, a recently proposed highly sensitive and unbiased online evaluation method. We demonstrate that P-MGD significantly outperforms state-of-the-art online learning to rank methods in terms of online performance, without sacrificing offline performance and at greater learning speed.

Harrie Oosterhuis, Anne Schuth, Maarten de Rijke

### Real-World Expertise Retrieval: The Information Seeking Behaviour of Recruitment Professionals

Recruitment professionals perform complex search tasks in order to find candidates that match client job briefs. In completing these tasks, they have to contend with many core Information Retrieval (IR) challenges such as query formulation and refinement and results evaluation. However, despite these and other similarities with more established information professions such as patent lawyers and healthcare librarians, this community has been largely overlooked in IR research. This paper presents results of a survey of recruitment professionals, investigating their information seeking behaviour and needs regarding IR systems and applications.

Tony Russell-Rose, Jon Chamberlain

### Compressing and Decoding Term Statistics Time Series

There is growing recognition that temporality plays an important role in information retrieval, particularly for timestamped document collections such as tweets. This paper examines the problem of compressing and decoding term statistics time series, or counts of terms within a particular time window across a large document collection. Such data are large—essentially the cross product of the vocabulary and the number of time intervals—but are also sparse, which makes them amenable to compression. We explore various integer compression techniques, starting with a number of coding schemes that are well-known in the information retrieval literature, and build toward a novel compression approach based on Huffman codes over blocks of term counts. We show that our Huffman-based methods are able to substantially reduce storage requirements compared to state-of-the-art compression techniques while still maintaining good decoding performance.

Jinfeng Rao, Xing Niu, Jimmy Lin

### Feedback or Research: Separating Pre-purchase from Post-purchase Consumer Reviews

Consumer reviews provide a wealth of information about products and services that, if properly identified and extracted, could be of immense value to businesses. While classification of reviews according to sentiment polarity has been extensively studied in previous work, more focused types of review analysis are needed to assist companies in making business decisions. In this work, we introduce a novel text classification problem of separating post-purchase from pre-purchase review fragments that can facilitate identification of immediate actionable insights based on the feedback from the customers, who actually purchased and own a product. To address this problem, we propose the features, which are based on the dictionaries and part-of-speech (POS) tags. Experimental results on the publicly available gold standard indicate that the proposed features allow to achieve nearly 75 % accuracy for this problem and improve the performance of classifiers relative to using only lexical features.

Mehedi Hasan, Alexander Kotov, Aravind Mohan, Shiyong Lu, Paul M. Stieg

### Inferring the Socioeconomic Status of Social Media Users Based on Behaviour and Language

This paper presents a method to classify social media users based on their socioeconomic status. Our experiments are conducted on a curated set of Twitter profiles, where each user is represented by the posted text, topics of discussion, interactive behaviour and estimated impact on the microblogging platform. Initially, we formulate a 3-way classification task, where users are classified as having an upper, middle or lower socioeconomic status. A nonlinear, generative learning approach using a composite Gaussian Process kernel provides significantly better classification accuracy ($$75\,\%$$75%) than a competitive linear alternative. By turning this task into a binary classification – upper vs. medium and lower class – the proposed classifier reaches an accuracy of $$82\,\%$$82%.

Vasileios Lampos, Nikolaos Aletras, Jens K. Geyti, Bin Zou, Ingemar J. Cox

### Two Scrolls or One Click: A Cost Model for Browsing Search Results

Modeling how people interact with search interfaces has been of particular interest and importance to the field of Interactive Information Retrieval. Recently, there has been a move to developing formal models of the interaction between the user and the system, whether it be to run a simulation, conduct an economic analysis, measure system performance, or simply to better understand the interactions. In this paper, we present a cost model that characterizes a user examining search results. The model shows under what conditions the interface should be more scroll based or more click based and provides ways to estimate the number of results per page based on the size of the screen and the various interaction costs. Further extensions to the model could be easily included to model different types of browsing and other costs.

Leif Azzopardi, Guido Zuccon

### Determining the Optimal Session Interval for Transaction Log Analysis of an Online Library Catalogue

Transaction log analysis at the level of a session is commonly used as a means of understanding user-system interactions. A key practical issue in the process of conducting session level analysis is the segmentation of the logs into appropriate user sessions (i.e., sessionisation). Methods based on time intervals are frequently used as a simple and convenient means of carrying out this segmentation task. However, little work has been carried out to determine whether the commonly applied 30-minute period is appropriate, particularly for the analysis of search logs from library catalogues. Comparison of a range session intervals with human judgements demonstrate that the overall accuracy of session segmentation is relatively constant for session intervals between 26 to 57 min. However, a session interval of between 25 and 30 min minimises the chances of one error type (incorrect collation or incorrect segmentation) predominating.

Simon Wakeling, Paul Clough

### A Comparison of Deep Learning Based Query Expansion with Pseudo-Relevance Feedback and Mutual Information

Automatic query expansion techniques are widely applied for improving text retrieval performance, using a variety of approaches that exploit several data sources for finding expansion terms. Selecting expansion terms is challenging and requires a framework capable of extracting term relationships. Recently, several Natural Language Processing methods, based on Deep Learning, are proposed for learning high quality vector representations of terms from a large amount of unstructured text with billions of words. These high quality vector representations capture a large number of term relationships. In this paper, we experimentally compare several expansion methods with expansion using these term vector representations. We use language models for information retrieval to evaluate expansion methods. Experiments conducted on four CLEF collections show a statistically significant improvement over the language models and other expansion models.

Mohannad ALMasri, Catherine Berrut, Jean-Pierre Chevallet

### A Full-Text Learning to Rank Dataset for Medical Information Retrieval

We present a dataset for learning to rank in the medical domain, consisting of thousands of full-text queries that are linked to thousands of research articles. The queries are taken from health topics described in layman’s English on the non-commercial www.NutritionFacts.org website; relevance links are extracted at 3 levels from direct and indirect links of queries to research articles on PubMed. We demonstrate that ranking models trained on this dataset by far outperform standard bag-of-words retrieval models. The dataset can be downloaded from: www.cl.uni-heidelberg.de/statnlpgroup/nfcorpus/.

Vera Boteva, Demian Gholipour, Artem Sokolov, Stefan Riezler

### Multi-label, Multi-class Classification Using Polylingual Embeddings

We propose a Polylingual text Embedding (PE) strategy, that learns a language independent representation of texts using Neural Networks. We study the effects of bilingual representation learning for text classification and we empirically show that the learned representations achieve better classification performance compared to traditional bag-of-words and other monolingual distributed representations. The performance gains are more significant in the interesting case where only few labeled examples are available for training the classifiers.

Georgios Balikas, Massih-Reza Amini

### Learning Word Embeddings from Wikipedia for Content-Based Recommender Systems

In this paper we present a preliminary investigation towards the adoption of Word Embedding techniques in a content-based recommendation scenario. Specifically, we compared the effectiveness of three widespread approaches as Latent Semantic Indexing, Random Indexing and Word2Vec in the task of learning a vector space representation of both items to be recommended as well as user profiles.To this aim, we developed a content-based recommendation (CBRS) framework which uses textual features extracted from Wikipedia to learn user profiles based on such Word Embeddings, and we evaluated this framework against two state-of-the-art datasets. The experimental results provided interesting insights, since our CBRS based on Word Embeddings showed results comparable to those of well-performing algorithms based on Collaborative Filtering and Matrix Factorization, especially in high-sparsity recommendation scenarios.

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops

### Tracking Interactions Across Business News, Social Media, and Stock Fluctuations

In this paper we study the interactions between how companies are mentioned in news, their presence on social media, and daily fluctuation in their stock prices. Our experiments demonstrate that for some entities these time series can be correlated in interesting ways, though for others the correspondences are more opaque. In this study, social media presence is measured by counting Wikipedia page hits. This work is done in a context of building a system for aggregating and analyzing news text, which aims to help the user track business trends; we show results obtainable by the system.

Ossi Karkulahti, Lidia Pivovarova, Mian Du, Jussi Kangasharju, Roman Yangarber

### Subtopic Mining Based on Three-Level Hierarchical Search Intentions

This paper proposes a subtopic mining method based on three-level hierarchical search intentions. Various subtopic candidates are extracted from web documents using a simple pattern, and higher-level and lower-level subtopics are selected from these candidates. The selected subtopics as second-level subtopics are ranked by a proposed measure, and are expanded and re-ranked considering the characteristics of resources. Using general terms in the higher-level subtopics, we make second-level subtopic groups and generate first-level subtopics. Our method achieved better performance than a state of the art method.

Se-Jong Kim, Jaehun Shin, Jong-Hyeok Lee

### Cold Start Cumulative Citation Recommendation for Knowledge Base Acceleration

This paper studies cold start Cumulative Citation Recommen dation (CCR) for Knowledge Base Acceleration (KBA), whose objective is to detect potential citations for target entities without existing KB entries from a volume of stream documents. Unlike routine CCR, in which target entities are identified by a reference KB, cold start CCR is more common since lots of less popular entities do not have any KB entry in practice. We propose a two-step strategy to address this problem: (1) event-based sentence clustering and (2) document ranking. In addition, to build effective rankers, we develop three kinds of features based on the clustering results: time range, local profile and action pattern. Empirical studies on TREC-KBA-2014 dataset demonstrate the effectiveness of the proposed strategy and the novel features.

Jingang Wang, Jingtian Jiang, Lejian Liao, Dandan Song, Zhiwei Zhang, Chin-Yew Lin

### Cross Domain User Engagement Evaluation

Due to the applications of user engagements in recommender systems, predicting user engagement has recently attracted considerable attention. In this task which is firstly proposed in ACM Recommender Systems Challenge 2014, the posts containing users’ opinions about items (e.g., the tweets containing the users’ ratings about movies in the IMDb website) are studied. In this paper, we focus on user engagement evaluation for cold-start web applications in the extreme case, when there is no training data available for the target web application. We propose an adaptive model based on transfer learning (TL) technique to train on the data from a web application and test on another one. We study the problem of detecting tweets with positive engagement, which is a highly imbalanced classification problem. Therefore, we modify the loss function of the employed transfer learning method to cope with imbalanced data. We evaluate our method using a dataset including the tweets of four popular and diverse data sources, i.e., IMDb, YouTube, Goodreads, and Pandora. The experimental results show that in some cases transfer learning can transfer knowledge among domains to improve the user engagement evaluation performance. We further analyze the results to figure out when transfer learning can help to improve the performance.

Ali Montazeralghaem, Hamed Zamani, Azadeh Shakery

### An Empirical Comparison of Term Association and Knowledge Graphs for Query Expansion

Term graphs constructed from document collections as well as external resources, such as encyclopedias (DBpedia) and knowledge bases (Freebase and ConceptNet), have been individually shown to be effective sources of semantically related terms for query expansion, particularly in case of difficult queries. However, it is not known how they compare with each other in terms of retrieval effectiveness. In this work, we use standard TREC collections to empirically compare the retrieval effectiveness of these types of term graphs for regular and difficult queries. Our results indicate that the term association graphs constructed from document collections using information theoretic measures are nearly as effective as knowledge graphs for Web collections, while the term graphs derived from DBpedia, Freebase and ConceptNet are more effective than term association graphs for newswire collections. We also found out that the term graphs derived from ConceptNet generally outperformed the term graphs derived from DBpedia and Freebase.

Saeid Balaneshinkordan, Alexander Kotov

### Deep Learning to Predict Patient Future Diseases from the Electronic Health Records

The increasing cost of health care has motivated the drive towards preventive medicine, where the primary concern is recognizing disease risk and taking action at the earliest stage. We present an application of deep learning to derive robust patient representations from the electronic health records and to predict future diseases. Experiments showed promising results in different clinical domains, with the best performances for liver cancer, diabetes, and heart failure.

Riccardo Miotto, Li Li, Joel T. Dudley

### Improving Document Ranking for Long Queries with Nested Query Segmentation

In this research, we explore nested or hierarchical query segmentation (An extended version of this paper is available at http://research.microsoft.com/pubs/259980/2015-msri-tr-nest-seg.pdf), where segments are defined recursively as consisting of contiguous sequences of segments or query words, as a more effective representation of a query. We design a lightweight and unsupervised nested segmentation scheme, and propose how to use the tree arising out of the nested representation of a query to improve ranking performance. We show that nested segmentation can lead to significant gains over state-of-the-art flat segmentation strategies.

Rishiraj Saha Roy, Anusha Suresh, Niloy Ganguly, Monojit Choudhury

### Sketching Techniques for Very Large Matrix Factorization

Matrix factorization is a prominent technique for approximate matrix reconstruction and noise reduction. Its common appeal is attributed to its space efficiency and its ability to generalize with missing information. For these reasons, matrix factorization is central to collaborative filtering systems. In the real world, such systems must deal with million of users and items, and they are highly dynamic as new users and new items are constantly added. Factorization techniques, however, have difficulties to cope with such a demanding environment. Whereas they are well understood with static data, their ability to efficiently cope with new and dynamic data is limited. Scaling to extremely large numbers of users and items is also problematic. In this work, we propose to use the count sketching technique for representing the latent factors with extreme compactness, facilitating scaling.

Raghavendran Balu, Teddy Furon, Laurent Amsaleg

### Diversifying Search Results Using Time

An Information Retrieval Method for Historians

Getting an overview of a historic entity or event can be difficult in search results, especially if important dates concerning the entity or event are not known beforehand. For such information needs, users benefit if returned results covered diverse dates, thus giving an overview of what has happened throughout history. Such a method can be a building block for applications, for instance, in digital humanities. We describe an approach to diversify search results using temporal expressions (e.g., 1990s) from their contents. Our approach first identifies time intervals of interest to the given keyword query based on pseudo-relevant documents. It then re-ranks query results so as to maximize the coverage of identified time intervals. We present a novel and objective evaluation for our proposed approach. We test the effectiveness of our methods on The New York Times Annotated corpus and the Living Knowledge corpus, collectively consisting of around 6 million documents. Using history-oriented queries and encyclopedic resources we show that our method is able to present search results diversified along time.

Dhruv Gupta, Klaus Berberich

### On Cross-Script Information Retrieval

We address the problem of cross-script retrieval in the context of a microblog system such as Twitter. Specifically, we explore methods for using native Arabic script queries to retrieve Arabic tweets written in a Roman script known as Arabizi. For example, a query for “كتاب” would not match “kitab” even though an Arabic reader would see them as the same word. Moreover, because of the lack of Arabic script, automatic language identification methods fail to recognize the Arabizi text as Arabic and label it as English, Polish, or the like. We propose a cross-script retrieval system using automatic rule-based mapping and statistical selection of transliteration keywords. We show that our system can achieve effective cross-script retrieval with minimal knowledge of the target language and without the need to rely on external translation or transliteration tools or lexica. With minimal human annotation, our technique can be applied to other languages such as Hindi and Greek, which are commonly converted to a Roman character set similarly.

### LExL: A Learning Approach for Local Expert Discovery on Twitter

In this paper, we explore a geo-spatial learning-to-rank framework for identifying local experts. Three of the key features of the proposed approach are: (i) a learning-based framework for integrating multiple factors impacting local expertise that leverages the fine-grained GPS coordinates of millions of social media users; (ii) a location-sensitive random walk that propagates crowd knowledge of a candidate’s expertise; and (iii) a comprehensive controlled study over AMT-labeled local experts on eight topics and in four cities. We find significant improvements of local expert finding versus two state-of-the-art alternatives.

Wei Niu, Zhijiao Liu, James Caverlee

### Clickbait Detection

This paper proposes a new model for the detection of clickbait, i.e., short messages that lure readers to click a link. Clickbait is primarily used by online content publishers to increase their readership, whereas its automatic detection will give readers a way of filtering their news stream. We contribute by compiling the first clickbait corpus of 2992 Twitter tweets, 767 of which are clickbait, and, by developing a clickbait model based on 215 features that enables a random forest classifier to achieve 0.79 ROC-AUC at 0.76 precision and 0.76 recall.

Martin Potthast, Sebastian Köpsel, Benno Stein, Matthias Hagen

### Informativeness for Adhoc IR Evaluation: A Measure that Prevents Assessing Individual Documents

Informativeness measures have been used in interactive information retrieval and automatic summarization evaluation. Indeed, as opposed to adhoc retrieval, these two tasks cannot rely on the Cranfield evaluation paradigm in which retrieved documents are compared to static query relevance document lists. In this paper, we explore the use of informativeness measures to evaluate adhoc task. The advantage of the proposed evaluation framework is that it does not rely on an exhaustive reference and can be used in a changing environment in which new documents occur, and for which relevance has not been assessed. We show that the correlation between the official system ranking and the informativeness measure is specifically high for most of the TREC adhoc tracks.

Romain Deveaud, Véronique Moriceau, Josiane Mothe, Eric SanJuan

### What Multimedia Sentiment Analysis Says About City Liveability

Recent developments allow for sentiment analysis on multimodal social media content. In this paper we analyse content posted on microblogging and content-sharing platforms to estimate sentiment of the city’s neighbourhoods. The results of sentiment analysis are evaluated through investigation into the existence of relationships with the indicators of city liveability, collected by the local government. Additionally, we create a set of sentiment maps that may help discover existence of possible sentiment patterns within the city. This study shows several important findings. First, utilizing multimedia data, i.e., both visual and text content leads to more reliable sentiment scores. The microblogging platform Twitter further appears more suitable for sentiment analysis than the content-sharing website Flickr. However, in case of both platforms, the computed multimodal sentiment scores show significant relationships with the indicators of city liveability.

Joost Boonzajer Flaes, Stevan Rudinac, Marcel Worring

### Scenemash: Multimodal Route Summarization for City Exploration

The potential of mining tourist information from social multimedia data gives rise to new applications offering much richer impressions of the city. In this paper we propose Scenemash, a system that generates multimodal summaries of multiple alternative routes between locations in a city. To get insight into the geographic areas on the route, we collect a dataset of community-contributed images and their associated annotations from Foursquare and Flickr. We identify images and terms representative of a geographic area by jointly analysing distributions of a large number of semantic concepts detected in the visual content and latent topics extracted from associated text. Scenemash prototype is implemented as an Android app for smartphones and smartwatches.

Jorrit van den Berg, Stevan Rudinac, Marcel Worring

### Exactus Like: Plagiarism Detection in Scientific Texts

The paper presents an overview of Exactus Like – a plagiarism detection system. Deep parsing for text alignment helps the system to find moderate forms of disguised plagiarism. The features of the system and its advantages are discussed. We describe the architecture of the system and present its performance.

Ilya Sochenkov, Denis Zubarev, Ilya Tikhomirov, Ivan Smirnov, Artem Shelmanov, Roman Suvorov, Gennady Osipov

### Jitter Search: A News-Based Real-Time Twitter Search Interface

In this demo we show how we can enhance real-time microblog search by monitoring news sources on Twitter. We improve retrieval through query expansion using pseudo-relevance feedback. However, instead of doing feedback on the original corpus we use a separate Twitter news index. This allows the system to find additional terms associated with the original query to find more “interesting” posts.

Flávio Martins, João Magalhães, Jamie Callan

### TimeMachine: Entity-Centric Search and Visualization of News Archives

We present a dynamic web tool that allows interactive search and visualization of large news archives using an entity-centric approach. Users are able to search entities using keyword phrases expressing news stories or events and the system retrieves the most relevant entities to the user query based on automatically extracted and indexed entity profiles. From the computational journalism perspective, TimeMachine allows users to explore media content through time using automatic identification of entity names, jobs, quotations and relations between entities from co-occurrences networks extracted from the news articles. TimeMachine demo is available at http://maquinadotempo.sapo.pt/.

Pedro Saleiro, Jorge Teixeira, Carlos Soares, Eugénio Oliveira

### OPMES: A Similarity Search Engine for Mathematical Content

This paper presents details about a new mathematical search engine, i.e., OPMES. This search engine leverages operator trees in both representation and relevance modeling of the mathematical content. More specifically, OPMES represents mathematical expressions using operator trees, and then indexes each expression based on all the leaf-root paths of the generated operator tree. Such data structures enable OPMES to implement an efficient two-stage query processing technique. The system first identifies structurally relevant expressions based on the matching of the leaf-root paths, and then further ranks them based on their symbolic similarity to the query.

Wei Zhong, Hui Fang

### SHAMUS: UFAL Search and Hyperlinking Multimedia System

In this paper, we describe SHAMUS, our system for an easy search and navigation in multimedia archives. The system consists of three components. The Search component provides a text-based search in a multimedia collection, the Anchoring component determines the most important segments of videos, and segments topically related to the anchoring ones are retrieved by the Hyperlinking component. In the paper, we describe each component of the system as well as the online demo interface http://ufal.mff.cuni.cz/shamus which currently works with a collection of TED talks.

Petra Galuščáková, Shadi Saleh, Pavel Pecina

### Industry Day Overview

The Industry track aims to bring together information retrieval researchers, practitioners and analysts from industry and academia. Since ECIR 2006, these events have been very successful and provided many interesting talks.

Omar Alonso, Pavel Serdyukov

### Bibliometric-Enhanced Information Retrieval: 3rd International BIR Workshop

The BIR workshop brings together experts in Bibliometrics and Information Retrieval. While sometimes perceived as rather loosely related, these research areas share various interests and face similar challenges. Our motivation as organizers of the BIR workshop stemmed from a twofold observation. First, both communities only partly overlap, albeit sharing various interests. Second, it will be profitable for both sides to tackle some of the emerging problems that scholars face today when they have to identify relevant and high quality literature in the fast growing number of electronic publications available worldwide. Bibliometric techniques are not yet used widely to enhance retrieval processes in digital libraries, although they offer value-added effects for users. Information professionals working in libraries and archives, however, are increasingly confronted with applying bibliometric techniques in their services. The first BIR workshop in 2014 set the research agenda by introducing each group to the other, illustrating state-of-the-art methods, reporting on current research problems, and brainstorming about common interests. The second workshop in 2015 further elaborated these themes. This third BIR workshop aims to foster a common ground for the incorporation of bibliometric-enhanced services into scholarly search engine interfaces. In particular we will address specific communities, as well as studies on large, cross-domain collections like Mendeley and ResearchGate. This third BIR workshop addresses explicitly both scholarly and industrial researchers.

Philipp Mayr, Ingo Frommholz, Guillaume Cabanac

### MultiLingMine 2016: Modeling, Learning and Mining for Cross/Multilinguality

The increasing availability of text information coded in many different languages poses new challenges to modern information retrieval and mining systems in order to discover and exchange knowledge at a larger world-wide scale. The 1st International Workshop on Modeling, Learning and Mining for Cross/Multilinguality (dubbed MultiLingMine 2016) provides a venue to discuss research advances in cross-/multilingual related topics, focusing on new multidisciplinary research questions that have not been deeply investigated so far (e.g., in CLEF and related events relevant to CLIR). This includes theoretical and experimental on-going works about novel representation models, learning algorithms, and knowledge-based methodologies for emerging trends and applications, such as, e.g., cross-view cross-/multilingual information retrieval and document mining, (knowledge-based) translation-independent cross-/multilingual corpora, applications in social network contexts, and more.

Dino Ienco, Mathieu Roche, Salvatore Romeo, Paolo Rosso, Andrea Tagarelli

### Proactive Information Retrieval: Anticipating Users’ Information Need

The ultimate goal of an IR system is to fulfill the user’s information need. Traditional search systems have been reactive in nature wherein the search systems react to an input query and return a set of ranked documents most probable to contain the desired information. Due to the inability of, and efforts required by users to create efficient queries expressing their information needs, techniques such as query expansion, query suggestions, using relevance feedback and click-through information, personalization, etc. have been used to better understand and satisfy users’ information needs. Given the increasing popularity of smartphones and Internet enabled wearable devices, how can the information retrieval systems use the additional data, and better interact with the user so as to better understand, and even anticipate her precise information needs? Building such zero query or minimum user effort systems require research efforts from multiple disciplines covering algorithmic aspects of retrieval models, user modeling and profiling, evaluation, context modeling, novel user interfaces design, etc. The proposed workshop intends to gather together the researchers from academia and industry practitioners with these diverse backgrounds to share their experiences and opinions on challenges and possibilities of developing such proactive information retrieval systems.

Sumit Bhatia, Debapriyo Majumdar, Nitish Aggarwal

### First International Workshop on Recent Trends in News Information Retrieval (NewsIR’16)

The news industry has gone through seismic shifts in the past decade with digital content and social media completely redefining how people consume news. Readers check for accurate fresh news from multiple sources throughout the day using dedicated apps or social media on their smartphones and tablets. At the same time, news publishers rely more and more on social networks and citizen journalism as a frontline to breaking news. In this new era of fast-flowing instant news delivery and consumption, publishers and aggregators have to overcome a great number of challenges. These include the verification or assessment of a source’s reliability; the integration of news with other sources of information; real-time processing of both news content and social streams in multiple languages, in different formats and in high volumes; deduplication; entity detection and disambiguation; automatic summarization; and news recommendation. Although Information Retrieval (IR) applied to news has been a popular research area for decades, fresh approaches are needed due to the changing type and volume of media content available and the way people consume this content. The goal of this workshop is to stimulate discussion around new and powerful uses of IR applied to news sources and the intersection of multiple IR tasks to solve real user problems. To promote research efforts in this area, we released a new dataset consisting of one million news articles to the research community and introduced a data challenge track as part of the workshop.

Miguel Martinez-Alvarez, Udo Kruschwitz, Gabriella Kazai, Frank Hopfgartner, David Corney, Ricardo Campos, Dyaa Albakour

### Collaborative Information Retrieval: Concepts, Models and Evaluation

Recent work have shown the potential of collaboration for solving complex or exploratory search tasks allowing to achieve synergic effects with respect to individual search, which is the prevalent information retrieval (IR) setting this last decade. This interactive multi-user context gives rise to several challenges in IR. One main challenge relies on the adaptation of IR techniques or models [8] in order to build algorithmic supports of collaboration distributing documents among users. The second challenge is related to the design of Collaborative Information Retrieval (CIR) models and their effectiveness evaluation since individual IR frameworks and measures do not totally fit with the collaboration paradigms. In this tutorial, we address the second challenge and present first a general overview of collaborative search introducing the main underlying notions. Then, we focus on related work dealing with collaborative ranking models and their effectiveness evaluation. Our primary objective is to introduce these notions by highlighting how and why they should be different from individual IR in order to give participants the main clues for investigating new research directions in this domain with a deep understanding of current CIR work.

Lynda Tamine, Laure Soulier

### Group Recommender Systems: State of the Art, Emerging Aspects and Techniques, and Research Challenges

A recommender system aims at suggesting to users items that might interest them and that they have not considered yet. A class of systems, known as group recommendation, provides suggestions in contexts in which more than one person is involved in the recommendation process. The goal of this tutorial is to provide the ECIR audience with an overview on group recommendation. We will first illustrate the recommender systems principles, then formally introduce the problem of producing recommendations to groups, and present a survey based on the tasks performed by these systems. We will also analyze challenging topics like their evaluation, and present emerging aspects and techniques in this area. The tutorial will end with a summary that highlights open issues and research challenges.

Ludovico Boratto

### Living Labs for Online Evaluation: From Theory to Practice

Experimental evaluation has always been central to Information Retrieval research. The field is increasingly moving towards online evaluation, which involves experimenting with real, unsuspecting users in their natural task environments, a so-called living lab. Specifically, with the recent introduction of the Living Labs for IR Evaluation initiative at CLEF and the OpenSearch track at TREC, researchers can now have direct access to such labs. With these benchmarking platforms in place, we believe that online evaluation will be an exciting area to work on in the future. This half-day tutorial aims to provide a comprehensive overview of the underlying theory and complement it with practical guidance.

Anne Schuth, Krisztian Balog

### Real-Time Bidding Based Display Advertising: Mechanisms and Algorithms

In display and mobile advertising, the most significant development in recent years is the Real-Time Bidding (RTB), which allows selling and buying in real-time one ad impression at a time. The ability of making impression level bid decision and targeting to an individual user in real-time has fundamentally changed the landscape of the digital media. The further demand for automation, integration and optimisation in RTB brings new research opportunities in the IR fields, including information matching with economic constraints, CTR prediction, user behaviour targeting and profiling, personalised advertising, and attribution and evaluation methodologies. In this tutorial, teamed up with presenters from both the industry and academia, we aim to bring the insightful knowledge from the real-world systems, and to provide an overview of the fundamental mechanism and algorithms with the focus on the IR context. We will also introduce to IR researchers a few datasets recently made available so that they can get hands-on quickly and enable the said research.

Jun Wang, Shuai Yuan, Weinan Zhang

### Backmatter

Weitere Informationen