Skip to main content

2010 | Buch

Advances in Information Retrieval

32nd European Conference on IR Research, ECIR 2010, Milton Keynes, UK, March 28-31, 2010.Proceedings

herausgegeben von: Cathal Gurrin, Yulan He, Gabriella Kazai, Udo Kruschwitz, Suzanne Little, Thomas Roelleke, Stefan Rüger, Keith van Rijsbergen

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

These proceedings contain the papers presented at ECIR 2010, the 32nd Eu- pean Conference on Information Retrieval. The conference was organizedby the Knowledge Media Institute (KMi), the Open University, in co-operation with Dublin City University and the University of Essex, and was supported by the Information Retrieval Specialist Group of the British Computer Society (BCS- IRSG) and the Special Interest Group on Information Retrieval (ACM SIGIR). It was held during March 28-31, 2010 in Milton Keynes, UK. ECIR 2010 received a total of 202 full-paper submissions from Continental Europe (40%), UK (14%), North and South America (15%), Asia and Australia (28%), Middle East and Africa (3%). All submitted papers were reviewed by at leastthreemembersoftheinternationalProgramCommittee.Outofthe202- pers 44 were selected asfull researchpapers. ECIR has alwaysbeen a conference with a strong student focus. To allow as much interaction between delegates as possible and to keep in the spirit of the conference we decided to run ECIR 2010 as a single-track event. As a result we decided to have two presentation formats for full papers. Some of them were presented orally, the others in poster format. The presentation format does not represent any di?erence in quality. Instead, the presentation format was decided after the full papers had been accepted at the Program Committee meeting held at the University of Essex. The views of the reviewers were then taken into consideration to select the most appropriate presentation format for each paper.

Inhaltsverzeichnis

Frontmatter

Recent Developments in Information Retrieval

Recent Developments in Information Retrieval

This paper summarizes the scientific work presented at the 32nd European Conference on Information Retrieval. It demonstrates that information retrieval (IR) as a research area continues to thrive with progress being made in three complementary sub-fields, namely IR theory and formal methods together with indexing and query representation issues, furthermore Web IR as a primary application area and finally research into evaluation methods and metrics. It is the combination of these areas that gives IR its solid scientific foundations. The paper also illustrates that significant progress has been made in other areas of IR. The keynote speakers addressed three such subject fields, social search engines using personalization and recommendation technologies, the renewed interest in applying natural language processing to IR, and multimedia IR as another fast-growing area.

Cathal Gurrin, Yulan He, Gabriella Kazai, Udo Kruschwitz, Suzanne Little, Thomas Roelleke, Stefan Rüger, Keith van Rijsbergen

Invited Talks

Web Search Futures: Personal, Collaborative, Social

In this talk we will discuss where Web search may be heading, focusing on a number of large-scale research projects that are trying to develop the “next big thing” in Web search. We will consider some important recent initiatives on how to improve the quality of the Web search experience by helping search engines to respond to our individual needs and preferences. In turn, we will focus on some innovative work on how to take advantage of the inherently collaborative nature of Web search as we discuss recent attempts to develop so-called “social search engines”.

Barry Smyth
IR, NLP, and Visualization

In the last ten years natural language processing (NLP) has become an essential part of many information retrieval systems, mainly in the guise of question answering, summarization, machine translation and preprocessing such as decompounding. However, most of these methods are shallow. More complex natural language processing is not yet sufficiently reliable to be used in IR. I will discuss how new visualization technology and rich interactive environments offer new opportunities for complex NLP in IR.

Hinrich Schütze
Image and Natural Language Processing for Multimedia Information Retrieval

Image annotation, the task of automatically generating description words for a picture, is a key component in various image search and retrieval applications. Creating image databases for model development is, however, costly and time consuming, since the keywords must be hand-coded and the process repeated for new collections. In this work we exploit the vast resource of images and documents available on the web for developing image annotation models without any human involvement. We describe a probabilistic framework based on the assumption that images and their co-occurring textual data are generated by mixtures of latent topics. Applications of this framework to image annotation and retrieval show performance gains over previously proposed approaches, despite the noisy nature of our dataset. We also discuss how the proposed model can be used for story picturing, i.e., to find images that appropriately illustrate a text and demonstrate its utility when interfaced with an image caption generator.

Mirella Lapata

Regular Papers

NLP and Text Mining

A Language Modeling Approach for Temporal Information Needs

This work addresses information needs that have a temporal dimension conveyed by a temporal expression in the user’s query. Temporal expressions such as

“in the 1990s”

are frequent, easily extractable, but not leveraged by existing retrieval models. One challenge when dealing with them is their inherent uncertainty. It is often unclear which exact time interval a temporal expression refers to.

We integrate temporal expressions into a language modeling approach, thus making them first-class citizens of the retrieval model and considering their inherent uncertainty. Experiments on the New York Times Annotated Corpus using Amazon Mechanical Turk to collect queries and obtain relevance assessments demonstrate that our approach yields substantial improvements in retrieval effectiveness.

Klaus Berberich, Srikanta Bedathur, Omar Alonso, Gerhard Weikum
Analyzing Information Retrieval Methods to Recover Broken Web Links

In this work we compare different techniques to automatically find candidate web pages to substitute broken links. We extract information from the anchor text, the content of the page containing the link, and the cache page in some digital library. The selected information is processed and submitted to a search engine. We have compared different information retrieval methods for both, the selection of terms used to construct the queries submitted to the search engine, and the ranking of the candidate pages that it provides, in order to help the user to find the best replacement. In particular, we have used term frequencies, and a language model approach for the selection of terms; and cooccurrence measures and a language model approach for ranking the final results. To test the different methods, we have also defined a methodology which does not require the user judgments, what increases the objectivity of the results.

Juan Martinez-Romo, Lourdes Araujo
Between Bags and Trees – Constructional Patterns in Text Used for Attitude Identification

This paper describes experiments to use non-terminological information to find attitudinal expressions in written English text. The experiments are based on an analysis of text with respect to not only the vocabulary of content terms present in it (which most other approaches use as a basis for analysis) but also with respect to presence of structural features of the text represented by constructional features (typically disregarded by most other analyses). In our analysis, following a construction grammar framework, structural features are treated as occurrences, similarly to the treatment of vocabulary features. The constructional features in play are chosen to potentially signify opinion but are not specific to negative or positive expressions.

The framework is used to classify clauses, headlines, and sentences from three different shared collections of attitudinal data. We find that constructional features transfer well across different text collections and that the information couched in them integrates easily with a vocabulary based approach, yielding improvements in classification without complicating the application end of the processing framework.

Jussi Karlgren, Gunnar Eriksson, Magnus Sahlgren, Oscar Täckström
Improving Medical Information Retrieval with PICO Element Detection

Without a well formulated and structured question, it can be very difficult and time consuming for physicians to identify appropriate resources and search for the best available evidence for medical treatment in evidence-based medicine (EBM). In EBM, clinical studies and questions involve four aspects: Population/Problem (P), Intervention (I), Comparison (C) and Outcome (O), which are known as PICO elements. It is intuitively more advantageous to use these elements in Information Retrieval (IR). In this paper, we first propose an approach to automatically identify the PICO elements in documents and queries. We test several possible approaches to use the identified elements in IR. Experiments show that it is a challenging task to determine accurately PICO elements. However, even with noisy tagging results, we can still take advantage of some PICO elements, namely I and P elements, to enhance the retrieval process, and this allows us to obtain significantly better retrieval effectiveness than the state-of-the-art methods.

Florian Boudin, Lixin Shi, Jian-Yun Nie
The Role of Query Sessions in Extracting Instance Attributes from Web Search Queries

Per-instance attributes are acquired using a weakly supervised extraction method which exploits anonymized Web-search query sessions, as an alternative to isolated, individual queries. Examples of these attributes are

top speed

for

chevrolet corvette

, or

population density

for

brazil

). Inherent challenges associated with using sessions for attribute extraction, such as a large majority of within-session queries not being related to attributes, are overcome by using attributes globally extracted from isolated queries as an unsupervised filtering mechanism. In a head-to-head qualitative comparison, the ranked lists of attributes generated by merging attributes extracted from query sessions, on one hand, and from isolated queries, on another hand, are about 12% more accurate on average, than the attributes extracted from isolated queries by a previous method.

Marius Paşca, Enrique Alfonseca, Enrique Robledo-Arnuncio, Ricardo Martin-Brualla, Keith Hall
Transliteration Equivalence Using Canonical Correlation Analysis

We address the problem of Transliteration Equivalence,

i.e.

determining whether a pair of words in two different languages (

e.g.

Auden

, ऑडेन) are name transliterations or not. This problem is at the heart of Mining Name Transliterations (MINT) from various sources of multilingual text data including parallel, comparable, and non-comparable corpora and multilingual news streams. MINT is useful in several cross-language tasks including Cross-Language Information Retrieval (CLIR), Machine Translation (MT), and Cross-Language Named Entity Retrieval. We propose a novel approach to Transliteration Equivalence using language-neutral representations of names. The key idea is to consider name transliterations in two languages as two views of the same semantic object and compute a low-dimensional common feature space using Canonical Correlation Analysis (CCA). Similarity of the names in the common feature space forms the basis for classifying a pair of names as transliterations. We show that our approach outperforms state-of-the-art baselines in the CLIR task for Hindi-English (3 collections) and Tamil-English (2 collections).

Raghavendra Udupa, Mitesh M. Khapra

Web IR

Explicit Search Result Diversification through Sub-queries

Queries submitted to a retrieval system are often ambiguous. In such a situation, a sensible strategy is to diversify the ranking of results to be retrieved, in the hope that users will find at least one of these results to be relevant to their information need. In this paper, we introduce xQuAD, a novel framework for search result diversification that builds such a diversified ranking by explicitly accounting for the relationship between documents retrieved for the original query and the possible aspects underlying this query, in the form of sub-queries. We evaluate the effectiveness of xQuAD using a standard TREC collection. The results show that our framework markedly outperforms state-of-the-art diversification approaches under a simulated best-case scenario. Moreover, we show that its effectiveness can be further improved by estimating the relative importance of each identified sub-query. Finally, we show that our framework can still outperform the simulated best-case scenario of the state-of-the-art diversification approaches using sub-queries automatically derived from the baseline document ranking itself.

Rodrygo L. T. Santos, Jie Peng, Craig Macdonald, Iadh Ounis
Interpreting User Inactivity on Search Results

The lack of user activity on search results was until recently perceived as a sign of user dissatisfaction from retrieval performance, often, referring to such inactivity as a failed search (negative search abandonment). However, recent studies suggest that some search tasks can be achieved in the contents of the results displayed without the need to click through them (positive search abandonment); thus they emphasize the need to discriminate between successful and failed searches without follow-up clicks. In this paper, we study users’ inactivity on search results in relation to their pursued search goals and investigate the impact of displayed results on user clicking decisions. Our study examines two types of post-query user inactivity:

pre-determined

and

post-determined

depending on whether the user started searching with a preset intention to look for answers only within the result snippets and did not intend to click through the results, or the user inactivity was decided after the user had reviewed the list of retrieved documents. Our findings indicate that 27% of web searches in our sample are conducted with a pre-determined intention to look for answers in the results’ list and 75% of them can be satisfied in the contents of the displayed results. Moreover, in nearly half the queries that did not yield result visits, the desired information is found in the result snippets.

Sofia Stamou, Efthimis N. Efthimiadis
Learning to Select a Ranking Function

Learning To Rank (LTR) techniques aim to learn an effective document ranking function by combining several document features. While the function learned may be uniformly applied to all queries, many studies have shown that different ranking functions favour different queries, and the retrieval performance can be significantly enhanced if an appropriate ranking function is selected for each individual query. In this paper, we propose a novel Learning To Select framework that selectively applies an appropriate ranking function on a per-query basis. The approach employs a query feature to identify similar training queries for an unseen query. The ranking function which performs the best on this identified training query set is then chosen for the unseen query. In particular, we propose the use of divergence, which measures the extent that a document ranking function alters the scores of an initial ranking of documents for a given query, as a query feature. We evaluate our method using tasks from the TREC Web and Million Query tracks, in combination with the LETOR 3.0 and LETOR 4.0 feature sets. Our experimental results show that our proposed method is effective and robust for selecting an appropriate ranking function on a per-query basis. In particular, it always outperforms three state-of-the-art LTR techniques, namely Ranking SVM, AdaRank, and the automatic feature selection method.

Jie Peng, Craig Macdonald, Iadh Ounis
Mining Anchor Text Trends for Retrieval

Anchor text has been considered as a useful resource to complement the representation of target pages and is broadly used in web search. However, previous research only uses anchor text of a single snapshot to improve web search. Historical trends of anchor text importance have not been well modeled in anchor text weighting strategies. In this paper, we propose a novel temporal anchor text weighting method to incorporate the trends of anchor text creation over time, which combines historical weights of anchor text by propagating the anchor text weights among snapshots over the time axis. We evaluate our method on a real-world web crawl from the Stanford WebBase. Our results demonstrate that the proposed method can produce a significant improvement in ranking quality.

Na Dai, Brian D. Davison
Predicting Query Performance via Classification

We investigate using topic prediction data, as a summary of document content, to compute measures of search result quality. Unlike existing quality measures such as query clarity that require the entire content of the top-ranked results, class-based statistics can be computed efficiently online, because class information is compact enough to precompute and store in the index. In an empirical study we compare the performance of class-based statistics to their language-model counterparts for two performance-related tasks: predicting query difficulty and expansion risk. Our findings suggest that using class predictions can offer comparable performance to full language models while reducing computation overhead.

Kevyn Collins-Thompson, Paul N. Bennett

Evaluation

A Case for Automatic System Evaluation

Ranking a set retrieval systems according to their retrieval effectiveness without relying on relevance judgments was first explored by Soboroff

et al.

[13]. Over the years, a number of alternative approaches have been proposed, all of which have been evaluated on early TREC test collections. In this work, we perform a wider analysis of

system ranking estimation

methods on sixteen TREC data sets which cover more tasks and corpora than previously. Our analysis reveals that the performance of system ranking estimation approaches varies across topics. This observation motivates the hypothesis that the performance of such methods can be improved by selecting the “right” subset of topics from a topic set. We show that using topic subsets improves the performance of automatic system ranking methods by 26% on average, with a maximum of 60%. We also observe that the commonly experienced problem of underestimating the performance of the best systems is data set dependent and not inherent to system ranking estimation. These findings support the case for automatic system evaluation and motivate further research.

Claudia Hauff, Djoerd Hiemstra, Leif Azzopardi, Franciska de Jong
Aggregation of Multiple Judgments for Evaluating Ordered Lists

Many tasks (e.g., search and summarization) result in an ordered list of items. In order to evaluate such an ordered list of items, we need to compare it with an ideal ordered list created by a human expert for the same set of items. To reduce any bias, multiple human experts are often used to create multiple ideal ordered lists. An interesting challenge in such an evaluation method is thus how to aggregate these different ideal lists to compute a single score for an ordered list to be evaluated. In this paper, we propose three new methods for aggregating multiple order judgments to evaluate ordered lists: weighted correlation aggregation, rank-based aggregation, and frequent sequential pattern-based aggregation. Experiment results on ordering sentences for text summarization show that all the three new methods outperform the state of the art average correlation methods in terms of discriminativeness and robustness against noise. Among the three proposed methods, the frequent sequential pattern-based method performs the best due to the flexible modeling of agreements and disagreements among human experts at various levels of granularity.

Hyun Duk Kim, ChengXiang Zhai, Jiawei Han
Evaluation and User Preference Study on Spatial Diversity

Spatial diversity is a relatively new branch of research in the context of spatial information retrieval. Although the assumption that spatially diversified results may meet users’ needs better seems reasonable, there has been little hard evidence in the literature indicating so. In this paper, we will show the potentials of spatial diversity by not only the traditional evaluation metrics (precision and cluster recall), but also through a user preference study using Amazon Mechanical Turk. The encouraging results from the latter prove that users do have strong preference on spatially diversified results.

Jiayu Tang, Mark Sanderson
News Comments:Exploring, Modeling, and Online Prediction

Online news agents provide commenting facilities for their readers to express their opinions or sentiments with regards to news stories. The number of user supplied comments on a news article may be indicative of its importance, interestingness, or impact. We explore the news comments space, and compare the log-normal and the negative binomial distributions for modeling comments from various news agents. These estimated models can be used to normalize raw comment counts and enable comparison across different news sites. We also examine the feasibility of online prediction of the number of comments, based on the volume observed shortly after publication. We report on solid performance for predicting news comment volume in the long run, after short observation. This prediction can be useful for identifying news stories with the potential to “take off,” and can be used to support front page optimization for news sites.

Manos Tsagkias, Wouter Weerkamp, Maarten de Rijke
Query Performance Prediction: Evaluation Contrasted with Effectiveness

Query performance predictors are commonly evaluated by reporting correlation coefficients to denote how well the methods perform at predicting the retrieval performance of a set of queries. Despite the amount of research dedicated to this area, one aspect remains neglected: how strong does the correlation need to be in order to realize an improvement in retrieval effectiveness in an operational setting? We address this issue in the context of two settings: Selective Query Expansion and Meta-Search. In an empirical study, we control the quality of a predictor in order to examine how the strength of the correlation achieved, affects the effectiveness of an adaptive retrieval system. The results of this study show that many existing predictors fail to achieve a correlation strong enough to reliably improve the retrieval effectiveness in the Selective Query Expansion as well as the Meta-Search setting.

Claudia Hauff, Leif Azzopardi, Djoerd Hiemstra, Franciska de Jong

Multimedia IR

A Framework for Evaluating Automatic Image Annotation Algorithms

Several Automatic Image Annotation (AIA) algorithms have been introduced recently, which have been found to outperform previous models. However, each one of them has been evaluated using either different descriptors, collections or parts of collections, or "easy" settings. This fact renders their results non-comparable, while we show that collection-specific properties are responsible for the high reported performance measures, and not the actual models. In this paper we introduce a framework for the evaluation of image annotation models, which we use to evaluate two state-of-the-art AIA algorithms. Our findings reveal that a simple Support Vector Machine (SVM) approach using Global MPEG-7 Features outperforms state-of-the-art AIA models across several collection settings. It seems that these models heavily depend on the set of features and the data used, while it is easy to exploit collection-specific properties, such as tag popularity especially in the commonly used Corel 5K dataset and still achieve good performance.

Konstantinos Athanasakos, Vassilios Stathopoulos, Joemon M. Jose
BASIL: Effective Near-Duplicate Image Detection Using Gene Sequence Alignment

Finding near-duplicate images is a task often found in Multimedia Information Retrieval (MIR). Toward this effort, we propose a novel idea by bridging two seemingly unrelated fields –

MIR

and

Biology

. That is, we propose to use the popular gene sequence alignment algorithm in Biology, i.e., BLAST, in detecting near-duplicate images. Under the new idea, we study how various image features and gene sequence generation methods (using gene alphabets such as

A

,

C

,

G

, and

T

in DNA sequences) affect the accuracy and performance of detecting near-duplicate images. Our proposal, termed as

B

L

AS

Ted

I

mage

L

inkage (

BASIL

), is empirically validated using various real data sets. This work can be viewed as the “first” step toward bridging MIR and Biology fields in the well-studied near-duplicate image detection problem.

Hung-sik Kim, Hau-Wen Chang, Jeongkyu Lee, Dongwon Lee
Beyond Shot Retrieval: Searching for Broadcast News Items Using Language Models of Concepts

Current video search systems commonly return video shots as results. We believe that users may better relate to longer, semantic video units and propose a retrieval framework for news story items, which consist of multiple shots. The framework is divided into two parts: (1) A concept based language model which ranks news items with known occurrences of semantic concepts by the probability that an important concept is produced from the concept distribution of the news item and (2) a probabilistic model of the uncertain presence, or risk, of these concepts. In this paper we use a method to evaluate the performance of story retrieval, based on the TRECVID shot-based retrieval groundtruth. Our experiments on the TRECVID 2005 collection show a significant performance improvement against four standard methods.

Robin Aly, Aiden Doherty, Djoerd Hiemstra, Alan Smeaton
Ranking Fusion Methods Applied to On-Line Handwriting Information Retrieval

This paper presents an empirical study on the application of ranking fusion methods in the context of handwriting information retrieval. Several works in the electronic text-domain suggest that significant improvements in retrieval performance can be achieved by combining different approaches to IR. In the handwritten-domain, two quite different families of retrieval approaches are encountered. The first family is based on standard approaches carried out on texts obtained through handwriting recognition, therefore regarded as noisy texts, while the second one is recognition-free using word spotting algorithms. Given the large differences that exist between these two families of approaches (document and query representations, matching methods, etc.), we hypothesize that fusion methods applied to the handwritten-domain can also bring significant effectiveness improvements. Results show that for texts having a word error rate (

wer

) lower than 23%, the performances achieved with the combined system are close to the performances obtained with clean digital texts, i.e. without transcription errors. In addition, for poorly recognized texts (

wer

> 52%), improvements can also be obtained with standard fusion methods. Furthermore, we present a detailed analysis of the fusion performances, and show that existing indicators of expected improvements are not accurate in our context.

Sebastián Peña Saldarriaga, Emmanuel Morin, Christian Viard-Gaudin

Distributed IR and Performance Issues

Improving Query Correctness Using Centralized Probably Approximately Correct (PAC) Search

A non-deterministic architecture for information retrieval, known as probably approximately correct (PAC) search, has recently been proposed. However, for equivalent storage and computational resources, the performance of PAC is only 63% of a deterministic system. We propose a modification to the PAC architecture, introducing a centralized query coordination node. To respond to a query, random sampling of computers is replaced with pseudo-random sampling using the query as a seed. Then, for queries that occur frequently, this pseudo-random sample is iteratively refined so that performance improves with each iteration. A theoretical analysis is presented that provides an upper bound on the performance of any iterative algorithm. Two heuristic algorithms are then proposed to iteratively improve the performance of PAC search. Experiments on the TREC-8 dataset demonstrate that performance can improve from 67% to 96% in just 10 iterations, and continues to improve with each iteration. Thus, for queries that occur 10 or more times, the performance of a non-deterministic PAC architecture can closely match that of a deterministic system.

Ingemar Cox, Jianhan Zhu, Ruoxun Fu, Lars Kai Hansen
Learning to Distribute Queries into Web Search Nodes

Web search engines are composed of a large set of search nodes and a broker machine that feeds them with queries. A location cache keeps minimal information in the broker to register the search nodes capable of producing the top-

N

results for frequent queries. In this paper we show that it is possible to use the location cache as a training dataset for a standard machine learning algorithm and build a predictive model of the search nodes expected to produce the best approximated results for queries. This can be used to prevent the broker from sending queries to all search nodes under situations of sudden peaks in query traffic and, as a result, avoid search node saturation. This paper proposes a logistic regression model to quickly predict the most pertinent search nodes for a given query.

Marcelo Mendoza, Mauricio Marín, Flavio Ferrarotti, Barbara Poblete
Text Clustering for Peer-to-Peer Networks with Probabilistic Guarantees

Text clustering is an established technique for improving quality in information retrieval, for both centralized and distributed environments. However, for highly distributed environments, such as peer-to-peer networks, current clustering algorithms fail to scale. Our algorithm for peer-to-peer clustering achieves high scalability by using a probabilistic approach for assigning documents to clusters. It enables a peer to compare each of its documents only with very few selected clusters, without significant loss of clustering quality. The algorithm offers probabilistic guarantees for the correctness of each document assignment to a cluster. Extensive experimental evaluation with up to 100000 peers and 1 million documents demonstrates the scalability and effectiveness of the algorithm.

Odysseas Papapetrou, Wolf Siberski, Norbert Fuhr
XML Retrieval Using Pruned Element-Index Files

An element-index is a crucial mechanism for supporting content-only (CO) queries over XML collections. A full element-index that indexes each element along with the content of its descendants involves a high redundancy and reduces query processing efficiency. A direct index, on the other hand, only indexes the content that is directly under each element and disregards the descendants. This results in a smaller index, but possibly in return to some reduction in system effectiveness. In this paper, we propose using static index pruning techniques for obtaining more compact index files that can still result in comparable retrieval performance to that of a full index. We also compare the retrieval performance of these pruning based approaches to some other strategies that make use of a direct element-index. Our experiments conducted along with the lines of INEX evaluation framework reveal that pruned index files yield comparable to or even better retrieval performance than the full index and direct index, for several tasks in the ad hoc track.

Ismail Sengor Altingovde, Duygu Atilgan, Özgür Ulusoy

IR Theory and Formal Models

Category-Based Query Modeling for Entity Search

Users often search for entities instead of documents and in this setting are willing to provide extra input, in addition to a query, such as category information and example entities. We propose a general probabilistic framework for entity search to evaluate and provide insight in the many ways of using these types of input for query modeling. We focus on the use of category information and show the advantage of a category-based representation over a term-based representation, and also demonstrate the effectiveness of category-based expansion using example entities. Our best performing model shows very competitive performance on the INEX-XER entity ranking and list completion tasks.

Krisztian Balog, Marc Bron, Maarten de Rijke
Maximum Margin Ranking Algorithms for Information Retrieval

Machine learning ranking methods are increasingly applied to ranking tasks in information retrieval (IR). However ranking tasks in IR often differ from standard ranking tasks in machine learning, both in terms of problem structure and in terms of the evaluation criteria used to measure performance. Consequently, there has been much interest in recent years in developing ranking algorithms that directly optimize IR ranking measures. Here we propose a family of ranking algorithms that preserve the simplicity of standard pair-wise ranking methods in machine learning, yet show performance comparable to state-of-the-art IR ranking algorithms. Our algorithms optimize variations of the hinge loss used in support vector machines (SVMs); we discuss three variations, and in each case, give simple and efficient stochastic gradient algorithms to solve the resulting optimization problems. Two of these are stochastic gradient projection algorithms, one of which relies on a recent method for

l

1, ∞ 

-norm projections; the third is a stochastic exponentiated gradient algorithm. The algorithms are simple and efficient, have provable convergence properties, and in our preliminary experiments, show performance close to state-of-the-art algorithms that directly optimize IR ranking measures.

Shivani Agarwal, Michael Collins
Query Aspect Based Term Weighting Regularization in Information Retrieval

Traditional retrieval models assume that query terms are independent and rank documents primarily based on various term weighting strategies including TF-IDF and document length normalization. However, query terms are related, and groups of semantically related query terms may form query aspects. Intuitively, the relations among query terms could be utilized to identify hidden query aspects and promote the ranking of documents covering more query aspects. Despite its importance, the use of semantic relations among query terms for term weighting regularization has been under-explored in information retrieval. In this paper, we study the incorporation of query term relations into existing retrieval models and focus on addressing the challenge, i.e., how to regularize the weights of terms in different query aspects to improve retrieval performance. Specifically, we first develop a general strategy that can systematically integrate a term weighting regularization function into existing retrieval functions, and then propose two specific regularization functions based on the guidance provided by constraint analysis. Experiments on eight standard TREC data sets show that the proposed methods are effective to improve retrieval accuracy.

Wei Zheng, Hui Fang
Using the Quantum Probability Ranking Principle to Rank Interdependent Documents

A known limitation of the Probability Ranking Principle (PRP) is that it does not cater for dependence between documents. Recently, the Quantum Probability Ranking Principle (QPRP) has been proposed, which implicitly captures dependencies between documents through “quantum interference”. This paper explores whether this new ranking principle leads to improved performance for subtopic retrieval, where novelty and diversity is required. In a thorough empirical investigation, models based on the PRP, as well as other recently proposed ranking strategies for subtopic retrieval (i.e. Maximal Marginal Relevance (MMR) and Portfolio Theory(PT)), are compared against the QPRP. On the given task, it is shown that the QPRP outperforms these other ranking strategies. And unlike MMR and PT, one of the main advantages of the QPRP is that no parameter estimation/tuning is required; making the QPRP both simple and effective. This research demonstrates that the application of quantum theory to problems within information retrieval can lead to significant improvements.

Guido Zuccon, Leif Azzopardi
Wikipedia-Based Semantic Smoothing for the Language Modeling Approach to Information Retrieval

Semantic smoothing for the language modeling approach to information retrieval is significant and effective to improve retrieval performance. In previous methods such as the translation model, individual terms or phrases are used to do semantic mapping. These models are not very efficient when faced with ambiguous words and phrases because they are unable to incorporate contextual information. To overcome this limitation, we propose a novel Wikipedia-based semantic smoothing method that decomposes a document into a set of weighted Wikipedia concepts and then maps those unambiguous Wikipedia concepts into query terms. The mapping probabilities from each Wikipedia concept to individual terms are estimated through the EM algorithm. Document models based on Wikipedia concept mapping are then derived. The new smoothing method is evaluated on the TREC Ad Hoc Track (Disks 1, 2, and 3) collections. Experiments show significant improvements over the two-stage language model, as well as the language model with translation-based semantic smoothing.

Xinhui Tu, Tingting He, Long Chen, Jing Luo, Maoyuan Zhang

Personalization and Recommendation

A Performance Prediction Approach to Enhance Collaborative Filtering Performance

Performance prediction has gained increasing attention in the IR field since the half of the past decade and has become an established research topic in the field. The present work restates the problem in the area of Collaborative Filtering (CF), where it has barely been researched so far. We investigate the adaptation of clarity-based query performance predictors to predict neighbor performance in CF. A predictor is proposed and introduced in a kNN CF algorithm to produce a dynamic variant where neighbor ratings are weighted based on their predicted performance. The properties of the predictor are empirically studied by, first, checking the correlation of the predictor output with a proposed measure of neighbor performance. Then, the performance of the dynamic kNN variant is examined on different sparsity and neighborhood size conditions, where the variant consistently outperforms the baseline algorithm, with increasing difference on small neighborhoods.

Alejandro Bellogín, Pablo Castells
Collaborative Filtering: The Aim of Recommender Systems and the Significance of User Ratings

This paper investigates the significance of numeric user ratings in recommender systems by considering their inclusion / exclusion in both the generation and evaluation of recommendations. When standard evaluation metrics are used, experimental results show that inclusion of numeric rating values in the recommendation process does not enhance the results. However, evaluating the accuracy of a recommender algorithm requires identifying the aim of the system. Evaluation metrics such as

precision

and

recall

evaluate how well a system performs at recommending items that have been previously rated by the user. By contrast, a new metric, known as

Approval Rate

, is intended to evaluate how well a system performs at recommending items that would be rated highly by the user. Experimental results demonstrate that these two aims are not synonymous and that for an algorithm to attempt both obscures the investigation. The results also show that appropriate use of numeric rating valuesin the process of calculating user similarity can enhance the performance when

Approval Rate

is used.

Jennifer Redpath, David H. Glass, Sally McClean, Luke Chen
Goal-Driven Collaborative Filtering – A Directional Error Based Approach

Collaborative filtering is one of the most effective techniques for making personalized content recommendation. In the literature, a common experimental setup in the modeling phase is to minimize, either explicitly or implicitly, the (expected) error between the predicted ratings and the true user ratings, while in the evaluation phase, the resulting model is again assessed by that error. In this paper, we argue that defining an error function that is fixed across rating scales is however limited, and different applications may have different recommendation goals thus error functions. For example, in some cases, we might be more concerned about the highly predicted items than the ones with low ratings (precision minded), while in other cases, we want to make sure not to miss any highly rated items (recall minded). Additionally, some applications might require to produce a top-

N

recommendation list, where the rank-based performance measure becomes valid. To address this issue, we propose a flexible optimization framework that can adapt to individual recommendation goals. We introduce a

Directional Error Function

to capture the cost (risk) of each individual predictions, and it can be learned from the specified performance measures at hand. Our preliminary experiments on a real data set demonstrate that significant performance gains have been achieved.

Tamas Jambor, Jun Wang
Personalizing Web Search with Folksonomy-Based User and Document Profiles

Web search personalization aims to adapt search results to a user based on his tastes, interests and needs. The way in which such personal preferences are captured, modeled and exploited distinguishes the different personalization strategies. In this paper, we propose to represent a user profile in terms of social tags, manually provided by users in folksonomy systems to describe, categorize and organize items of interest, and investigate a number of novel techniques that exploit the users’ social tags to re-rank results obtained with a Web search engine. An evaluation conducted with a dataset from Delicious social bookmarking system shows that our personalization techniques clearly outperform state of the art approaches.

David Vallet, Iván Cantador, Joemon M. Jose
Tripartite Hidden Topic Models for Personalised Tag Suggestion

Social tagging systems provide methods for users to categorise resources using their own choice of keywords (or “tags”) without being bound to a restrictive set of predefined terms. Such systems typically provide simple tag recommendations to increase the number of tags assigned to resources. In this paper we extend the latent Dirichlet allocation topic model to include user data and use the estimated probability distributions in order to provide personalised tag suggestions to users. We describe the resulting tripartite topic model in detail and show how it can be utilised to make personalised tag suggestions. Then, using data from a large-scale, real life tagging system, test our system against several baseline methods. Our experiments show a statistically significant increase in performance of our model over all key metrics, indicating that the model could be successfully used to provide further social tagging tools such as resource suggestion and collaborative filtering.

Morgan Harvey, Mark Baillie, Ian Ruthven, Mark J. Carman

Domain-Specific IR and CLIR

Extracting Multilingual Topics from Unaligned Comparable Corpora

Topic models have been studied extensively in the context of monolingual corpora. Though there are some attempts to mine topical structure from cross-lingual corpora, they require clues about document alignments. In this paper we present a generative model called JointLDA which uses a bilingual dictionary to mine multilingual topics from an unaligned corpus. Experiments conducted on different data sets confirm our conjecture that jointly modeling the cross-lingual corpora offers several advantages compared to individual monolingual models. Since the JointLDA model merges related topics in different languages into a single multilingual topic: a) it can fit the data with relatively fewer topics. b) it has the ability to predict related words from a language different than that of the given document. In fact it has better predictive power compared to the bag-of-word based translation model leaving the possibility for JointLDA to be preferred over bag-of-word model for Cross-Lingual IR applications. We also found that the monolingual models learnt while optimizing the cross-lingual copora are more effective than the corresponding LDA models.

Jagadeesh Jagarlamudi, Hal Daumé III
Improving Retrievability of Patents in Prior-Art Search

Prior-art search is an important task in patent retrieval. The success of this task relies upon the selection of relevant search queries. Typically terms for prior-art queries are extracted from the claim fields of query patents. However, due to the complex technical structure of patents, and presence of terms mismatch and vague terms, selecting relevant terms for queries is a difficult task. During evaluating the patents retrievability coverage of prior-art queries generated from query patents, a large bias toward a subset of the collection is experienced. A large number of patents either have a very low retrievability score or can not be discovered via any query. To increase the retrievability of patents, in this paper we expand prior-art queries generated from query patents using query expansion with pseudo relevance feedback. Missing terms from query patents are discovered from feedback patents, and better patents for relevance feedback are identified using a novel approach for checking their similarity with query patents. We specifically focus on how to automatically select better terms from query patents based on their proximity distribution with prior-art queries that are used as features for computing similarity. Our results show, that the coverage of prior-art queries can be increased significantly by incorporating relevant queries terms using query expansion.

Shariq Bashir, Andreas Rauber
Mining OOV Translations from Mixed-Language Web Pages for Cross Language Information Retrieval

Translating Out-Of-Vocabulary (OOV) terms is crucial for Cross Language Information Retrieval (CLIR). In this paper, we propose a method that automatically acquires a large quantity of OOV translations from the web. Different from previous approaches that rely on a finite set of hand-crafted extraction rules, our method adaptively learns translation extraction patterns based on the observation that translation pairs on the same page tend to appear following similar layout patterns. The learned patterns are leveraged in a discriminative translation extraction model that treats translation extraction from a mixed language bilingual web page as a sequence labeling task in order to exploit useful relations among translation pairs on the page. Experiments demonstrate that our proposed method out-performs earlier work with marked improvement on OOV translation mining quality.

Lei Shi
On Foreign Name Search

We address foreign name search in a highly diverse user community. User sophistication ranges from highly experienced archivists to apprehensive users who shy away from technology; apprehensive users dominate system use. Thus, all system interfaces must assume minimal dependency on the user.

Our foreign names search approach, called S

egments

, is language independent; thus, there is no need to determine the language of origin from the diverse candidate set of thirteen languages. We compare S

egments

against traditional n-gram and Soundex based solutions. Actual and synthetic queries are used to search a names data set resident in the United States Holocaust Memorial Museum. We also search a subset of the 1990 United States Census Bureau Surnames data set to evaluate the performance of S

egments

on a predominately language specific (English) collection. Our results demonstrate statistically significant performance gains over both traditional approaches. The described approach supports search efforts at the United States Holocaust Memorial Museum.

Jason Soo, Ophir Frieder
Promoting Ranking Diversity for Biomedical Information Retrieval Using Wikipedia

In this paper, we propose a cost-based re-ranking method to promote ranking diversity for biomedical information retrieval. The proposed method concerns with finding passages that cover many different aspects of a query topic. First, aspects covered by retrieved passages are detected and explicitly presented by Wikipedia concepts. Then, an aspect filter based on a two-stage model is introduced. It ranks the detected aspects in decreasing order of the probability that an aspect is generated by the query. Finally, retrieved passages are re-ranked using the proposed cost-based re-ranking method which ranks a passage according to the number of new aspects covered by the passage and the query-relevance of aspects covered by the passage. A series of experiments conducted on the TREC 2006 and 2007 Genomics collections demonstrate the effectiveness of the proposed method in promoting ranking diversity for biomedical information retrieval.

Xiaoshi Yin, Xiangji Huang, Zhoujun Li
Temporal Shingling for Version Identification in Web Archives

Building and preserving archives of the evolving Web has been an important problem in research. Given the huge volume of content that is added or updated daily, identifying the right versions of pages to store in the archive is an important building block of any large-scale archival system. This paper presents temporal shingling, an extension of the well-established shingling technique for measuring how similar two snapshots of a page are. This novel method considers the lifespan of shingles to differentiate between important updates that should be archived and transient changes that may be ignored. Extensive experiments demonstrate the tradeoff between archive size and version coverage, and show that the novel method yields better archive coverage at smaller sizes than existing techniques.

Ralf Schenkel

User Issues

Biometric Response as a Source of Query Independent Scoring in Lifelog Retrieval

Personal lifelog archives contain digital records captured from an individual’s daily life, e.g. emails, web pages downloaded and SMSs sent or received. While capturing this information is becoming increasingly easy, subsequently locating relevant items in response to user queries from within these archives is a significant challenge. This paper presents a novel query independent static biometric scoring approach for re-ranking result lists retrieved from a lifelog using a BM25 model for content and content + context data. For this study we explored the utility of galvanic skin response (GSR) and skin temperature (ST) associated with past experience of items as a measure of potential future significance of items. Results obtained indicate that our static scoring techniques are useful in re-ranking retrieved result lists.

Liadh Kelly, Gareth J. F. Jones
Enabling Interactive Query Expansion through Eliciting the Potential Effect of Expansion Terms

Despite its potential to improve search effectiveness, previous research has shown that the uptake of interactive query expansion (IQE) is limited. In this paper, we investigate one method of increasing the uptake of IQE by displaying summary overviews that allow searchers to view the impact of their expansion decisions in real time, engage more with suggested terms, and support them in making good expansion decisions. Results from our user studies show that searchers use system-generated suggested terms more frequently if they know the impact of doing so on their results. We also present evidence that the usefulness of our proposed IQE approach is highest when searchers attempt unfamiliar or difficult information seeking tasks. Overall, our work presents strong evidence that searchers are more likely to engage with suggested terms if they are supported by the search interface.

Nuzhah Gooda Sahib, Anastasios Tombros, Ian Ruthven
Evaluation of an Adaptive Search Suggestion System

This paper describes an adaptive search suggestion system based on case–based reasoning techniques, and details an evaluation of its usefulness in helping users employ better search strategies. A user experiment with 24 participants was conducted using a between–subjects design. One group received search suggestions for the first two out of three tasks, while the other didn’t. Results indicate a correlation between search success, expressed as number of relevant documents saved, and use of suggestions. In addition, users who received suggestions used significantly more of the advanced tools and options of the search system — even after suggestions were switched off during a later task.

Sascha Kriewel, Norbert Fuhr
How Different Are Language Models andWord Clouds?

Word clouds are a summarised representation of a document’s text, similar to tag clouds which summarise the tags assigned to documents. Word clouds are similar to language models in the sense that they represent a document by its word distribution. In this paper we investigate the differences between word cloud and language modelling approaches, and specifically whether effective language modelling techniques also improve word clouds. We evaluate the quality of the language model using a system evaluation test bed, and evaluate the quality of the resulting word cloud with a user study. Our experiments show that different language modelling techniques can be applied to improve a standard word cloud that uses a TF weighting scheme in combination with stopword removal. Including bigrams in the word clouds and a parsimonious term weighting scheme are the most effective in both the system evaluation and the user study.

Rianne Kaptein, Djoerd Hiemstra, Jaap Kamps

Posters

Colouring the Dimensions of Relevance

In this article we introduce a visualisation technique for analysing relevance and interaction data. It allows the researcher to quickly detect emerging patterns in both interactions and relevance criteria usage. The concept of “relevance criteria profile”, which provides a global view of user behaviour in judging the relevance of the retrieved information, is developed. We discuss by example, using data from a live search user study, how these tools support the data analysis.

Ulises Cerviño Beresi, Yunhyong Kim, Mark Baillie, Ian Ruthven, Dawei Song
On Improving Pseudo-Relevance Feedback Using Pseudo-Irrelevant Documents

Pseudo-Relevance Feedback (PRF) assumes that the top-ranking

n

documents of the initial retrieval are relevant and extracts expansion terms from them. In this work, we introduce the notion of pseudo-irrelevant documents, i.e. high-scoring documents outside of top

n

that are highly unlikely to be relevant. We show how pseudo-irrelevant documents can be used to extract better expansion terms from the top-ranking

n

documents: good expansion terms are those which discriminate the top-ranking

n

documents from the pseudo-irrelevant documents. Our approach gives substantial improvements in retrieval performance over Model-based Feedback on several test collections.

Karthik Raman, Raghavendra Udupa, Pushpak Bhattacharya, Abhijit Bhole
Laplacian Co-hashing of Terms and Documents

A promising way to accelerate similarity search is

semantic hashing

which designs compact binary codes for a large number of documents so that semantically similar documents are mapped to similar codes within a short Hamming distance. In this paper, we introduce the novel problem of

co-hashing

where both documents and terms are hashed simultaneously according to their semantic similarities. Furthermore, we propose a novel algorithm Laplacian Co-Hashing (LCH) to solve this problem which directly optimises the Hamming distance.

Dell Zhang, Jun Wang, Deng Cai, Jinsong Lu
Query Difficulty Prediction for Contextual Image Retrieval

This paper explores how to predict query difficulty for contextual image retrieval. We reformulate the problem as the task of predicting how difficult to represent a query as images. We propose to use machine learning algorithms to learn the query difficulty prediction models based on the characteristics of the query words as well as the query context. More specifically, we focus on noun word/phrase queries and propose four features based on several assumptions. We created an evaluation data set by hand and compare several machine learning algorithms on the prediction task. Our preliminary experimental results show the effectiveness of our proposed features and the stable performance using different classification models.

Xing Xing, Yi Zhang, Mei Han
Estimating Translation Probabilities from the Web for Structured Queries on CLIR

We present two methods for estimating replacement probabilities without using parallel corpora. The first method proposed exploits the possible translation probabilities latent in Machine Readable Dictionaries (MRD). The second method is more robust, and exploits context similarity-based techniques in order to estimate word translation probabilities using the Internet as a bilingual comparable corpus. The experiments show a statistically significant improvement over non weighted structured queries in terms of MAP by using the replacement probabilities obtained with the proposed methods. The context similarity-based method is the one that yields the most significant improvement.

Xabier Saralegi, Maddalen Lopez de Lacalle
Using Weighted Tagging to Facilitate Enterprise Search

Motivated by the success of social tagging in web communities, this paper proposes a novel document tagging method more suitable for the enterprise environment, named weighted tagging. The method allows users to tag a document with weighted tags which are then used as an additional source for the query matching and relevance scoring to improve the search results. The method enables a user-driven search result ranking by adapting the relevance score of a search result through weighted tags based on user feedbacks. A prototype intranet search system has been built to demonstrate the viability of the method.

Shengwen Yang, Jianming Jin, Yuhong Xiong
An Empirical Study of Query Specificity

We analyse the statistical behavior of query-associated quantities in query-logs, namely, the sum and mean of IDF of query terms, otherwise known as

query specificity

and

query mean specificity

. We narrow down the possibilities for modeling their distributions to gamma, log-normal, or log-logistic, depending on query length and on whether the sum or the mean is considered. The results have applications in query performance prediction and artificial query generation.

Avi Arampatzis, Jaap Kamps
Semantically Enhanced Term Frequency

In this paper, we complement the term frequency, which is used in many bag-of-words based information retrieval models, with information about the semantic relatedness of query and document terms. Our experiments show that when employed in the standard probabilistic retrieval model BM25, the additional semantic information significantly outperforms the standard term frequency, and also improves the effectiveness when additional query expansion is applied. We further analyze the impact of different lexical semantic resources on the IR effectiveness.

Christof Müller, Iryna Gurevych
Crowdsourcing Assessments for XML Ranked Retrieval

Crowdsourcing has gained a lot of attention as a viable approach for conducting IR evaluations. This paper shows through a series of experiments on INEX data that crowdsourcing can be a good alternative for relevance assessment in the context of XML retrieval.

Omar Alonso, Ralf Schenkel, Martin Theobald
Evaluating Server Selection for Federated Search

Previous evaluations of server selection methods for federated search have either used metrics which are unconnected with user satisfaction, or have not been able to account for confounding factors due to other search components.

We propose a new framework for evaluating federated search server selection techniques. In our model, we isolate the effect of other confounding factors such as server summaries and result merging. Our results suggest that state-of-the-art server selection techniques are generally effective but result merging methods can be significantly improved. Furthermore, we show that the performance differences among server selection techniques can be obscured by ineffective merging.

Paul Thomas, Milad Shokouhi
A Comparison of Language Identification Approaches on Short, Query-Style Texts

In a multi-language Information Retrieval setting, the knowledge about the language of a user query is important for further processing. Hence, we compare the performance of some typical approaches for language detection on very short, query-style texts. The results show that already for single words an accuracy of more than 80% can be achieved, for slightly longer texts we even observed accuracy values close to 100%.

Thomas Gottron, Nedim Lipka
Filtering Documents with Subspaces

We propose an approach to build a subspace representation for documents. This more powerful representation is a first step towards the development of a quantum-based model for Information Retrieval (IR). To validate our methodology, we apply it to the adaptive document filtering task.

Benjamin Piwowarski, Ingo Frommholz, Yashar Moshfeghi, Mounia Lalmas, Keith van Rijsbergen
User’s Latent Interest-Based Collaborative Filtering

Memory-based collaborative filtering is one of the most popular methods used in recommendation systems. It predicts a user’s preference based on his or her similarity to other users. Traditionally, the Pearson correlation coefficient is often used to compute the similarity between users. In this paper we develop novel memory-based approach that incorporates user’s latent interest. The interest level of a user is first estimated from his/her ratings for items through a latent trait model, and then used for computing the similarity between users. Experimental results show that the proposed method outperforms the traditional memory-based one.

Biyun Hu, Zhoujun Li, Jun Wang
Evaluating the Potential of Explicit Phrases for Retrieval Quality

This paper evaluates the potential impact of explicit phrases on retrieval quality through a case study with the TREC Terabyte benchmark. It compares the performance of user- and system-identified phrases with a standard score and a proximity-aware score, and shows that an optimal choice of phrases, including term permutations, can significantly improve query performance.

Andreas Broschart, Klaus Berberich, Ralf Schenkel
Developing a Test Collection for the Evaluation of Integrated Search

The poster discusses the characteristics needed in an information retrieval (IR) test collection to facilitate the evaluation of

integrated search

, i.e. search across a range of different sources but with one search box and one ranked result list, and describes and analyses a new test collection constructed for this purpose. The test collection consists of approx. 18,000 monographic records, 160,000 papers and journal articles in PDF and 275,000 abstracts with a varied set of metadata and vocabularies from the physics domain, 65 topics based on real work tasks and corresponding graded relevance assessments. The test collection may be used for systems- as well as user-oriented evaluation.

Marianne Lykke, Birger Larsen, Haakon Lund, Peter Ingwersen
Retrieving Customary Web Language to Assist Writers

This paper introduces

Netspeak

, a Web service which assists writers in finding adequate expressions. To provide statistically relevant suggestions, the service indexes more than 1.8 billion

n

-grams,

n

 ≤ 5, along with their occurrence frequencies on the Web. If in doubt about a wording, a user can specify a query that has wildcards inserted at those positions where she feels uncertain.

Queries define patterns for which a ranked list of matching

n

-grams along with usage examples are retrieved. The ranking reflects the occurrence frequencies of the

n

-grams and informs about both absolute and relative usage. Given this choice of customary wordings, one can easily select the most appropriate. Especially second-language speakers can learn about style conventions and language usage.

To guarantee response times within milliseconds we have developed an index that considers occurrence probabilities, allowing for a biased sampling during retrieval. Our analysis shows that the extreme speedup obtained with this strategy (factor 68) comes without significant loss in retrieval quality.

Benno Stein, Martin Potthast, Martin Trenkmann
Enriching Peer-to-Peer File Descriptors Using Association Rules on Query Logs

We describe a P2P association rule mining descriptor enrichment approach that statistically significantly increases accuracy by greater than 15% over the non-enriched baseline. Unlike the state-of-the-art enrichment approach however, the proposed solution does not introduce additional network load.

Nazli Goharian, Ophir Frieder, Wai Gen Yee, Jay Mundrawala
Cross-Language High Similarity Search: Why No Sub-linear Time Bound Can Be Expected

This paper contributes to an important variant of cross-language information retrieval, called cross-language high similarity search. Given a collection

D

of documents and a query

q

in a language different from the language of

D

, the task is to retrieve highly similar documents with respect to

q

. Use cases for this task include cross-language plagiarism detection and translation search.

The current line of research in cross-language high similarity search resorts to the comparison of

q

and the documents in

D

in a multilingual concept space—which, however, requires a linear scan of

D

. Monolingual high similarity search can be tackled in sub-linear time, either by fingerprinting or by “brute force

n

-gram indexing”, as it is done by Web search engines. We argue that neither fingerprinting nor brute force

n

-gram indexing can be applied to tackle cross-language high similarity search, and that a linear scan is inevitable. Our findings are based on theoretical and empirical insights.

Maik Anderka, Benno Stein, Martin Potthast
Exploiting Result Consistency to Select Query Expansions for Spoken Content Retrieval

We propose a technique that predicts both if and how expansion should be applied to individual queries. The prediction is made on the basis of the topical consistency of the top results of the initial results lists returned by the unexpanded query and several query expansion alternatives. We use the coherence score, known to capture the tightness of topical clustering structure, and also propose two simplified coherence indicators. We test our technique in a spoken content retrieval task, with the intention of helping to control the effects of speech recognition errors. Experiments use 46 semantic-theme-based queries defined by VideoCLEF 2009 over the TRECVid 2007 and 2008 video data sets. Our indicators make the best choice roughly 50% of the time. However, since they predict the right query expansion in critical cases, overall MAP improves. The approach is computationally lightweight and requires no training data.

Stevan Rudinac, Martha Larson, Alan Hanjalic
Statistics of Online User-Generated Short Documents

User-generated short documents assume an important role in online communication due to the established utilization of social networks and real-time text messaging on the Internet. In this paper we compare the statistics of different online user-generated datasets and traditional TREC collections, investigating their similarities and differences. Our results support the applicability of traditional techniques also to user-generated short documents albeit with proper preprocessing.

Giacomo Inches, Mark J. Carman, Fabio Crestani
Mining Neighbors’ Topicality to Better Control Authority Flow

Web pages are often recognized by others through contexts. These contexts determine how linked pages influence and interact with each other. When differentiating such interactions, the authority of web pages can be better estimated by controlling the authority flows among pages. In this work, we determine the authority distribution by examining the topicality relationship between associated pages. In addition, we find it is not enough to quantify the influence of authority propagation from only one type of neighbor, such as parent pages in PageRank algorithm, since web pages, like people, are influenced by diverse types of neighbors within the same network. We propose a probabilistic method to model authority flows from different sources of neighbor pages. In this way, we distinguish page authority interaction by incorporating the topical context and the relationship between associated pages. Experiments on the 2003 and 2004 TREC Web Tracks demonstrate that this approach outperforms other competitive topical ranking models and produces a more than 10% improvement over PageRank on the quality of top 10 search results. When increasing the types of incorporated neighbor sources, the performance shows stable improvements.

Na Dai, Brian D. Davison, Yaoshuang Wang
Finding Wormholes with Flickr Geotags

We propose a kernel convolution method to predict similar locations (wormholes) based on human travel behaviour. A scaling parameter can be used to define a set of relevant users to the target location and we show how the geotags of these users can effectively be aggregated to predict a ranking of similar locations. We evaluate results on world and city level using several independent test collections.

Maarten Clements, Pavel Serdyukov, Arjen P. de Vries, Marcel J. T. Reinders
Enhancing N-Gram-Based Summary Evaluation Using Information Content and a Taxonomy

In this paper we propose a novel information-theoretic metric for automatic summary evaluation when model summaries are available as in the setting of the AESOP task of the Update Summarization track of the Text Analysis Conference (TAC). The metric is based on the concept of information content operationalized by using a taxonomy. Hereby, we present and discuss the results obtained at TAC 2009.

Mijail Kabadjov, Josef Steinberger, Ralf Steinberger, Massimo Poesio, Bruno Pouliquen

Demos

NEAT: News Exploration Along Time

There are a number of efforts towards building applications that leverage temporal information in documents. The demonstration of our NEAT (News Exploration Along Time) prototype system that we propose here, is an attempt towards building an intuitive and exploratory interface for search results over large news archives using timelines. The demonstration uses the New York Times Annotated Corpus as an illustrative example of such a news archive.

The NEAT system consists of two parts: the back-end server extracts and stores in an index all the temporal information from documents, and performs important phrase discovery from sentences that have time-sensitive information. The front-end user interface,

anchors

the results of a keyword search along the timeline where the user can explore and browse results at different points in time. To aid in this exploration, the interesting phrases discovered from the result documents are displayed on the timeline to provide an overview.

Another key feature of NEAT, which distinguishes it from other timeline-based approaches, is the adoption of semantic temporal annotations to anchor results on the timeline. An appropriate choice of personally-identifiable temporal annotations can enable users to more effectively contextualize results. For example, Barack Obama was elected in 2008 and Germany hosted the FIFA World Cup in 2006. We gathered temporal annotations at large-scale by

crowdsourcing

it over Amazon Mechanical Turk (AMT). Each HIT (Human Intelligence Task) on AMT consists of a request to expand a temporal expression (such as a year, a time-interval, or decade, etc.) with an entity (e.g., a person, country, organization etc.). Based on the agreement level among workers, we derive key entities for constructing a semantic temporal annotation layer on top the timeline. The outcome is a manually annotated timeline that can be very useful to anchor search results. Examples of annotations produced by crowdsourcing are (

1969: Woodstock, Moon landing

), (

1970: Nixon

), and (

2003-2009: Iraq war

) to name a few with different time granularities.

The demonstration consists of an exploratory search interface where we show how queries can produce different timelines and how one can use temporal information to discover interesting facts.

Omar Alonso, Klaus Berberich, Srikanta Bedathur, Gerhard Weikum
Opinion Summarization of Web Comments

All kinds of Web sites invite visitors to provide feedback on comment boards. Typically, submitted comments are published immediately on the same page, so that new visitors can get an idea of the opinions of previous visitors. Popular multimedia items, such as videos and images, frequently get up to thousands of comments, which is too much to be read in reasonable time. I.e., visitors read, if at all, only the newest comments and hence get an incomplete and possibly misleading picture of the overall opinion. To address this issue we introduce OPINIONCLOUD, a technology to summarize and visualize opinions that are expressed in the form of Web comments.

Martin Potthast, Steffen Becker
EUROGENE: Multilingual Retrieval and Machine Translation Applied to Human Genetics

The objective of Eurogene is to collect a critical mass of educational content in the field of human genetics in nine European languages and to build a platform that will support the retrieval, sharing and navigation over the learning content. The Eurogene platform is already operational and is being used by the genetics community. In this paper, a part of the Eurogene platform related to the retrieval and machine translation of domain specific content is described. Our contribution lies in an approach for domain-specific adaption of cross-language information retrieval (CLIR) and machine translation (MT). The CLIR system is based on a multilingual domain ontology which is also used as a synchronization component between CLIR and MT. The MT system is adapted to the target domain using the terminology represented in the ontology and using statistical training performed on a collection of parallel texts. In the statistical training phase, new translations of a term can be discovered and used for ontology updating. The paper is organized as follows. First, we describe the motivation for our approach and the multilingual domain ontology. Later, the CLIR and MT components and their domain adaption and synchronization are discussed.

Petr Knoth, Trevor Collins, Elsa Sklavounou, Zdenek Zdrahal
Netspeak—Assisting Writers in Choosing Words

Netspeak

is a Web service which helps writers in finding alternative expressions for what they want to say. It provides a large index of writing samples in the form of

n

-grams,

n

 ≤ 5, along with an efficient means to retrieve them by the use of wildcard queries. When in doubt about a phrasing, a user can get additional evidence by retrieving samples that match a given context. The figure below shows the results for a query where a user is interested in the two most frequently written words between “looks” and “me”. The first two columns give an idea about the customariness of each result, and the user can select the one most appropriate for her sentence.

Martin Potthast, Martin Trenkmann, Benno Stein
A Data Analysis and Modelling Framework for the Evaluation of Interactive Information Retrieval

Over the last two decades, Interactive Information Retrieval (IIR) has established a new direction within the long tradition of IR that introduces the user at its center and poses new challenges for system evaluation. IR systems can improve performance by utilizing information about the entire interactive process of search. This approach has so far only been initially explored [1,2] with much potential for the future. This demonstration describes an extensible data analysis and modelling framework that enables researchers to integrate, explore and analyze interactive experiment data obtained from task-based IIR experiments and build and test models of interactive user behavior.

Ralf Bierig, Michael Cole, Jacek Gwizdka, Nicholas J. Belkin
Backmatter
Metadaten
Titel
Advances in Information Retrieval
herausgegeben von
Cathal Gurrin
Yulan He
Gabriella Kazai
Udo Kruschwitz
Suzanne Little
Thomas Roelleke
Stefan Rüger
Keith van Rijsbergen
Copyright-Jahr
2010
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-12275-0
Print ISBN
978-3-642-12274-3
DOI
https://doi.org/10.1007/978-3-642-12275-0

Premium Partner