Skip to main content
Top

2009 | Book

Evaluating Systems for Multilingual and Multimodal Information Access

9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, Aarhus, Denmark, September 17-19, 2008, Revised Selected Papers

Editors: Carol Peters, Thomas Deselaers, Nicola Ferro, Julio Gonzalo, Gareth J. F. Jones, Mikko Kurimo, Thomas Mandl, Anselmo Peñas, Vivien Petras

Publisher: Springer Berlin Heidelberg

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

The ninth campaign of the Cross-Language Evaluation Forum (CLEF) for European languages was held from January to September 2008. There were seven main eval- tion tracks in CLEF 2008 plus two pilot tasks. The aim, as usual, was to test the p- formance of a wide range of multilingual information access (MLIA) systems or s- tem components. This year, 100 groups, mainly but not only from academia, parti- pated in the campaign. Most of the groups were from Europe but there was also a good contingent from North America and Asia plus a few participants from South America and Africa. Full details regarding the design of the tracks, the methodologies used for evaluation, and the results obtained by the participants can be found in the different sections of these proceedings. The results of the CLEF 2008 campaign were presented at a two-and-a-half day workshop held in Aarhus, Denmark, September 17–19, and attended by 150 resear- ers and system developers. The annual workshop, held in conjunction with the European Conference on Digital Libraries, plays an important role by providing the opportunity for all the groups that have participated in the evaluation campaign to get together comparing approaches and exchanging ideas. The schedule of the workshop was divided between plenary track overviews, and parallel, poster and breakout sessions presenting this year’s experiments and discu- ing ideas for the future. There were several invited talks.

Table of Contents

Frontmatter

What Happened in CLEF 2008

What Happened in CLEF 2008

The organization of the CLEF 2008 evaluation campaign is described and details are provided concerning the tracks, test collections, evaluation infrastructure, and participation. The main results are commented and future evolutions in the organization of CLEF are discussed.

Carol Peters

Part I: Multilingual Textual Document Retrieval (Ad Hoc)

CLEF 2008: Ad Hoc Track Overview

We describe the objectives and organization of the CLEF 2008 Ad Hoc track and discuss the main characteristics of the tasks offered to test monolingual and cross-language textual document retrieval systems. The track was changed considerably this year with the introduction of tasks with new document collections consisting of (i) library catalog records derived from The European Library, and (ii) and non-European language data, plus a task offering the chance to test retrieval with word sense disambiguated data. The track was thus structured in three distinct streams denominated: TEL@CLEF, Persian@CLEF and Robust WSD. The results obtained for each task are presented and statistical analyses are given.

Eneko Agirre, Giorgio Maria Di Nunzio, Nicola Ferro, Thomas Mandl, Carol Peters

TEL@CLEF

Logistic Regression for Metadata: Cheshire Takes on Adhoc-TEL

In this paper we will briefly describe the approaches taken by the Berkeley Cheshire Group for the Adhoc-TEL 2008 tasks (Mono and Bilingual retrieval). Since the Adhoc-TEL task is new for this year, we took the approach of using methods that have performed fairly well in other tasks. In particular, the approach this year used probabilistic text retrieval based on logistic regression and incorporating blind relevance feedback for all of the runs. All translation for bilingual tasks was performed using the LEC Power Translator PC-based MT system. This approach seems to be a fit good for the limited TEL records, since the overall results show Cheshire runs in the top five submitted runs for all languages and tasks except for Monolingual German.

Ray R. Larson
Query Expansion via Library Classification System

Managing the development and delivery of multilingual electronic library services is one of the major current challenges for making digital content in Europe more accessible, usable and exploitable. Digital libraries and OPAC-based traditional libraries are the most important source of reliable information used daily by scholars, researchers, knowledge workers and citizens to carry on their working (and leisure) activities. Facilitating access to multilingual document collections therefore is an important way of supporting the dissemination of knowledge and cultural content. CACAO offers an innovative approach for accessing, understanding and navigating multilingual textual content in digital libraries and OPACs, enabling European users to better exploit the available European electronic content. This paper describes the participation of the CACAO project consortium in the TEL@CLEF 2008 task and proposes a novel approach for exploiting library classification systems as a mean to drive query expansion.

Alessio Bosca, Luca Dini
Experiments on a Multinomial Language Model versus Lucene’s Off-the-Shelf Ranking Scheme and Rocchio Query Expansion (TEL@CLEF Monolingual Task)

We describe our participation in the TEL@CLEF task of the CLEF 2008 ad-hoc track, where we measured the retrieval performance of the IR service that is currently under development as part of the DIGMAP project. DIGMAP’s IR service is mostly based on Lucene, together with extensions for using query expansion and multinomial language modelling. In our runs, we experimented combinations of query expansion, Lucene’s off-the-shelf ranking scheme and the ranking scheme based on multinomial language modelling. Results show that query expansion and multinomial language modelling both result in increased performance.

Jorge Machado, Bruno Martins, José Borbinha
WikiTranslate: Query Translation for Cross-Lingual Information Retrieval Using Only Wikipedia

This paper presents WikiTranslate, a system which performs query translation for cross-lingual information retrieval (CLIR) using only Wikipedia to obtain translations. Queries are mapped to Wikipedia concepts and the corresponding translations of these concepts in the target language are used to create the final query. WikiTranslate is evaluated by searching with topics formulated in Dutch, French and Spanish in an English data collection. The system achieved a performance of 67% compared to the monolingual baseline.

Dong Nguyen, Arnold Overwijk, Claudia Hauff, Dolf R. B. Trieschnigg, Djoerd Hiemstra, Franciska de Jong
UFRGS@CLEF2008: Using Association Rules for Cross-Language Information Retrieval

For UFRGS’s participation on the TEL task at CLEF2008, our aim was to assess the validity of using algorithms for mining association rules to find mappings between concepts on a Cross-Language Information Retrieval scenario. Our approach requires a sample of parallel documents to serve as the basis for the generation of the association rules. The results of the experiments show that the performance of our approach is not statistically different from the monolingual baseline in terms of mean average precision. This is an indication that association rules can be effectively used to map concepts between languages. We have also tested a modification to BM25 that aims at increasing the weight of rare terms. The results show that this modified version achieved better performance. The improvements were considered to be statistically significant in terms of MAP on our monolingual runs.

André Pinto Geraldo, Viviane P. Moreira
CLEF 2008 Ad-Hoc Track: Comparing and Combining Different IR Approaches

This article describes post workshop experiments that were conducted after our first participation at the

TEL@CLEF task

. We used the

Xtrieval

framework [5], [4] for the preparation and execution of the experiments. We ran 69 experiments in the setting of the CLEF 2008 task, whereof 39 were monolingual and 30 were cross-lingual. We investigated the capabilities of the current version of Xtrieval, which could use the two retrieval cores Lucene and Lemur from now on. Our main goal was to compare and combine the results from those retrieval engines. The translation of the topics for the cross-lingual experiments was realized with a plug-in to access the Google AJAX language API. The performance of our monolingual experiments was better than the best experiments we submitted during the evaluation campaign. Our cross-lingual experiments performed very well for all target collections and achieved between 87% and 100% of the monolingual retrieval effectiveness. The combination of the results from the Lucene and the Lemur retrieval core showed very consistent performance.

Jens Kürsten, Thomas Wilhelm, Maximilian Eibl
Multi-language Models and Meta-dictionary Adaptation for Accessing Multilingual Digital Libraries

Accessing digital libraries raises the important issue of how to deal with the multilinguality of the documents. Inside a target collection, documents can be written in very different languages and the record associated to a particular document often contains field descriptors in different languages. This paper proposes a principled way to solve this issue, by proposing a multi-language model approach to information retrieval, as well as an extension of the dictionary adaptation mechanism to cover multiple languages (including the source language). In experiments related to the TEL task of the CLEF2008 Ad-hoc track, runs based on the assumption of a purely bilingual approach, translating the query only in the official language of the collection, appeared to result in performance (mean average precision) larger or equal to the ones of the other participants. But, contrarily to our initial intuition, in the case of the TEL task, the experiments showed that exploiting information in languages different from the official language of the collection turns out to offer no advantage.

Stephane Clinchant, Jean-Michel Renders

Persian@CLEF

Improving Persian Information Retrieval Systems Using Stemming and Part of Speech Tagging

With the emergence of vast resources of information, it is necessary to develop methods that retrieve the most relevant information according to needs. These retrieval methods may benefit from natural language constructs to boost their results by achieving higher precision and recall rates. In this study, we have used part of speech properties of terms as extra source of information about document and query terms and have evaluated the impact of such data on the performance of the Persian retrieval algorithms. Furthermore the effect of stemming has been experimented as a complement to this research. Our findings indicate that part of speech tags may have small influence on effectiveness of the retrieved results. However, when this information is combined with stemming it improves the accuracy of the outcomes considerably.

Reza Karimpour, Amineh Ghorbani, Azadeh Pishdad, Mitra Mohtarami, Abolfazl AleAhmad, Hadi Amiri, Farhad Oroumchian
Fusion of Retrieval Models at CLEF 2008 Ad Hoc Persian Track

Metasearch engines submit the user query to several underlying search engines and then merge their retrieved results to generate a single list that is more effective to the users’ information needs. According to the idea behind metasearch engines, it seems that merging the results retrieved from different retrieval models will improve the search coverage and precision. In this study, we have investigated the effect of fusion of different retrieval techniques on the performance of Persian retrieval. We use an extension of Ordered Weighted Average (OWA) operator called IOWA and a weighting schema, NOWA for merging the results. Our experimental results show that merging by OWA operators produces better MAP.

Zahra Aghazade, Nazanin Dehghani, Leili Farzinvash, Razieh Rahimi, Abolfazl AleAhmad, Hadi Amiri, Farhad Oroumchian
Cross Language Experiments at Persian@CLEF 2008

In this study we will discuss our cross language text retrieval experiments of Persian ad hoc track at CLEF 2008. Two teams from University of Tehran were involved in cross language text retrieval part of the track using two different CLIR approaches that are query translation and document translation. For query translation we use a method named Combinatorial Translation Probability (CTP) calculation for estimation of translation probabilities. In the document translation part, we use the Shiraz machine translation system for translation of documents into English. Then we create a Hybrid CLIR system by score-based merging of the two retrieval system results. In addition, we investigated N-grams and a light stemmer in our monolingual experiments.

Abolfazl AleAhmad, Ehsan Kamalloo, Arash Zareh, Masoud Rahgozar, Farhad Oroumchian

Robust-WSD

Evaluating Word Sense Disambiguation Tools for Information Retrieval Task

The main interest of this paper is the characterization of queries where WSD is a useful tool. That is, which issues must be fulfilled by a query in order to apply an state-of-art WSD tool? In addition, we have evaluated several approaches in order to apply WSD. We have used several types of indices. Thus, we have generated 13 indices and we have carried out 39 different experiments, obtaining that some indices based on WSD tools even outperforms slightly the non disambiguated baseline case. After the interpretation of our experiments, we think that only queries with terms very polysemous and very high IDF value are improved by using WSD.

Fernando Martínez-Santiago, José M. Perea-Ortega, Miguel A. García-Cumbreras
IXA at CLEF 2008 Robust-WSD Task: Using Word Sense Disambiguation for (Cross Lingual) Information Retrieval

This paper describes experiments for the CLEF 2008 Robust-WSD task, both for the monolingual (English) and the bilingual (Spanish to English) subtasks. We tried several query and document expansion and translation strategies, with and without the use of the word sense disambiguation results provided by the organizers. All expansions and translations were done using the English and Spanish wordnets as provided by the organizers and no other resource was used. We used Indri as the search engine, which we tuned in the training part. Our main goal was to improve (Cross Lingual) Information Retrieval results using WSD information, and we attained improvements in both mono and bilingual subtasks, with statistically significant differences on the second. Our best systems ranked 4th overall and 3rd overall in the monolingual and bilingual subtasks, respectively.

Eneko Agirre, Arantxa Otegi, German Rigau
SENSE: SEmantic N-levels Search Engine at CLEF2008 Ad Hoc Robust-WSD Track

This paper presents the results of the experiments conducted at the University of Bari for the Ad Hoc Robust-WSD track of the Cross-Language Evaluation Forum (CLEF) 2008. The evaluation was performed using SENSE (SEmantic N-levels Search Engine), a semantic search engine that tries to overcome the limitations of the ranked keyword approach by introducing

semantic levels

, which integrate (and not simply replace) the lexical level represented by keywords.

We show how SENSE is able to manage documents indexed at two separate levels, keyword and word meaning, in an attempt of improving the retrieval performance.

Two types of experiments have been performed by exploiting both only one indexing level and all indexing levels at the same time. The experiments performed combining keywords and word meanings, extracted from the

WordNet

lexical database, show the promise of the idea and point out the value of our institution.

In particular the results confirm our hypothesis: The combination of two indexing levels outperforms a single level. Indeed, an improvement of 35% in precision has been obtained by adopting the N-levels model with respect to the results obtained by exploiting the indexing level based only on keywords.

Annalina Caputo, Pierpaolo Basile, Giovanni Semeraro
IR-n in the CLEF Robust WSD Task 2008

In our approach to the Robust WSD task we have used a passage based system jointly with a WordNet and WSD based term expansion for the documents and queries. Furthermore, we have experimented with two well known relevance feedback methods - LCA and PRF -, in order to figure out which is more suitable to take profit of the WSD query expansion based on Wordnet. Our best run has obtained a 4th - 0.4008 MAP -. A major finding is that LCA fits better than PRF to this task due to it is able to take advantage of the expanded documents and queries.

Sergio Navarro, Fernando Llopis, Rafael Muñoz
Query Clauses and Term Independence

Much current research in IR, Web-Search and Semantic-Web technologies aims at enriching the user query to gain a richer, more semantic understanding of the information need. Almost in all cases this query enrichment step is approached independently of the ranking function; however, this may be far from optimal. In this paper we discuss the problem of term dependency in the context of query expansion and show its dangers in a number of empirical evaluations. Furthermore we propose a simple method (query clauses) that can be applied to several standard ranking functions to exploit a simple type of term dependency.

José R. Pérez-Agüera, Hugo Zaragoza
Analysis of Word Sense Disambiguation-Based Information Retrieval

Several studies have tried to improve retrieval performances based on automatic Word Sense Disambiguation techniques. So far, most attempts have failed. We try, through this paper, to give a deep analysis of the reasons behind these failures. During our participation at the Robust WSD task at CLEF 2008, we performed experiments on monolingual (English) and bilingual (Spanish to English) collections. Our official results and a deep analysis are described below, along with our conclusions and perspectives.

Jacques Guyot, Gilles Falquet, Saïd Radhouani, Karim Benzineb
Crosslanguage Retrieval Based on Wikipedia Statistics

In this paper we present the methodology, implementations and evaluation results of the crosslanguage retrieval system we have developed for the Robust WSD Task at CLEF 2008. Our system is based on query preprocessing for translation and homogenisation of queries. The presented preprocessing of queries includes two stages: Firstly, a query translation step based on term statistics of cooccuring articles in Wikipedia. Secondly, different disjunct query composition techniques to search in the CLEF corpus. We apply the same preprocessing steps for the monolingual as well as the crosslingual task and thereby acting fair and in a similar way across these tasks. The evaluation revealed that the similar processing comes at nearly no costs for monolingual retrieval but enables us to do crosslanguage retrieval and also a feasible comparison of our system performance on these two tasks.

Andreas Juffinger, Roman Kern, Michael Granitzer

Ad Hoc Mixed: TEL and Persian

Sampling Precision to Depth 10000 at CLEF 2008

We conducted an experiment to test the completeness of the relevance judgments for the monolingual German, French, English and Persian (Farsi) information retrieval tasks of the Ad Hoc Track of the Cross-Language Evaluation Forum (CLEF) 2008. In the ad hoc retrieval tasks, the system was given 50 natural language queries, and the goal was to find all of the relevant documents (with high precision) in a particular document set. For each language, we submitted a sample of the first 10000 retrieved items to investigate the frequency of relevant items at deeper ranks than the official judging depth (of 60). The results suggest that, on average, the percentage of relevant items assessed was less than 55% for German, French and English and less than 25% for Persian.

Stephen Tomlinson
JHU Ad Hoc Experiments at CLEF 2008

For CLEF 2008 JHU conducted monolingual and bilingual experiments in the ad hoc TEL and Persian tasks. Additionally we performed several post hoc experiments using previous CLEF ad hoc tests sets in 13 languages.

In all three tasks we explored alternative methods of tokenizing documents including plain words, stemmed words, automatically induced segments, a single selected n-gram from each word, and all n-grams from words (

i.e.,

traditional character n-grams). Character n-grams demonstrated consistent gains over ordinary words in each of these three diverse sets of experiments. Using mean average precision, relative gains of of 50-200% on the TEL task, 5% on the Persian task, and 18% averaged over 13 languages from past CLEF evaluations, were observed.

Paul McNamee
UniNE at CLEF 2008: TEL, and Persian IR

In our participation in this evaluation campaign, our first objective was to analyze retrieval effectiveness when using The European Library (TEL) corpora composed of very short descriptions (library catalog records) and also to evaluate the retrieval effectiveness of several IR models. As a second objective we wanted to design and evaluate a stopword list and a light stemming strategy for the Persian (Farsi), a member of the Indo-European family of languages and whose morphology is more complex than of the English language.

Ljiljana Dolamic, Claire Fautsch, Jacques Savoy

Part II: Mono- and Cross-Language Scientific Data Retrieval (Domain-Specific)

The Domain-Specific Track at CLEF 2008

The domain-specific track evaluates retrieval models for structured scientific bibliographic collections in English, German and Russian. Documents contain textual elements (title, abstracts) as well as subject keywords from controlled vocabularies, which can be used in query expansion and bilingual translation. Mappings between the different controlled vocabularies are provided. In 2008, new Russian language resources were provided, among them Russian-English and Russian-German terminology lists as well as a mapping table between the Russian and German controlled vocabularies. Six participants experimented with different retrieval systems and query expansion schemes. Compared to previous years, the queries were more discriminating, so that fewer relevant documents were found per query. The year 2008 marked the last year of the domain-specific track, a special issue of important experiments and results is planned.

Categories and Subject Descriptors

H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Information Search and Retrieval.

General Terms

Measurement, Performance, Experimentation.

Vivien Petras, Stefan Baerisch
UniNE at Domain-Specific IR - CLEF 2008

Our first objective in participating in this domain-specific evaluation campaign is to propose and evaluate various indexing and search strategies for the German, English and Russian languages, and thus obtain retrieval effectiveness superior to that of language-independent approaches (

n

-gram). To do so we evaluated the GIRT-4 test-collection using the Okapi model, various IR models based on the Divergence from Randomness (DFR) paradigm, the statistical language model (LM) together with the classical

tf

·

idf

vector-processing scheme.

Claire Fautsch, Ljiljana Dolamic, Jacques Savoy
Back to Basics – Again – for Domain-Specific Retrieval

In this paper we will describe Berkeley’s approach to the Domain-Specific (DS) track for CLEF 2008. Last year we used

Entry Vocabulary Indexes

and Thesaurus expansion approaches for DS, but found in later testing that some simple text retrieval approaches had better results than these more complex query expansion approaches. This year we decided to revisit our basic text retrieval approaches and see how they would stack up against the various expansion approaches used by other groups. The results are now in and the answer is clear, they perform pretty badly compared to other groups’ approaches.

All of the runs submitted were performed using the Cheshire II system. This year the Berkeley/Cheshire group submitted a total of twenty-four runs, including two for each subtask of the DS track. These include six Monolingual runs for English, German, and Russian, twelve Bilingual runs (four X2EN, four X2DE, and four X2RU), and six Multilingual runs (two EN, two DE, and two RU). The overall results include Cheshire runs in the top five participants for each task, but usually as the lowest of the five (and often fewer) groups.

Ray R. Larson
Concept Models for Domain-Specific Search

We describe our participation in the 2008 CLEF Domain-specific track. We evaluate blind relevance feedback models and concept models on the CLEF domain-specific test collection. Applying relevance modeling techniques is found to have a positive effect on the 2008 topic set, in terms of mean average precision and precision@10. Applying concept models for blind relevance feedback, results in even bigger improvements over a query-likelihood baseline, in terms of mean average precision and early precision.

Edgar Meij, Maarten de Rijke
The Xtrieval Framework at CLEF 2008: Domain-Specific Track

This article describes our participation at the

Domain-Specific track

. We used the

Xtrieval

framework for the preparation and execution of the experiments. The translation of the topics for the cross-lingual experiments was realized with a plug-in to access the Google AJAX language API. This year, we submitted 20 experiments in total. In all our experiments we applied a standard top-k pseudo-relevance feedback algorithm. We used merged monolingual runs as baseline for comparison to all our cross-lingual experiments. Translating the topics for the bilingual experiments decreased the retrieval effectiveness only between 8 and 15 percent.

Jens Kürsten, Thomas Wilhelm, Maximilian Eibl
Using Wikipedia and Wiktionary in Domain-Specific Information Retrieval

The main objective of our experiments in the domain-specific track at CLEF 2008 is utilizing semantic knowledge from collaborative knowledge bases such as Wikipedia and Wiktionary to improve the effectiveness of information retrieval. While Wikipedia has already been used in IR, the application of Wiktionary in this task is new. We evaluate two retrieval models, i.e. SR-Text and SR-Word, based on semantic relatedness by comparing their performance to a statistical model as implemented by Lucene. We refer to Wikipedia article titles and Wiktionary word entries as concepts and map query and document terms to concept vectors which are then used to compute the document relevance. In the bilingual task, we translate the English topics into the document language, i.e. German, by using machine translation. For SR-Text, we alternatively perform the translation process by using cross-language links in Wikipedia, whereby the terms are directly mapped to concept vectors in the target language. The evaluation shows that the latter approach especially improves the retrieval performance in cases where the machine translation system incorrectly translates query terms.

Christof Müller, Iryna Gurevych

Part III: Interactive Cross-Language Retrieval (iCLEF)

Overview of iCLEF 2008: Search Log Analysis for Multilingual Image Retrieval

This paper summarises activities from the iCLEF 2008 task. In an attempt to encourage greater participation in user-orientated experiments, a new task was organised based on users participating in an interactive cross-language image search experiment. Organizers provided a default multilingual search system which accessed images from Flickr, with the whole iCLEF experiment run as an online game. Interaction by users with the system was recorded in log files which were shared with participants for further analyses, and provide a future resource for studying various effects on user-orientated cross-language search. In total six groups participated in iCLEF, providing a combined effort in generating results for a shared experiment on user-orientated cross-language retrieval.

Julio Gonzalo, Paul Clough, Jussi Karlgren
Log Analysis of Multilingual Image Searches in Flickr

In this paper, we summarize our analysis over the logs of multilingual image searches in Flickr provided to iCLEF 2008 participants. We have studied: a) correlations between the language skills of searchers in the target language and other session parameters, such as success (was the image found?), number of query refinements, etc.; b) usage of specific cross-language search facilities; and c) users perceptions on the task (questionnaire analysis).

We have studied 4,302 complete search sessions (searcher/target image pairs) from the logs provided by the organization. Our analysis shows that when users have active competence in the target language, their success rate is 18% higher than if they do not know the language at all. If the user has passive competence of the language (i.e. can partially understand texts but cannot make queries), the success rate equals those with active competence, but at the expense of executing more interactions with the system.

Finally, the usage of specific cross-language facilities (such as refining translations offered by the system) is low, but significantly higher than standard relevance feedback facilities, and is perceived as useful by searchers.

Víctor Peinado, Julio Gonzalo, Javier Artiles, Fernando López-Ostenero
Cross-Lingual Image Retrieval Interactions Based on a Game Competition

This is the first year of participation of the University of Padua in the interactive CLEF track. A group of students of Linguistics at the Faculty of Humanities were asked to participate in the experiment. The interaction of the user with a cross-lingual system, the solutions they find for a given task, and the tools that a system should provide in order to assist the user in the task are studied by means of questionnaire analysis together with some log analysis. Interesting insights and results emerged and can be summarized with the following points: the hardest obstacle in finding the given image are the size of the set of images retrieved, the difficulty in describing the image, and finding suitable keywords in one or more languages.

Giorgio Maria Di Nunzio
A Study of Users’ Image Seeking Behaviour in FlickLing

This study aims to explore users’ image seeking behaviour when searching for a known, non-annotated image in the FlickLing search interface provided by iCLEF2008 track. The main focus of our study was threefold: a) to identify the reasons that determined users’ choice of a specific interface mode, b) to examine whether users were thinking about languages when searching for images and to what extent and c) to examine if used, how helpful the translations proved to be for finding the images. This study used questionnaires, retrospective thinking aloud, observation and interviews to meet its research questions.

Evgenia Vassilakaki, Frances Johnson, Richard J. Hartley, David Randall
SICS at iCLEF 2008: User Confidence and Satisfaction Tentatively Inferred from iCLEF Logs

This paper gives a brief description of some initial experiments performed at SICS using the interactive image search query logs provided for participants in the interactive track of CLEF. The SICS experiments attempt to establish whether user confidence and trust in results can be related to logged behaviour.

Jussi Karlgren

Part IV: Multiple Language Question Answering (QA@CLEF)

Overview of the Clef 2008 Multilingual Question Answering Track

The QA campaign at CLEF 2008 [1], was mainly the same as that proposed last year. The results and the analyses reported by last year’s participants suggested that the changes introduced in the previous campaign had led to a drop in systems’ performance. So for this year’s competition it has been decided to practically replicate last year’s exercise. Following last year’s experience some QA pairs were grouped in clusters. Every cluster was characterized by a topic (not given to participants). The questions from a cluster contained co-references between one of them and the others. Moreover, as last year, the systems were given the possibility to search for answers in Wikipedia as document corpus beside the usual newswire collection. In addition to the main task, three additional exercises were offered, namely the Answer Validation Exercise (AVE), the Question Answering on Speech Transcriptions (QAST), which continued last year’s successful pilots, together with the new Word Sense Disambiguation for Question Answering (QA-WSD). As general remark, it must be said that the main task still proved to be very challenging for participating systems. As a kind of shallow comparison with last year’s results the best overall accuracy dropped significantly from 42% to 19% in the multi-lingual subtasks, but increased a little in the monolingual sub-tasks, going from 54% to 63%.

Pamela Forner, Anselmo Peñas, Eneko Agirre, Iñaki Alegria, Corina Forăscu, Nicolas Moreau, Petya Osenova, Prokopis Prokopidis, Paulo Rocha, Bogdan Sacaleanu, Richard Sutcliffe, Erik Tjong Kim Sang
Overview of the Answer Validation Exercise 2008

The Answer Validation Exercise at the Cross Language Evaluation Forum (CLEF) is aimed at developing systems able to decide whether the answer of a Question Answering (QA) system is correct or not. We present here the exercise description, the changes in the evaluation with respect to the last edition and the results of this third edition (AVE 2008). Last year’s changes allowed us to measure the possible gain in performance obtained by using AV systems as the selection method of QA systems. Then, in this edition we wanted to reward AV systems able to detect also if all the candidate answers to a question are incorrect. 9 groups have participated with 24 runs in 5 different languages, and compared with the QA systems, the results show an evidence of the potential gain that more sophisticated AV modules might introduce in the task of QA.

Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo
Overview of QAST 2008

This paper describes the experience of QAST 2008, the second time a pilot track of CLEF has been held aiming to evaluate the task of Question Answering in Speech Transcripts. Five sites submitted results for at least one of the five scenarios (lectures in English, meetings in English, broadcast news in French and European Parliament debates in English and Spanish). In order to assess the impact of potential errors of automatic speech recognition, for each task contrastive conditions are with manual and automatically produced transcripts. The QAST 2008 evaluation framework is described, along with descriptions of the five scenarios and their associated data, the system submissions for this pilot track and the official evaluation results.

Jordi Turmo, Pere R. Comas, Sophie Rosset, Lori Lamel, Nicolas Moreau, Djamel Mostefa

Mono and Bilingual QA

Assessing the Impact of Thesaurus-Based Expansion Techniques in QA-Centric IR

We study the impact of using thesaurus-based query expansion methods at the Information Retrieval (IR) stage of a Question Answering (QA) system. We focus on expanding queries for questions regarding

actions

and

events

, where verbs have a central role. Two different thesaurus are used: the OpenOffice thesaurus and an automatically generated verb thesaurus. The performance of thesaurus-based methods is compared against what is obtained by (i) executing

no expansion

and (ii) applying a simple query generalization method. Results show that thesaurus-based approaches help improving

recall

at retrieval, while keeping satisfactory

precision

. However, we confirm that positive impact for the final QA performance is mostly achieved due to increase in

recall

, which can also be obtained by using simpler methods. Nevertheless, because of its better relative precision thesaurus-based expansion is effective in selectively reducing the number of

irrelevant

text passages retrieved, thus reducing computational load in the answer extraction stage.

Luís Sarmento, Jorge Teixeira, Eugénio Oliveira
Using AliQAn in Monolingual QA@CLEF 2008

This paper describes the participation of the system AliQAn in the CLEF 2008 Spanish monolingual QA task. This time, the main goals of the current version of AliQAn were to deal with topic-related questions and to decrease the number of inexact answers. We have also explored the use of the Wikipedia corpora, which have posed some new challenges for the QA task.

Sandra Roger, Katia Vila, Antonio Ferrández, María Pardiño, José Manuel Gómez, Marcel Puchol-Blasco, Jesús Peral
Priberam’s Question Answering System in QA@CLEF 2008

This paper describes the changes implemented in Priberam’s question answering (QA) system, followed by the discussion of the results obtained in Portuguese and Spanish monolingual runs at QA@CLEF 2008. We enhanced the syntactic analysis of the question and improved the indexing process by using question categories at the sentence retrieval level. The fine-tuning of the syntactic analysis allowed the system to more precisely match the pivots of the question with their counterparts in the answer. As a result, in QA@CLEF 2008, Priberam’s system achieved a considerable overall accuracy increase in the Portuguese run.

Carlos Amaral, Adán Cassan, Helena Figueira, André Martins, Afonso Mendes, Pedro Mendes, José Pina, Cláudia Pinto
IdSay: Question Answering for Portuguese

IdSay is an open domain Question Answering (QA) system for Portuguese. Its current version can be considered a baseline version, using mainly techniques from the area of Information Retrieval (IR). The only external information it uses besides the text collections is lexical information for Portuguese. It was submitted to the monolingual Portuguese task of the QA track of the Cross-Language Evaluation Forum 2008 (QA@CLEF) for the first time, and it answered correctly to 65 of the 200 questions in the first answer, and to 85 answers considering the three answers that could be returned per question. Generally, the types of questions that are answered better by IdSay system are measure factoids, count factoids and definitions, but there is still work to be done in these areas, as well as in the treatment of time. List questions, location and people/organization factoids are the types of question with more room for improvement.

Gracinda Carvalho, David Martins de Matos, Vitor Rocio
Dublin City University at QA@CLEF 2008

We describe our participation in Multilingual Question Answering at CLEF 2008 using German and English as our source and target languages, respectively. The system was built using UIMA (Unstructured Information Management Architechture) as underlying framework.

Sisay Fissaha Adafre, Josef van Genabith
Using Answer Retrieval Patterns to Answer Portuguese Questions

Esfinge is a general domain Portuguese question answering system which has been participating at QA@CLEF since 2004. It uses the information available in the “official” document collections used in QA@CLEF (newspaper text and Wikipedia) and information from the Web as an additional resource when searching for answers. Where it regards the use of external tools, Esfinge uses a syntactic analyzer, a morphological analyzer and a named entity recognizer. This year an alternative approach to retrieve answers was tested: whereas in previous years, search patterns were used to retrieve relevant documents, this year a new type of search patterns was also used to extract the answers themselves. We also evaluated the second and third best answers returned by Esfinge. This evaluation showed that when Esfinge answers correctly a question, it does so usually with its first answer. Furthermore, the experiments revealed that the answer retrieval patterns created for this participation improve the results, but only for definition questions.

Luís Fernando Costa
Ihardetsi: A Basque Question Answering System at QA@CLEF 2008

This paper describes

Ihardetsi

, a question answering system for Basque. We present the results of our first participation in the QA@CLEF 2008 evaluation task. We participated in three subtasks using Basque, English and Spanish as source languages, and Basque as target language. We approached the Spanish-Basque and English-Basque cross-lingual tasks with a machine translation system that first processes a question in the source language (i.e. Spanish, English), then translates it into the target language (i.e. Basque) and, finally, sends the obtained Basque question as input to the monolingual module.

Olatz Ansa, Xabier Arregi, Arantxa Otegi, Ander Soraluze
Question Interpretation in QA@L2F

The Question Interpretation module of QA@L

2

F, the ques-tion-answering system from L

2

F/INESC-ID, is thoroughly described in this paper, as well as the frame formalism it employs. Moreover, the anaphora resolution process introduced this year, based on frames manipulation, is detailed.

The overall results QA@L

2

F achieved at the CLEF competition and a brief overview on the system’s evolution throughout the 2 years of joint evaluation are presented. The results of an evaluation to the QI module alone are also detailed here.

Luísa Coheur, Ana Mendes, João Guimarães, Nuno J. Mamede, Ricardo Ribeiro
UAIC Participation at QA@CLEF2008

2008 marked UAIC’s third consecutive participation at the QA@CLEF competition, with continually improving results. The most significant change to our system with regards to the previous edition is the partial transition to a real-time QA system, as a consequence of the simplification or elimination of the main time-consuming tasks such as linguistic pre-processing. A brief description of our system and an analysis of the errors introduced by each module are given in this paper.

Adrian Iftene, Diana Trandabăţ, Ionuţ Pistol, Alex-Mihai Moruz, Maria Husarciuc, Dan Cristea
RACAI’s QA System at the Romanian–Romanian QA@CLEF2008 Main Task

This paper describes the participation of the Research Institute for Artificial Intelligence of the Romanian Academy (RACAI) to the Multiple Language Question Answering Main Task at the CLEF 2008 competition. We present our Question Answering system answering Romanian questions from Romanian Wikipedia documents focusing on the implementation details. The presentation will also emphasize the fact that question analysis, snippet selection and ranking provide a useful basis of any answer extraction mechanism.

Radu Ion, Dan Ştefănescu, Alexandru Ceauşu, Dan Tufiş
Combining Logic and Machine Learning for Answering Questions

LogAnswer is a logic-oriented question answering system developed by the AI research group at the University of Koblenz-Landau and by the IICS at the University of Hagen. The system addresses two notorious problems of the logic-based approach: Achieving robustness and acceptable response times. Its main innovation is the use of logic for simultaneously extracting answer bindings and validating the corresponding answers. In this way the inefficiency of the classical answer extraction/answer validation pipeline is avoided. The prototype of the system, which can be tested on the web, demonstrates response times suitable for real-time querying. Robustness to gaps in the background knowledge and errors of linguistic analysis is achieved by combining the optimized deductive subsystem with shallow techniques by machine learning.

Ingo Glöckner, Björn Pelzer
The MIRACLE Team at the CLEF 2008 Multilingual Question Answering Track

The MIRACLE team participated in the monolingual Spanish and cross-language French to Spanish subtasks at QA@CLEF 2008. For the Spanish subtask, we used an almost completely rebuilt version of our system, designed with the aim of flexibly combining information sources and linguistic annotators for different languages. To allow easy development for new languages, most of the modules do not make any language dependent assumptions. The language dependent knowledge is encapsulated in a rule language developed within the MIRACLE team. By the time of submitting the runs, work on the new version was still ongoing, so we consider the results as a partial test of the possibilities of the new architecture. Subsystems for other languages were not yet available, so we tried a very simple approach for the French to Spanish subtask: questions were translated to Spanish with Babylon, and the output of this translation was fed into our system. The results had an accuracy of 16% for the monolingual Spanish task and 5% for the cross-language task.

Ángel Martínez-González, César de Pablo-Sánchez, Concepción Polo-Bayo, María Teresa Vicente-Díez, Paloma Martínez-Fernández, José Luís Martínez-Fernández
Efficient Question Answering with Question Decomposition and Multiple Answer Streams

The German question answering (QA) system IRSAW (formerly: InSicht) participated in QA@CLEF for the fifth time. IRSAW was introduced in 2007 by integrating the deep answer producer InSicht, several shallow answer producers, and a logical validator. InSicht builds on a deep QA approach: it transforms documents to semantic representations using a parser, draws inferences on semantic representations with rules, and matches semantic representations derived from questions and documents. InSicht was improved for QA@CLEF 2008 mainly in the following two areas. The coreference resolver was trained on question series instead of newspaper texts in order to be better applicable for follow-up questions. Questions are decomposed by several methods on the level of semantic representations. On the shallow processing side, the number of answer producers was increased from two to four by adding FACT, a fact index, and SHASE, a shallow semantic network matcher. The answer validator introduced in 2007 was replaced by the faster RAVE validator designed for logic-based answer validation under time constraints. Using RAVE for merging the results of the answer producers, monolingual German runs and bilingual runs with source language English and Spanish were produced by applying the machine translation web service Promt. An error analysis shows the main problems for the precision-oriented deep answer producer InSicht and the potential offered by the recall-oriented shallow answer producers.

Sven Hartrumpf, Ingo Glöckner, Johannes Leveling
DFKI-LT at QA@CLEF 2008

The paper describes QUANTICO, a cross-language open domain factoid question answering system for German and English document collections. The main features of the system are: use of preemptive off-line document annotation with information like Named Entities, abbreviation-extension pairs and appositional constructions; use of online translation services for the cross-language scenarios; use of redundancy as an indicator of good answer candidates; selection of the best answers based on distance metrics defined over graph representations. The results of evaluating the system’s performance by QA@CLEF 2008 were as follows: for the German-German run we achieved a best overall accuracy (ACC) of 37%; for the English-German run 14.5% (ACC); and for the German-English run 14% (ACC).

Bogdan Sacaleanu, Günter Neumann, Christian Spurk
Integrating Logic Forms and Anaphora Resolution in the AliQAn System

This paper deals with the AliQAn QA system in the multilingual (English - Spanish) task. It highlights the translation module of the QA system by applying two methods: the first one based on logic forms, and the other on machine translation techniques. Moreover, the system is able to solve the anaphora resolution problem by applying linguistic techniques. According to the results, machine translation techniques are a bit better than techniques based on logic forms in the performance of the question translation.

Rafael Muñoz-Terol, Marcel Puchol-Blasco, María Pardiño, José Manuel Gómez, Sandra Roger, Katia Vila, Antonio Ferrández, Jesús Peral, Patricio Martínez-Barco
Some Experiments in Question Answering with a Disambiguated Document Collection

This paper describes our approach to the Question Answering - Word Sense Disambiguation task. This task consists in carrying out Question Answering over a disambiguated document collection. In our approach, disambiguated documents are used to improve the accuracy of the retrieval phase. In order to do this, we added a WordNet-expanded index to the document collection. The expanded index contains synonyms, hypernyms and holonyms of the words already in the documents. Question words are searched for in both the expanded WordNet index and the default index. The obtained results show that the system that exploited disambiguation obtained better precision than the non-WSD one.

Davide Buscaldi, Paolo Rosso

Answer Validation Exercise (AVE)

Answer Validation on English and Romanian Languages

The present article describes the system built for the participation in the AVE 2008 track, stressing upon the new features added to the approach we had in the AVE 2007. The current version, while also based on the Textual Entailment system we built for the RTE-3 competition, adds and combines specific techniques used by Question Answering systems to improve answer classification. We outline the performance of this approach, presenting the high results obtained for both English and Romanian. Finally, we perform a critical analysis of the detected errors and propose the lines for future work.

Adrian Iftene, Alexandra Balahur
The Answer Validation System ProdicosAV Dedicated to French

In this paper, we present the ProdicosAV answer validation system which was developed by the NLP team from the LINA institute. ProdicosAV system is based on the Prodicos System which participated two years ago in the Question Answering CLEF evaluation campaign for French. We firstly present the modifications made on Prodicos to improve it and to adapt it to a new kind of exercise. We present in details the ranking passage module and the temporal validator module. Secondly, the answer-validation module dedicated to the AVE task is presented. Finally, the evaluation is put forward to justify the results obtained.

Christine Jacquin, Laura Monceaux, Emmanuel Desmontils
Studying the Influence of Semantic Constraints in AVE

This paper discusses the participation of the University of Alicante in the Answer Validation Exercise (AVE) track. First, the proposed system uses a set of regular expressions in order to join the question and the answer into a declarative sentence, and afterwards applies several lexical-semantic inferences to attempt to detect whether the meaning of this sentence can be inferred by the meaning of the supporting text. Throughout the paper, we describe a basic system configuration and how it is enriched by the addition of semantic constraints. Moreover, we want to apply special emphasis to the language-independent capabilities of some system components. As a result, we were able to apply our techniques over both Spanish and English corpora achieving the first and second position in the AVE ranking.

Óscar Ferrández, Rafael Muñoz, Manuel Palomar
RAVE: A Fast Logic-Based Answer Validator

RAVE (Real-time Answer Validation Engine) is a logic-based answer validator/selector designed for real-time question answering. Instead of proving a hypothesis for each answer, RAVE uses logic only for checking if a considered passage supports a correct answer at all. In this way parsing of the answers is avoided, yielding low validation/selection times. Machine learning is used for assigning local validation scores based on logical and shallow features. The subsequent aggregation of these local scores strives to be robust to duplicated information in the support passages. To achieve this, the effect of aggregation is controlled by the lexical diversity of the support passages for a given answer.

Ingo Glöckner
Information Synthesis for Answer Validation

This paper proposes an integration of

Recognizing Textual Entailment

(RTE) with other additional information to deal with the

Answer Validation

task. The additional information used in our participation in the

Answer Validation Exercise

(AVE 2008) is from named-entity (NE) recognizer, question analysis component, etc. We have submitted two runs, one run for English and the other for German, achieving f-measures of 0.64 and 0.61 respectively. Compared with our system last year, which purely depends on the output of the RTE system, the extra information does show its effectiveness.

Rui Wang, Günter Neumann
Analyzing the Use of Non-overlap Features for Supervised Answer Validation

This year we evaluated our supervised answer validation method at both, the Spanish Answer Validation Exercise (AVE) and the Spanish Question Answering Main Task. This paper describes and analyzes our evaluation results from both tracks. In resume, the F-measure of the proposed method outperformed the baseline result of the AVE 2008 task by more than 100%, and enhanced the performance of our question answering system, showing a gain in accuracy of 22% for answering factoid questions. A detailed analysis of the results shows that the proposed non–overlap features are most discriminative than the traditional overlap ones. In particular, these novel features allowed increasing the F-measure result of our method by 26%.

Alberto Téllez-Valero, Antonio Juárez-González, Manuel Montes-y-Gómez, Luis Villaseñor-Pineda

Question Answering on Script Transcription (QAST)

The LIMSI Multilingual, Multitask QAst System

In this paper, we present the LIMSI question-answering system which participated to the Question Answering on speech transcripts 2008 evaluation. This systems is based on a complete and multi-level analysis of both queries and documents. It uses an automatically generated research descriptor. A score based on those descriptors is used to select documents and snippets. The extraction and scoring of candidate answers is based on proximity measurements within the research descriptor elements and a number of secondary factors. We participated to all the subtasks and submitted 18 runs (for 16 sub-tasks). The evaluation results for manual transcripts range from 31% to 45% for accuracy depending on the task and from 16 to 41% for automatic transcripts.

Sophie Rosset, Olivier Galibert, Guillaume Bernard, Eric Bilinski, Gilles Adda
IBQAst: A Question Answering System for Text Transcriptions

This paper shows the results of adapting a modular domain English QA system (called IBQAS, whose initials correspond to Interchangeable Blocks Question Answering System) to work with both manual and automatic text transcriptions. This system provides a generic and modular framework using an approach based on the recognition of named entities as a method of extracting answers.

María Pardiño, José M. Gómez, Héctor Llorens, Rafael Muñoz-Terol, Borja Navarro-Colorado, Estela Saquete, Patricio Martínez-Barco, Paloma Moreda, Manuel Palomar
Robust Question Answering for Speech Transcripts: UPC Experience in QAst 2008

This paper describes the participation of the Technical University of Catalonia in the CLEF 2008 Question Answering on Speech Transcripts track. We have participated in the English and Spanish scenarios of QAst. For the processing of manual transcripts we have deployed a robust factoid Question Answering that uses minimal syntactic information. For the handling of automatic transcripts we modify the QA system with a Passage Retrieval and Answer Extraction engine based on a sequence alignment algorithm that searches for “sounds like” sequences. We perform a detailed analysis of our results and draw conclusions relating QA performance to word error rate in transcripts.

Pere R. Comas, Jordi Turmo

Part V: Cross-Language Retrieval in Image Collections (ImageCLEF)

Overview of the ImageCLEFphoto 2008 Photographic Retrieval Task

ImageCLEFphoto 2008 is an ad-hoc photo retrieval task and part of the ImageCLEF evaluation campaign. This task provides both the resources and the framework necessary to perform comparative laboratory-style evaluation of visual information retrieval systems. In 2008, the evaluation task concentrated on promoting diversity within the top 20 results from a multilingual image collection. This new challenge attracted a record number of submissions: a total of 24 participating groups submitting 1,042 system runs. Some of the findings include that the choice of annotation language is almost negligible and the best runs are by combining concept and content-based retrieval methods.

Thomas Arni, Paul Clough, Mark Sanderson, Michael Grubinger
Overview of the ImageCLEFmed 2008 Medical Image Retrieval Task

The medical image retrieval task of ImageCLEF is in its fifth year and participation continues to increase to a total of 37 registered research groups. About half the registered groups finally submit results. Main change in 2008 was the use of a new databases containing images of the medical scientific literature (articles from the Journals Radiology and Radiographics). Besides the images, the figure captions and the part of the caption referring to a particular sub–figure were supplied as well as access to the full text articles in html. All texts were in English and the topics were supplied in German, French, and English. 30 topics were made available, ten of each of the categories visual, mixed, semantic.

Most groups concentrated on fully automatic retrieval. Only three groups submitted a total of six manual or interactive runs not showing an increase of performance over automatic approaches. In previous years, multi–modal combinations were the most frequent submissions but in 2008 text only runs were clearly higher. Only very few fully visual runs were submitted and non of the fully visual runs had an extremely good performance. Part of these tendencies might be due to semantic topics and the extremely well annotated database. Best results regarding MAP were similar for textual and multi–modal approaches whereas early precision was better for some multi–modal approaches.

Henning Müller, Jayashree Kalpathy-Cramer, Charles E. Kahn Jr., William Hatt, Steven Bedrick, William Hersh
Medical Image Annotation in ImageCLEF 2008

The ImageCLEF 2008 medical image annotation task is designed to assess the quality of content-based image retrieval and image classification by means of global signatures. In contrast to the previous years, the 2008 task was designed such that the hierarchy of reference IRMA code classifications is essential for good performance. In total, 12076 images were used, and 24 runs of 6 groups were submitted. Multi-class classification schemes for support vector machines outperformed the other methods. A scoring scheme was defined to penalise wrong classification in early code positions over those in later branches of the code hierarchy, and to penalise false category association over the assignment of a “not known” code. The obtained scores rage from 74.92 over 182.77 to 313.01 for best, baseline and worst results, respectively.

Thomas Deselaers, Thomas M. Deserno
The Visual Concept Detection Task in ImageCLEF 2008

The Visual Concept Detection Task (VCDT) of ImageCLEF 2008 is described. A database of 2,827 images were manually annotated with 17 concepts. Of these, 1,827 were used for training and 1,000 for testing the automated assignment of categories. In total 11 groups participated and submitted 53 runs. The runs were evaluated using ROC curves, from which the Area Under the Curve (AUC) and Equal Error Rate (EER) were calculated. For each concept, the best runs obtained an AUC of 80% or above.

Thomas Deselaers, Allan Hanbury
Overview of the WikipediaMM Task at ImageCLEF 2008

The wikipediaMM task provides a testbed for the system- oriented evaluation of ad-hoc retrieval from a large collection of Wikipedia images. It became a part of the ImageCLEF evaluation campaign in 2008 with the aim of investigating the use of visual and textual sources in combination for improving the retrieval performance. This paper presents an overview of the task’s resources, topics, assessments, participants’ approaches, and main results.

Theodora Tsikrika, Jana Kludas

ImageCLEFphoto

Meiji University at ImageCLEF2008 Photo Retrieval Task: Evaluation of Image Retrieval Methods Integrating Different Media

This paper describes the participation of the Human Interface Laboratory of Meiji University in the ImageCLEF2008 photo retrieval task. We submitted eight retrieval runs taking two main approaches. The first approach combined Text-Based Image Retrieval (TBIR) and Context-Based Image Retrieval (CBIR). The second approach applied query expansion using conceptual fuzzy sets (CFS). A CFS is a method that uses the expression of meaning depending on the context, which an ordinary fuzzy set does not recognize. A conceptual dictionary is necessary to perform query expansion using CFS, and this is constructed by clustering. We propose here the use of query expansion with CFS and other techniques, for image retrieval that integrates different media, and we verify the utility of the system by explaining our experimental results. This time,

TBIR+CFS

in the system which we proposed is selected No.1 with “Text Only” runs, and we demonstrated that question expansion with CFS produced higher search results.

Kosuke Yamauchi, Takuya Nomura, Keiko Usui, Yusuke Kamoi, Tomohiro Takagi
Building a Diversity Featured Search System by Fusing Existing Tools

This paper describes a diversity featured retrieval system which is built for the task of ImageCLEFPhoto 2008. Two existing tools are used: Solr and Carrot

2

. We have experimented with different settings of the system to see how the performance changes. The results suggest that the system can indeed increase diversity of the retrieved results, without sacrificing too much of the precision.

Jiayu Tang, Thomas Arni, Mark Sanderson, Paul Clough
Some Results Using Different Approaches to Merge Visual and Text-Based Features in CLEF’08 Photo Collection

This paper describes the participation of the MIRACLE team at the ImageCLEF Photographic Retrieval task of CLEF 2008. We succeeded in submitting 41 runs. Obtained results from text-based retrieval are better than content-based as previous experiments in the MIRACLE team campaigns [5, 6] using different software. Our main aim was to experiment with several merging approaches to fuse text-based retrieval and content-based retrieval results, and it happened that we improve the text-based baseline when applying one of the three merging algorithms, although visual results are lower than textual ones.

Ana García-Serrano, Xaro Benavent, Ruben Granados, José Miguel Goñi-Menoyo
MIRACLE-GSI at ImageCLEFphoto 2008: Different Strategies for Automatic Topic Expansion

This paper describes the participation of MIRACLE-GSI research consortium at the ImageCLEFphoto task of ImageCLEF 2008. For this campaign, the main purpose of our experiments was to evaluate different strategies for topic expansion in a pure textual retrieval context. Two approaches were used: methods based on linguistic information such as thesauri, and statistical methods that use term frequency. First a common baseline algorithm was used in all experiments to process the document collection. Then different expansion techniques are applied. For the semantic expansion, we used WordNet to expand topic terms with related terms. The statistical method consisted of expanding the topics using Agrawal’s apriori algorithm. Relevance-feedback techniques were also used. Last, the result list is reranked using an implementation of k-Medoids clustering algorithm with the target number of clusters set to 20. 14 fully-automatic runs were finally submitted. MAP values achieved are on the average, comparing to other groups. However, results show a significant improvement in cluster precision (6% at CR10, 12% at CR20, for runs in English) when clustering is applied, thus proving to be valuable.

Julio Villena-Román, Sara Lana-Serrano, José Carlos González-Cristóbal
Using Visual Concepts and Fast Visual Diversity to Improve Image Retrieval

In this article, we focus our efforts (i) on the study of how to automatically extract and exploit visual concepts and (ii) on fast visual diversity. First, in the Visual Concept Detection Task (VCDT), we look at the mutual exclusion and implication relations between VCDT concepts in order to improve the automatic image annotation by Forest of Fuzzy Decision Trees (FFDTs). Second, in the ImageCLEFphoto task, we use the FFDTs learnt in VCDT task and WordNet to improve image retrieval. Third, we apply a fast visual diversity method based on space clustering to improve the cluster recall score. This study shows that there is a clear improvement, in terms of precision or cluster recall at 20, when using the visual concepts explicitly appearing in the query and that space clustering can be efficiently used to improve cluster recall.

Sabrina Tollari, Marcin Detyniecki, Ali Fakeri-Tabrizi, Christophe Marsala, Massih-Reza Amini, Patrick Gallinari
A Comparative Study of Diversity Methods for Hybrid Text and Image Retrieval Approaches

This article compares eight different diversity methods: 3 based on visual information, 1 based on date information, 3 adapted to each topic based on location and visual information; finally, for completeness, 1 based on random permutation. To compare the effectiveness of these methods, we apply them on 26 runs obtained with varied methods from different research teams and based on different modalities. We then discuss the results of the more than 200 obtained runs. The results show that query-adapted methods are more efficient than non-adapted method, that visual only runs are more difficult to diversify than text only and text-image runs, and finally that only few methods maximize both the precision and the cluster recall at 20 documents.

Sabrina Tollari, Philippe Mulhem, Marin Ferecatu, Hervé Glotin, Marcin Detyniecki, Patrick Gallinari, Hichem Sahbi, Zhong-Qiu Zhao
University of Jaén at ImagePhoto 2008: Filtering the Results with the Cluster Term

This paper describes the University of Jaén system presented at ImagePhoto CLEF 2008. Previous systems used translation approaches and different information retrieval systems to obtain good results. The queries used are monolingual, so translation methods are not necessary. The new system uses the parameters that obtain the best results in the past. The novelty of our method consists of some filtered methods that are used to improve the results, with the cluster terms and its WordNet synonyms. The combination of different weighting functions (Okapi and Tfidf), the results obtained by the information retrieval systems (Lemur and Jirs), and the use or not of automatic feedback complete the experimentation.

Miguel Angel García-Cumbreras, Manuel Carlos Díaz-Galiano, María Teresa Martín-Valdivia, L. Alfonso Ureña-López
Combining TEXT-MESS Systems at ImageCLEF 2008

This paper describes the joint work of two teams belonging to the TEXT-MESS project. The system presented at ImageCLEFPhoto task combines one module based on filtering and other based on clustering. The main objective was to study the behavior of these methods with a large number of configurations in order to increase our chances of success. The system presented at ImageCLEFmed task uses the IR-n system with a negative query expansion based on the acquisition type of the image mixed with the SINAI system with a MeSH based query expansion.

Sergio Navarro, Miguel Angel García-Cumbreras, Fernando Llopis, Manuel Carlos Díaz-Galiano, Rafael Muñoz, María Teresa Martín-Valdivia, L. Alfonso Ureña-López, Arturo Montejo-Ráez
Image Retrieval by Inter-media Fusion and Pseudo-relevance Feedback

This paper presents our participation at the ImageCLEF Photo 2008 task. We submitted six runs, experimenting with our own block-based visual retrieval as well as with query expansion. The results we obtained show that despite the poor performance of the visual and text retrieval components, better results can be obtained through pseudo-relevance feedback and the inter-media fusion of the results.

Osama El Demerdash, Leila Kosseim, Sabine Bergler
Increasing Precision and Diversity in Photo Retrieval by Result Fusion

This paper considers the strategies of query expansion, relevance feedback and result fusion to increase both precision and diversity in photo retrieval. In the text-based retrieval only experiments, the run with query expansion has better MAP and P20 than that without query expansion, and only has 0.85% decrease in CR20. Although relevance feedback run increases both MAP and P20, its CR20 decreases 10.18% compared with the non-feedback run. It shows that relevance feedback brings in relevant but similar images, thus diversity may be decreased. The run with both query expansion and relevance feedback is the best in the four text-based runs. Its F1-measure is 0.2791, which has 20.8% increase to the baseline model. In the content-based retrieval only experiments, the run without feedback outperforms the run with feedback. The latter has 10.84%, 9.13%, 20.46%, and 16.7% performance decrease in MAP, P20, CR20, and F1-measure. In the fusion experiment, integrating text-based and content-based retrieval not only reports more relevant images, but also more diverse ones. Its F1-measure is 0.3189.

Yih-Chen Chang, Hsin-Hsi Chen
Diversity in Image Retrieval: DCU at ImageCLEFPhoto 2008

DCU participated in the ImageCLEF 2008 photo retrieval task, which aimed to evaluate diversity in Image Retrieval, submitting runs for both the English and Random language annotation conditions. Our approaches used text-based and image-based retrieval to give baseline runs, with the the highest-ranked images from these baseline runs clustered using K-Means clustering of the text annotations, with representative images from each cluster ranked for the final submission. For random language annotations, we compared results from translated runs with untranslated runs. Our results show that combining image and text outperforms text alone and image alone, both for general retrieval performance and for diversity. Our baseline image and text runs give our best overall balance between retrieval and diversity; indeed, our baseline text and image run was the 2nd best automatic run for ImageCLEF 2008 Photographic Retrieval task. We found that clustering consistently gives a large improvement in diversity performance over the baseline, unclustered results, while degrading retrieval performance. Pseudo relevance feedback consistently improved retrieval, but always at the cost of diversity. We also found that the diversity of untranslated random runs was quite close to that of translated random runs, indicating that for this dataset at least, if diversity is our main concern it may not be necessary to translate the image annotations.

Neil O’Hare, Peter Wilkins, Cathal Gurrin, Eamonn Newman, Gareth J. F. Jones, Alan F. Smeaton
Visual Affinity Propagation Improves Sub-topics Diversity without Loss of Precision in Web Photo Retrieval

This paper demonstrates that Affinity Propagation (AP) outperforms Kmeans for sub-topic clustering of web image retrieval. A SVM visual images retrieval system is built, and then clustering is performed on the results of each topic. Then we heighten the diversity of the 20 top results, by moving into the top the image with the lowest rank in each cluster. Using 45 dimensions Profile Entropy visual Features, we show for the 39 topics of the imageCLEF08 web image retrieval clustering campaign on 20K IAPR images, that the Cluster-Recall (CR) after AP is 13% better than the baseline without clustering, while the Precision stays almost the same. Moreover, CR and Precision without clustering are altered by Kmeans. We finally discuss that some high-level topics require text information for good CR, and that more discriminant visual features would also allow Precision enhancement after AP.

Hervé Glotin, Zhong-Qiu Zhao
Exploiting Term Co-occurrence for Enhancing Automated Image Annotation

This paper describes an application of statistical co-occurrence techniques that built on top of a probabilistic image annotation framework is able to increase the precision of an image annotation system. We observe that probabilistic image analysis by itself is not enough to describe the rich semantics of an image. Our hypothesis is that more accurate annotations can be produced by introducing additional knowledge in the form of statistical co-occurrence of terms. This is provided by the context of images that otherwise independent keyword generation would miss. We applied our algorithm to the dataset provided by ImageCLEF 2008 for the Visual Concept Detection Task (VCDT). Our algorithm not only obtained better results but also it appeared in the top quartile of all methods submitted in ImageCLEF 2008.

Ainhoa Llorente, Simon Overell, Haiming Liu, Rui Hu, Adam Rae, Jianhan Zhu, Dawei Song, Stefan Rüger
Enhancing Visual Concept Detection by a Novel Matrix Modular Scheme on SVM

A novel Matrix Modular Support Vector Machine(MMSVM) classifier is proposed to partition a visual concept problem into many easier two-class problems.This MMSVM shows significant detection improvements on the ImageClef2008 VCDT task, with a relative reduction of 15% of the classification error, compared with usual SVMs.

Zhong-Qiu Zhao, Hervé Glotin
SZTAKI @ ImageCLEF 2008: Visual Feature Analysis in Segmented Images

We describe our image processing system used in the ImageCLEF 2008 Photo Retrieval and Visual Concept Detection tasks. Our method consists of image segmentation followed by feature generation over the segments based on color, shape and texture. In the paper we elaborate on the importance of choices in the segmentation procedure with emphasis on edge detection. We also measure the relative importance of the visual features as well as the right choice of the distance function. Finally, given a very large number of parameters in our image processing system, we give a method for parameter optimization by measuring how well the similarity measures separate sample images of the same topic from those of different topics.

Bálint Daróczy, Zsolt Fekete, Mátyás Brendel, Simon Rácz, András Benczúr, Dávid Siklósi, Attila Pereszlényi
THESEUS Meets ImageCLEF: Combining Evaluation Strategies for a New Visual Concept Detection Task 2009

Automatic methods for archiving, indexing and retrieving multimedia content become more and more important through the steadily increasing amount of digital data in the web and at home. THESEUS, a German research program, focuses on developing sophisticated algorithms and evaluation strategies for the automated processing of digital data. In this paper we present how evaluation is performed in THESEUS and introduce a generic framework for the evaluation of various video and image analysis algorithms. Besides, evaluation campaigns like the Cross Evaluation Language Forum (CLEF) and subprojects like ImageCLEF deal with the evaluation of such algorithms and provide an objective comparison of their performance. We relate the THESEUS tasks to the work done in ImageCLEF and propose a new task for ImageCLEF 2009.

Stefanie Nowak, Peter Dunker, Ronny Paduschek
Query Types and Visual Concept-Based Post-retrieval Clustering

In the photo retrieval task of ImageCLEF 2008, we examined the influence of image representations, clustering methods, and query types in enhancing result diversity. Two types of visual concept vectors and hierarchical and partitioning clustering as post-retrieval clustering methods were compared. We used the title fields in the search topics, and either only the title field or both the title and description fields of the annotations were in English. The experimental results showed that one type of visual concept representation dominated the other except under one condition. Also, it was found that hierarchical clustering can enhance instance recall while preserving the precision when the threshold parameters are appropriately set. In contrast, partitioning clustering degraded the results. We also categorized the queries into geographical and non-geographical, and found that the geographical queries are relatively easy in terms of the precision of retrieval results and post-retrieval clustering also works better for them.

Masashi Inoue, Piyush Grover
Annotation-Based Expansion and Late Fusion of Mixed Methods for Multimedia Image Retrieval

This paper describes experimental results of two approaches to multimedia image retrieval:

annotation-based expansion

and

late fusion of mixed methods

. The former formulation consists of expanding manual annotations with labels generated by automatic annotation methods. Experimental results show that the performance of text-based methods can be improved with this strategy, specially, for visual topics; motivating further research in several directions. The second approach consists of combining the outputs of diverse image retrieval models based on different information. Experimental results show that competitive performance, in both retrieval and results diversification, can be obtained with this simple strategy. It is interesting that, contrary to previous work, the best results of the fusion were obtained by assigning a high weight to visual methods. Furthermore, a probabilistic modeling approach to result-diversification is proposed; experimental results reveal that some modifications are needed to achieve satisfactory results with this method.

Hugo Jair Escalante, Jesús A. Gonzalez, Carlos A. Hernández, Aurelio López, Manuel Montes, Eduardo Morales, Luis E. Sucar, Luis Villaseñor-Pineda
Evaluation of Diversity-Focused Strategies for Multimedia Retrieval

In this paper, we propose and evaluate different strategies to promote diversity in the top results of multimedia retrieval systems. These strategies consist in clustering, explicitly or implictly, the elements of the top list of some initial ranking and produce a re-ranking that favours elements belonging to different clusters. We evaluate these strategies in the particular case of ImageCLEFPhoto 2008 Collection. Results show that most of these strategies succeed in increasing a diversity performance measure, while keeping or slightly degrading precision of the top list and, more interestingly, they achieve this in complementary ways.

Julien Ah-Pine, Gabriela Csurka, Jean-Michel Renders
Clustering for Photo Retrieval at Image CLEF 2008

This paper presents the first participation of the University of Ottawa group in the Photo Retrieval task at Image CLEF 2008. Our system uses Lucene for text indexing and LIRE for image indexing. We experiment with several clustering methods in order to retrieve images from diverse clusters. The clustering methods are: k-means clustering, hierarchical clustering, and our own method based on WordNet. We present results for thirteen runs, in order to compare retrieval based on text description, to image-only retrieval, and to merged retrieval, and to compare results for the different clustering methods.

Diana Inkpen, Marc Stogaitis, François DeGuire, Muath Alzghool

ImageCLEFmed

Methods for Combining Content-Based and Textual-Based Approaches in Medical Image Retrieval

This paper describes our participation to the Medical Image Retrieval task of Image CLEF 2008. Our aim was to evaluate different combination approaches for context-based and content-base image retrieval. Our test set is composed of 30 queries, which has been classified by organizers into three categories: visual, textual (semantic) and mixed.

Our most interesting conclusion is that combining results provided by both methods using a classical combination function on all query types, obtains higher retrieval accuracy than combining according to query type. Moreover, it is more successful than using only textual retrieval or using only visual retrieval.

Mouna Torjmen, Karen Pinel-Sauvagnat, Mohand Boughanem
An SVM Confidence-Based Approach to Medical Image Annotation

This paper presents the algorithms and results of the “idiap” team participation to the ImageCLEFmed annotation task in 2008. On the basis of our successful experience in 2007 we decided to integrate two different local structural and textural descriptors. Cues are combined through concatenation of feature vectors and through the Multi-Cue Kernel. The challenge this year was to annotate images coming mainly from classes with only few training examples. We tackled the problem on two fronts: (1) we introduced a further integration strategy using SVM as an opinion maker; (2) we enriched the poorly populated classes adding virtual examples. We submitted several runs considering different combinations of the proposed techniques. The run jointly using the feature concatenation, the confidence-based opinion fusion and the virtual examples ranked first among all submissions.

Tatiana Tommasi, Francesco Orabona, Barbara Caputo
LIG at ImageCLEF 2008

This paper describes the work of the LIG for ImageCLEF 2008. For ImageCLEFPhoto, two non diversified runs (text only and text + image), and two diversified runs were officially submitted. We add in this paper results on image only runs. The text retrieval part is based on a language model of Information Retrieval, and the image part uses RGB histograms. Text+image results are obtained by late fusion, by merging text and image results. We tested three strategies for promoting diversity using date/location or visual features. Diversification on image only runs does not perform well. Diversification on image and text+image outperforms non diversified runs. In a second part, this paper describes the runs and results obtained by the LIG at ImageCLEFmed 2008. This contribution incorporates knowledge in the language modeling approach to information retrieval (IR) through the graph modeling approach proposed in . Our model makes use of the textual part of the corpus and of the medical knowledge found in the Unified Medical Language System (UMLS) knowledge sources. And the model is extended to combine different graph detection methods on queries and documents. The results show that detection combination improves the performances.

Loic Maisonnasse, Philippe Mulhem, Eric Gaussier, Jean Pierre Chevallet
The MedGIFT Group at ImageCLEF 2008

This article describes the participation of the MedGIFT research group at the 2008 ImageCLEFmed image retrieval benchmark. We concentrated on the two tasks concerning medical imaging. The visual information analysis is mainly based on the GNU Image Finding Tool (GIFT). Other information such as textual information and aspect ratio were integrated to improve our results. The main techniques are similar to past years, with tuning a few parameters to improve results.

For the visual tasks it becomes clear that the baseline GIFT runs do not have the same performance as some more sophisticated and more modern techniques. GIFT can be seen as a baseline for the visual retrieval as it has been used for the past five years in ImageCLEF. Due to time constraints not all optimizations could be performed and no relevance feedback was used, one of the strong points of GIFT. Still, a clear difference in performance can be observed depending on the various optimizations applied, and the difference with the best groups is smaller than in past years.

Xin Zhou, Julien Gobeill, Henning Müller
MIRACLE at ImageCLEFmed 2008: Semantic vs. Statistical Strategies for Topic Expansion

This paper describes the participation of MIRACLE research consortium at the ImageCLEFmed task of ImageCLEF 2008. The main goal of our participation this year is to evaluate different text-based topic expansion approaches: methods based on linguistic information such as thesauri or knowledge bases, and statistical techniques based mainly on term frequency. First a common baseline algorithm is used to process the document collection: text extraction, medical-vocabulary recognition, tokenization, conversion to lowercase, filtering, stemming and indexing and retrieval. Then different expansion techniques are applied. For the semantic expansion, the MeSH concept hierarchy using UMLS entities as basic root elements was used. The statistical method expanded the topics using the apriori algorithm. Relevance-feedback techniques were also used.

Sara Lana-Serrano, Julio Villena-Román, José Carlos González-Cristóbal
Experiments in Calibration and Validation for Medical Content-Based Images Retrieval

We present a CBIR system (Content-based Image Retrieval). The system establishes a set of visual features which will be automatically generated. The sort of features is diverse and they are related to various concepts. After visual features calculation, a calibration process is performed whereby the system estimates the best weight for each feature. It uses a calibration algorithm (an iterative process) and a set of experiments, and the result is the influence of each feature in the main function that is used for the retrieval process. In image validation, the modifications to the main function are verified so as to ensure that the new function is better than the preceding one. Finally, the image retrieval process is performed according to the ImageCLEFmed rules, fully described in [2, 5]. The retrieval results have not been the expected ones, but they are a good starting for the future.

Jose L. Delgado, Covadonga Rodrigo, Gonzalo León
MIRACLE at ImageCLEFannot 2008: Nearest Neighbour Classification of Image Feature Vectors for Medical Image Annotation

This paper describes the participation of MIRACLE research consortium at the ImageCLEF Medical Image Annotation task of ImageCLEF 2008. During the last year, our own image analysis system was developed, based on MATLAB. This system extracts a variety of global and local features including histogram, image statistics, Gabor features, fractal dimension, DCT and DWT coefficients, Tamura features and co-occurrence matrix statistics. A classifier based on the k-Nearest Neighbour algorithm is trained on the extracted image feature vectors to determine the IRMA code associated to a given image. The focus of our participation was mainly to test and evaluate this system in-depth and to compare among diverse configuration parameters such as number of images for the relevance feedback to use in the classification module.

Sara Lana-Serrano, Julio Villena-Román, José Carlos González-Cristóbal, José Miguel Goñi-Menoyo
Query Expansion on Medical Image Retrieval: MeSH vs. UMLS

In this paper we explain experiments in the medical information retrieval task (ImageCLEFmed). We experimented with query expansion and the amount of textual information obtained from the collection. For expansion, we carried out experiments using MeSH ontology and UMLS separately. With respect to textual collection, we produced three different collections, the first one with caption and title, the second one with caption, title and the text of the section where the image appears, and the third one with the full text article. Moreover, we experimented with textual and visual search, along with the combination of these two results. For image retrieval we used the results generated by the FIRE software. The best results were obtained using MeSH query expansion on shortest textual collection (only caption and title) merging with the FIRE results.

Manuel Carlos Díaz-Galiano, Miguel Angel García-Cumbreras, María Teresa Martín-Valdivia, L. Alfonso Ureña-López, Arturo Montejo-Ráez
Query and Document Expansion with Medical Subject Headings Terms at Medical Imageclef 2008

In this paper, we report on query and document expansion using Medical Subject Headings (MeSH) terms designed for medical ImageCLEF 2008. In this collection, MeSH terms describing an image could be obtained in two different ways: either being collected with the associated MEDLINE’s paper, or being extracted from the associated caption. We compared document expansion using both. From a baseline of 0.136 for Mean Average Precision (MAP), we reached a MAP of respectively 0.176 (+29%) with the first method, and 0.154 (+13%) with the second. In-depth analyses show how both strategies were beneficial, as they covered different aspects of the image. Finally, we combined them in order to produce a significantly better run (0.254 MAP, +86%). Combining the MeSH terms using both methods gives hence a better representation of the images, in order to perform document expansion.

Julien Gobeill, Patrick Ruch, Xin Zhou
Multimodal Medical Image Retrieval OHSU at ImageCLEF 2008

We present results from the Oregon Health & Science University’s participation in the medical retrieval task of ImageCLEF 2008. Our web-based retrieval system was built using a Ruby on Rails framework. Ferret, a Ruby port of Lucene was used to create the full-text based index and search engine. In addition to the textual index of annotations, supervised machine learning techniques using visual features were used to classify the images based on image acquisition modality. Our system provides the user with a number of search options including the ability to limit their search by modality, UMLS-based query expansion, and Natural Language Processing-based techniques. Purely textual runs as well as mixed runs using the purported modality were submitted. We also submitted interactive runs using user specified search options. Although the use of the UMLS metathesaurus increased our recall, our system is geared towards early precision. Consequently, many of our multimodal automatic runs using the custom parser as well as interactive runs had high early precision including the highest P10 and P30 among the official runs. Our runs also performed well using the

bpref

metric, a measure that is more robust in the case of incomplete judgments.

Jayashree Kalpathy-Cramer, Steven Bedrick, William Hatt, William Hersh
Baseline Results for the ImageCLEF 2008 Medical Automatic Annotation Task in Comparison over the Years

This work reports baseline results for the CLEF 2008 Medical Automatic Annotation Task (MAAT) by applying a classifier with a fixed parameter set to all tasks 2005 – 2008. A nearest-neighbor (NN) classifier is used, which uses a weighted combination of three distance and similarity measures operating on global image features: Scaled-down representations of the images are compared using models for the typical variability in the image data, mainly translation, local deformation, and radiation dose. In addition, a distance measure based on texture features is used. In 2008, the baseline classifier yields error scores of 170.34 and 182.77 for

k

 = 1 and

k

 = 5 when the full code is reported, which corresponds to error rates of 51.3% and 52.8% for 1-NN and 5-NN, respectively. Judging the relative increases of the number of classes and the error rates over the years, MAAT 2008 is estimated to be the most difficult in the four years.

Mark O. Güld, Petra Welter, Thomas M. Deserno

ImageCLEFWiki

Evaluating the Impact of Image Names in Context-Based Image Retrieval

This paper describes our work at the CLEF 2008 WikipediaMM Task. We study the use of image name in a context-based image retrieval approach. This factor is evaluated in three manners. The first one consists of using image names explicitly: we computed a similarity score between the query and the name of images using the vector space model. The second one consists of combining results obtained using the textual content of documents and results obtained using the first method. Finally, in our last approach, image names are used less explicitly: we proposed to use all the textual content of image annotations, but we increased the weight of terms in the image name. Results show that the image name can be an interesting factor to improve image retrieval results.

Mouna Torjmen, Karen Pinel-Sauvagnat, Mohand Boughanem
Large-Scale Cross-Media Retrieval of WikipediaMM Images with Textual and Visual Query Expansion

In this paper, we present our approaches for the WikipediaMM task at ImageCLEF 2008. We first experimented with a text-based image retrieval approach with query expansion, where the extension terms were automatically selected from a knowledge base that was semi-automatically constructed from Wikipedia. Encouragingly, the experimental results rank in the first place among all submitted runs. We also implemented a content-based image retrieval approach with query-dependent visual concept detection. Then cross-media retrieval was successfully carried out by independently applying the two meta-search tools and then combining the results through a weighted summation of scores. Though not submitted, this approach outperforms our text-based and content-based approaches remarkably.

Zhi Zhou, Yonghong Tian, Yuanning Li, Tiejun Huang, Wen Gao
Conceptual Image Retrieval over a Large Scale Database

Image retrieval in large-scale databases is currently based on a textual chains matching procedure. However, this approach requires an accurate annotation of images, which is not the case on the Web. To tackle this issue, we propose a reformulation method that reduces the influence of noisy image annotations. We extract a ranked list of related concepts for terms in the query from WordNet and Wikipedia, and use them to expand the initial query. Then some visual concepts are used to re-rank the results for queries containing, explicitly or implicitly, visual cues. First evaluations on a diversified corpus of 150000 images were convincing since the proposed system was ranked 4

th

and 2

nd

at the WikipediaMM task of the ImageCLEF 2008 campaign [1].

Adrian Popescu, Hervé Le Borgne, Pierre-Alain Moëllic
UJM at ImageCLEFwiki 2008

This paper reports our multimedia information retrieval experiments carried out for the ImageCLEF track (ImageCLEFwiki[10]). We propose a new multimedia model combining textual and/or visual information which enables to perform textual, visual, or multimedia queries. We experiment the model on ImageCLEF data and we compare the results obtained using the different modalities.

Our multimedia document model is based on a vector of textual and visual terms. Textual terms correspond to textual words while the visual ones are computed using local colour features. We obtain good results using only the textual part and we show that the visual information is useful in some particular cases.

Christophe Moulin, Cécile Barat, Mathias Géry, Christophe Ducottet, Christine Largeron

Part VI: Multilingual Web Track (WebCLEF)

Overview of WebCLEF 2008

We describe the WebCLEF 2008 task. Similarly to the 2007 edition of WebCLEF, the 2008 edition implements a multilingual “information synthesis” task, where, for a given topic, participating systems have to extract important snippets from web pages. We detail the task, the assessment procedure, the evaluation measures and results.

Valentin Jijkoun, Maarten de Rijke
On the Evaluation of Snippet Selection for WebCLEF

WebCLEF is about supporting a user who is an expert in writing a survey article on a specific topic with a clear goal and audience by generating a ranked list with relevant snippets. This paper focuses on the evaluation methodology of WebCLEF. We show that the evaluation method and test set used for WebCLEF 2007 cannot be used to evaluate new systems and give recommendations how to improve the evaluation.

Arnold Overwijk, Dong Nguyen, Claudia Hauff, Dolf Trieschnigg, Djoerd Hiemstra, Franciska de Jong
UNED at WebCLEF 2008: Applying High Restrictive Summarization, Low Restrictive Information Retrieval and Multilingual Techniques

This paper describes our participation in the WebCLEF 2008 task, targeted at snippet retrieval from new data. Our system assumes that the task can be tackled as a summarization problem and that the document retrieval and multilinguism treatment steps can be ignored. Our approach assumes also that the redundancy of information in the Web allows the system to be very restrictive when picking information pieces. Our evaluation results suggest that, while the first assumption is feasible, the second one is not always true.

Enrique Amigó, Juan Martinez-Romo, Lourdes Araujo, Víctor Peinado
Retrieval of Snippets of Web Pages Converted to Plain Text. More Questions Than Answers

This year’s WebCLEF task was to retrieve snippets and pieces from documents on various topics. The extraction and the choice of the most widely used snippets can be carried out using various methods. However, the way in which web pages are usually converted to plain text introduces a series of problems that cause inefficiency in the retrieval. Duplicate information, absolutely irrelevants snippets or even meaningless, are some of these problems. Also, it is intended in this paper to explore the real impact of the use of several languages in obtaining relevant fragments.

Carlos G. Figuerola, José Luis Alonso Berrocal, Ángel F. Zazo Rodríguez, Montserrat Mateos

Part VII: Cross-Language Geographical Retrieval (GeoCLEF)

GeoCLEF 2008: The CLEF 2008 Cross-Language Geographic Information Retrieval Track Overview

GeoCLEF is an evaluation task running under the scope of the Cross Language Evaluation Forum (CLEF). The purpose of GeoCLEF is to test and evaluate cross-language geographic information retrieval (GIR). The GeoCLEF 2008 task presented twenty-five geographically challenging search topics for English, German and Portuguese. Eleven participants submitted 131 runs, based on a variety of approaches, including sample documents, named entity extraction and ontology based retrieval. The evaluation methodology and results are presented in the paper.

Thomas Mandl, Paula Carvalho, Giorgio Maria Di Nunzio, Fredric Gey, Ray R. Larson, Diana Santos, Christa Womser-Hacker
GIR with Language Modeling and DFR Using Terrier

This paper reports on additional experiments in the Monolingual English, German and Portuguese collections tasks to those described in CLEF2008 Working Notes. Experiments were performed using the language modeling approach and the Divergence From Randomness (DFR) InL2 model as implemented in Terrier (TERabyte RetrIEveR) version 2.1. The main purpose was twofold: 1) to compare these approaches to determine their impact on performance retrieval and 2) to compare results from these experiments with the results generated in the first set of experiments to determine whether query expansion and the presence or absence of diacritic marks have an impact on performance retrieval. The stopword list provided by Terrier was used to index all the collections. We removed diacritic marks from the topics and collections for German and Portuguese before indexing and retrieval. Topics were processed automatically and the query tags specified were the title and the description. Query expansion was included using the 20 top ranked documents and 40 terms. These parameters were selected arbitrarily. Results show that the DFR InL2 model outperformed language modeling for all the languages. Results of the new experiments with query expansion show an improvement in performance retrieval for all the languages. They also suggest that removing diacritic marks may also have an impact in the case of German and Portuguese.

Rocio Guillén
Cheshire at GeoCLEF 2008: Text and Fusion Approaches for GIR

In this paper we will briefly describe the approaches taken by the Berkeley Cheshire group for the main GeoCLEF 2008 tasks (Mono and Bilingual retrieval), and present some analyses of the fusion approach used. This year our submissions used probabilistic text retrieval based on logistic regression and incorporating blind relevance feedback for all of the runs and in addition we ran a number of tests combining this type of search with OKAPI BM25 searches using a fusion approach. We did not, however, use any explicit geographic processing. All translation for bilingual tasks was performed using the LEC Power Translator PC-based MT system.

Ray R. Larson
Geographic and Textual Data Fusion in Forostar

In this paper we provide some analysis of data fusion techniques employed at GeoCLEF 2008 to merge textual and geographic relevance. These methods are compared to our own experiments, where using our GIR system,

Forostar

, we show that an aggressive filter-based data fusion method can outperform a more sophisticated penalisation method.

Simon Overell, Adam Rae, Stefan Rüger
Query Expansion for Effective Geographic Information Retrieval

We developed two methods for monolingual Geo-CLEF 2008 task. The GCEC method aims to test the effectiveness of our online geographic coordinates extraction and clustering algorithm, and the WIKIGEO method wants to examine the usefulness of using the geographic coordinates information in Wikipedia for identifying geo-locations. We proposed a measure of topic distance to evaluate these two methods. The experiments results show that: 1) our online geographic coordinates extraction and clustering algorithm is useful for the type of locations that do not have clear corresponding coordinates; 2) the expansion based on the geo-locations generated by GCEC is effective in improving geographic retrieval; 3) Wikipedia can help in finding the coordinates for many geo-locations, but its usage for query expansion still needs further study; 4) query expansion based on title only obtained better results than that on the title and narrative parts, even though the latter contains more related geographic information. Further study is needed for this part.

Qiang Pu, Daqing He, Qi Li
Integrating Methods from IR and QA for Geographic Information Retrieval

This paper describes the participation of GIRSA at GeoCLEF 2008, the geographic information retrieval task at CLEF. GIRSA combines information retrieval (IR) on geographically annotated data and question answering (QA) employing query decomposition.

For the monolingual German experiments, several parameter settings were varied: using a single index or separate indexes for content and geographic annotation, using complex term weighting, adding location names from the topic narrative, and merging results from IR and QA, which yields the highest mean average precision (0.2608 MAP).

For bilingual experiments, English and Portuguese topics were translated via the web services Applied Language Solutions, Google Translate, and Promt Online Translator. For both source languages, Google Translate seems to return the best translations. For English (Portuguese) topics, 60.2% (80.0%) of the maximum MAP for monolingual German experiments, or 0.1571 MAP (0.2085 MAP), is achieved.

As a post-official experiment, translations of English topics were analysed with a parser. The results were employed to select the best translation for topic titles and descriptions. The corresponding retrieval experiment achieved 69.7% of the MAP of the best monolingual experiment.

Johannes Leveling, Sven Hartrumpf
Using Query Reformulation and Keywords in the Geographic Information Retrieval Task

This paper describes the use of query reformulation to improve the Geographic Information Retrieval (GIR) task. This technique also includes the geographic expansion of the topics. Moreover, several experiments related to the use of keywords and hyponyms in the filtering process are performed. We also use a new approach in the re-ranking process based on the original position of each document in the ranking. The results obtained show that our query reformulation sometimes retrieves valid documents that the default query is not able to find, but on average it does not improve the baseline case. The best result is obtained considering the geographic entities in the traditional retrieval process.

José Manuel Perea-Ortega, L. Alfonso Ureña-López, Manuel García-Vega, Miguel Angel García-Cumbreras
Using GeoWordNet for Geographical Information Retrieval

We present a method that uses GeoWordNet for Geographical Information Retrieval. During the indexing phase, all places are disambiguated and assigned their coordinates on the world map. Documents are first searched for by means of a term-based search method, and then re-ranked according to the geographical information. The results show that map-based re-ranking allows to improve the results obtained by the base system, which relies only on textual information.

Davide Buscaldi, Paolo Rosso
GeoTextMESS: Result Fusion with Fuzzy Borda Ranking in Geographical Information Retrieval

In this paper we discuss the integration of different GIR systems by means of a fuzzy Borda method for result fusion. Two of the systems, the one by the Universidad Politécnica de Valencia and the one of the Universidad of Jaén participated to the GeoCLEF task under the name TextMess. The proposed result fusion method takes as input the document lists returned by the different systems and returns a document list where the documents are ranked according to the fuzzy Borda voting scheme. The obtained results show that the fusion method allows to improve the results of the component systems, although the fusion is not optimal, because it is effective only if the components return a similar set of relevant documents.

Davide Buscaldi, José Manuel Perea Ortega, Paolo Rosso, L. Alfonso Ureña López, Daniel Ferrés, Horacio Rodríguez
A Ranking Approach Based on Example Texts for Geographic Information Retrieval

This paper focuses on the problem of ranking documents for Geographic Information Retrieval. It aims to demonstrate that by using some query-related

example texts

it is possible to improve the final ranking of the retrieved documents. Experimental results indicated that our approach could improve the MAP of some sets of retrieved documents using only two

example texts

.

Esaú Villatoro-Tello, Manuel Montes-y-Gómez, Luis Villaseñor-Pineda
Ontology-Based Query Construction for GeoCLEF

This paper describes experiments with geographical information retrieval (GIR). Being different from the traditional information IR, we focus more on the query expansion instead of document ranking. We parse each topic into the

event

part and the

geographic

part and use different ontologies to expand both parts respectively. The results show promising results of our strategy for this task.

Rui Wang, Günter Neumann
Experiments with Geographic Evidence Extracted from Documents

For the 2008 participation at GeoCLEF, we focused on improving the extraction of geographic signatures from documents and optimising their use for GIR. The results show that the detection of explicit geographic named entities for including their terms in a tuned weighted index field significantly improves retrieval performance when compared to classic text retrieval.

Nuno Cardoso, Patrícia Sousa, Mário J. Silva
GikiP at GeoCLEF 2008: Joining GIR and QA Forces for Querying Wikipedia

This paper reports on the GikiP pilot that took place in 2008 in GeoCLEF. This pilot task requires a combination of methods from geographical information retrieval and question answering to answer queries to the Wikipedia. We start by the task description, providing details on topic choice and evaluation measures. Then we offer a brief motivation from several perspectives, and we present results in detail. A comparison of participants’ approaches is then presented, and the paper concludes with improvements for the next edition.

Diana Santos, Nuno Cardoso, Paula Carvalho, Iustin Dornescu, Sven Hartrumpf, Johannes Leveling, Yvonne Skalban

Part VIII: Cross-Language Video Retrieval (VideoCLEF)

Overview of VideoCLEF 2008: Automatic Generation of Topic-Based Feeds for Dual Language Audio-Visual Content

The VideoCLEF track, introduced in 2008, aims to develop and evaluate tasks related to analysis of and access to multilingual multimedia content. In its first year, VideoCLEF piloted the Vid2RSS task, whose main subtask was the classification of dual language video (Dutch-language television content featuring English-speaking experts and studio guests). The task offered two additional discretionary subtasks: feed translation and automatic keyframe extraction. Task participants were supplied with Dutch archival metadata, Dutch speech transcripts, English speech transcripts and ten thematic category labels, which they were required to assign to the test set videos. The videos were grouped by class label into topic-based RSS-feeds, displaying title, description and keyframe for each video.

Five groups participated in the 2008 VideoCLEF track. Participants were required to collect their own training data; both Wikipedia and general web content were used. Groups deployed various classifiers (SVM, Naive Bayes and k-NN) or treated the problem as an information retrieval task. Both the Dutch speech transcripts and the archival metadata performed well as sources of indexing features, but no group succeeded in exploiting combinations of feature sources to significantly enhance performance. A small scale fluency/adequacy evaluation of the translation task output revealed the translation to be of sufficient quality to make it valuable to a non-Dutch speaking English speaker. For keyframe extraction, the strategy chosen was to select the keyframe from the shot with the most representative speech transcript content. The automatically selected shots were shown, with a small user study, to be competitive with manually selected shots. Future years of VideoCLEF will aim to expand the corpus and the class label list, as well as to extend the track to additional tasks.

Martha Larson, Eamonn Newman, Gareth J. F. Jones
MIRACLE at VideoCLEF 2008: Topic Identification and Keyframe Extraction in Dual Language Videos

This paper describes the participation of MIRACLE research consortium at the VideoCLEF track at CLEF 2008. We took part in both the main mandatory Classification task (classify videos of television episodes using speech transcripts and metadata) and the Keyframe Extraction task (select keyframes that represent individual episodes from a set of supplied keyframes). Our system for the first task is composed of two main blocks: the core system knowledge base and the set of operational elements that are needed to classify the speech transcripts of the topic episodes and generate the output in RSS format. For the second task, our approach is based on the assumption that the most representative fragment (shot) of each episode is the one with the lowest distance to the whole episode, considering a vector space model. In the classification task, our runs ranked 3rd (out of 6 participants) in terms of precision.

Julio Villena-Román, Sara Lana-Serrano
DCU at VideoClef 2008

We describe a baseline system for the VideoCLEF Vid2RSS task in which videos are to be classified into thematic categories based on their content. The system uses an off-the-shelf Information Retrieval system. Speech transcripts generated using automated speech recognition are indexed using default stemming and stopping methods. The categories are populated by using the category theme (or label) as a query on the collection, and assigning the retrieved items to that particular category. Run 4 of our system achieved the highest f-score in the task by maximising recall. We discuss this in terms of the primary aims of the task, i.e., automating video classification.

Eamonn Newman, Gareth J. F. Jones
Using an Information Retrieval System for Video Classification

This paper describes a simple approach to resolve the video classification task. This approach consists in applying an Information Retrieval (IR) system as classifier. We have generated a document collection for each topic class predefined. This collection has been composed of documents retrieved using the Google search engine. Following the IR strategy, we have used the speech transcriptions of the videos as textual queries. The results obtained show that an IR system can perform well as video classifier if the speech transcriptions of the videos to classify have good quality.

José Manuel Perea-Ortega, Arturo Montejo-Ráez, Manuel Carlos Díaz-Galiano, María Teresa Martín-Valdivia, L. Alfonso Ureña-López
VideoCLEF 2008: ASR Classification with Wikipedia Categories

This article describes our participation at the

VideoCLEF track

. We designed and implemented a prototype for the classification of the Video ASR data. Our approach was to regard the task as text classification problem. We used terms from Wikipedia categories as training data for our text classifiers. For the text classification the Naive-Bayes and kNN classifier from the WEKA toolkit were used. We submitted experiments for classification task 1 and 2. For the translation of the feeds to English (translation task) Google’s AJAX language API was used. Although our experiments achieved only low precision of 10 to 15 percent, we assume those results will be useful in a combined setting with the retrieval approach that was widely used. Interestingly, we could not improve the quality of the classification by using the provided metadata.

Jens Küsrsten, Daniel Richter, Maximilian Eibl
Metadata and Multilinguality in Video Classification

The VideoCLEF 2008 Vid2RSS task involves the assignment of thematic category labels to dual language (Dutch/English) television episode videos. The University of Amsterdam chose to focus on exploiting archival metadata and speech transcripts generated by both Dutch and English speech recognizers. A Support Vector Machine (SVM) classifier was trained on training data collected from Wikipedia. The results provide evidence that combining archival metadata with speech transcripts can improve classification performance, but that adding speech transcripts in an additional language does not yield performance gains.

Jiyin He, Xu Zhang, Wouter Weerkamp, Martha Larson

Part IX: Multilingual Information Filtering (INFILE@CLEF)

Overview of CLEF 2008 INFILE Pilot Track

The INFILE campaign was run for the first time as a pilot track in CLEF 2008. Its purpose was the evaluation of cross-language adaptive filtering systems. It used a corpus of 300,000 newswires from Agence France Presse (AFP) in three languages: Arabic, English and French, and a set of 50 topics in general and specific domain (scientific and technological information). Due to delays in the organization of the task, the campaign only had 3 submissions (from one participant) which are presented in this article.

Romaric Besançon, Stéphane Chaudiron, Djamel Mostefa, Olivier Hamon, Ismaïl Timimi, Khalid Choukri
Online Document Filtering Using Adaptive k-NN

We propose in this paper an adaptation of the k-Nearest Neighbor (k-NN) algorithm using category specific thresholds in a multiclass environment where a document can belong to more than one class. Our method uses feedback to tune the thresholds and in turn the classification performance over time. The experiments were run on the InFile data, comprising 100,000 English documents and 50 topics.

Vincent Bodinier, Ali Mustafa Qamar, Eric Gaussier

Part X: Morpho Challenge at CLEF 2008

Overview of Morpho Challenge 2008

This paper gives an overview of Morpho Challenge 2008 competition and results. The goal of the challenge was to evaluate unsupervised algorithms that provide morpheme analyses for words in different languages. For morphologically complex languages, such as Finnish, Turkish and Arabic, morpheme analysis is particularly important for lexical modeling of words in speech recognition, information retrieval and machine translation. The evaluation in Morpho Challenge competitions consisted of both a linguistic and an application oriented performance analysis. In addition to the Finnish, Turkish, German and English evaluations performed in Morpho Challenge 2007, the competition this year had an additional evaluation for Arabic. The results in linguistic evaluation in 2008 show that although the level of precision and recall varies substantially between the tasks in different languages, the best methods seem to deal quite well with all languages involved. The results in information retrieval evaluation indicate that the morpheme analysis has a significant effect in all the tested languages (Finnish, English and German). The best unsupervised and language-independent morpheme analysis methods can also rival the best language-dependent word normalization methods. The Morpho Challenge was part of the EU Network of Excellence PASCAL Challenge Program and organized in collaboration with CLEF.

Mikko Kurimo, Ville Turunen, Matti Varjokallio
ParaMor and Morpho Challenge 2008

We summarize the strong performance of ParaMor, an unsupervised morphology induction system, at Morpho Challenge 2008. When ParaMor’s morphological analyses, which specialize at identifying inflectional morphology, are added to the analyses from the general-purpose unsupervised morphology induction system, Morfessor, the combined system identifies the morphemes of all five Morpho Challenge languages at recall scores higher than those of any other system which competed in the Challenge. These strong recall scores lead to F

1

values for morpheme identification as high as or higher than those of any competing system for all the competition languages but English.

Categories and Subject Descriptors:

I.2 [Artificial Intelligence]: I.2.7 Natural Language Processing.

General Terms

: Experimentation.

Christian Monson, Jaime Carbonell, Alon Lavie, Lori Levin
Allomorfessor: Towards Unsupervised Morpheme Analysis

We extend the unsupervised morpheme segmentation method Morfessor Baseline to account for the linguistic phenomenon of allomorphy, where one morpheme has several different surface forms. Our method discovers common base forms for allomorphs from an unannotated corpus. We evaluate the method by participating in the Morpho Challenge 2008 competition 1, where inferred analyses are compared against a linguistic gold standard. While our competition entry achieves high precision, but low recall, and therefore low F-measure scores, we show that a small model change gives state-of-the-art results.

Oskar Kohonen, Sami Virpioja, Mikaela Klami
Using Unsupervised Paradigm Acquisition for Prefixes

We describe a simple method of unsupervised morpheme segmentation of words in an unknown language. All that is needed is a raw text corpus (or a list of words) in the given language. The algorithm identifies word parts occurring in many words and interprets them as morpheme candidates (prefixes, stems and suffixes). New treatment of prefixes is the main innovation in comparison to [1]. After filtering out spurious hypotheses, the list of morphemes is applied to segment input words. Official Morpho Challenge 2008 evaluation is given together with some additional experiments. Processing of prefixes improved the F-score by 5 to 11 points for German, Finnish and Turkish, while it failed to improve English and Arabic. We also analyze and discuss errors with respect to the evaluation method.

Daniel Zeman
Morpho Challenge Evaluation by Information Retrieval Experiments

In Morpho Challenge competitions, the objective has been to design statistical machine learning algorithms that discover which morphemes (smallest individually meaningful units of language) words consist of. Ideally, these are basic vocabulary units suitable for different tasks, such as text understanding, machine translation, information retrieval (IR), and statistical language modeling. In this paper, we propose to evaluate the morpheme analyses by performing IR experiments, where the words in the documents and queries are replaced by their proposed morpheme representations and the search is based on morphemes instead of words. In this paper, the evaluations are run for three languages: Finnish, German, and English using the queries, texts, and relevance judgments available in CLEF forum. The results show that the morpheme analysis has a significant effect in IR performance in all languages, and that the performance of the best unsupervised methods can be superior to the supervised reference methods.

Mikko Kurimo, Mathias Creutz, Ville Turunen
Backmatter
Metadata
Title
Evaluating Systems for Multilingual and Multimodal Information Access
Editors
Carol Peters
Thomas Deselaers
Nicola Ferro
Julio Gonzalo
Gareth J. F. Jones
Mikko Kurimo
Thomas Mandl
Anselmo Peñas
Vivien Petras
Copyright Year
2009
Publisher
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-04447-2
Print ISBN
978-3-642-04446-5
DOI
https://doi.org/10.1007/978-3-642-04447-2

Premium Partner