Skip to main content

Über dieses Buch

This book constitutes the proceedings of the 6th International Information Retrieval Facility Conference, IRFC 2013, held in Limassol, Cyprus, October 2013.

The 8 papers presented together with 2 short papers were carefully reviewed and selected from 16 high-quality submissions. IRF conferences wish to bring young researchers into contact with industry at an early stage. This sixth conference aimed to tackle four complementary research areas: information retrieval, machine translations for search solutions, and interactive information access.



Multilingual and Cross-Lingual News Analysis in the Europe Media Monitor (EMM) (Extended Abstract)

We give an overview of the highly multilingual news analysis systemEurope Media Monitor (EMM), which gathers an average of 175,000 online news articles per day in tens of languages, categorises the news items and extracts named entities and various other information from them. We explain how users benefit from media monitoring and why it is so important to monitor the news in many different languages. We also describe the challenge of developing text mining tools for tens of languages and in particular that of dealing with highly inflected languages, such as those of the Balto-Slavonic and Finno-Ugric language families.
Ralf Steinberger

Ontology Based Query Expansion with a Probabilistic Retrieval Model

This paper examines the use of ontologies for defining query context. The information retrieval system used is based on the probabilistic retrieval model. We extend the use of relevance feedback (RFB) and pseudo-relevance feedback (PF) query expansion techniques using information from a news domain ontology. The aim is to assess the impact of the ontology on the query expansion results with respect to recall and precision. We also tested the results for varying the relevance feedback parameters (number of terms or number of documents). The factors which influence the success of ontology based query expansion are outlined. Our findings show that ontology based query expansion has had mixed success. The use of the ontology has vastly increased the number of relevant documents retrieved, however, we conclude that for both types of query expansion, the PF results are better than the RFB results.
Jagdev Bhogal, Andrew Macfarlane

A Machine Learning Approach for Subjectivity Classification Based on Positional and Discourse Features

In recent years, several machine learning methods have been proposed to detect subjective (opinionated) expressions within on-line documents. This task is important in many Opinion Mining and Sentiment Analysis applications. However, the opinion extraction process is often done with rough content-based features. In this paper, we study the role of structural features to guide sentence-level subjectivity classification. More specifically, we combine classical n-grams features with novel features defined from positional information and from the discourse structure of the sentences. Our experiments show that these new features are beneficial in the classification of subjective sentences.
Jose M. Chenlo, David E. Losada

Recall-Oriented Evaluation for Information Retrieval Systems

In a recall context, the user is interested in retrieving all relevant documents rather than retrieving a few that are at the top of the results list. In this article we propose ROM (Recall Oriented Measure) which takes into account the main elements that should be considered in evaluating information retrieval systems while ordering them in a way explicitly adapted to a recall context.
Bissan Audeh, Philippe Beaune, Michel Beigbeder

Using ‘Search Transitions’ to Study Searchers’ Investment of Effort: Experiences with Client and Server Side Logging

We are investigating the value of using the concept ‘search transition’ for studying effort invested in information search processes. In this paper we present findings from a comparative study of data collected from client and server side loggings. The purpose is to see what factors of effort can be captured from the two logging methods. The data stems from studies of searchers interaction with an XML information retrieval system. The searchers interaction was simultaneously logged by a screen capturing software and the IR systems logging facility. In order to identify the advantages and disadvantages we have compared the data gathered from a selection of sessions. We believe there is value in identifying the effort investment in a search process, both to evaluate the quality of the search system and to suggest areas of system intervention in the search process, if effort investment can be detected dynamically.
Nils Pharo, Ragnar Nordlie

An IR-Inspired Approach to Recovering Named Entity Tags in Broadcast News

We propose a new approach to improving named entity recognition (NER) in broadcast news speech data. The approach proceeds in two key steps: (1) we automatically detect document alignments between highly similar speech documents and corresponding written news stories that are easily obtainable from the Web; (2) we employ term expansion techniques commonly used in information retrieval to recover named entities that were initially missed by the speech transcriber. We show that our method is able to find named entities missing in the transcribed speech data, and additionally to correct incorrectly assigned named entity tags. Consequently, our novel approach improves state-of-the-art NER results from speech data both in terms of recall and precision.
Niraj Shrestha, Ivan Vulić, Marie-Francine Moens

An Exploratory Study on Content-Based Filtering of Call for Papers

Due to the increasing number of conferences, researchers need to spend more and more time browsing through the respective calls for papers (CFPs) to identify those conferences which might be of interest to them. In this paper we study several content-based techniques to filter CFPs retrieved from the web. To this end, we explore how to exploit the information available in a typical CFP: a short introductory text, topics in the scope of the conference, and the names of the people in the program committee. While the introductory text and the topics can be directly used to model the document (e.g. to derive a tf-idf weighted vector), the names of the members of the program committee can be used in several indirect ways. One strategy we pursue in particular is to take into account the papers that these people have recently written. Along similar lines, to find out the research interests of the users, and thus to decide which CFPs to select, we look at the abstracts of the papers that they have recently written. We compare and contrast a number of approaches based on the vector space model and on generative language models.
Germán Hurtado Martín, Steven Schockaert, Chris Cornelis, Helga Naessens

Domain Adaptation of General Natural Language Processing Tools for a Patent Claim Visualization System

In this study we present a first step towards domain adaptation of Natural Language Processing (NLP) tools, which we use in a pipeline for a system to create a dependency claim graph (DCG). Our system takes advantage of patterns occurring in the patent domain notably of the characteristic of patent claims of containing technical terminology combined with legal rhetorical structure. Such patterns make the sentences generally difficult to understand for people, but can be leveraged by our system to assist the cognitive process of understanding the innovation described in the claim. We present this set of patterns, together with an extensive evaluation showing that the results are, even for this relatively difficult genre, at least 90% correct, as identified by both expert and non-expert users. The assessment of each generated DCG is based upon completeness, connection and a set of pre-defined relations.
Linda Andersson, Mihai Lupu, Allan Hanbury

Concept Extraction from Patent Images Based on Recursive Hybrid Classification

Recently, the intellectual property and information retrieval communities have shown interest in patent image analysis, which could augment the current practices of patent search by image classification and concept extraction. This article presents an approach for concept extraction from patent images, which relies upon recursive hybrid (text and visual-based) classification. To evaluate this approach, we selected a dataset from the footwear domain.
Anastasia Moumtzidou, Stefanos Vrochidis, Ioannis Kompatsiaris

Towards a Framework for Human (Manual) Information Retrieval

Information retrieval work has mostly focused on the automatic process of filtering and retrieving documents based on a query search. The subsequesnt manual process by which the information seeker will scrutinise and triage through the retrieved documents is not thoroughly understood. Limited work, particularly for human factors in web searching have been reported on but this is usually case specific and difficult to cross reference or cross examine and compare. Furthermore, the majority of the work is also qualitatively reported on while there are no clear measures for empirically and quantitatively evaluating user behaviour and interactive systems. In this work, we introduce a universal framework which conceptualises the behavioural and procedural human process. Beyond the scholarly contribution, the framework can be employed and adapted in order for practitioners and researchers to have a foundation for evaluating both user performance and interactive systems.
Fernando Loizides, George Buchanan

A Generalized Framework for Integrated Professional Search Systems

This paper presents a framework for Integrated Professional Search (IPS) systems. The framework provides a context to better classify and characterize what IPS systems are, but it is also used to better understand the design space of IPS systems. The framework suggests an architecture and methodology to build loosely coupled IPS systems in which each of their search tools have little or no knowledge of the details of other search tools or components. The paper also describes, as a case study of using the proposed framework, the architecture and the main functionalities of a patent search system and a medical search system. The integration of different search tools into these search systems is discussed to demonstrate the flexibility of the framework that facilitates external search tools to be integrated into the search systems.
Michail Salampasis, Allan Hanbury


Weitere Informationen