nach oben

2012 | Buch

Kapitel lesen Erstes Kapitel lesen

Data Fusion in Information Retrieval

verfasst von: Shengli Wu

Verlag: Springer Berlin Heidelberg

Buchreihe : Adaptation, Learning, and Optimization

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

The technique of data fusion has been used extensively in information retrieval due to the complexity and diversity of tasks involved such as web and social networks, legal, enterprise, and many others. This book presents both a theoretical and empirical approach to data fusion. Several typical data fusion algorithms are discussed, analyzed and evaluated. A reader will find answers to the following questions, among others:

What are the key factors that affect the performance of data fusion algorithms significantly?

What conditions are favorable to data fusion algorithms?

CombSum and CombMNZ, which one is better? and why?

What is the rationale of using the linear combination method?

How can the best fusion option be found under any given circumstances?

Inhaltsverzeichnis

Frontmatter

Introduction

Data fusion has been used in many different application areas. However, the data fusion technique in information retrieval is special mainly due to its result presentation style – a ranked list of documents. In this introductory part of the book, we discuss several typical data fusion methods in information retrieval, including CombSum, CombMNZ, the linear combination method, the Borda count, the Condorcet voting, and others. At the end of this section, an introductory remark is given for several major issues of data fusion in information retrieval.

Shengli Wu

Evaluation of Retrieval Results

Abstract

Evaluating retrieval results is a key issue for information retrieval systems as well as data fusion methods. One common assumption is that the retrieval result is presented as a ranked list of documents. Under such an assumption, we review some retrieval evaluation systems including binary relevance judgment, graded relevance judgment, and incomplete relevance judgment.We also introduce some metrics that will be used later in this book.

Shengli Wu

Score Normalization

Abstract

Score normalization is relevant to data fusion since very often those scores provided by component systems are not comparable or there is no scoring information at all. Therefore, score normalization is very often served as a preliminary step to data fusion. Score normalization methods can be divided into two categories: linear and non-linear. If no scores are provided, then it is possible to use some methods that can transform ranking into scores. In each case, we discuss several different methods.

Shengli Wu

Observations and Analyses

Abstract

Due to the uncertainty involved, it is difficult to answer questions such as why and how data fusion can improve retrieval performance, which data fusion method is better, or in what condition the data fusion methods can improve retrieval performance. However, in the last two decades, some effort has been taken to try to find some sort of answer to these questions. It is understandable that statistical analysis plays a very important role. In this chapter, we are going to discuss some progress already made in this regard.

Shengli Wu

The Linear Combination Method

Abstract

In data fusion, the linear combination method is a very flexible method since different weights can be assigned to different systems. When using the linear combination method, how to decide weights is a key issue. Profitable weights assignment is affected by a few factors mainly including performance of all component results and similarity among component results. In this chapter, we are going to discuss a few different methods for weights assignment. Extensive experimental results with TREC data are given to evaluate the effectiveness of these weights assignment methods and to reveal the properties of the linear combination data fusion methods.

Shengli Wu

A Geometric Framework for Data Fusion

Abstract

Quite a few data fusion methods have been proposed, but questions such as why data fusion can bring improvement in effectiveness and what are the favourable conditions for data fusion algorithms are only partially or vaguely answered due to the uncertainty of the problem. In this chapter, we set up a geometric framework to formally describe score-based data fusion methods, in which each component result returned from an information retrieval system for a given query is represented as a point in a multi-dimensional space. The performance of any result and the similarity between any pair of results can be evaluated by the same metric – the Euclidean distance. Then all the component results and the fused results can be explained using geometrical principles. In such a framework, data fusion becomes a deterministic problem. The performance of the fused result is determined by the performances of all component results and the similarities among all of them. Several interesting features of the centroid-based data fusion method and the linear combination method can be deduced. As a formal model of data fusion, this framework enable us to have a better understanding of the nature of data fusion and use the data fusion technique more precisely and effectively [105].

Shengli Wu

Ranking-Based Fusion

Abstract

In information retrieval, retrieval results are usually presented as a ranked list of documents for a given information need. Thus ranking-based fusion methods are applicable even no scoring information is provided for all the documents involved. In this chpater, we are going to investigate ranking-based fusion methods especially the Borda count, the Condorcet voting and the weighted Condorect voting.

Shengli Wu

Fusing Results from Overlapping Databases

Abstract

With the rapid development of the Internet and WWW, numerous on-line resources are available due to efforts of research groups, companies, government agencies, and individuals all over the globe. How to provide an effective and efficient solution to access such a huge collection of resources for end users is a demanding issue, which is the major goal of federated search, also known as distributed information retrieval. In this section, we are going to discuss the two key issues involved, i.e., resource selection and especially result merging, with the assumption that partial overlap exists among different resources. In a sense, merging results from overlapping databases looks somewhat like the data fusion problem, in which results are obtained from identical databases.

Shengli Wu

Application of the Data Fusion Technique

Abstract

Different aspects of the data fusion technique have been addressed so far. Surprisingly, data fusion can be used in many different situations. In this chapter, we are going to discuss some applications of it.

Shengli Wu

Backmatter

Titel: Data Fusion in Information Retrieval
verfasst von: Shengli Wu
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-28866-1
Print ISBN: 978-3-642-28865-4
DOI: https://doi.org/10.1007/978-3-642-28866-1

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Introduction

Evaluation of Retrieval Results

Score Normalization

Observations and Analyses

The Linear Combination Method

A Geometric Framework for Data Fusion

Ranking-Based Fusion

Fusing Results from Overlapping Databases

Application of the Data Fusion Technique

Backmatter

Premium Partner