Skip to main content

2011 | Buch

Advanced Topics in Information Retrieval

herausgegeben von: Massimo Melucci, Ricardo Baeza-Yates

Verlag: Springer Berlin Heidelberg

Buchreihe : The Information Retrieval Series

insite
SUCHEN

Über dieses Buch

Information retrieval is the science concerned with the effective and efficient retrieval of documents starting from their semantic content. It is employed to fulfill some information need from a large number of digital documents. Given the ever-growing amount of documents available and the heterogeneous data structures used for storage, information retrieval has recently faced and tackled novel applications.

In this book, Melucci and Baeza-Yates present a wide-spectrum illustration of recent research results in advanced areas related to information retrieval. Readers will find chapters on e.g. aggregated search, digital advertising, digital libraries, discovery of spam and opinions, information retrieval in context, multimedia resource discovery, quantum mechanics applied to information retrieval, scalability challenges in web search engines, and interactive information retrieval evaluation. All chapters are written by well-known researchers, are completely self-contained and comprehensive, and are complemented by an integrated bibliography and subject index.

With this selection, the editors provide the most up-to-date survey of topics usually not addressed in depth in traditional (text)books on information retrieval. The presentation is intended for a wide audience of people interested in information retrieval: undergraduate and graduate students, post-doctoral researchers, lecturers, and industrial researchers.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Digital Libraries
Abstract
The Digital Libraries area is initially introduced with a report on initial approaches of designing library automation systems that can be considered “ancestors” of present days systems. After having presented the background to the area, the main concepts that underline present digital library systems are introduced together with a report on the efforts of defining the Digital Library Manifesto and the DELOS Digital Library Reference Model. Considerations on a possible way of improving present digital library systems to make them more user-centered are subsequently given. Finally, interoperability and evaluation issues are faced. The presentation ends with a concluding remark.
Maristella Agosti
Chapter 2. Scalability Challenges in Web Search Engines
Abstract
Continuous growth of the Web and user bases forces web search engine companies to make costly investments on very large compute infrastructures. The scalability of these infrastructures requires careful performance optimizations in every major component of the search engine. Herein, we try to provide a fairly comprehensive coverage of the literature on scalability challenges in large-scale web search engines. We present the identified challenges through an architectural classification, starting from a simple single-node search system and moving towards a hypothetical multi-site web search architecture. We also discuss a number of open research problems and provide recommendations to researchers in the field.
Berkant Barla Cambazoglu, Ricardo Baeza-Yates
Chapter 3. Spam, Opinions, and Other Relationships: Towards a Comprehensive View of the Web Knowledge Discovery
Abstract
“Web mining” or “Web Knowledge Discovery” is the analysis of web resources with data-mining techniques such as classification, clustering, association-rule or graph-structure methods. Its applications pervade much of the software web users interact with on a daily basis: search engines’ indexing and ranking choices, recommender systems’ recommendations, targeted advertising, and many others. An understanding of this fast-moving field is therefore a key component of digital information literacy for everyone and a useful and fascinating extension of knowledge and skills for Information Retrieval researchers and practitioners. This chapter proposes an integrating model of learning cycles involving data, information and knowledge, explains how this model subsumes Information Retrieval and Knowledge Discovery and relates them to one another. We illustrate the usefulness of this model in an introduction to web content/text mining, using the model to structure the activities in this form of Knowledge Discovery. We focus on spam detection, opinion mining and relation mining. The chapter aims at complementing other books and articles that focus on the computational aspects of web mining, by emphasizing the often-neglected context in which these computational analyses take place: the full cycle of Knowledge Discovery, which ranges from application understanding via data understanding, data preparation, modeling and evaluation to deployment.
Bettina Berendt
Chapter 4. The User in Interactive Information Retrieval Evaluation
Abstract
This chapter initially defines what characterizes and distinguishes research frameworks from research models. The Laboratory Research Framework for IR illustrates the case. We define briefly what is meant by the concept of research design, including research questions, and what this chapter regards as central IIR evaluation research settings and variables. This is followed by a description of IIR components, pointing to the elements of the Integrated Cognitive Research Framework for IR that incorporates the Laboratory Framework in a contextual manner. The following sections describe and exemplify: (1) Request types, test persons, task-based simulations of search situations and relevance or performance measures in IIR; (2) Ultra-Light Interactive IR experiments; (3) Interactive-Light IR studies; and (4) Naturalistic field investigations of IIR. The chapter concludes with a summary section, a reference list and a thematically classified bibliography.
Peter Ingwersen
Chapter 5. Aggregated Search
Abstract
To support broad queries or ambiguous information needs, providing diverse search results to users has become increasingly necessary. Aggregated search attempts to achieve diversity by presenting search results from different information sources, so-called verticals (image, video, blog, news, etc.), in addition to the standard web results, on one result page. This comes in contrast with the common search paradigm, where users are provided with a list of information sources, which they have to examine in turn to find relevant content. All major search engines are now performing some levels of aggregated search. This chapter provides an overview of the current developments in aggregated search.
Mounia Lalmas
Chapter 6. Quantum Mechanics and Information Retrieval
Abstract
This chapter aims at providing a survey of the body of scientific literature relevant to Quantum Mechanics (QM) and Information Retrieval (IR). The survey is illustrated with a common notation to fully grasp the contribution of each paper. In particular, the probability aspects of IR and those of QM are emphasized because probability is one of the most important topics of both disciplines.
Massimo Melucci, Keith van Rijsbergen
Chapter 7. Multimedia Resource Discovery
Abstract
This chapter examines the challenges and opportunities of Multimedia Information Retrieval and corresponding search engine applications. Computer technology has changed our access to information tremendously: We used to search authors or titles (which we had to know) in library cards in order to locate relevant books; now we can issue keyword searches within the full text of whole book repositories in order to identify authors, titles and locations of relevant books. What about the corresponding challenge of finding multimedia by fragments, examples and excerpts? Rather than asking for a music piece by artist and title, can we hum its tune to find it? Can doctors submit scans of a patient to identify medically similar images of diagnosed cases in a database? Can your mobile phone take a picture of a statue and tell you about its artist and significance via a service that it sends this picture to?
In an attempt to answer some of these questions we get to know basic concepts of multimedia resource discovery technologies for a number of different query and document types: piggy-back text search, i.e., reducing the multimedia to pseudo text documents; automated annotation of visual components; content-based retrieval where the query is an image; and fingerprinting to match near duplicates.
Some of the research challenges are given by the semantic gap between the simple pixel properties computers can readily index and high-level human concepts; related to this is an inherent technological limitation of automated annotation of images from pixels alone. Other challenges are given by polysemy, i.e., the many meanings and interpretations that are inherent in visual material and the corresponding wide range of a user’s information need.
This chapter demonstrates how these challenges can be tackled by automated processing and machine learning and by utilising the skills of the user, for example through browsing or through a process that is called relevance feedback, thus putting the user at centre stage. The latter is made easier by “added value” technologies, exemplified here by summaries of complex multimedia objects such as TV news, information visualisation techniques for document clusters, visual search by example, and methods to create browsable structures within the collection.
Stefan Rüger
Chapter 8. Information Retrieval in Context
Abstract
The situations in which we search form a context: a complex set of variables describing our intentions, our personal characteristics, the data and systems available for searching, and our physical, social and organizational environments. Different contexts can mean that we want search systems to behave differently or to offer different responses. Creating search systems and search interfaces to be contextually sensitive raises many research challenges: what aspects of a searcher’s context are useful to know about, how can we model context for use by retrieval systems and how do we evaluate search systems in context? In this chapter we will look at why differences in context can affect how we want search systems to operate and ways that we can use contextual information to help search systems behave more intelligently to our changing context. We will examine some new types of system that use different types of user context to learn about users, to adapt their response to different users or to help us make better search decisions.
Ian Ruthven
Chapter 9. Digital Advertising: An Information Scientist’s Perspective
Abstract
Digital online advertising is a form of promotion that uses the Internet and Web for the express purpose of delivering marketing messages to attract customers. Examples of online advertising include text ads that appear on search engine results pages, banner ads, in-text ads, or Rich Media ads that appear on regular web pages, portals, or applications. Over the past 15 years online advertising, a $65 billion industry worldwide in 2009, has been pivotal to the success of the Web. That being said, the field of advertising has been equally revolutionized by the Internet, Web, and more recently, by the emergence of the social web, and mobile devices. This success has arisen largely from the transformation of the advertising industry from a low-tech, human intensive, “Mad Men” way of doing work to highly optimized, quantitative, mathematical, computer- and data-centric processes that enable highly targeted, personalized, performance-based advertising. This chapter provides a clear and detailed overview of the technologies and business models that are transforming the field of online advertising primarily from statistical machine learning and information science perspectives.
James G. Shanahan, Goutham Kurra
Backmatter
Metadaten
Titel
Advanced Topics in Information Retrieval
herausgegeben von
Massimo Melucci
Ricardo Baeza-Yates
Copyright-Jahr
2011
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-20946-8
Print ISBN
978-3-642-20945-1
DOI
https://doi.org/10.1007/978-3-642-20946-8

Neuer Inhalt