Skip to main content

Information Retrieval Models

Foundations & Relationships

  • Book
  • © 2013

Overview

This is a preview of subscription content, log in via an institution to check access.

Access this book

eBook USD 29.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 37.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

Table of contents (4 chapters)

About this book

Information Retrieval (IR) models are a core component of IR research and IR systems. The past decade brought a consolidation of the family of IR models, which by 2000 consisted of relatively isolated views on TF-IDF (Term-Frequency times Inverse-Document-Frequency) as the weighting scheme in the vector-space model (VSM), the probabilistic relevance framework (PRF), the binary independence retrieval (BIR) model, BM25 (Best-Match Version 25, the main instantiation of the PRF/BIR), and language modelling (LM). Also, the early 2000s saw the arrival of divergence from randomness (DFR). Regarding intuition and simplicity, though LM is clear from a probabilistic point of view, several people stated: "It is easy to understand TF-IDF and BM25. For LM, however, we understand the math, but we do not fully understand why it works." This book takes a horizontal approach gathering the foundations of TF-IDF, PRF, BIR, Poisson, BM25, LM, probabilistic inference networks (PIN's), and divergence-basedmodels. The aim is to create a consolidated and balanced view on the main models. A particular focus of this book is on the "relationships between models." This includes an overview over the main frameworks (PRF, logical IR, VSM, generalized VSM) and a pairing of TF-IDF with other models. It becomes evident that TF-IDF and LM measure the same, namely the dependence (overlap) between document and query. The Poisson probability helps to establish probabilistic, non-heuristic roots for TF-IDF, and the Poisson parameter, average term frequency, is a binding link between several retrieval models and model parameters. Table of Contents: List of Figures / Preface / Acknowledgments / Introduction / Foundations of IR Models / Relationships Between IR Models / Summary & Research Outlook / Bibliography / Author's Biography / Index

Authors and Affiliations

  • Queen Mary University of London, United Kingdom

    Thomas Roelleke

About the author

Thomas Roelleke holds a Dr rer nat (Ph.D.) and a Diplom der Ingenieur-Informatik (MSc in Engineering & Computer Science) of the University of Dortmund. After school education in Meschede, Germany, he attended the b.i.b., the Nixdorf Computer school for professions in informatics, in Paderborn. Nixdorf Computer awarded him a sales and management trainee program, after which he was appointed as product consultant in the Unix/DB/4GL marketing of Nixdorf Computer. He studied Diplom-Ingenieur-Informatik at the University of Dortmund (UniDo), and was later a lecturer/researcher at UniDo. His research focused on probabilistic reasoning and knowledge representations, hypermedia retrieval, and the integration of retrieval and database technologies. His lecturing included information/database systems, object-oriented design and programming, and software engineering. He obtained his Ph.D. in 1999 for the thesis titled "POOL: A probabilistic object-oriented logic for the representation and retrieval of complex objects - a model for hypermedia retrieval." Since 1999, he has been working as a strategic IT consultant, founder and director of small businesses, research fellow, and lecturer at the Queen Mary University of London (QMUL). Research contributions include a probabilistic relational algebra (PRA), a probabilistic object-oriented logic (POOL), the relational Bayes, a matrix-based framework for IR, a parallel derivation of IR models, a probabilistic interpretation of the BM25-TF based on "semi-subsumed" event occurrences, and theoretical studies of retrieval models. Thomas Roelleke lives in England, in a village in the middle between buzzy London and beautiful East Anglia.

Bibliographic Information

Publish with us