Erschienen in:

2003 | OriginalPaper | Buchkapitel

An Unbiased Generative Model for Setting Dissemination Thresholds

verfasst von : Yi Zhang, Jamie Callan

Erschienen in: Language Modeling for Information Retrieval

Verlag: Springer Netherlands

Enthalten in: Professional Book Archive

Zugang erhalten

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Information filtering systems based on statistical retrieval models usually compute a numeric score that indicates how well each document matches each profile. Documents with scores above profile-specific dissemination thresholds are delivered. Optimal dissemination thresholds are usually difficult to determine a priori, so they are often learned during filtering, using relevance feedback about disseminated documents. However, the scores of disseminated documents are a biased sample of the complete distribution of document scores, which causes some algorithms to learn suboptimal thresholds.This chapter presents a generative method of adjusting dissemination thresholds that explicitly models and compensates for this bias. The new algorithm, which is based on the Maximum Likelihood principle, jointly estimates the parameters of the density distributions for relevant and non-relevant documents and the ratio of relevant to non-relevant documents in the region around the dissemination threshold. Experiments demonstrate its effectiveness when its underlying assumptions about document scores are true, and illustrate its behavior when its assumptions don’t match the actual distribution of document scores.

Springer Professional