2012 | OriginalPaper | Buchkapitel
A Log-Logistic Model-Based Interpretation of TF Normalization of BM25
verfasst von : Yuanhua Lv, ChengXiang Zhai
Erschienen in: Advances in Information Retrieval
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
The effectiveness of BM25 retrieval function is mainly due to its sub-linear term frequency (TF) normalization component, which is controlled by a parameter
k
1
. Although BM25 was derived based on the classic probabilistic retrieval model, it has been so far unclear how to interpret its parameter
k
1
probabilistically, making it hard to optimize the setting of this parameter. In this paper, we provide a novel probabilistic interpretation of the BM25 TF normalization and its parameter
k
1
based on a log-logistic model for the probability of seeing a document in the collection with a given level of TF. The proposed interpretation allows us to derive different approaches to estimation of parameter
k
1
based solely on the current collection without requiring any training data, thus effectively eliminating one free parameter from BM25. Our experiment results show that the proposed approaches can accurately predict the optimal
k
1
without requiring training data and achieve better or comparable retrieval performance to a well-tuned BM25 where
k
1
is optimized based on training data.