2003 | OriginalPaper | Buchkapitel
Applications of Score Distributions in Information Retrieval
verfasst von : R. Manmatha
Erschienen in: Language Modeling for Information Retrieval
Verlag: Springer Netherlands
Enthalten in: Professional Book Archive
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Researchers have recently shown that document scores of a number of different text search engines may be fitted on a per query basis using an exponential distribution for the set of non-relevant documents and a normal distribution for the set of relevant documents. This model fits a large number of different search engines including probabilistic search engines like INQUERY, vector space search engines like SMART and also LSI search engines and a language model engine. The model also appears to be true of search engines operating on a number of different languages. This leads to the hypothesis that all ‘good’ text search engines operating on any language have similar characteristics.We then show that given a query for which relevance information is not available, a mixture model consisting of an exponential and a normal distribution can be fitted to the score distribution. These distributions can be used to map the scores of a search engine to probabilities.This model has many possible applications. For example, the outputs of different search engines can be combined by averaging the probabilities (optimal if the search engines are independent) or by using the probabilities to select the best engine for each query. It has also been applied to filtering. We discuss these and other applications of score modeling in information retrieval.