Erschienen in:

2004 | OriginalPaper | Buchkapitel

Choosing the Right Bigrams for Information Retrieval

verfasst von : Maojin Jiang, Eric Jensen, Steve Beitzel, Shlomo Argamon

Erschienen in: Classification, Clustering, and Data Mining Applications

Verlag: Springer Berlin Heidelberg

Enthalten in: Professional Book Archive

Zugang erhalten

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

After more than 30 years of research in information retrieval, the dominant paradigm remains the “bag-of-words,” in which query terms are considered independent of their coocurrences with each other. Although there has been some work on incorporating phrases or other syntactic information into IR, such attempts have given modest and inconsistent improvements, at best. This paper is a first step at investigating more deeply the question of using bigrams for information retrieval. Our results indicate that only certain kinds of bigrams are likely to aid retrieval. We used linear regression methods on data from TREC 6, 7, and 8 to identify which bigrams are able to help retrieval at all. Our characterization was then tested through retrieval experiments using our information retrieval engine, AIRE, which implements many standard ranking functions and retrieval utilities.

Springer Professional