Skip to main content

2004 | OriginalPaper | Buchkapitel

Choosing the Right Bigrams for Information Retrieval

verfasst von : Maojin Jiang, Eric Jensen, Steve Beitzel, Shlomo Argamon

Erschienen in: Classification, Clustering, and Data Mining Applications

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

After more than 30 years of research in information retrieval, the dominant paradigm remains the “bag-of-words,” in which query terms are considered independent of their coocurrences with each other. Although there has been some work on incorporating phrases or other syntactic information into IR, such attempts have given modest and inconsistent improvements, at best. This paper is a first step at investigating more deeply the question of using bigrams for information retrieval. Our results indicate that only certain kinds of bigrams are likely to aid retrieval. We used linear regression methods on data from TREC 6, 7, and 8 to identify which bigrams are able to help retrieval at all. Our characterization was then tested through retrieval experiments using our information retrieval engine, AIRE, which implements many standard ranking functions and retrieval utilities.

Metadaten
Titel
Choosing the Right Bigrams for Information Retrieval
verfasst von
Maojin Jiang
Eric Jensen
Steve Beitzel
Shlomo Argamon
Copyright-Jahr
2004
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-17103-1_50