Skip to main content

2001 | OriginalPaper | Buchkapitel

Declustering Web Content Indices for Parallel Information Retrieval

verfasst von : Yoojin Chung, Hyuk-Chul Kwon, Sang-Hwa Chung, Kwang Ryel Ryu

Erschienen in: Web Intelligence: Research and Development

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

We consider an information retrieval (IR) system on a low-cost highperformance PC cluster environment. The IR system replicates the Web pages locally, it is indexed by the inverted-index file (IIF), and the vector space model is used as ranking strategy. In the IR system, the inverted-index file (IIF) is partitioned into pieces using the lexical and the greedy declustering methods. The lexical method assigns each of the terms in the IIF lexicographically to each of the processing nodes in turn and the greedy one is based on the probability of co-occurrence of an arbitrary pair of terms in the IIF and distributed to the cluster nodes to be stored on each node’s hard disk. For each incoming user’s query with multiple terms, terms are sent to the corresponding nodes that contain the relevant pieces of the IIF to be evaluated in parallel. We study how query performance is affected by two declustering methods with various-sized IIF. According to the experiments, the greedy method shows about 3.7% enhancement overall when compared with the lexical method.

Metadaten
Titel
Declustering Web Content Indices for Parallel Information Retrieval
verfasst von
Yoojin Chung
Hyuk-Chul Kwon
Sang-Hwa Chung
Kwang Ryel Ryu
Copyright-Jahr
2001
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/3-540-45490-X_41

Premium Partner