2010 | OriginalPaper | Buchkapitel
Combining Wikipedia-Based Concept Models for Cross-Language Retrieval
verfasst von : Benjamin Roth, Dietrich Klakow
Erschienen in: Advances in Multidisciplinary Retrieval
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
As a low-cost ressource that is up-to-date, Wikipedia recently gains attention as a means to provide cross-language brigding for information retrieval. Contradictory to a previous study, we show that standard Latent Dirichlet Allocation (LDA) can extract cross-language information that is valuable for IR by simply normalizing the training data. Furthermore, we show that LDA and Explicit Semantic Analysis (ESA) complement each other, yielding significant improvements when combined. Such a combination can significantly contribute to retrieval based on machine translation, especially when query translations contain errors. The experiments were perfomed on the Multext JOC corpus und a CLEF dataset.