2013 | OriginalPaper | Chapter
Improving ESA with Document Similarity
Authors : Tamara Polajnar, Nitish Aggarwal, Kartik Asooja, Paul Buitelaar
Published in: Advances in Information Retrieval
Publisher: Springer Berlin Heidelberg
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
Explicit semantic analysis (ESA) is a technique for computing semantic relatedness between natural language texts. It is a document-based distributional model similar to latent semantic analysis (LSA), which is often built on the Wikipedia database when it is required for general English usage. Unlike LSA, however, ESA does not use dimensionality reduction, and therefore it is sometimes unable to account for similarity between words that do not co-occur with same concepts, even if their concepts themselves cover similar subjects. In the Wikipedia implementation ESA concepts are Wikipedia articles, and the Wikilinks between the articles are used to overcome the concept-similarity problem. In this paper, we provide two general solutions for integration of concept-concept similarities into the ESA model, ones that do not rely on a particular corpus structure and do not alter the explicit concept-mapping properties that distinguish ESA from models like LSA and latent Dirichlet allocation (LDA).