2015 | OriginalPaper | Buchkapitel
MIST: Top-k Approximate Sub-string Mining Using Triplet Statistical Significance
verfasst von : Sourav Dutta
Erschienen in: Advances in Information Retrieval
Verlag: Springer International Publishing
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Efficient extraction of strings or sub-strings similar to an input query string forms a necessity in applications like
instant search, record linkage
, etc., where the similarity between two strings is usually quantified by
edit
distance. This paper proposes a novel top-k approximate sub-string matching algorithm,
MIST
, for a given query, based on
Chi-squared
statistical significance of string triplets, thereby avoiding expensive edit distance computation. Experiments with real-life data validate the run-time effectiveness and accuracy of our algorithm.