2015 | OriginalPaper | Buchkapitel
Computing Probability Threshold Set Similarity on Probabilistic Sets
verfasst von : Lei Wang, Ming Gao, Rong Zhang, Cheqing Jin, Aoying Zhou
Erschienen in: Web-Age Information Management
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Currently, the computation of set similarity has become an increasingly important tool in many real-world applications, such as near-duplicate detection, data cleaning and record linkage, etc., in which sets often are uncertain due to date missing, imprecise and noise, etc. The challenge of evaluating similarity between probabilistic sets mainly stems from the exponential blowup in the number of possible worlds induced by uncertainty. In this paper, we define the probability threshold set similarity (
PTSS
) between two probabilistic sets based on the possible world semantics and propose an exact solution to compute
PTSS
via the dynamic programming. To speed up the computation of the probability threshold set query (
PTSQ
), we derive an efficient and effective pruning rule for
PTSQ
. Finally, we conduct extensive experiments to verify the effectiveness and efficiency of our algorithms using both real and synthetic datasets.