2015 | OriginalPaper | Chapter
Efficient String Similarity Search: A Cross Pivotal Based Approach
Authors : Fei Bi, Lijun Chang, Wenjie Zhang, Xuemin Lin
Published in: Database Systems for Advanced Applications
Publisher: Springer International Publishing
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
In this paper, we study the problem of string similarity search with edit distance constraint; it retrieves all strings in a string database that are similar to a query string. The state-of-the-art approaches employ the concept of pivotal set, which is a set of non-overlapping signatures, for indexing and query processing. However, they do not fully exploit the pruning power potential of the pivotal sets by using only the pivotal set of the query string or the data strings. To remedy this issue, in this paper we propose a cross pivotal based approach to fully exploiting the pruning power of multiple pivotal sets. We prove theoretically that our cross pivotal filter has stronger pruning power than state-of-the-art filters. We also propose a more efficient algorithm with better time complexity for pivotal selection. Moreover, we further develop two advanced filters to prune unpromising single-match candidates which are the set of candidates introduced by one and only one of the probing signatures. Our experimental results on real datasets demonstrate that our cross pivotal based approach significantly outperforms the state-of-the-art approaches.