2014 | OriginalPaper | Chapter
Holistic and Compact Selectivity Estimation for Hybrid Queries over RDF Graphs
Authors : Andreas Wagner, Veli Bicer, Thanh Tran, Rudi Studer
Published in: The Semantic Web – ISWC 2014
Publisher: Springer International Publishing
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
Many RDF descriptions today are
text-rich
: besides
structured
data they also feature much
unstructured
text. Text-rich RDF data is frequently queried via predicates matching structured data, combined with string predicates for textual constraints (
hybrid queries
). Evaluating hybrid queries efficiently requires means for
selectivity estimation
. Previous works on selectivity estimation, however, suffer from inherent drawbacks, which are reflected in efficiency and effectiveness issues. We propose a novel estimation approach,
TopGuess
, which exploits topic models as data synopsis. This way, we capture correlations between structured and unstructured data in a
holistic and compact
manner. We study TopGuess in a theoretical analysis and show it to guarantee a linear space complexity w.r.t. text data size. Further, we show selectivity estimation time complexity to be independent from the synopsis size. In experiments on real-world data, TopGuess allowed for great improvements in estimation accuracy, without sacrificing efficiency.