Many RDF descriptions today are
data they also feature much
text. Text-rich RDF data is frequently queried via predicates matching structured data, combined with string predicates for textual constraints (
). Evaluating hybrid queries efficiently requires means for
. Previous works on selectivity estimation, however, suffer from inherent drawbacks, which are reflected in efficiency and effectiveness issues. We propose a novel estimation approach,
, which exploits topic models as data synopsis. This way, we capture correlations between structured and unstructured data in a
holistic and compact
manner. We study TopGuess in a theoretical analysis and show it to guarantee a linear space complexity w.r.t. text data size. Further, we show selectivity estimation time complexity to be independent from the synopsis size. In experiments on real-world data, TopGuess allowed for great improvements in estimation accuracy, without sacrificing efficiency.