Skip to main content


Swipe to navigate through the articles of this issue

20-07-2020 | Special Issue Paper | Issue 6/2020 Open Access

The VLDB Journal 6/2020

Faster & strong: string dictionary compression using sampling and fast vectorized decompression

The VLDB Journal > Issue 6/2020
Robert Lasch, Ismail Oukid, Roman Dementiev, Norman May, Suleyman S. Demirsoy, Kai-Uwe Sattler
Important notes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


String dictionaries constitute a large portion of the memory footprint of database applications. While strong string dictionary compression algorithms exist, these come with impractical access and compression times. Therefore, lightweight algorithms such as front coding (PFC) are favored in practice. This paper endeavors to make strong string dictionary compression practical. We focus on Re-Pair Front Coding (RPFC), a grammar-based compression algorithm, since it consistently offers better compression ratios than other algorithms in the literature. To accelerate compression times, we propose block-based RPFC (BRPFC) which consists in independently compressing small blocks of the dictionary. For further accelerated compression times especially on large string dictionaries, we also propose an alternative version of BRPFC that uses sampling to speed up compression. Moreover, to accelerate access times, we devise a vectorized access method, using \(\hbox {Intel}^{\circledR }\) Advanced Vector Extensions 512 (\(\hbox {Intel}^{\circledR }\) AVX-512). Our experimental evaluation shows that sampled BRPFC offers compression times up to 190 \(\times \) faster than RPFC, and random string lookups 2.3 \(\times \) faster than RPFC on average. These results move our modified RPFC into a practical range for use in database systems because the overhead of Re-Pair-based compression for access times can be reduced by 2 \(\times \).
About this article

Other articles of this Issue 6/2020

The VLDB Journal 6/2020 Go to the issue

Premium Partner

    Image Credits