1 Introduction
-
a direct fast look-up of string queries on the GPU,
-
hybrid scenarios handling shorter keys up to a fixed maximum length on GPU and longer keys on CPU, and
-
an extensive evaluation of these approaches on a high performance computing system running recent many-core CPUs and GPUs for scientific calculations.
-
APU-based search with SYCL and APU look-up of the insert position and separate CPU/GPU insertion for libcuckoo.
2 Related work
3 Basics
3.1 Hash tables
3.2 Hardware acceleration
4 Parallel hybrid GPU/CPU hash table for string keys
4.1 Approach
4.2 Data structure and search
5 Evaluation
5.1 Benchmark framework
5.2 Benchmark environment
5.3 Benchmark results
-
libcuckoo CPU/libcuckoo GPU (\(L_C\)/\(L_G\)),
-
robin-map CPU/robin-map GPU (\(R_C\)/\(R_G\)),
-
libcuckoo GPU approach on CPU (\(L_{GC}\)),
-
robin-map GPU approach on CPU (\(R_{GC}\)).
6 APU acceleration
6.1 Overview
6.2 Shared memory
6.3 APU implementation
7 APU benchmark results
-
Original insert on CPU (\(LI_C\)).
-
Finding on iGPU and insert on CPU (\(LI_{GC}\)).
-
Finding on iGPU and insert on iGPU separate kernels (\(LI_{GGS}\)).
-
Finding on iGPU and insert on iGPU in one kernel (\(LI_{GGC}\)).