Searching databases of protein sequences for those proteins that match patterns represented as profile HMMs is a widely performed bioinformatics task. The standard tool for the task is HMMER version 3 from Sean Eddy. HMMER3 achieved significant improvements in performance over version 2 through the introduction of a heuristic filter called the Multiple Segment Viterbi algorithm (MSV) and the use of native SIMD instruction set on modern CPUs. Our objective was to further improve performance by using a general-purpose graphical processing unit (GPU) and the CUDA software environment from Nvidia.
An execution profile of HMMER3 identifies the MSV filter as a code hotspot that consumes over 75% of the total execution time. We applied a number of well-known optimization strategies for coding GPUs in order to implement a CUDA version of the MSV filter.
The results show that our implementation achieved 1.8x speedup over the single-threaded HMMER3 CPU SSE2 implementation on average. The experiments used a modern Kepler architecture GPU from Nvidia that has 768 cores running at 811 Mhz and an Intel Core i7-3960X 3.3GHz CPU overclocked at 4.6GHz.
For HMMER2 there was a significant speed-up of an order of magnitude obtained by implementations using GPUs. Such gains seem out of reach for HMMER3.