2014 | OriginalPaper | Buchkapitel
Fast Parallel Connected Components Algorithms on GPUs
verfasst von : Guojing Cong, Paul Muzio
Erschienen in: Euro-Par 2014: Parallel Processing Workshops
Verlag: Springer International Publishing
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
We study parallel connected components algorithms on GPUs in comparison with CPUs. Although straightforward implementation of PRAM algorithms performs relatively better on GPUs than on CPUs, the GPU memory subsystem performance is poor due to non-coalesced random accesses.
We argue that generic sort-based access coalescing is too costly on GPUs. We propose a new coalescing technique and a new meta algorithm to improve locality and performance. Our optimization achieves up to 2.7 times speedup over the straightforward implementation. Interestingly, our optimization also works well on CPUs.
Comparing the best-performing algorithms on GPUs and CPUs, we find our new algorithm is the fastest on GPUs and the second fastest on CPUs, while the parallel Rem’s algorithm is the fastest on CPUs but does not perform well on GPUs due to path divergence.