ABSTRACT
We discuss the mapping of elementary ray tracing operations---acceleration structure traversal and primitive intersection---onto wide SIMD/SIMT machines. Our focus is on NVIDIA GPUs, but some of the observations should be valid for other wide machines as well. While several fast GPU tracing methods have been published, very little is actually understood about their performance. Nobody knows whether the methods are anywhere near the theoretically obtainable limits, and if not, what might be causing the discrepancy. We study this question by comparing the measurements against a simulator that tells the upper bound of performance for a given kernel. We observe that previously known methods are a factor of 1.5--2.5X off from theoretical optimum, and most of the gap is not explained by memory bandwidth, but rather by previously unidentified inefficiencies in hardware work distribution. We then propose a simple solution that significantly narrows the gap between simulation and measurement. This results in the fastest GPU ray tracer to date. We provide results for primary, ambient occlusion and diffuse interreflection rays.
- Blelloch, G. 1990. Prefix sums and their applications. In Synthesis of Parallel Algorithms, Morgan Kaufmann, J. H. Reif, Ed.Google Scholar
- Ernst, M., and Greiner, G. 2007. Early split clipping for bounding volume hierarchies. In Proc. IEEE/Eurographics Symposium of Interactive Ray Tracing 2007, 73--78. Google ScholarDigital Library
- Günther, J., Popov, S., Seidel, H.-P., and Slusallek, P. 2007. Realtime ray tracing on GPU with BVH-based packet traversal. In Proc. IEEE/Eurographics Symposium on Interactive Ray Tracing 2007, 113--118. Google ScholarDigital Library
- Lindholm, E., Nickolls, J., Oberman, S., and Montrym, J. 2008. Nvidia tesla: A unified graphics and computing architecture. IEEE Micro 28, 2, 39--55. Google ScholarDigital Library
- NVIDIA. 2008. NVIDIA CUDA Programming Guide Version 2.1.Google Scholar
- Reshetov, A., Soupikov, A., and Hurley, J. 2005. Multi-level ray tracing algorithm. ACM Trans. Graph. 24, 3, 1176--1185. Google ScholarDigital Library
- Wächter, C., and Keller, A. 2006. Instant ray tracing: The bounding interval hierarchy. In Proc. Eurographics Symposium on Rendering 2006, 139--149. Google ScholarDigital Library
- Wald, I., Benthin, C., and Wagner, M. 2001. Interactive rendering with coherent ray tracing. Computer Graphics Forum 20, 3, 153--164.Google ScholarDigital Library
- Wald, I., Boulos, S., and Shirley, P. 2007. Ray Tracing Deformable Scenes using Dynamic Bounding Volume Hierarchies. ACM Trans. Graph. 26, 1. Google ScholarDigital Library
- Wald, I., Benthin, C., and Boulos, S. 2008. Getting rid of packets: Efficient SIMD single-ray traversal using multibranching bvhs. In Proc. IEEE/Eurographics Symposium on Interactive Ray Tracing 2008.Google Scholar
- Woop, S. 2004. A Ray Tracing Hardware Architecture for Dynamic Scenes. Tech. rep., Saarland University.Google Scholar
- Zhou, K., Hou, Q., Wang, R., and Guo, B. 2008. Real-time KD-tree construction on graphics hardware. ACM Trans. Graph. 27, 5, 1--11. Google ScholarDigital Library
Recommendations
Toward Real-Time Ray Tracing: A Survey on Hardware Acceleration and Microarchitecture Techniques
Ray tracing has long been considered as the next-generation technology for graphics rendering. Recently, there has been strong momentum to adopt ray tracing--based rendering techniques on consumer-level platforms due to the inability of further ...
The rendering equation
We present an integral equation which generalizes a variety of known rendering algorithms. In the course of discussing a monte carlo solution we also present a new form of variance reduction, called Hierarchical sampling and give a number of ...
An improved illumination model for shaded display
To accurately render a two-dimensional image of a three-dimensional scene, global illumination information that affects the intensity of each pixel of the image must be known at the time the intensity is calculated. In a simplified form, this ...
Comments