Abstract
We introduce GRAMPS, a programming model that generalizes concepts from modern real-time graphics pipelines by exposing a model of execution containing both fixed-function and application-programmable processing stages that exchange data via queues. GRAMPS allows the number, type, and connectivity of these processing stages to be defined by software, permitting arbitrary processing pipelines or even processing graphs. Applications achieve high performance using GRAMPS by expressing advanced rendering algorithms as custom pipelines, then using the pipeline as a rendering engine. We describe the design of GRAMPS, then evaluate it by implementing three pipelines, that is, Direct3D, a ray tracer, and a hybridization of the two, and running them on emulations of two different GRAMPS implementations: a traditional GPU-like architecture and a CPU-like multicore architecture. In our tests, our GRAMPS schedulers run our pipelines with 500 to 1500KB of queue usage at their peaks.
- AMD. 2008a. AMD radeon HD 4800 product documentation. http://ati.amd.com/products/radeonhd4800.Google Scholar
- AMD. 2008b. ATI stream computing web site. http://ati.amd.com/technology/streamcomputing/.Google Scholar
- Bavoil, L., Callahan, S. P., Lefohn, A., Comba, J. L. D., and Silva, C. T. 2007. Multi-Fragment effects on the GPU using the k-buffer. In Proceedings of the Symposium on Interactive 3D Graphics and Games. ACM, New York, 97--104. Google ScholarDigital Library
- Blythe, D. 2006. The Direct3D 10 system. ACM Trans. Graphics 25, 3, 724--734. Google ScholarDigital Library
- Boulos, S., Edwards, D., Lacewell, J., Kniss, J., Kautz, J., Shirley, P., and Wald, I. 2007. Packet-Based Whitted and distribution ray tracing. In Proceedings of the Graphics Interface Conference, 177--184. Google ScholarDigital Library
- Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., and Hanrahan, P. 2004. Brook for GPUs: Stream computing on graphics hardware. ACM Trans. Graphics 23, 3, 777--786. Google ScholarDigital Library
- Chen, J., Gordon, M. I., Thies, W., Zwicker, M., Pulli, K., and Durand, F. 2005. A reconfigurable architecture for load-balanced rendering. In Proceedings of the Workshop on Graphics Hardware. ACM, New York, 71--80. Google ScholarDigital Library
- Dally, W. J., Hanrahan, P., Erez, M., Knight, T. J., Labonte, F., A., J.-H., Jayasena, N., Kapasi, U. J., Das, A., Gummaraju, J., and Buck, I. 2003. Merrimac: Supercomputing with streams. In Proceedings of the ACM/IEEE Conference on Super Computing (SC'03). Google ScholarDigital Library
- Das, A., Dally, W. J., and Mattson, P. 2006. Compiling for stream processing. In Proceedings of the International Conference on Paralel Computing Technologies (PaCT'06), 33--42. Google ScholarDigital Library
- Foley, T. and Sugerman, J. 2005. KD-tree acceleration structures for a GPU raytracer. In Proceedings of the Workshop on Graphics Hardware. ACM, New York, 15--22. Google ScholarDigital Library
- Hall, R. and Greenberg, D. 1983. A testbed for realistic image synthesis. IEEE Comput. Graph. Appl. 3, 8, 10--20. Google ScholarDigital Library
- Hasselgren, J. and Akenine-Möller, T. 2007. PCU: The programmable culling unit. ACM Trans. Graphics 26, 3, 92. Google ScholarDigital Library
- Horn, D., Sugerman, J., Houston, M., and Hanrahan, P. 2007. Interactive k-D tree GPU raytracing. In Proceedings of the Symposium on Interactive 3D Graphics and Games. ACM, New York. Google ScholarDigital Library
- Intel. 2008. Intel thread building blocks product documentation. http://www.intel.com/cd/software/products/asmo-na/eng/294797.htm.Google Scholar
- Kapasi, U., Dally, W. J., Rixner, S., Owens, J. D., and Khailany, B. 2002. The Imagine stream processor. In Proceedings IEEE International Conference on Computer Design, 282--288. Google ScholarDigital Library
- Kongetira, P., Aingaran, K., and Olukotun, K. 2005. Niagara: A 32-way multithreaded SPARC processor. IEEE Micro 25, 2, 21--29. Google ScholarDigital Library
- Kumar, S., Hughes, C., and Nguyen, A. 2007. Carbon: Architectural support for fine-grained parallelism on chip multiprocessors. In Proceedings of the 34th Annual International Conference on Computer Architecture, 162--173. Google ScholarDigital Library
- Lindholm, E., Nickolls, J., Obermanan, S., and Montrym, J. 2008. NVIDIA Tesla: A graphics and computing architecture. IEEE Micro 28, 2, 39--55. Google ScholarDigital Library
- McCool, M., Toit, S. D., Popa, T., Chan, B., and Moule, K. 2004. Shader algebra. In Proceedings of the ACM SIGGRAPH'04 International Conference on Computer Graphics and Interactive Techniques. ACM, New York, 787--795. Google ScholarDigital Library
- MIPS Technologies Inc. 2005. MIPS64 architecture. http://mips.com/products/architectures/mips64/.Google Scholar
- NVIDIA. 2007. NVIDIA CUDA programming guide. http://developer.download.nvidia.com/compute/cuda/1_1/NVIDIA_CUDA_Programming_Guide_1.1.pdf.Google Scholar
- Owens, J. D., Khailany, B., Towles, B., and Dally, W. J. 2002. Comparing Reyes and OpenGL on a stream architecture. In Proceedings of the Workshop on Graphics Hardware, 47--56. Google ScholarDigital Library
- Pham, D., Asano, S., Bolliger, M., Day, M., Hofstee, H., Johns, C., Kahle, J., Kameyama, A., Keaty, J., Masubuchi, Y., et al. 2005. The design and implementation of a first-generation CELL processor. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC'05), 184--186.Google ScholarCross Ref
- Purcell, T. J. 2004. Ray tracing on a stream processor. Ph.D. thesis, Stanford University. Google ScholarDigital Library
- Segal, M. and Akeley, K. 2006. The OpenGL 2.1 specification. http://www.opengl.org/registry/doc/glspec21.20061201.pdf.Google Scholar
- Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., and Hanrahan, P. 2008. Larrabee: A many-core x86 architecture for visual computing. ACM Trans. Graphics 27, 3. Google ScholarDigital Library
- Tarditi, D., Puri, S., and Oglesby, J. 2006. Accelerator: Using data parallelism to program GPUs for general-purpose uses. SIGOPS Oper. Syst. Rev. 40, 5, 325--335. Google ScholarDigital Library
- Thies, W., Karczmarek, M., and Amarasinghe, S. 2002. StreamIt: A language for streaming applications. In International Conference on Compiler Construction. Google ScholarDigital Library
Index Terms
- GRAMPS: A programming model for graphics pipelines
Recommendations
Extending the graphics pipeline with adaptive, multi-rate shading
Due to complex shaders and high-resolution displays (particularly on mobile graphics platforms), fragment shading often dominates the cost of rendering in games. To improve the efficiency of shading on GPUs, we extend the graphics pipeline to natively ...
Piko: a framework for authoring programmable graphics pipelines
We present Piko, a framework for designing, optimizing, and retargeting implementations of graphics pipelines on multiple architectures. Piko programmers express a graphics pipeline by organizing the computation within each stage into spatial bins and ...
Brook for GPUs: stream computing on graphics hardware
SIGGRAPH '04: ACM SIGGRAPH 2004 PapersIn this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming co-processor. We present a ...
Comments