ABSTRACT
Recently general purpose computing on graphic processing units (GPUs) is rising as an exciting new trend in high-performance computing. Thus it is appealing to study the potential of GPU for Electronic Design Automation (EDA) applications. However, EDA generally involves irregular data structures such as sparse matrix and graph operations, which pose significant challenges for efficient GPU implementations. In this paper, we propose highperformance GPU implementations for two important irregular EDA computing patterns, Sparse-Matrix Vector Product (SMVP) and graph traversal. On a wide range of EDA problem instances, our SMVP implementations outperform all published work and achieve a speedup of one order of magnitude over the CPU baseline. Upon such a basis, both timing analysis and linear system solution can be considerably accelerated. We also introduce a SMVP based formulation for Breadth-First Search and observe considerable speedup on GPU implementations. Our results suggest that the power of GPU computing can be successfully unleashed through designing GPU-friendly algorithms and/or re-organizing computing structures of current algorithms.
- Blythe, D. 2008. Rise of the graphics processor. Proceeding of IEEE, Vol. 96, No. 5, 761--778, May, 2008.Google ScholarCross Ref
- NVidia. 2008. CUDA programming guide.Google Scholar
- Feng, Z. and Li, P. Multigrid on GPU: tackling power grid analysis on parallel SIMT platforms. In Proc. of Int'l Conf' on Computer Aided Design. Google ScholarDigital Library
- Gulati, K. and Khatri, S. P. 2008. Towards acceleration of fault simulation using graphics processing units. In Proc. of. ACM IEEE Design Automation Conf. Google ScholarDigital Library
- Gulati, K., Croix, J. F., Khatri, S. P., and Shastry, R. 2009. Fast circuit simulation on graphics processing units. in Proc. of Conf' on Asia and South Pacific Design Automation. Google ScholarDigital Library
- Kleinhans, G. Sigl, F. Johannes, and Antreich, K. 1991. Gordian: VLSI placement by quadratic programming and slicing optimization. IEEE Trans. CAD, vol. 10, no. 3, March 1991.Google ScholarDigital Library
- Eisenmann, H. and Johannes, F. M. 1998. Generic global placement and floorplanning. In Proc of Design automation Conf., 269--274. Google ScholarDigital Library
- Nam, G.-J., Alpert, J. C., and Villarrubia, P. G. 2007. ISPD 2005/2006 placement benchmarks. Modern Circuit Placement. Ch. 1. Springer US.Google Scholar
- Catanzaro, B., Keutzer, K., and Su, B.-Y. 2008. Parallelizing CAD: a timely research agenda for EDA. In Proc. Design Automation Conf., 12--17. Google ScholarDigital Library
- Blelloch, G. E. 1990. Vector models for data-parallel computing. MIT Press. Google ScholarDigital Library
- Bell, N. and Garland, M. 2008. Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report. NVR-2008-004.Google Scholar
- Harish, P. and Narayanan, P. J. 2007. Accelerating large graph algorithms on the GPU using CUDA. In Proc. Of High Performance Computing -- HiPC. 197--208. Google ScholarDigital Library
- Saad, Y. 2000. Iterative methods for sparse linear systems. SIAM. Google ScholarDigital Library
- Deng, Y. and Mu, S. 2008. The potential of GPUs for VLSI physical design automation. In Proc. of International Conference on Solid-State and Integrated-Circuit Technology.Google Scholar
- Garland, M. 2008. Sparse matrix computations on manycore GPU's. In Proc. of Design Automation Conference, Jun. 2008, pp. 2--6. Google ScholarDigital Library
- NVidia, 2009. CUDA Programming Guide, CUDA Driver, Toolkit, and SDK code samples. http://www.nvidia.com/object/cuda_get.html.Google Scholar
- MathWorks. 2007. Matlab V7.4. http://www.mathworks.com/.Google Scholar
- Bell, N. 2008. Sparse Matrix-Vector Multiplication on CUDA. http://forums.nvidia.com/index.php?showtopic=83825&st=0Google Scholar
- Davis, T. University of Florida Sparse Matrix Collection. Available online: http://www.cise.ufl.edu/research/sparse/matrices/. Google ScholarDigital Library
- Xue, J., et. al. 2009. Layout Dependent STI Stress Analysis and Stress-Aware RF/Analog Circuit Design Optomization. In Proc. of ACM/IEEE Int'l Conf. on Computer-Aided Design. Google ScholarDigital Library
- Ramalingam, A., Nam, G.-J., Singh, A. K., Orshansky, M., Nassif, S. R., and Pan, D. Z. 2006. An accurate sparse matrix based framework for statistical static timing analysis. In Proc. of the 2006 Int'l Conf. on Computer-Aided Design, {doi>10.1145/1233501.1233547} Google ScholarDigital Library
- ITC'99 Benchmarks (2nd release), Available online: http://www.cad.polito.it/tools/itc99.html.Google Scholar
- SMIC, 0.13um Low Leakage Cadence PDK. http://www.smics.com/website/cnVersion/DS/SMIC-PDK.htm.Google Scholar
- Barret, R. et al, 1994. Templates for the Solution of Linear Systems, 2nd Edition, SIAM.Google Scholar
- Viswanathan, N. and Chu, C. C.-N. 2005. FastPlace: Efficient analytical placement using cell shifting, iterative local refinement and a hybrid net model. IEEE Trans. Computer-Aided Design, Vol. 24, No. 5, 722--733, 2005. Google ScholarDigital Library
- Sapatnekar, S. 2004. Timing. Springer; 1 edition. Ch. 5.Google Scholar
- Cormen, T. H., Leiserson, C. E., Rivest, R. L. and Stein, C. 2001. Introduction to algorithms. 2nd Edition. MIT Press. Google ScholarDigital Library
- David, T. 2006. Direct methods for sparse linear systems. SIAM. Google ScholarDigital Library
Index Terms
- Taming irregular EDA applications on GPUs
Recommendations
Towards accelerating irregular EDA applications with GPUs
Recently graphic processing units (GPUs) are rising as a new vehicle for high-performance, general purpose computing. It is attractive to unleash the power of GPU for Electronic Design Automation (EDA) computations to cut the design turn-around time of ...
Analysis of Sparse Matrix-Vector Multiplication Using Iterative Method in CUDA
NAS '13: Proceedings of the 2013 IEEE Eighth International Conference on Networking, Architecture and StorageScaling up the sparse matrix-vector multiplication has been at the heart of numerous studies in both academia and industry. The massive parallelism of graphics processing units offers tremendous performance in many high-performance computing ...
Brook for GPUs: stream computing on graphics hardware
SIGGRAPH '04: ACM SIGGRAPH 2004 PapersIn this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming co-processor. We present a ...
Comments