Abstract
Efficient implementation of DSP applications is critical for many embedded systems. Optimizing compilers for application programs, written in C, largely focus on code generation and scheduling, which, with their growing maturity, are providing diminishing returns. As DSP applications typically make extensive use of pointer arithmetic, the alternative use of high-level, source-to-source, transformations has been largely ignored. This article develops an array recovery technique that automatically converts pointers to arrays, enabling the empirical evaluation of high-level transformations. High-level techniques were applied to the DSPstone benchmarks on three platforms: TriMedia TM-1000, Texas Instruments TMS320C6201, and the Analog Devices SHARC ADSP-21160. On average, the best transformation gave a factor of 2.43 improvement across the platforms. In certain cases, a speedup of 5.48 was found for the SHARC, 7.38 for the TM-1, and 2.3 for the C6201. These preliminary results justify pointer to array conversion and further investigation into the use of high-level techniques for embedded compilers.
- Allen, R. and Johnson, S. 1988. Compiling C for vectorization, parallelization, and inline expansion, In Proceedings of the SIGPLAN '88 Conference of Programming Languages Design and Implementation (Atlanta, GA, June 1988).]] Google ScholarDigital Library
- Analog Devices Inc. 1999. ADSP-21160 SHARC DSP Hardware Reference. http://www.analog.com.]]Google Scholar
- Andreyev, A. L., Balyakhov, D. I., and Russakov, V. P. 1996. The technique of high-level optimization of DSP algorithms implementation, International Conference on Signal Processing Applications & Technology (ICSPAT), 1996.]]Google Scholar
- de Araujo, G. C. S. 1997. Code generation algorithms for digital signal processor. Dissertation, Princeton University, Dep. Electrical Engineering.]] Google ScholarDigital Library
- Bacon, D. F., Graham, S. L., and Sharp, O. J. 1994. Compiler transformations for high-performance computing. ACM Comput. Surv. 26, 4, (1994), 345--420.]] Google ScholarDigital Library
- Barreteau, M. et al. 1998. OCEANS optimising compilers for embedded applications, In Proceedings Euro-Par 98, LNCS 1470 (1998), 1123--1130.]] Google ScholarDigital Library
- Bhattacharyya, S. S., Leupers, R., and Marwedel, P. 2000. Software synthesis and code generation for signal processing systems. IEEE Trans. Circuits Syst. II: Analog and Digital Signal Processing, 47, 9.]]Google Scholar
- Bodin, F., Chamski, Z., Eisenbeis, C., Rohou, E., and Seznec, A. 1998. GCDS: A compiler strategy for trading code size against performance in embedded applications. INRIA, Tech. Rep. RR-3346.]]Google Scholar
- Callahan, D., Cooper K. D., Kennedy, K., and Torczon, L. 1986. Interprocedural constant propagation. In Proceedings of the SIGPLAN Symposium on Compiler Construction (1986), 152--161.]] Google ScholarDigital Library
- Clauss, P. 1996. Counting solutions to linear and nonlinear constraints through ehrhart polynomials: applications to analyze and transform scientific programs. In Proceedings of the 1996 International Conference on Supercomputing (Philadelphia, PA, May 1996), ACM Press, 278--285.]] Google ScholarDigital Library
- Duesterwald, E., Gupta, R., and Soffa, M. 1993. A practical data flow framework for array reference analysis and its use in optimizations. In Proceedings of the SIGPLAN Conference on Programming Languages Design and Implementation, (Albuquerque, NM, 1993), 67--77.]] Google ScholarDigital Library
- Franke, B. and O'Boyle, M. 2001. Compiler transformation of pointers to explicit array accesses in DSP applications. In Proceedings of ETAPS CC 2001 (Genova, Italy, 2001).]] Google ScholarDigital Library
- Frederiksen, A., Christiansen, R., Bier, J., and Koch, P. 2000. An evaluation of compiler-processor interaction for DSP applications. In Proceedings of the 34th IEEE Asilomar Conference on Signals, Systems, and Computers (Pacific Grove, CA, Oct. 2000).]]Google Scholar
- Grove,D. and Torczon, L. 1993. Interprocedural constant propagation: A study of jump function implementations. In Proceedings of the ACM SIGPLAN '93 Conference on Programming Language Design and Implementation (1993), 90--99.]] Google ScholarDigital Library
- Gupta, S., Miranda, M., Catthoor, F., and Gupta, R. 2000. Analysis of high-level address code transformations for programmable processors. In Proceedings of Design and Test in Europe Conference (DATE, 2000).]] Google ScholarDigital Library
- Kandemir, M., Vijaykrishnan, N., Irwin, M. J., and Kim, H. S. 2000. Experimental evaluation of energy behavior of iteration space tiling. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing (Yorktown Heights, NY, Aug. 2000).]] Google ScholarDigital Library
- Kandemir, M., Ramanujam, J., and Choudhary, A. 1999. Improving cache locality by a combination of loop and data transformations. IEEE Trans. Comput. 48, 2 (Feb. 1999), 159--167.]] Google ScholarDigital Library
- Kisuki, T., Knijnenburg, P. M. W., and O'Boyle, M. F. P. 2000. Combined selection of tile sizes and unroll factors using iterative compilation. In Proceedings of PACT 2000, Parallel Architectures and Compiler Technology (Oct. 2000), IEEE Press, 237--248.]] Google ScholarDigital Library
- Liem, C., Paulin, P., and Jerraya, A. 1996. Address calculation for retargetable compilation and exploration of instruction set architecture. In Proceedings of the 33rd ACM Design Automation Conference (DAC, Las Vegas, NV, 1996).]] Google ScholarDigital Library
- Leupers, R. 1998. Novel code optimzation techniques for DSPs. In Proceedings of the 2nd European DSP Education and Research Conference (Paris, 1998).]]Google Scholar
- Maydan, D. E., Hennessy, J. L., and Lam, M. S. 1995. Effectiveness of data dependence analysis, Int. J. Parallel Program. 23, 1, (1995), 63--81.]] Google ScholarDigital Library
- Morgan, R. 1998. Building an Optimizing Compiler. Butterworth-Heinemann, Boston, MA.]] Google ScholarDigital Library
- Numerix-DSP Digital Signal Processing Web Site, http://www.numerix-dsp.com/c_coding.pdf, 2000.]]Google Scholar
- O'Boyle M. F. P. and Knijnenburg, P. M. W. 1998. Integrating loop and data transformations for global optimisation. In Proceedings of PACT '98, Parallel Architectures and Compiler Technology (Oct. 1998), IEEE Press, 12--19.]] Google ScholarDigital Library
- Philips Semiconductors 1999. TriMedia TM-1300 http://www.semiconductors.philips.com, 1999.]]Google Scholar
- Pugh, W. 1994. Counting solutions to presburger formulas: How and why. In Proceedings of the SIGPLAN Conference on Programming Languages Design and Implementation, ACM Press, 1994, 121--134.]] Google ScholarDigital Library
- Sagiv, M., Reps, T., and Horwitz, S. 1995. Precise interprocedural dataflow analysis with applications to constant propagation. In Proceedings of the TAPSOFT'95 Conference (Arhus, Denmark), LNCS, Springer-Verlag, 1995, 49--61.]] Google ScholarDigital Library
- Sair, S., Kaeli, D. K., and Meleis, W. 1998. A study of loop unrolling for VLIW-based DSP processors. In Proceedings of the 1998 IEEE Workshop on Signal Processing Systems (SiPS '98, Oct. 1998), 519--527.]]Google Scholar
- Song, Y. and Lin, Y. 2000. Unroll-and-jam for imperfectly-nested loops in DSP applications. In Proceedings of the ACM International Conference on Compilers, Architectures, Synthesis for Embedded Systems (Nov. 2000).]] Google ScholarDigital Library
- Su, B., Wang, J., and Esguerra, A. 1999. Source-level loop optimization for DSP code generation. In Proceedings of the 1999 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '99, Phoenix, AZ, 1999), 2155--2158.]] Google ScholarDigital Library
- van Swaaij, M. F. X. B., Franssen, F. H. M., Catthoor, F. V. M., and De Man, H. J. 1992. Automatimg high level control flow transformations for DSP memory management. In Proceedings of the IEEE Workshop on VLSI Signal Processing, (Napa Valley, CA, Oct. 1992). Also in VLSI Signal Processing V, K. Yao, R. Jain, W. Przytula (Eds.), IEEE Press, New York, 397--406.]]Google ScholarCross Ref
- Timmer, A., Strik, M., van Meerbergen, J., and Jess, J. 1995. Conflict modelling and instruction scheduling in code generation for in-house DSP cores. In Proceedings of the 1995 Design Automation Conference (1995), 593--598.]] Google ScholarDigital Library
- Texas Instruments 2001. TMS320C6201B Digital Signal Processor. http://www.ti.com.]]Google Scholar
- van Engelen, R. and Gallivan, K. 2001. An efficient algorithm for pointer-to-array access conversion for compiling and optimizing DSP applications. In Proceedings of the International Workshop on Innovative Architecture (IWIA 2001, Maui, 2001).]]Google Scholar
- Wang, J. and Su, B. 1998. Software pipelining of nested loops for real-time DSP applications, In Proceedings of the 1998 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '98, Seattle, WA, 1998).]]Google Scholar
- Zivojnovic, V., Velarde, J. M., Schlager, C., and Meyr, H. 1994. DSPstone: A DSP-oriented benchmarking methodology, In Proceedings of Signal Processing Applications & Technology Conference (Dallas, TX, 1994).]]Google Scholar
Index Terms
- Array recovery and high-level transformations for DSP applications
Recommendations
Register coalescing techniques for heterogeneous register architecture with copy sifting
Optimistic coalescing has been proven as an elegant and effective technique that provides better chances of safely coloring more registers in register allocation than other coalescing techniques. Its algorithm originally assumes homogeneous registers, ...
Optimistic coalescing for heterogeneous register architectures
LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systemsIn this paper, Optimistic coalescing has been proven as an elegant and effective technique that provides better chances of safely coloring more registers in register allocation than other coalescing techniques. Its algorithm originally assumes ...
A retargetable register allocation framework for embedded processors
LCTES '04This paper describes the FlexCC2 register allocation framework. FlexCC2 is an optimizing retargetable C compiler for embedded processors, and in particular for DSP processors. Embedded processors often contain features such as irregular and constrained ...
Comments