- {1} A. Agarwal, D. Krantz, V. Nataranjan, "Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors", IEEE Trans. on Parallel and Distributed Systems, Vol. 6, No. 9, pp. 943-962, Sep. 1995. Google ScholarDigital Library
- {2} S. Amarasinghe, J. Anderson, M. Lam, and C. Tseng, "The SUIF compiler for scalable parallel machines", Proc. of the 7th SIAM Conf. on Parallel Proc. for Scientific Computing, 1995.Google Scholar
- {3} C. Ancourt, F. Irigoin and Y. Yang, "Minimal data dependence abstractions for loop transformations", Int. J. of Parallel Programming, Vol. 23, No. 4, pp. 359-388, 1995. Google ScholarDigital Library
- {4} U. Banerjee, R. Eigenmann, A. Nicolau, D. Padua, "Automatic program parallelisation", Proc. of the IEEE, invited paper, Vol. 81, No. 2, Feb. 1993.Google Scholar
- {5} E. Brockmeyer, L. Nachtergaele, F. Catthoor, J. Bormans, H. De Man, "Low power memory storage and transfer organization for the MPEG-4 full pel motion estimation on a multi media processor", IEEE Trans. on Multi-Media, Vol. 1, No. 2, pp. 202-216, June 1999. Google ScholarDigital Library
- {6} F. Catthoor, S. Wuytack, E. De Greef, F. Franssen, L. Nachtergaele. H. De Man, "System-level transformations for low power data transfer and storage", in paper collection on "Low power CMOS design" (eds. A. Chandrakasan, R. Brodersen), IEEE Press, pp. 609-618, 1998.Google Scholar
- {7} B. Creusillet, F. Irigoin, "Interprocedural array region analysis", Int. J. of Parallel Programming, Vol. 24, No. 6, pp. 513-546. Google ScholarDigital Library
- {8} K. Danckaert, K. Masselos, F. Catthoor, H. De Man, C. Goutis, "Strategy for power efficient design of parallel systems", IEEE Trans. on VLSI Systems, Vol. 7, No. 2, pp. 258-265, June 1999. Google ScholarDigital Library
- {9} K. Danckaert, C. Kulkarni, F. Catthoor, H. De Man, V. Tiwari, "A systematic approach for system bus load reduction applied to medical imaging", accepted for Proc. IEEE Int. Conf. on VLSI Design, Bangalore, India, Jan. 2001. Google ScholarDigital Library
- {10} E. De Greef, F. Catthoor, H. De Man, "Memory Size Reduction through Storage Order Optimization for Embedded Parallel Multimedia Applications", Intnl. Parallel Proc. Symp. (IPPS) in Proc. Workshop on "Parallel Processing and Multimedia", Geneva, Switzerland, pp. 84-98, 1997.Google Scholar
- {11} H. De Man, F. Catthoor, G. Goossens, J. Vanhoof, J. Van Meerbergen, S. Note, J. Huisken, "Architecture-driven synthesis techniques for VLSI implementation of DSP algorithms", Proc. of the IEEE, special issue on "The future of computer-aided Design", Vol. 78, No. 2, pp. 319-335, Feb. 1990.Google Scholar
- {12} M. Dion, Y. Robert, "Mapping affine loop nests: new results", Lecture Notes in Computer Science, Vol. 919 on "High-Performance Computing and Networking", pp. 184-189, 1995. Google ScholarDigital Library
- {13} P. Feautrier, "Some efficient solutions to the affine scheduling problem", Int. J. of Parallel Programming, Vol. 21, No. 5, pp. 389-420, 1992. Google ScholarDigital Library
- {14} P. Feautrier, "Automatic parallelization in the polytope model", to appear.Google Scholar
- {15} D. Gannon, W. Jalby, K. Gallivan, "Strategies for cache and local memory management by global program optimizations" J. of Parallel and Distributed Computing, vol. 5, pp. 587-616, 1988. Google ScholarDigital Library
- {16} M. Gupta, E. Schonberg, H. Srinivasan, "A Unified Framework for Optimizing Communication in Data-Parallel Programs", IEEE Trans. on Parallel and Distributed Systems, Vol. 7, No. 7, pp. 689-704, July 1996. Google ScholarDigital Library
- {17} M. Kandemir, J. Ramanujam, A. Choudhary, "Improving cache locality by a combination of loop and data transformations", IEEE trans. on computers, vol. 48, no. 2, pp. 159-167, 1999. Google ScholarDigital Library
- {18} W. Kelly, W. Pugh, "A framework for unifying reordering transformations", Technical report CS-TR-3193, Dept. of CS, Univ. of Maryland, College Park, April 1993. Google ScholarDigital Library
- {19} C. Kulkarni, K. Danckaert, F. Catthoor, M. Gupta, "Interaction between data parallel compilation and data transfer and storage cost for multimedia applications", Proc. EuroPar Conf., Toulouse, France, September 1999. Google ScholarDigital Library
- {20} L. Lamport, "The parallel execution of DO loops", Communications of the ACM, Vol. 17, No. 2, pp. 83-93, Feb. 1974. Google ScholarDigital Library
- {21} C. Lengauer. "Loop parallelization in the polytope model", Proc. of the Fourth Intnl. Conf. on Concurrency Theory, Hildesheim, Germany, Aug. 1993. Google ScholarDigital Library
- {22} P. Lippens, J. van Meerbergen, W. Verhaegh, A. van der Werf, "Allocation of multiport memories for hierarchical data streams", Proc. IEEE Int. Conf. Comp. Aided Design, Santa Clara CA, Nov. 1993. Google ScholarDigital Library
- {23} K. McKinley, "A compiler optimization algorithm for shared-memory multiprocessors", IEEE Trans. on Parallel and Ditsributed Systems, Vol. 9, No. 8, pp. 769-787, Aug. 1998. Google ScholarDigital Library
- {24} I. Verbauwhede, F. Catthoor, J. Vandewalle, H. De Man, "In-place memory management of algebraic algorithms on application-specific IC's", Journal of VLSI signal processing, Vol. 3, Kluwer, Boston, pp. 193-200, 1991. Google ScholarDigital Library
- {25} M. van Swaaij, F. Franssen, F. Catthoor, H. De Man, "Automating high-level control flow transformations for DSP memory management", Proc. IEEE workshop on VLSI signal processing, Napa Valley CA, Oct. 1992.Google ScholarCross Ref
- {26} D. Wilde, S. Rajopadhye, "Memory reuse analysis in the polyhedral model", Proc. Euro-Par Conf., Lyon, France, Aug. 1996. Lecture notes in computer science, Vol. 1123, pp. 389-397, Springer, 1996. Google ScholarDigital Library
- {27} M. Wolfe, U. Banerjee, "Data Dependence and its Application to Parallel Processing", Int. J. of Parallel Programming, Vol. 16, No. 2, pp. 137-178, 1987. Google ScholarDigital Library
- {28} M. Wolf, "Improving locality and parallelism in nested loops", Ph.D. dissertation, Aug. 1992. Google ScholarDigital Library
- {29} S. Wuytack, F. Catthoor, L. Nachtergaele, H. De Man, "Power Exploration for Data Dominated Video Applications", Proc. IEEE Intnl. Symp. on Low Power Design, Monterey CA, pp. 359-364, Aug. 1996. Google ScholarDigital Library
Recommendations
Integrating Loop and Data Transformations for Global Optimisation
PACT '98: Proceedings of the 1998 International Conference on Parallel Architectures and Compilation TechniquesThis paper is concerned with integrating global data transformations and local loop transformations in order to minimise overhead on distributed shared memory machines such as the SGi Origin 2000. By first developing an extended algebraic transformation ...
Integrating Loop and Data Transformations for Global Optimization
This paper is concerned with integrating global data transformations and local loop transformations in order to minimize overhead on distributed shared memory machines such as the SGi Origin 2000. By first developing an extended algebraic transformation ...
Data transformations enabling loop vectorization on multithreaded data parallel architectures
PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingLoop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memory access patterns in the data stream. This paper describes data ...
Comments