Article

Free Access

A preprocessing step for global loop transformations for data transfer optimization

Authors:
Koen Danckaert

IMEC, Kapeldreef 75, B-3001 Leuven, Belgium

IMEC, Kapeldreef 75, B-3001 Leuven, Belgium
View Profile

,
Francky Catthoor

IMEC, Kapeldreef 75, B-3001 Leuven, Belgium

IMEC, Kapeldreef 75, B-3001 Leuven, Belgium
View Profile

,
Hugo De Man

IMEC, Kapeldreef 75, B-3001 Leuven, Belgium

IMEC, Kapeldreef 75, B-3001 Leuven, Belgium
View Profile

CASES '00: Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systemsNovember 2000Pages 34–40https://doi.org/10.1145/354880.354886

Published:01 November 2000Publication History

CASES '00: Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems

Pages 34–40

References

{1} A. Agarwal, D. Krantz, V. Nataranjan, "Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors", IEEE Trans. on Parallel and Distributed Systems, Vol. 6, No. 9, pp. 943-962, Sep. 1995. Google ScholarDigital Library
{2} S. Amarasinghe, J. Anderson, M. Lam, and C. Tseng, "The SUIF compiler for scalable parallel machines", Proc. of the 7th SIAM Conf. on Parallel Proc. for Scientific Computing, 1995.Google Scholar
{3} C. Ancourt, F. Irigoin and Y. Yang, "Minimal data dependence abstractions for loop transformations", Int. J. of Parallel Programming, Vol. 23, No. 4, pp. 359-388, 1995. Google ScholarDigital Library
{4} U. Banerjee, R. Eigenmann, A. Nicolau, D. Padua, "Automatic program parallelisation", Proc. of the IEEE, invited paper, Vol. 81, No. 2, Feb. 1993.Google Scholar
{5} E. Brockmeyer, L. Nachtergaele, F. Catthoor, J. Bormans, H. De Man, "Low power memory storage and transfer organization for the MPEG-4 full pel motion estimation on a multi media processor", IEEE Trans. on Multi-Media, Vol. 1, No. 2, pp. 202-216, June 1999. Google ScholarDigital Library
{6} F. Catthoor, S. Wuytack, E. De Greef, F. Franssen, L. Nachtergaele. H. De Man, "System-level transformations for low power data transfer and storage", in paper collection on "Low power CMOS design" (eds. A. Chandrakasan, R. Brodersen), IEEE Press, pp. 609-618, 1998.Google Scholar
{7} B. Creusillet, F. Irigoin, "Interprocedural array region analysis", Int. J. of Parallel Programming, Vol. 24, No. 6, pp. 513-546. Google ScholarDigital Library
{8} K. Danckaert, K. Masselos, F. Catthoor, H. De Man, C. Goutis, "Strategy for power efficient design of parallel systems", IEEE Trans. on VLSI Systems, Vol. 7, No. 2, pp. 258-265, June 1999. Google ScholarDigital Library
{9} K. Danckaert, C. Kulkarni, F. Catthoor, H. De Man, V. Tiwari, "A systematic approach for system bus load reduction applied to medical imaging", accepted for Proc. IEEE Int. Conf. on VLSI Design, Bangalore, India, Jan. 2001. Google ScholarDigital Library
{10} E. De Greef, F. Catthoor, H. De Man, "Memory Size Reduction through Storage Order Optimization for Embedded Parallel Multimedia Applications", Intnl. Parallel Proc. Symp. (IPPS) in Proc. Workshop on "Parallel Processing and Multimedia", Geneva, Switzerland, pp. 84-98, 1997.Google Scholar
{11} H. De Man, F. Catthoor, G. Goossens, J. Vanhoof, J. Van Meerbergen, S. Note, J. Huisken, "Architecture-driven synthesis techniques for VLSI implementation of DSP algorithms", Proc. of the IEEE, special issue on "The future of computer-aided Design", Vol. 78, No. 2, pp. 319-335, Feb. 1990.Google Scholar
{12} M. Dion, Y. Robert, "Mapping affine loop nests: new results", Lecture Notes in Computer Science, Vol. 919 on "High-Performance Computing and Networking", pp. 184-189, 1995. Google ScholarDigital Library
{13} P. Feautrier, "Some efficient solutions to the affine scheduling problem", Int. J. of Parallel Programming, Vol. 21, No. 5, pp. 389-420, 1992. Google ScholarDigital Library
{14} P. Feautrier, "Automatic parallelization in the polytope model", to appear.Google Scholar
{15} D. Gannon, W. Jalby, K. Gallivan, "Strategies for cache and local memory management by global program optimizations" J. of Parallel and Distributed Computing, vol. 5, pp. 587-616, 1988. Google ScholarDigital Library
{16} M. Gupta, E. Schonberg, H. Srinivasan, "A Unified Framework for Optimizing Communication in Data-Parallel Programs", IEEE Trans. on Parallel and Distributed Systems, Vol. 7, No. 7, pp. 689-704, July 1996. Google ScholarDigital Library
{17} M. Kandemir, J. Ramanujam, A. Choudhary, "Improving cache locality by a combination of loop and data transformations", IEEE trans. on computers, vol. 48, no. 2, pp. 159-167, 1999. Google ScholarDigital Library
{18} W. Kelly, W. Pugh, "A framework for unifying reordering transformations", Technical report CS-TR-3193, Dept. of CS, Univ. of Maryland, College Park, April 1993. Google ScholarDigital Library
{19} C. Kulkarni, K. Danckaert, F. Catthoor, M. Gupta, "Interaction between data parallel compilation and data transfer and storage cost for multimedia applications", Proc. EuroPar Conf., Toulouse, France, September 1999. Google ScholarDigital Library
{20} L. Lamport, "The parallel execution of DO loops", Communications of the ACM, Vol. 17, No. 2, pp. 83-93, Feb. 1974. Google ScholarDigital Library
{21} C. Lengauer. "Loop parallelization in the polytope model", Proc. of the Fourth Intnl. Conf. on Concurrency Theory, Hildesheim, Germany, Aug. 1993. Google ScholarDigital Library
{22} P. Lippens, J. van Meerbergen, W. Verhaegh, A. van der Werf, "Allocation of multiport memories for hierarchical data streams", Proc. IEEE Int. Conf. Comp. Aided Design, Santa Clara CA, Nov. 1993. Google ScholarDigital Library
{23} K. McKinley, "A compiler optimization algorithm for shared-memory multiprocessors", IEEE Trans. on Parallel and Ditsributed Systems, Vol. 9, No. 8, pp. 769-787, Aug. 1998. Google ScholarDigital Library
{24} I. Verbauwhede, F. Catthoor, J. Vandewalle, H. De Man, "In-place memory management of algebraic algorithms on application-specific IC's", Journal of VLSI signal processing, Vol. 3, Kluwer, Boston, pp. 193-200, 1991. Google ScholarDigital Library
{25} M. van Swaaij, F. Franssen, F. Catthoor, H. De Man, "Automating high-level control flow transformations for DSP memory management", Proc. IEEE workshop on VLSI signal processing, Napa Valley CA, Oct. 1992.Google ScholarCross Ref
{26} D. Wilde, S. Rajopadhye, "Memory reuse analysis in the polyhedral model", Proc. Euro-Par Conf., Lyon, France, Aug. 1996. Lecture notes in computer science, Vol. 1123, pp. 389-397, Springer, 1996. Google ScholarDigital Library
{27} M. Wolfe, U. Banerjee, "Data Dependence and its Application to Parallel Processing", Int. J. of Parallel Programming, Vol. 16, No. 2, pp. 137-178, 1987. Google ScholarDigital Library
{28} M. Wolf, "Improving locality and parallelism in nested loops", Ph.D. dissertation, Aug. 1992. Google ScholarDigital Library
{29} S. Wuytack, F. Catthoor, L. Nachtergaele, H. De Man, "Power Exploration for Data Dominated Video Applications", Proc. IEEE Intnl. Symp. on Low Power Design, Monterey CA, pp. 359-364, Aug. 1996. Google ScholarDigital Library

Recommendations

Integrating Loop and Data Transformations for Global Optimisation
PACT '98: Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques

This paper is concerned with integrating global data transformations and local loop transformations in order to minimise overhead on distributed shared memory machines such as the SGi Origin 2000. By first developing an extended algebraic transformation ...
Read More
Integrating Loop and Data Transformations for Global Optimization

This paper is concerned with integrating global data transformations and local loop transformations in order to minimize overhead on distributed shared memory machines such as the SGi Origin 2000. By first developing an extended algebraic transformation ...
Read More
Data transformations enabling loop vectorization on multithreaded data parallel architectures
PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memory access patterns in the data stream. This paper describes data ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CASES '00: Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
November 2000
200 pages
ISBN:1581133383
DOI:10.1145/354880
Chairman:
Chair The
Copyright © 2000 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 November 2000
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate52of230submissions,23%
Upcoming Conference
ESWEEK '24

Sponsor:

sigbed

sigbed

sigbed

Twentieth Embedded Systems Week

September 29 - October 4, 2024

Raleigh , NC , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 294
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A preprocessing step for global loop transformations for data transfer optimization

CASES '00: Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems

References

Cited By

Recommendations

Integrating Loop and Data Transformations for Global Optimisation

Integrating Loop and Data Transformations for Global Optimization

Data transformations enabling loop vectorization on multithreaded data parallel architectures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A preprocessing step for global loop transformations for data transfer optimization

CASES '00: Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems

References

Cited By

Recommendations

Integrating Loop and Data Transformations for Global Optimisation

Integrating Loop and Data Transformations for Global Optimization

Data transformations enabling loop vectorization on multithreaded data parallel architectures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media