Article

Data locality enhancement by memory reduction

Authors:
Yonghong Song

Sun Microsystems, Inc., 901 San Antonio Rd., Palo Alto, CA

Sun Microsystems, Inc., 901 San Antonio Rd., Palo Alto, CA
View Profile

,
Rong Xu

Department of Computer Sciences, Purdue University, West Lafayette, IN

Department of Computer Sciences, Purdue University, West Lafayette, IN
View Profile

,
Cheng Wang

Department of Computer Sciences, Purdue University, West Lafayette, IN

Department of Computer Sciences, Purdue University, West Lafayette, IN
View Profile

,
Zhiyuan Li

Department of Computer Sciences, Purdue University, West Lafayette, IN

Department of Computer Sciences, Purdue University, West Lafayette, IN
View Profile

ICS '01: Proceedings of the 15th international conference on SupercomputingJune 2001Pages 50–64https://doi.org/10.1145/377792.377806

Published:17 June 2001Publication History

ICS '01: Proceedings of the 15th international conference on Supercomputing

Pages 50–64

ABSTRACT

In this paper, we propose memory reduction as a new approach to data locality enhancement. Under this approach, we use the compiler to reduce the size of the data repeatedly referenced in a collection of nested loops. Between their reuses, the data will more likely remain in higher-speed memory devices, such as the cache. Specifically, we present an optimal algorithm to combine loop shifting, loop fusion and array contraction to reduce the temporary array storage required to execute a collection of loops. When applied to 20 benchmark programs, our technique reduces the memory requirement, counting both the data and the code, by 51% on average. The transformed programs gain a speedup of 1.40 on average, due to the reduced footprint and, consequently, the improved data locality.

References

1.R. Ahuja, T. Magnanti, and J. Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice-Hall Inc., Englewood Cliffs, New Jersey, 1993.]] Google ScholarDigital Library
2.M. S. Bazaraa, J. J. Jarvis, and H. D. Sherali. Linear Programming and Network Flows. Wiley, NewYork, 1990.]] Google ScholarDigital Library
3.D. Callahan, S. Carr, and K. Kennedy. Improving register allocation for subscripted variables. In Proceedings of ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation, pages 53-65, White Plains, New York, June 1990.]] Google ScholarDigital Library
4.B. Creusillet and F. Irigoin. Interprocedural array region analyses. International Journal of Parallel Programming, 24(6):513-546, December 1996.]]Google ScholarDigital Library
5.A. Darte. On the complixity of loop fusion. In Proceedings of International Conference on Parallel Architecture and Compilation Techniques, pages 149-157, Newport Beach, California, October 1999.]] Google ScholarDigital Library
6.C. Ding. Improving Effective Bandwidth Through Compiler Enhancement of Global and Dynamic Cache Reuse. PhD thesis, Department of Computer Science, Rice University, January 2000.]] Google ScholarDigital Library
7.J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing, August 1991. Also in Lecture Notes in Computer Science, pp. 328-341, Springer-Verlag, August 1991.]] Google ScholarDigital Library
8.A. Fraboulet, G. Hurard, and A. Mignotte. Loop alignment for memory accesses optimization. In Proceedings of the Twelfth International Symposium on System Synthesis, Boca Raton, Florida, November 1999.]] Google ScholarDigital Library
9.D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformation. Journal of Parallel and Distributed Computing, 5(5):587-616, October 1988.]] Google ScholarDigital Library
10.G. R. Gao, R. Olsen, V. Sarkar, and R. Thekkath. Collective loop fusion for array contraction. In Proceedings of the Fifth Workshop on Languages and Compilers for Parallel Computing. Also in No. 757 in Lecture Notes in Computer Science, pages 281-295, Springer-Verlag, 1992.]] Google ScholarDigital Library
11.T. Gross and P. Steenkiste. Structured data ow analysis for arrays and its use in an optimizing compiler. Software-Practice and Experience, 20(2), February 1990.]] Google ScholarDigital Library
12.J. Gu, Z. Li, and G. Lee. Experience with efficient array data ow analysis for array privatization. In Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice ofParallel Programming, pages 157-167, Las Vegas, NV, June 1997.]] Google ScholarDigital Library
13.K. Kennedy and K. S. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Springer-Verlag Lecture Notes in Computer Science, 768. Proceedings of the Sixth Workhsop on Languages and Compilers for Parallel Computing, Portland, Oregon, August, 1993.]] Google ScholarDigital Library
14.D. J. Kuck. The Structure of Computers and Computations, volume 1. John Wiley & Sons, 1978.]] Google ScholarDigital Library
15.C.-C. Lam, D. Cociorva, G. Baumgartner, and P. Sadayappan. Optimization of memory usage and communication requirements for a class of loops implementing multi-dimensional integrals. In Proceedings of the Twelfth International Workshop on Languages and Compilers for Parallel Computing, San Diego, CA, August 1999.]] Google ScholarDigital Library
16.C. Leiserson and J. Saxe. Retiming synchronous circuitry. Algorithmica, 6:5-35, 1991.]]Google ScholarDigital Library
17.E. C. Lewis, C. Lin, and L. Snyder. The implementation and evaluation of fusion and contraction in array languages. In Proceedings of the 1998 ACM SIGPLAN Conference onProgramming Language Design and Implementation, pages 50-59, Montreal, Canada, June 1998.]] Google ScholarDigital Library
18.N. Manjikian and T. Abdelrahman. Fusion of loops for parallelism and locality. IEEE Transactions on Parallel and Distributed Systems, 8(2):193-209, February 1997.]] Google ScholarDigital Library
19.D. Maydan, S. Amarasinghe, and M. Lam. Array data- ow analysis and its use in array privatization. In Proceedings of ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 2-15, Charleston, SC, January 1993.]] Google ScholarDigital Library
20.A. G. Mohamed, G. C. Fox, G. von Laszewski, M. Parashar, T. Haupt, K. Mills, Y.-H. Lu, N.-T. Lin, and N.-K. Yeh. Applications benchmark set for fortran-d and high performance fortran. Technical Report CRPS-TR92260, Center for Research on Parallel Computation, Rice University, June 1992.]]Google Scholar
21.J. Rice and J. Jing. Problems to test parallel and vector languages. Technical Report CSD-TR-1016, Department of Computer Science, Purdue University, 1990.]]Google Scholar
22.V. Sarkar. Optimized unrolling of nested loops. In Proceedings of the ACM International Conference on Supercomputing, pages 153-166, Santa Fe, NM, May 2000.]] Google ScholarDigital Library
23.V. Sarkar and G. R. Gao. Optimization of array accesses by collective loop transformations. In Proceedings of 1991 ACM International Conference on Supercomputing, pages 194-205, Cologne, Germany, June 1991.]] Google ScholarDigital Library
24.A. Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons, 1986.]] Google ScholarDigital Library
25.S. K. Singhai and K. S. McKinley. A parameterized loop fusion algorithm for improving parallelism and cache locality. The Computer Journal, 40(6), 1997.]]Google ScholarCross Ref
26.Y. Song and Z. Li. New tiling techniques to improve cache temporal locality. InProceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 215-228, Atlanta, GA, May 1999.]] Google ScholarDigital Library
27.Y. Song, R. Xu, C. Wang, and Z. Li. Performance enhancement by memory reduction. Technical Report CSD-TR-00-016, Department of Computer Science, Purdue University, 2000. Also available at http://www.cs.purdue.edu/homes/songyh/academic.html.]]Google Scholar
28.M. Strout, L. Carter, J. Ferrante, and B. Simon. Schedule-independent storage mapping for loops. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 24-33, San Jose, CA, October 1998.]] Google ScholarDigital Library
29.M. Wolf. Improving Locality and Parallelism in Nested Loops. PhD thesis, Department of Computer Science, Stanford University, August 1992.]] Google ScholarDigital Library
30.M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company, 1995.]] Google ScholarDigital Library

Index Terms

Data locality enhancement by memory reduction
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Improving Data Locality by Array Contraction

Array contraction is a program transformation which reduces array size while preserving the correct output. In this paper, we present an aggressive array-contraction technique and study its impact on memory system performance. This technique, called ...
Read More
Loop Restructuring for Data I/O Minimization on Limited On-Chip Memory Embedded Processors

In this paper, we propose a framework for analyzing the flow of values and their reuse in loop nests to minimize data traffic under the constraints of limited on-chip memory capacity and dependences. Our analysis first undertakes fusion of possible loop ...
Read More
Improving data locality with loop transformations

In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effective when programs exhibit data locality. In the this article, we present ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICS '01: Proceedings of the 15th international conference on Supercomputing
June 2001
510 pages
ISBN:158113410X
DOI:10.1145/377792
Chairmen:
Mario Mango Furnari
Istituto di Cibernetica, CNR, Italy
,
Efstratios Gallopoulos
Univ. of Patras
Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 June 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
array contraction
data locality
loop fusion
loop shifting
Qualifiers
- Article
Conference

Acceptance Rates
ICS '01 Paper Acceptance Rate45of133submissions,34%Overall Acceptance Rate584of2,055submissions,28%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 42
  Total Citations
  View Citations
- 690
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Data locality enhancement by memory reduction

ICS '01: Proceedings of the 15th international conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Improving Data Locality by Array Contraction

Loop Restructuring for Data I/O Minimization on Limited On-Chip Memory Embedded Processors

Improving data locality with loop transformations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Data locality enhancement by memory reduction

ICS '01: Proceedings of the 15th international conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Improving Data Locality by Array Contraction

Loop Restructuring for Data I/O Minimization on Limited On-Chip Memory Embedded Processors

Improving data locality with loop transformations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media