skip to main content
10.1145/377792.377806acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Data locality enhancement by memory reduction

Authors Info & Claims
Published:17 June 2001Publication History

ABSTRACT

In this paper, we propose memory reduction as a new approach to data locality enhancement. Under this approach, we use the compiler to reduce the size of the data repeatedly referenced in a collection of nested loops. Between their reuses, the data will more likely remain in higher-speed memory devices, such as the cache. Specifically, we present an optimal algorithm to combine loop shifting, loop fusion and array contraction to reduce the temporary array storage required to execute a collection of loops. When applied to 20 benchmark programs, our technique reduces the memory requirement, counting both the data and the code, by 51% on average. The transformed programs gain a speedup of 1.40 on average, due to the reduced footprint and, consequently, the improved data locality.

References

  1. 1.R. Ahuja, T. Magnanti, and J. Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice-Hall Inc., Englewood Cliffs, New Jersey, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.M. S. Bazaraa, J. J. Jarvis, and H. D. Sherali. Linear Programming and Network Flows. Wiley, NewYork, 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.D. Callahan, S. Carr, and K. Kennedy. Improving register allocation for subscripted variables. In Proceedings of ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation, pages 53-65, White Plains, New York, June 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.B. Creusillet and F. Irigoin. Interprocedural array region analyses. International Journal of Parallel Programming, 24(6):513-546, December 1996.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.A. Darte. On the complixity of loop fusion. In Proceedings of International Conference on Parallel Architecture and Compilation Techniques, pages 149-157, Newport Beach, California, October 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.C. Ding. Improving Effective Bandwidth Through Compiler Enhancement of Global and Dynamic Cache Reuse. PhD thesis, Department of Computer Science, Rice University, January 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing, August 1991. Also in Lecture Notes in Computer Science, pp. 328-341, Springer-Verlag, August 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.A. Fraboulet, G. Hurard, and A. Mignotte. Loop alignment for memory accesses optimization. In Proceedings of the Twelfth International Symposium on System Synthesis, Boca Raton, Florida, November 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformation. Journal of Parallel and Distributed Computing, 5(5):587-616, October 1988.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.G. R. Gao, R. Olsen, V. Sarkar, and R. Thekkath. Collective loop fusion for array contraction. In Proceedings of the Fifth Workshop on Languages and Compilers for Parallel Computing. Also in No. 757 in Lecture Notes in Computer Science, pages 281-295, Springer-Verlag, 1992.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.T. Gross and P. Steenkiste. Structured data ow analysis for arrays and its use in an optimizing compiler. Software-Practice and Experience, 20(2), February 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.J. Gu, Z. Li, and G. Lee. Experience with efficient array data ow analysis for array privatization. In Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice ofParallel Programming, pages 157-167, Las Vegas, NV, June 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.K. Kennedy and K. S. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Springer-Verlag Lecture Notes in Computer Science, 768. Proceedings of the Sixth Workhsop on Languages and Compilers for Parallel Computing, Portland, Oregon, August, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.D. J. Kuck. The Structure of Computers and Computations, volume 1. John Wiley & Sons, 1978.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.C.-C. Lam, D. Cociorva, G. Baumgartner, and P. Sadayappan. Optimization of memory usage and communication requirements for a class of loops implementing multi-dimensional integrals. In Proceedings of the Twelfth International Workshop on Languages and Compilers for Parallel Computing, San Diego, CA, August 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.C. Leiserson and J. Saxe. Retiming synchronous circuitry. Algorithmica, 6:5-35, 1991.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.E. C. Lewis, C. Lin, and L. Snyder. The implementation and evaluation of fusion and contraction in array languages. In Proceedings of the 1998 ACM SIGPLAN Conference onProgramming Language Design and Implementation, pages 50-59, Montreal, Canada, June 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.N. Manjikian and T. Abdelrahman. Fusion of loops for parallelism and locality. IEEE Transactions on Parallel and Distributed Systems, 8(2):193-209, February 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.D. Maydan, S. Amarasinghe, and M. Lam. Array data- ow analysis and its use in array privatization. In Proceedings of ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 2-15, Charleston, SC, January 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20.A. G. Mohamed, G. C. Fox, G. von Laszewski, M. Parashar, T. Haupt, K. Mills, Y.-H. Lu, N.-T. Lin, and N.-K. Yeh. Applications benchmark set for fortran-d and high performance fortran. Technical Report CRPS-TR92260, Center for Research on Parallel Computation, Rice University, June 1992.]]Google ScholarGoogle Scholar
  21. 21.J. Rice and J. Jing. Problems to test parallel and vector languages. Technical Report CSD-TR-1016, Department of Computer Science, Purdue University, 1990.]]Google ScholarGoogle Scholar
  22. 22.V. Sarkar. Optimized unrolling of nested loops. In Proceedings of the ACM International Conference on Supercomputing, pages 153-166, Santa Fe, NM, May 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23.V. Sarkar and G. R. Gao. Optimization of array accesses by collective loop transformations. In Proceedings of 1991 ACM International Conference on Supercomputing, pages 194-205, Cologne, Germany, June 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.A. Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons, 1986.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25.S. K. Singhai and K. S. McKinley. A parameterized loop fusion algorithm for improving parallelism and cache locality. The Computer Journal, 40(6), 1997.]]Google ScholarGoogle ScholarCross RefCross Ref
  26. 26.Y. Song and Z. Li. New tiling techniques to improve cache temporal locality. InProceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 215-228, Atlanta, GA, May 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. 27.Y. Song, R. Xu, C. Wang, and Z. Li. Performance enhancement by memory reduction. Technical Report CSD-TR-00-016, Department of Computer Science, Purdue University, 2000. Also available at http://www.cs.purdue.edu/homes/songyh/academic.html.]]Google ScholarGoogle Scholar
  28. 28.M. Strout, L. Carter, J. Ferrante, and B. Simon. Schedule-independent storage mapping for loops. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 24-33, San Jose, CA, October 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. 29.M. Wolf. Improving Locality and Parallelism in Nested Loops. PhD thesis, Department of Computer Science, Stanford University, August 1992.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. 30.M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Data locality enhancement by memory reduction

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICS '01: Proceedings of the 15th international conference on Supercomputing
      June 2001
      510 pages
      ISBN:158113410X
      DOI:10.1145/377792

      Copyright © 2001 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 June 2001

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      ICS '01 Paper Acceptance Rate45of133submissions,34%Overall Acceptance Rate584of2,055submissions,28%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader