Skip to main content

2018 | OriginalPaper | Buchkapitel

4. Symbolic Multi-Level Parallelization

verfasst von : Alexandru-Petru Tanase, Frank Hannig, Jürgen Teich

Erschienen in: Symbolic Parallelization of Nested Loop Programs

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Today’s multiprocessor systems-on-chips (MPSoCs) have brought up massively parallel processor array accelerators that may achieve a high computational efficiency by exploiting multiple levels of parallelism and different memory hierarchies. However, existing parallelization techniques are often unable to exploit multiple levels of parallelism and are either I/O or memory bounded. Furthermore, if the number of available processing elements becomes only known at runtime—as in adaptive systems—static approaches fail. In this chapter, we solve these problems by proposing a hybrid compile/runtime multi-level symbolic parallelization technique that is able to: (a) Exploit multiple levels of parallelism as well as (b) different memory hierarchies, and (c) to match the I/O or memory capabilities of the target architecture for scenarios where the number of available processing elements is only known at runtime. Our proposed technique consists of two compile-time transformations: (1) symbolic hierarchical tiling followed by (2) symbolic multi-level scheduling. The tiling levels scheduled in parallel exploit different levels of parallelism, whereas the sequential one, different memory hierarchies. Furthermore, by tuning the size of the tiles on the individual levels, a tradeoff between the necessary I/O-bandwidth and memory is possible, which facilitates obeying resource constraints. The resulting schedules are symbolic with respect to the problem size and tile sizes. Thus, the number of processing elements to map onto does not need to be known at compile time. At runtime, when the number of available processors becomes known, a simple prologue chooses a feasible schedule with respect to I/O and memory constraints that is latency-optimal for the chosen tile size. This approach exploits multiple levels of parallelism, is independent of the problem size of the loop nest and thereby avoids any expensive re-compilation at runtime. This is particularly important for low cost and memory-scarce embedded MPSoC platforms that may not afford to host a just-in-time compiler.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
We again assume for simplicity that a single iteration of a given loop may be executed in a unit of time. We envision to generalize this, as part of future work (see Sect. 6.​2).
 
2
Represented the size of a certain feedback register.
 
Literatur
[DHT06]
Zurück zum Zitat Dutta, H., Hannig, F., & Teich, J. (2006). Hierarchical partitioning for piecewise linear algorithms. In Proceedings of the 5th International Conference on Parallel Computing in Electrical Engineering (PARELEC), September 2006 (pp. 153–160). Washington, DC, USA: IEEE Computer Society. Dutta, H., Hannig, F., & Teich, J. (2006). Hierarchical partitioning for piecewise linear algorithms. In Proceedings of the 5th International Conference on Parallel Computing in Electrical Engineering (PARELEC), September 2006 (pp. 153–160). Washington, DC, USA: IEEE Computer Society.
[DR95]
Zurück zum Zitat Darte, A., & Robert, Y. (1995). Affine-by-statement scheduling of uniform and affine loop nests over parametric domains. Journal of Parallel and Distributed Computing, 29(1), 43–59.CrossRef Darte, A., & Robert, Y. (1995). Affine-by-statement scheduling of uniform and affine loop nests over parametric domains. Journal of Parallel and Distributed Computing, 29(1), 43–59.CrossRef
[DSR.
Zurück zum Zitat Darte, A., Schreiber, R., Rau, B. R., & Vivien, F. (2000) A constructive solution to the juggling problem in systolic array synthesis. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS) (pp. 815–821). Darte, A., Schreiber, R., Rau, B. R., & Vivien, F. (2000) A constructive solution to the juggling problem in systolic array synthesis. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS) (pp. 815–821).
[EM97]
Zurück zum Zitat Eckhardt, U., & Merker, R. (1997). Scheduling in co-partitioned array architectures. In IEEE International Conference on Proceedings of the Application-Specific Systems, Architectures and Processors, July 1997 (pp. 219–228). Eckhardt, U., & Merker, R. (1997). Scheduling in co-partitioned array architectures. In IEEE International Conference on Proceedings of the Application-Specific Systems, Architectures and Processors, July 1997 (pp. 219–228).
[EM99]
Zurück zum Zitat Eckhardt, U., & Merker, R. (1999). Hierarchical algorithm partitioning at system level for an improved utilization of memory structures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 18(1), 14–24.CrossRef Eckhardt, U., & Merker, R. (1999). Hierarchical algorithm partitioning at system level for an improved utilization of memory structures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 18(1), 14–24.CrossRef
[HBB.
Zurück zum Zitat Hartono, A., Baskaran, M. M., Bastoul, C., Cohen, A., Krishnamoorthy, S., Norris, B., et al. (2009). Parametric multi-level tiling of imperfectly nested loops. In Proceedings of the 23rd International Conference on Supercomputing (ICS), New York, NY, USA, 2009 (pp. 147–157). Hartono, A., Baskaran, M. M., Bastoul, C., Cohen, A., Krishnamoorthy, S., Norris, B., et al. (2009). Parametric multi-level tiling of imperfectly nested loops. In Proceedings of the 23rd International Conference on Supercomputing (ICS), New York, NY, USA, 2009 (pp. 147–157).
[HLB.
Zurück zum Zitat Hannig, F., Lari, V., Boppu, S., Tanase, A., & Reiche, O. (2014). Invasive tightly-coupled processor arrays: A domain-specific architecture/compiler co-design approach. ACM Transactions on Embedded Computing Systems (TECS), 13(4s), 133:1–133:29. Hannig, F., Lari, V., Boppu, S., Tanase, A., & Reiche, O. (2014). Invasive tightly-coupled processor arrays: A domain-specific architecture/compiler co-design approach. ACM Transactions on Embedded Computing Systems (TECS), 13(4s), 133:1–133:29.
[JLF03]
Zurück zum Zitat Jiménez, M., Llabería, J. M., & Fernández, A. (2003). A cost-effective implementation of multilevel tiling. IEEE Transactions on Parallel and Distributed Systems, 14(10), 1006–1020.CrossRef Jiménez, M., Llabería, J. M., & Fernández, A. (2003). A cost-effective implementation of multilevel tiling. IEEE Transactions on Parallel and Distributed Systems, 14(10), 1006–1020.CrossRef
[KR09]
Zurück zum Zitat Kim, D., & Rajopadhye, S. (2009). Efficient tiled loop generation: D-tiling. In Workshop on Languages and Compilers for Parallel Computing (LCPC). Lecture notes in computer science (Vol. 5898, pp. 293–307). Berlin: Springer. Kim, D., & Rajopadhye, S. (2009). Efficient tiled loop generation: D-tiling. In Workshop on Languages and Compilers for Parallel Computing (LCPC). Lecture notes in computer science (Vol. 5898, pp. 293–307). Berlin: Springer.
[KRR.
Zurück zum Zitat Kim, D., Renganarayanan, L., Rostron, D., Rajopadhye, S., & Strout, M. M. (2007). Multi-level tiling: M for the price of one. In SC’07: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, New York, NY, USA, 2007 (pp. 1–12). Kim, D., Renganarayanan, L., Rostron, D., Rajopadhye, S., & Strout, M. M. (2007). Multi-level tiling: M for the price of one. In SC’07: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, New York, NY, USA, 2007 (pp. 1–12).
[MEFS97]
Zurück zum Zitat Merker, R., Eckhardt, U., Fimmel, D., & Schreiber, H. (1997). A system for designing parallel processor arrays. Computer Aided Systems Theory—EUROCAST’97 (pp. 1–12). Merker, R., Eckhardt, U., Fimmel, D., & Schreiber, H. (1997). A system for designing parallel processor arrays. Computer Aided Systems Theory—EUROCAST’97 (pp. 1–12).
[RHMDR07]
Zurück zum Zitat Renganarayana, L., Harthikote-Matha, M., Dewri, R., & Rajopadhye, S. (2007). Towards optimal multi-level tiling for stencil computations. In IEEE International Parallel and Distributed Processing Symposium, 2007. IPDPS 2007 (pp. 1–10). New York: IEEE. Renganarayana, L., Harthikote-Matha, M., Dewri, R., & Rajopadhye, S. (2007). Towards optimal multi-level tiling for stencil computations. In IEEE International Parallel and Distributed Processing Symposium, 2007. IPDPS 2007 (pp. 1–10). New York: IEEE.
[RKRS07]
Zurück zum Zitat Renganarayanan, L., Kim, D., Rajopadhye, S., & Strout, M. M. (2007). Parameterized tiled loops for free. In Proceeding of the Conference on Programming Language Design and Implementation, San Diego, CA, USA, 2007 (pp. 405–414). Renganarayanan, L., Kim, D., Rajopadhye, S., & Strout, M. M. (2007). Parameterized tiled loops for free. In Proceeding of the Conference on Programming Language Design and Implementation, San Diego, CA, USA, 2007 (pp. 405–414).
[RKSR10]
Zurück zum Zitat Renganarayanan, L., Kim, D., Strout, M. M., Rajopadhye, S. (2010). Parameterized loop tiling. ACM Transactions on Programming Languages and Systems (pp. 3:1–3:41). Renganarayanan, L., Kim, D., Strout, M. M., Rajopadhye, S. (2010). Parameterized loop tiling. ACM Transactions on Programming Languages and Systems (pp. 3:1–3:41).
[RT99]
Zurück zum Zitat Rivera, G., & Tseng, C.-W. (1999). Locality optimizations for multi-level caches. In Proceedings of the 1999 ACM/IEEE Conference on Supercomputing (p. 2). New York: ACM. Rivera, G., & Tseng, C.-W. (1999). Locality optimizations for multi-level caches. In Proceedings of the 1999 ACM/IEEE Conference on Supercomputing (p. 2). New York: ACM.
[TTH13]
Zurück zum Zitat Teich, J., Tanase, A., & Hannig, F. (2013). Symbolic parallelization of loop programs for massively parallel processor arrays. In Proceedings of the 24th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP) (pp. 1–9). New York: IEEE. Best Paper Award. Teich, J., Tanase, A., & Hannig, F. (2013). Symbolic parallelization of loop programs for massively parallel processor arrays. In Proceedings of the 24th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP) (pp. 1–9). New York: IEEE. Best Paper Award.
[TTH14]
Zurück zum Zitat Teich, J., Tanase, A., & Hannig, F. (2014). Symbolic mapping of loop programs onto processor arrays. Journal of Signal Processing Systems, 77(1–2), 31–59.CrossRef Teich, J., Tanase, A., & Hannig, F. (2014). Symbolic mapping of loop programs onto processor arrays. Journal of Signal Processing Systems, 77(1–2), 31–59.CrossRef
[TWTH14]
Zurück zum Zitat Tanase, A., Witterauf, M., Teich, J., & Hannig, F. (2014). Symbolic inner loop parallelisation for massively parallel processor arrays. In Proceedings of the 12th ACM-IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE) (pp. 219–228). Tanase, A., Witterauf, M., Teich, J., & Hannig, F. (2014). Symbolic inner loop parallelisation for massively parallel processor arrays. In Proceedings of the 12th ACM-IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE) (pp. 219–228).
[TWTH15]
Zurück zum Zitat Tanase, A., Witterauf, M., Teich, J., & Hannig, F. (2015). Symbolic loop parallelization for balancing I/O and memory accesses on processor arrays. In Proceedings of the 13th ACM-IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE) (pp. 188–197). New York: IEEE. Tanase, A., Witterauf, M., Teich, J., & Hannig, F. (2015). Symbolic loop parallelization for balancing I/O and memory accesses on processor arrays. In Proceedings of the 13th ACM-IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE) (pp. 188–197). New York: IEEE.
[TWTH17]
Zurück zum Zitat Tanase, A., Witterauf, M., Teich, J., & Hannig, F. (2017). Symbolic multi-level loop mapping of loop programs for massively parallel processor arrays. ACM Transactions on Embedded Computing Systems, 17(2), 31:1–31:27. Tanase, A., Witterauf, M., Teich, J., & Hannig, F. (2017). Symbolic multi-level loop mapping of loop programs for massively parallel processor arrays. ACM Transactions on Embedded Computing Systems, 17(2), 31:1–31:27.
[YR13]
Zurück zum Zitat Yuki, T., & Rajopadhye, S. (2013). Parametrically Tiled Distributed Memory Parallelization of Polyhedral Programs. Technical Report, Citeseer. Yuki, T., & Rajopadhye, S. (2013). Parametrically Tiled Distributed Memory Parallelization of Polyhedral Programs. Technical Report, Citeseer.
Metadaten
Titel
Symbolic Multi-Level Parallelization
verfasst von
Alexandru-Petru Tanase
Frank Hannig
Jürgen Teich
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-73909-0_4

Neuer Inhalt