nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

4. Symbolic Multi-Level Parallelization

verfasst von : Alexandru-Petru Tanase, Frank Hannig, Jürgen Teich

Erschienen in: Symbolic Parallelization of Nested Loop Programs

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Today’s multiprocessor systems-on-chips (MPSoCs) have brought up massively parallel processor array accelerators that may achieve a high computational efficiency by exploiting multiple levels of parallelism and different memory hierarchies. However, existing parallelization techniques are often unable to exploit multiple levels of parallelism and are either I/O or memory bounded. Furthermore, if the number of available processing elements becomes only known at runtime—as in adaptive systems—static approaches fail. In this chapter, we solve these problems by proposing a hybrid compile/runtime multi-level symbolic parallelization technique that is able to: (a) Exploit multiple levels of parallelism as well as (b) different memory hierarchies, and (c) to match the I/O or memory capabilities of the target architecture for scenarios where the number of available processing elements is only known at runtime. Our proposed technique consists of two compile-time transformations: (1) symbolic hierarchical tiling followed by (2) symbolic multi-level scheduling. The tiling levels scheduled in parallel exploit different levels of parallelism, whereas the sequential one, different memory hierarchies. Furthermore, by tuning the size of the tiles on the individual levels, a tradeoff between the necessary I/O-bandwidth and memory is possible, which facilitates obeying resource constraints. The resulting schedules are symbolic with respect to the problem size and tile sizes. Thus, the number of processing elements to map onto does not need to be known at compile time. At runtime, when the number of available processors becomes known, a simple prologue chooses a feasible schedule with respect to I/O and memory constraints that is latency-optimal for the chosen tile size. This approach exploits multiple levels of parallelism, is independent of the problem size of the loop nest and thereby avoids any expensive re-compilation at runtime. This is particularly important for low cost and memory-scarce embedded MPSoC platforms that may not afford to host a just-in-time compiler.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Symbolic Parallelization

Nächstes Kapitel On-Demand Fault-Tolerant Loop Processing

We again assume for simplicity that a single iteration of a given loop may be executed in a unit of time. We envision to generalize this, as part of future work (see Sect. 6.2).

Represented the size of a certain feedback register.

[DHT06]

Dutta, H., Hannig, F., & Teich, J. (2006). Hierarchical partitioning for piecewise linear algorithms. In Proceedings of the 5th International Conference on Parallel Computing in Electrical Engineering (PARELEC), September 2006 (pp. 153–160). Washington, DC, USA: IEEE Computer Society.

[DR95]

Darte, A., & Robert, Y. (1995). Affine-by-statement scheduling of uniform and affine loop nests over parametric domains. Journal of Parallel and Distributed Computing, 29(1), 43–59.CrossRef

[DSR.

Darte, A., Schreiber, R., Rau, B. R., & Vivien, F. (2000) A constructive solution to the juggling problem in systolic array synthesis. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS) (pp. 815–821).

[EM97]

Eckhardt, U., & Merker, R. (1997). Scheduling in co-partitioned array architectures. In IEEE International Conference on Proceedings of the Application-Specific Systems, Architectures and Processors, July 1997 (pp. 219–228).

[EM99]

Eckhardt, U., & Merker, R. (1999). Hierarchical algorithm partitioning at system level for an improved utilization of memory structures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 18(1), 14–24.CrossRef

[HBB.

Hartono, A., Baskaran, M. M., Bastoul, C., Cohen, A., Krishnamoorthy, S., Norris, B., et al. (2009). Parametric multi-level tiling of imperfectly nested loops. In Proceedings of the 23rd International Conference on Supercomputing (ICS), New York, NY, USA, 2009 (pp. 147–157).

[HLB.

Hannig, F., Lari, V., Boppu, S., Tanase, A., & Reiche, O. (2014). Invasive tightly-coupled processor arrays: A domain-specific architecture/compiler co-design approach. ACM Transactions on Embedded Computing Systems (TECS), 13(4s), 133:1–133:29.

[JLF03]

Jiménez, M., Llabería, J. M., & Fernández, A. (2003). A cost-effective implementation of multilevel tiling. IEEE Transactions on Parallel and Distributed Systems, 14(10), 1006–1020.CrossRef

[KR09]

Kim, D., & Rajopadhye, S. (2009). Efficient tiled loop generation: D-tiling. In Workshop on Languages and Compilers for Parallel Computing (LCPC). Lecture notes in computer science (Vol. 5898, pp. 293–307). Berlin: Springer.

[KRR.

Kim, D., Renganarayanan, L., Rostron, D., Rajopadhye, S., & Strout, M. M. (2007). Multi-level tiling: M for the price of one. In SC’07: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, New York, NY, USA, 2007 (pp. 1–12).

[MEFS97]

Merker, R., Eckhardt, U., Fimmel, D., & Schreiber, H. (1997). A system for designing parallel processor arrays. Computer Aided Systems Theory—EUROCAST’97 (pp. 1–12).

[RHMDR07]

Renganarayana, L., Harthikote-Matha, M., Dewri, R., & Rajopadhye, S. (2007). Towards optimal multi-level tiling for stencil computations. In IEEE International Parallel and Distributed Processing Symposium, 2007. IPDPS 2007 (pp. 1–10). New York: IEEE.

[RKRS07]

Renganarayanan, L., Kim, D., Rajopadhye, S., & Strout, M. M. (2007). Parameterized tiled loops for free. In Proceeding of the Conference on Programming Language Design and Implementation, San Diego, CA, USA, 2007 (pp. 405–414).

[RKSR10]

Renganarayanan, L., Kim, D., Strout, M. M., Rajopadhye, S. (2010). Parameterized loop tiling. ACM Transactions on Programming Languages and Systems (pp. 3:1–3:41).

[RT99]

Rivera, G., & Tseng, C.-W. (1999). Locality optimizations for multi-level caches. In Proceedings of the 1999 ACM/IEEE Conference on Supercomputing (p. 2). New York: ACM.

[TTH13]

Teich, J., Tanase, A., & Hannig, F. (2013). Symbolic parallelization of loop programs for massively parallel processor arrays. In Proceedings of the 24th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP) (pp. 1–9). New York: IEEE. Best Paper Award.

[TTH14]

Teich, J., Tanase, A., & Hannig, F. (2014). Symbolic mapping of loop programs onto processor arrays. Journal of Signal Processing Systems, 77(1–2), 31–59.CrossRef

[TWTH14]

Tanase, A., Witterauf, M., Teich, J., & Hannig, F. (2014). Symbolic inner loop parallelisation for massively parallel processor arrays. In Proceedings of the 12th ACM-IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE) (pp. 219–228).

[TWTH15]

Tanase, A., Witterauf, M., Teich, J., & Hannig, F. (2015). Symbolic loop parallelization for balancing I/O and memory accesses on processor arrays. In Proceedings of the 13th ACM-IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE) (pp. 188–197). New York: IEEE.

[TWTH17]

Tanase, A., Witterauf, M., Teich, J., & Hannig, F. (2017). Symbolic multi-level loop mapping of loop programs for massively parallel processor arrays. ACM Transactions on Embedded Computing Systems, 17(2), 31:1–31:27.

[YR13]

Yuki, T., & Rajopadhye, S. (2013). Parametrically Tiled Distributed Memory Parallelization of Polyhedral Programs. Technical Report, Citeseer.

Titel: Symbolic Multi-Level Parallelization
verfasst von: Alexandru-Petru Tanase
Frank Hannig
Jürgen Teich
Verlag: Springer International Publishing
Buch: Symbolic Parallelization of Nested Loop Programs
Print ISBN: 978-3-319-73908-3

Electronic ISBN: 978-3-319-73909-0

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-319-73909-0_4

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence_ieS/© Springer Fachmedien Wiesbaden GmbH, Search Icon, Banner Hanser, Strompreise/© vejaa / stock.adobe.com, Bunte Männchen, die Kunden darstelle, werden von einem riesigen Magneten angezogen. /© Oleksiy Mark, Dr. Daniel Schneider/© Fraunhofer IESE, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.