Skip to main content

2018 | OriginalPaper | Buchkapitel

2. Fundamentals and Compiler Framework

verfasst von : Alexandru-Petru Tanase, Frank Hannig, Jürgen Teich

Erschienen in: Symbolic Parallelization of Nested Loop Programs

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Heterogeneous systems including power-efficient hardware accelerators are dominating the design of nowadays and future embedded computer architectures—as a requirement for energy-efficient system design. In this context, we discuss the main principles of invasive computing, then, we subsequently present the concept and structure of invasive tightly coupled processor arrays (TCPAs), which form the basis for our experiments throughout the book. For the efficient utilization of an invasive TCPA, through the concrete invasive language InvadeX10, compiler support is paramount. Without such support, programming that leverages the abundant parallelism in such architectures is very difficult, tedious, and error-prone. Unfortunately, even nowadays, there is a lack of compiler frameworks for generating efficient parallel code for massively parallel architectures. In this chapter, we therefore present LoopInvader, the first compiler for mapping nested loop programs onto invasive TCPAs. We furthermore discuss the fundamentals and background of the underlying models for algorithm and application specification.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
infect is implemented in terms of X10 places; an i-let is represented by an activity in X10, which is a lightweight thread.
 
2
For the sake of better visibility, control registers and control I/O ports are not shown in Figure 2.3.
 
3
In case of arrays, that each array element (indexed variable) is assigned only once.
 
4
Throughout this book we assume w.l.o.g. that we start from a UDA, as any PLA may be systematically transformed into a UDA using localization [Thi89, TR91] (see Section 2.3.5.2) which is automatically performed in PARO.
 
5
We define the rectangular hull \(\mathrm {rectHull}(\bigcup _{i=1}^G \mathcal {I}_i)\) as the space containing all iterations of all equations S i , with 1 ≤ i ≤ G. For the sake of simplicity, we assume that the rectangular hull origins at 0. This can be always achieved by a simple translation (i.e., lower bound is equal to zero).
 
6
Including single assignment conversion see Section 2.3.
 
7
For the rest of this book, we assume this functionality.
 
8
Invasive X10 loops will be automatically transformed by the LoopInvader’s front end into PAULA, as described in Section 2.3.
 
9
For example, map computations onto a fixed number of processors, local memory/register sizes, and communication bandwidth.
 
10
For this example, we assume an Locally Sequential Globally Parallel (LSGP) (see Section 2.3.5.4) mapping technique, where each tile—with the tile sizes described by a static tiling matrix P = diag(T,  3)— corresponds to one processor, which executes the iterations within the tile in a sequential manner.
 
11
Such dimensions (with zero iterations) are automatically removed in PARO through a source-to-source transformation.
 
12
It is assumed in the following that each \(\mathcal {F}_i\) can be mapped to a functional unit of a TCPA as a basic instruction. If \(\mathcal {F}_i\) is a more complex mathematical expression, the corresponding equation must be split into equations of this granularity [Tei93].
 
13
The formula is exact if the iteration space \(\mathcal {I}\) is dense, i.e., does not contain any iteration vectors where no equation is defined.
 
Literatur
[BBH.
Zurück zum Zitat Braun, M., Buchwald, S., Hack, S., Leißa, R., Mallon, C., & Zwinkau A. (2013). Simple and efficient construction of static single assignment form. In R. Jhala & K. Bosschere (Eds.), Compiler construction. Lecture notes in computer science (Vol. 7791, pp. 102–122). Berlin: Springer. Braun, M., Buchwald, S., Hack, S., Leißa, R., Mallon, C., & Zwinkau A. (2013). Simple and efficient construction of static single assignment form. In R. Jhala & K. Bosschere (Eds.), Compiler construction. Lecture notes in computer science (Vol. 7791, pp. 102–122). Berlin: Springer.
[BBMZ12]
Zurück zum Zitat Braun, M., Buchwald, S., Mohr, M., & Zwinkau, A. (2012). An X10 Compiler for Invasive Architectures. Technical Report 9, Karlsruhe Institute of Technology. Braun, M., Buchwald, S., Mohr, M., & Zwinkau, A. (2012). An X10 Compiler for Invasive Architectures. Technical Report 9, Karlsruhe Institute of Technology.
[BCG.
Zurück zum Zitat Bastoul, C., Cohen, A., Girbal, S., Sharma, S., & Temam, O. (2003). Putting polyhedral loop transformations to work. In Workshop on Languages and Compilers for Parallel Computing (LCPC), College Station, TX, USA, October 2003. Lecture notes in computer science (Vol. 2958, pp. 23–30). Berlin: Springer. Bastoul, C., Cohen, A., Girbal, S., Sharma, S., & Temam, O. (2003). Putting polyhedral loop transformations to work. In Workshop on Languages and Compilers for Parallel Computing (LCPC), College Station, TX, USA, October 2003. Lecture notes in computer science (Vol. 2958, pp. 23–30). Berlin: Springer.
[BHT13]
Zurück zum Zitat Boppu, S., Hannig, F., & Teich, J. (2013). Loop program mapping and compact code generation for programmable hardware accelerators. In Proceedings of the 24th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP) (pp. 10–17). New york: IEEE. Boppu, S., Hannig, F., & Teich, J. (2013). Loop program mapping and compact code generation for programmable hardware accelerators. In Proceedings of the 24th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP) (pp. 10–17). New york: IEEE.
[Bop15]
Zurück zum Zitat Boppu, S. (2015). Code Generation for Tightly Coupled Processor Arrays. Dissertation, Hardware/Software Co-Design, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany. Boppu, S. (2015). Code Generation for Tightly Coupled Processor Arrays. Dissertation, Hardware/Software Co-Design, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany.
[BTS.
Zurück zum Zitat Bhadouria, V. S., Tanase, A., Schmid, M., Hannig, F., Teich, J., & Ghoshal, D. (2016). A novel image impulse noise removal algorithm optimized for hardware accelerators. Journal of Signal Processing Systems, 89(2), 225–245.CrossRef Bhadouria, V. S., Tanase, A., Schmid, M., Hannig, F., Teich, J., & Ghoshal, D. (2016). A novel image impulse noise removal algorithm optimized for hardware accelerators. Journal of Signal Processing Systems, 89(2), 225–245.CrossRef
[CGS.
Zurück zum Zitat Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., et al. (2005). X10: An object-oriented approach to non-uniform cluster computing. ACM SIGPLAN Notices, 40(10), 519–538.CrossRef Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., et al. (2005). X10: An object-oriented approach to non-uniform cluster computing. ACM SIGPLAN Notices, 40(10), 519–538.CrossRef
[Fea91]
Zurück zum Zitat Feautrier, P. (1991). Dataflow analysis of array and scalar references. International Journal of Parallel Programming, 20(1), 23–53.CrossRefMATH Feautrier, P. (1991). Dataflow analysis of array and scalar references. International Journal of Parallel Programming, 20(1), 23–53.CrossRefMATH
[FL11]
Zurück zum Zitat Feautrier, P., & Lengauer, C. (2011). Polyhedron model. In Encyclopedia of parallel computing (pp. 1581–1592). Feautrier, P., & Lengauer, C. (2011). Polyhedron model. In Encyclopedia of parallel computing (pp. 1581–1592).
[GBH17]
Zurück zum Zitat Grudnitsky, A., Bauer, L., & Henkel, J. (2017). Efficient partial online synthesis of special instructions for reconfigurable processors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 25(2), 594–607.CrossRef Grudnitsky, A., Bauer, L., & Henkel, J. (2017). Efficient partial online synthesis of special instructions for reconfigurable processors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 25(2), 594–607.CrossRef
[GSL.
Zurück zum Zitat Gangadharan, D., Sousa, E., Lari, V., Hannig, F., & Teich, J. (2015). Application-driven reconfiguration of shared resources for timing predictability of MPSoC platforms. In Proceedings of Asilomar Conference on Signals, Systems, and Computers (ASILOMAR) (pp. 398–403). Washington, DC, USA: IEEE Computer Society. Gangadharan, D., Sousa, E., Lari, V., Hannig, F., & Teich, J. (2015). Application-driven reconfiguration of shared resources for timing predictability of MPSoC platforms. In Proceedings of Asilomar Conference on Signals, Systems, and Computers (ASILOMAR) (pp. 398–403). Washington, DC, USA: IEEE Computer Society.
[GTHT14]
Zurück zum Zitat Gangadharan, D., Tanase, A., Hannig, F., & Teich, J. (2014). Timing analysis of a heterogeneous architecture with massively parallel processor arrays. In DATE Friday Workshop on Performance, Power and Predictability of Many-Core Embedded Systems (3PMCES). ECSI. Gangadharan, D., Tanase, A., Hannig, F., & Teich, J. (2014). Timing analysis of a heterogeneous architecture with massively parallel processor arrays. In DATE Friday Workshop on Performance, Power and Predictability of Many-Core Embedded Systems (3PMCES). ECSI.
[Han09]
Zurück zum Zitat Hannig, F. (2009). Scheduling Techniques for High-throughput Loop Accelerators. Dissertation, University of Erlangen-Nuremberg, Germany, Verlag Dr. Hut, Munich, Germany. ISBN: 978-3-86853-220-3. Hannig, F. (2009). Scheduling Techniques for High-throughput Loop Accelerators. Dissertation, University of Erlangen-Nuremberg, Germany, Verlag Dr. Hut, Munich, Germany. ISBN: 978-3-86853-220-3.
[HHB.
Zurück zum Zitat Henkel, J., Herkersdorf, A., Bauer, L., Wild, T., Hübner, M., Pujari, R. K., et al. (2012). Invasive manycore architectures. In 17th Asia and South Pacific Design Automation Conference (ASP-DAC) (pp. 193–200). New York: IEEE.CrossRef Henkel, J., Herkersdorf, A., Bauer, L., Wild, T., Hübner, M., Pujari, R. K., et al. (2012). Invasive manycore architectures. In 17th Asia and South Pacific Design Automation Conference (ASP-DAC) (pp. 193–200). New York: IEEE.CrossRef
[HLB.
Zurück zum Zitat Hannig, F., Lari, V., Boppu, S., Tanase, A., & Reiche, O. (2014). Invasive tightly-coupled processor arrays: A domain-specific architecture/compiler co-design approach. ACM Transactions on Embedded Computing Systems (TECS), 13(4s), 133:1–133:29. Hannig, F., Lari, V., Boppu, S., Tanase, A., & Reiche, O. (2014). Invasive tightly-coupled processor arrays: A domain-specific architecture/compiler co-design approach. ACM Transactions on Embedded Computing Systems (TECS), 13(4s), 133:1–133:29.
[HRDT08]
Zurück zum Zitat Hannig, F., Ruckdeschel, H., Dutta, H., & Teich, J. (2008). PARO: Synthesis of hardware accelerators for multi-dimensional dataflow-intensive applications. In Proceedings of the Fourth International Workshop on Applied Reconfigurable Computing (ARC). Lecture notes in computer science, March 2008 (Vol. 4943, pp. 287–293). London, UK: Springer. Hannig, F., Ruckdeschel, H., Dutta, H., & Teich, J. (2008). PARO: Synthesis of hardware accelerators for multi-dimensional dataflow-intensive applications. In Proceedings of the Fourth International Workshop on Applied Reconfigurable Computing (ARC). Lecture notes in computer science, March 2008 (Vol. 4943, pp. 287–293). London, UK: Springer.
[HRS.
Zurück zum Zitat Hannig, F., Roloff, S., Snelting, G., Teich, J., & Zwinkau, A. (2011). Resource-aware programming and simulation of MPSoC architectures through extension of X10. In Proceedings of the 14th International Workshop on Software and Compilers for Embedded Systems (pp. 48–55). New York: ACM. Hannig, F., Roloff, S., Snelting, G., Teich, J., & Zwinkau, A. (2011). Resource-aware programming and simulation of MPSoC architectures through extension of X10. In Proceedings of the 14th International Workshop on Software and Compilers for Embedded Systems (pp. 48–55). New York: ACM.
[HRT08]
Zurück zum Zitat Hannig, F., Ruckdeschel, H., & Teich, J. (2008). The PAULA language for designing multi-dimensional dataflow-intensive applications. In Proceedings of the GI/ITG/GMM-Workshop – Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen (pp. 129–138). Freiburg, Germany: Shaker. Hannig, F., Ruckdeschel, H., & Teich, J. (2008). The PAULA language for designing multi-dimensional dataflow-intensive applications. In Proceedings of the GI/ITG/GMM-Workshop – Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen (pp. 129–138). Freiburg, Germany: Shaker.
[HSL.
Zurück zum Zitat Hannig, F., Schmid, M., Lari, V., Boppu, S., & Teich, J. (2013). System integration of tightly-coupled processor arrays using reconfigurable buffer structures. In Proceedings of the ACM International Conference on Computing Frontiers (CF) (pp. 2:1–2:4). New York: ACM. Hannig, F., Schmid, M., Lari, V., Boppu, S., & Teich, J. (2013). System integration of tightly-coupled processor arrays using reconfigurable buffer structures. In Proceedings of the ACM International Conference on Computing Frontiers (CF) (pp. 2:1–2:4). New York: ACM.
[HT04]
Zurück zum Zitat Hannig, F., & Teich, J. (2004). Dynamic piecewise linear/regular algorithms. In International Conference on Parallel Computing in Electrical Engineering. PARELEC’04 (pp. 79–84). New York: IEEE. Hannig, F., & Teich, J. (2004). Dynamic piecewise linear/regular algorithms. In International Conference on Parallel Computing in Electrical Engineering. PARELEC’04 (pp. 79–84). New York: IEEE.
[HZW.
Zurück zum Zitat Heisswolf, J., Zaib, A., Weichslgartner, A., Karle, M., Singh, M., Wild, T., et al. (2014). The invasive network on chip - a multi-objective many-core communication infrastructure. In ARCS’14; Workshop Proceedings on Architecture of Computing Systems (pp. 1–8). Heisswolf, J., Zaib, A., Weichslgartner, A., Karle, M., Singh, M., Wild, T., et al. (2014). The invasive network on chip - a multi-objective many-core communication infrastructure. In ARCS’14; Workshop Proceedings on Architecture of Computing Systems (pp. 1–8).
[Jai86]
Zurück zum Zitat Jainandunsing, K. (1986). Optimal partitioning scheme for wavefront/systolic array processors. In Proceedings of IEEE Symposium on Circuits and Systems (pp. 940–943). Jainandunsing, K. (1986). Optimal partitioning scheme for wavefront/systolic array processors. In Proceedings of IEEE Symposium on Circuits and Systems (pp. 940–943).
[KHKT06a]
Zurück zum Zitat Kissler, D., Hannig, F., Kupriyanov, A., & Teich, J. (2006). A dynamically reconfigurable weakly programmable processor array architecture template. In Proceedings of the International Workshop on Reconfigurable Communication Centric System-on-Chips (ReCoSoC) (pp. 31–37). Kissler, D., Hannig, F., Kupriyanov, A., & Teich, J. (2006). A dynamically reconfigurable weakly programmable processor array architecture template. In Proceedings of the International Workshop on Reconfigurable Communication Centric System-on-Chips (ReCoSoC) (pp. 31–37).
[KHKT06b]
Zurück zum Zitat Kissler, D., Hannig, F., Kupriyanov, A., & Teich, J. (2006). A highly parameterizable parallel processor array architecture. In Proceedings of the IEEE International Conference on Field Programmable Technology (FPT), (pp. 105–112). New York: IEEE. Kissler, D., Hannig, F., Kupriyanov, A., & Teich, J. (2006). A highly parameterizable parallel processor array architecture. In Proceedings of the IEEE International Conference on Field Programmable Technology (FPT), (pp. 105–112). New York: IEEE.
[KMW67]
Zurück zum Zitat Karp, R. M., Miller, R. E., & Winograd, S. (1967). The organization of computations for uniform recurrence equations. Journal of the ACM, 14(3), 563–590.MathSciNetCrossRefMATH Karp, R. M., Miller, R. E., & Winograd, S. (1967). The organization of computations for uniform recurrence equations. Journal of the ACM, 14(3), 563–590.MathSciNetCrossRefMATH
[KRZ.
Zurück zum Zitat Klues, K., Rhoden, B., Zhu, Y., Waterman, A., & Brewer, E. (2010). Processes and resource management in a scalable many-core OS. In HotPar10, Berkeley, CA, 2010. Klues, K., Rhoden, B., Zhu, Y., Waterman, A., & Brewer, E. (2010). Processes and resource management in a scalable many-core OS. In HotPar10, Berkeley, CA, 2010.
[KSHT09]
Zurück zum Zitat Kissler, D, Strawetz, A., Hannig, F., & Teich, J. (2009). Power-efficient reconfiguration control in coarse-grained dynamically reconfigurable architectures. Journal of Low Power Electronics, 5(1), 96–105.CrossRef Kissler, D, Strawetz, A., Hannig, F., & Teich, J. (2009). Power-efficient reconfiguration control in coarse-grained dynamically reconfigurable architectures. Journal of Low Power Electronics, 5(1), 96–105.CrossRef
[Kup09]
Zurück zum Zitat Kupriyanov, O. (2009). Modeling and Efficient Simulation of Complex System-on-a-Chip Architectures. PhD thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany. Kupriyanov, O. (2009). Modeling and Efficient Simulation of Complex System-on-a-Chip Architectures. PhD thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany.
[Lar16]
Zurück zum Zitat Lari, V. (2016). Invasive tightly coupled processor arrays. In Springer Book Series on Computer Architecture and Design Methodologies. Berlin: Springer. ISBN: 978-981-10-1058-3. Lari, V. (2016). Invasive tightly coupled processor arrays. In Springer Book Series on Computer Architecture and Design Methodologies. Berlin: Springer. ISBN: 978-981-10-1058-3.
[LBBG05]
Zurück zum Zitat Lindenmaier, G., Beck, M., Boesler, B., & Geiß, R. (2005). FIRM, An Intermediate Language for Compiler Research. Technical Report 2005-8, Fakultät für Informatik, Universität Karlsruhe, Karlsruhe, Germany. Lindenmaier, G., Beck, M., Boesler, B., & Geiß, R. (2005). FIRM, An Intermediate Language for Compiler Research. Technical Report 2005-8, Fakultät für Informatik, Universität Karlsruhe, Karlsruhe, Germany.
[Len93]
Zurück zum Zitat Lengauer, C. (1993). Loop parallelization in the polytope model. In CONCUR (Vol. 715, pp. 398–416). Lengauer, C. (1993). Loop parallelization in the polytope model. In CONCUR (Vol. 715, pp. 398–416).
[Lin06]
Zurück zum Zitat Lindenmaier, G. (2006). libFIRM – A Library for Compiler Optimization Research Implementing FIRM. Technical Report 2002-5, Fakultät für Informatik, Universität Karlsruhe, Karlsruhe, Germany. Lindenmaier, G. (2006). libFIRM – A Library for Compiler Optimization Research Implementing FIRM. Technical Report 2002-5, Fakultät für Informatik, Universität Karlsruhe, Karlsruhe, Germany.
[LNHT11]
Zurück zum Zitat Lari, V., Narovlyanskyy, A., Hannig, F., & Teich, J. (2011). Decentralized dynamic resource management support for massively parallel processor arrays. In Proceedings of the 22nd IEEE International Conference on Application-specific Systems, Architectures, and Processors (ASAP), Santa Monica, CA, USA, September 2011. Lari, V., Narovlyanskyy, A., Hannig, F., & Teich, J. (2011). Decentralized dynamic resource management support for massively parallel processor arrays. In Proceedings of the 22nd IEEE International Conference on Application-specific Systems, Architectures, and Processors (ASAP), Santa Monica, CA, USA, September 2011.
[LNOM08]
Zurück zum Zitat Lindholm, E., Nickolls, J., Oberman, S., & Montrym, J. (2008). NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro, 28(2), 39–55.CrossRef Lindholm, E., Nickolls, J., Oberman, S., & Montrym, J. (2008). NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro, 28(2), 39–55.CrossRef
[LWT.
Zurück zum Zitat Lari, V., Weichslgartner, A., Tanase, A., Witterauf, M., Khosravi, F., Teich, J., et al. (2016). Providing fault tolerance through invasive computing. Information Technology, 58(6), 309–328. Lari, V., Weichslgartner, A., Tanase, A., Witterauf, M., Khosravi, F., Teich, J., et al. (2016). Providing fault tolerance through invasive computing. Information Technology, 58(6), 309–328.
[MF86]
Zurück zum Zitat Moldovan, D. I., & Fortes, J. A. B. (1986). Partitioning and mapping algorithms into fixed size systolic arrays. IEEE Transactions on Computers, C-35(1), 1–12.CrossRefMATH Moldovan, D. I., & Fortes, J. A. B. (1986). Partitioning and mapping algorithms into fixed size systolic arrays. IEEE Transactions on Computers, C-35(1), 1–12.CrossRefMATH
[MJU.
Zurück zum Zitat Mehrara, M., Jablin, T. B., Upton, D., August, D. I., Hazelwood, K., & Mahlke, S. (2009). Compilation strategies and challenges for multicore signal processing. IEEE Signal Processing Magazine, 26(6), 55–63.CrossRef Mehrara, M., Jablin, T. B., Upton, D., August, D. I., Hazelwood, K., & Mahlke, S. (2009). Compilation strategies and challenges for multicore signal processing. IEEE Signal Processing Magazine, 26(6), 55–63.CrossRef
[Mun12]
Zurück zum Zitat Munshi, A. (2012). The OpenCL Specification Version 1.2. Khronos OpenCL Working Group. Munshi, A. (2012). The OpenCL Specification Version 1.2. Khronos OpenCL Working Group.
[OSK.
Zurück zum Zitat Oechslein, B., Schedel, J., Kleinöder, J., Bauer, L., Henkel, J., Lohmann, D., et al. (2011). OctoPOS: A parallel operating system for invasive computing. In R. McIlroy, J. Sventek, T. Harris, & T. Roscoe (Eds.), Proceedings of the International Workshop on Systems for Future Multi-Core Architectures (SFMA). USB Proceedings of Sixth International ACM/EuroSys European Conference on Computer Systems (EuroSys), EuroSys, 2011 (pp. 9–14). Oechslein, B., Schedel, J., Kleinöder, J., Bauer, L., Henkel, J., Lohmann, D., et al. (2011). OctoPOS: A parallel operating system for invasive computing. In R. McIlroy, J. Sventek, T. Harris, & T. Roscoe (Eds.), Proceedings of the International Workshop on Systems for Future Multi-Core Architectures (SFMA). USB Proceedings of Sixth International ACM/EuroSys European Conference on Computer Systems (EuroSys), EuroSys, 2011 (pp. 9–14).
[Rao85]
Zurück zum Zitat Rao, S. K. (1985). Regular Iterative Algorithms and Their Implementations on Processor Arrays. PhD thesis, Stanford University. Rao, S. K. (1985). Regular Iterative Algorithms and Their Implementations on Processor Arrays. PhD thesis, Stanford University.
[Rau94]
Zurück zum Zitat Rau, B. R. (1994). Iterative modulo scheduling: An algorithm for software pipelining loops. In Proceedings of the 27th Annual International Symposium on Microarchitecture (MICRO), San Jose, CA, USA, November 1994 (pp. 63–74). Rau, B. R. (1994). Iterative modulo scheduling: An algorithm for software pipelining loops. In Proceedings of the 27th Annual International Symposium on Microarchitecture (MICRO), San Jose, CA, USA, November 1994 (pp. 63–74).
[RWZ88]
Zurück zum Zitat Rosen, B. K., Wegman, M. N., & Zadeck, F. K. (1988). Global value numbers and redundant computations. In Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL’88, New York, NY, USA (pp. 12–27). Rosen, B. K., Wegman, M. N., & Zadeck, F. K. (1988). Global value numbers and redundant computations. In Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL’88, New York, NY, USA (pp. 12–27).
[SHTT14]
Zurück zum Zitat Schmid, M., Hannig, F., Tanase, A., & Teich, J. (2014). High-level synthesis revised – Generation of FPGA accelerators from a domain-specific language using the polyhedral model. In Parallel Computing: Accelerating Computational Science and Engineering (CSE), Advances in Parallel Computing (Vol. 25, pp. 497–506). Amsterdam, The Netherlands: IOS Press. Schmid, M., Hannig, F., Tanase, A., & Teich, J. (2014). High-level synthesis revised – Generation of FPGA accelerators from a domain-specific language using the polyhedral model. In Parallel Computing: Accelerating Computational Science and Engineering (CSE), Advances in Parallel Computing (Vol. 25, pp. 497–506). Amsterdam, The Netherlands: IOS Press.
[STB.
Zurück zum Zitat Schmid, M., Tanase, A., Bhadouria, V. S., Hannig, F., Teich, J., & Ghoshal, D. (2014). Domain-specific augmentations for high-level synthesis. In Proceedings of the 25th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP) (pp. 173–177). New York: IEEE. Schmid, M., Tanase, A., Bhadouria, V. S., Hannig, F., Teich, J., & Ghoshal, D. (2014). Domain-specific augmentations for high-level synthesis. In Proceedings of the 25th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP) (pp. 173–177). New York: IEEE.
[STHT13a]
Zurück zum Zitat Sousa, E. R., Tanase, A., Hannig, F., & Teich, J. (2013). A prototype of an adaptive computer vision algorithm on MPSoC architecture. In Proceedings of the Conference on Design and Architectures for Signal and Image Processing (DASIP), October 2013 (pp. 361–362). ECSI Media. Sousa, E. R., Tanase, A., Hannig, F., & Teich, J. (2013). A prototype of an adaptive computer vision algorithm on MPSoC architecture. In Proceedings of the Conference on Design and Architectures for Signal and Image Processing (DASIP), October 2013 (pp. 361–362). ECSI Media.
[STHT13b]
Zurück zum Zitat Sousa, E. R., Tanase, A., Hannig, F., & Teich, J. (2013). Accuracy and performance analysis of Harris corner computation on tightly-coupled processor arrays. In Proceedings of the Conference on Design and Architectures for Signal and Image Processing (DASIP) (pp. 88–95). New York: IEEE. Sousa, E. R., Tanase, A., Hannig, F., & Teich, J. (2013). Accuracy and performance analysis of Harris corner computation on tightly-coupled processor arrays. In Proceedings of the Conference on Design and Architectures for Signal and Image Processing (DASIP) (pp. 88–95). New York: IEEE.
[STL.
Zurück zum Zitat Sousa, E. R., Tanase, A., Lari, V., Hannig, F., Teich, J., Paul, J., et al. (2013). Acceleration of optical flow computations on tightly-coupled processor arrays. In Proceedings of the 25th Workshop on Parallel Systems and Algorithms (PARS), Mitteilungen – Gesellschaft für Informatik e. V., Parallel-Algorithmen und Rechnerstrukturen (Vol. 30, pp. 80–89). Gesellschaft für Informatik e. V. Sousa, E. R., Tanase, A., Lari, V., Hannig, F., Teich, J., Paul, J., et al. (2013). Acceleration of optical flow computations on tightly-coupled processor arrays. In Proceedings of the 25th Workshop on Parallel Systems and Algorithms (PARS), Mitteilungen – Gesellschaft für Informatik e. V., Parallel-Algorithmen und Rechnerstrukturen (Vol. 30, pp. 80–89). Gesellschaft für Informatik e. V.
[Tei93]
Zurück zum Zitat Teich, J. (1993). A compiler for application specific processor arrays. Reihe Elektrotechnik. Freiburg, Germany: Shaker. ISBN: 9783861117018. Teich, J. (1993). A compiler for application specific processor arrays. Reihe Elektrotechnik. Freiburg, Germany: Shaker. ISBN: 9783861117018.
[Tei08]
Zurück zum Zitat Teich, J. (2008). Invasive algorithms and architectures. Information Technology, 50(5), 300–310. Teich, J. (2008). Invasive algorithms and architectures. Information Technology, 50(5), 300–310.
[TGR.
Zurück zum Zitat Teich, J., Glaß, M., Roloff, S., Schröder-Preikschat, W., Snelting, G., Weichslgartner, A., et al. (2016). Language and compilation of parallel programs for *-predictable MPSoC execution using invasive computing. In 2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC) (pp. 313–320). Teich, J., Glaß, M., Roloff, S., Schröder-Preikschat, W., Snelting, G., Weichslgartner, A., et al. (2016). Language and compilation of parallel programs for *-predictable MPSoC execution using invasive computing. In 2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC) (pp. 313–320).
[THH.
Zurück zum Zitat Teich, J., Henkel, J., Herkersdorf, A., Schmitt-Landsiedel, D., Schröder-Preikschat, W., & Snelting, G. (2011). Multiprocessor System-on-Chip: Hardware Design and Tool Integration. Invasive computing: An overview (Chap. 11, pp. 241–268). Berlin: Springer. Teich, J., Henkel, J., Herkersdorf, A., Schmitt-Landsiedel, D., Schröder-Preikschat, W., & Snelting, G. (2011). Multiprocessor System-on-Chip: Hardware Design and Tool Integration. Invasive computing: An overview (Chap. 11, pp. 241–268). Berlin: Springer.
[Thi88]
Zurück zum Zitat Thiele, L. (1988). On the hierarchical design of vlsi processor arrays. In IEEE International Symposium on Circuits and Systems, 1988 (pp. 2517–2520). New York: IEEE.CrossRef Thiele, L. (1988). On the hierarchical design of vlsi processor arrays. In IEEE International Symposium on Circuits and Systems, 1988 (pp. 2517–2520). New York: IEEE.CrossRef
[Thi89]
Zurück zum Zitat Thiele, L. (1989). On the design of piecewise regular processor arrays. In IEEE International Symposium on Circuits and Systems (Vol. 3, pp. 2239–2242). Thiele, L. (1989). On the design of piecewise regular processor arrays. In IEEE International Symposium on Circuits and Systems (Vol. 3, pp. 2239–2242).
[TLHT13]
Zurück zum Zitat Tanase, A., Lari, V., Hannig, F., & Teich, J. (2012). Exploitation of quality/throughput tradeoffs in image processing through invasive computing. In Proceedings of the International Conference on Parallel Computing (ParCo) (pp. 53–62). Tanase, A., Lari, V., Hannig, F., & Teich, J. (2012). Exploitation of quality/throughput tradeoffs in image processing through invasive computing. In Proceedings of the International Conference on Parallel Computing (ParCo) (pp. 53–62).
[TR91]
Zurück zum Zitat Thiele, L., & Roychowdhury, V. P. (1991). Systematic design of local processor arrays for numerical algorithms. In Proceedings of the International Workshop on Algorithms and Parallel VLSI Architectures, Amsterdam, The Netherlands, 1991 (Vol. A: Tutorials, pp. 329–339). Thiele, L., & Roychowdhury, V. P. (1991). Systematic design of local processor arrays for numerical algorithms. In Proceedings of the International Workshop on Algorithms and Parallel VLSI Architectures, Amsterdam, The Netherlands, 1991 (Vol. A: Tutorials, pp. 329–339).
[TT91]
Zurück zum Zitat Teich, J., & Thiele, L. (1991). Control generation in the design of processor arrays. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology, 3(1), 77–92.CrossRef Teich, J., & Thiele, L. (1991). Control generation in the design of processor arrays. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology, 3(1), 77–92.CrossRef
[TT93]
Zurück zum Zitat Teich, J., & Thiele, L. (1993). Partitioning of processor arrays: A piecewise regular approach. Integration-The Vlsi Journal,14(3), 297–332. Teich, J., & Thiele, L. (1993). Partitioning of processor arrays: A piecewise regular approach. Integration-The Vlsi Journal,14(3), 297–332.
[TT96]
Zurück zum Zitat Teich, J., & Thiele, L. (1996). A new approach to solving resource-constrained scheduling problems based on a flow-model. Technical Report 17, TIK, Swiss Federal Institute of Technology (ETH) Zürich. Teich, J., & Thiele, L. (1996). A new approach to solving resource-constrained scheduling problems based on a flow-model. Technical Report 17, TIK, Swiss Federal Institute of Technology (ETH) Zürich.
[TT02]
Zurück zum Zitat Teich, J., & Thiele, L. (2002). Exact partitioning of affine dependence algorithms. In Embedded Processor Design Challenges. Lecture notes in computer science (Vol. 2268, pp. 135–151). Berlin, Germany: Springer. Teich, J., & Thiele, L. (2002). Exact partitioning of affine dependence algorithms. In Embedded Processor Design Challenges. Lecture notes in computer science (Vol. 2268, pp. 135–151). Berlin, Germany: Springer.
[TTZ96]
Zurück zum Zitat Teich, J., Thiele, L., & Zhang, L. (1996). Scheduling of partitioned regular algorithms on processor arrays with constrained resources. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors, ASAP’96 (p. 131). Washington, DC, USA: IEEE Computer Society. Teich, J., Thiele, L., & Zhang, L. (1996). Scheduling of partitioned regular algorithms on processor arrays with constrained resources. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors, ASAP’96 (p. 131). Washington, DC, USA: IEEE Computer Society.
[TTZ97a]
Zurück zum Zitat Teich, J., Thiele, L., & Zhang, L. (1997). Scheduling of partitioned regular algorithms on processor arrays with constrained resources. Journal of VLSI Signal Processing, 17(1), 5–20.CrossRef Teich, J., Thiele, L., & Zhang, L. (1997). Scheduling of partitioned regular algorithms on processor arrays with constrained resources. Journal of VLSI Signal Processing, 17(1), 5–20.CrossRef
[TTZ97b]
Zurück zum Zitat Teich, J., Thiele, L., & Zhang, L. (1997). Partitioning processor arrays under resource constraints. Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, 17, 5–20.CrossRefMATH Teich, J., Thiele, L., & Zhang, L. (1997). Partitioning processor arrays under resource constraints. Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, 17, 5–20.CrossRefMATH
[TWOSP12]
Zurück zum Zitat Teich, J., Weichslgartner, A., Oechslein, B., & Schröder-Preikschat, W. (2012). Invasive computing - concepts and overheads. In Proceeding of the 2012 Forum on Specification and Design Languages (pp. 217–224). Teich, J., Weichslgartner, A., Oechslein, B., & Schröder-Preikschat, W. (2012). Invasive computing - concepts and overheads. In Proceeding of the 2012 Forum on Specification and Design Languages (pp. 217–224).
[TWS.
Zurück zum Zitat Tanase, A., Witterauf, M., Sousa, É. R., Lari, V., Hannig, F., & Teich, J. (2016). LoopInvader: A Compiler for Tightly Coupled Processor Arrays. Tool Presentation at the University Booth at Design, Automation and Test in Europe (DATE), Dresden, Germany. Tanase, A., Witterauf, M., Sousa, É. R., Lari, V., Hannig, F., & Teich, J. (2016). LoopInvader: A Compiler for Tightly Coupled Processor Arrays. Tool Presentation at the University Booth at Design, Automation and Test in Europe (DATE), Dresden, Germany.
[Ver10]
Zurück zum Zitat Verdoolaege, S. (2010). ISL: An integer set library for the polyhedral model. In Proceedings of the Third International Congress Conference on Mathematical Software (ICMS), Kobe, Japan, 2010 (pp. 299–302). Berlin: Springer. Verdoolaege, S. (2010). ISL: An integer set library for the polyhedral model. In Proceedings of the Third International Congress Conference on Mathematical Software (ICMS), Kobe, Japan, 2010 (pp. 299–302). Berlin: Springer.
[VG12]
Zurück zum Zitat Verdoolaege, S., & Grosser, T. (2012). Polyhedral extraction tool. In Second International Workshop on Polyhedral Compilation Techniques (IMPACT’12), Paris, France. Verdoolaege, S., & Grosser, T. (2012). Polyhedral extraction tool. In Second International Workshop on Polyhedral Compilation Techniques (IMPACT’12), Paris, France.
[WBB.
Zurück zum Zitat Wildermann, S., Bader, M., Bauer, L., Damschen, M., Gabriel, D., Gerndt, M., et al. (2016). Invasive computing for timing-predictable stream processing on MPSoCs. Information Technology, 58(6), 267–280. Wildermann, S., Bader, M., Bauer, L., Damschen, M., Gabriel, D., Gerndt, M., et al. (2016). Invasive computing for timing-predictable stream processing on MPSoCs. Information Technology, 58(6), 267–280.
[Wol96]
Zurück zum Zitat Wolfe, M. J. (1996). High performance compilers for parallel computing. Boston, MA, USA: Addison-Wesley.MATH Wolfe, M. J. (1996). High performance compilers for parallel computing. Boston, MA, USA: Addison-Wesley.MATH
[Xue97]
[Xue00]
Zurück zum Zitat Xue, J. (2000). Loop tiling for parallelism. Norwell, MA, USA: Kluwer Academic Publishers.CrossRefMATH Xue, J. (2000). Loop tiling for parallelism. Norwell, MA, USA: Kluwer Academic Publishers.CrossRefMATH
Metadaten
Titel
Fundamentals and Compiler Framework
verfasst von
Alexandru-Petru Tanase
Frank Hannig
Jürgen Teich
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-73909-0_2

Neuer Inhalt