Skip to main content
Top
Published in: International Journal of Parallel Programming 1/2015

01-02-2015

Evaluation of Speculation in Out-of-Order Execution of Synchronous Dataflow Networks

Authors: Daniel Baudisch, Klaus Schneider

Published in: International Journal of Parallel Programming | Issue 1/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Dataflow process networks are a convenient formalism for implementing robust concurrent systems that has been successfully used for hardware and software systems in the past. However, the strictly stream-based execution limits the performance of dataflow process networks and requires to carefully balance the entire execution to avoid backpressure and idle nodes. Inspired by related techniques used in processor architectures, we already introduced in our previous work out-of-order execution of dataflow process networks. In this paper, we extend this improvement with speculation of input values for process nodes and allow otherwise idle processes to start computations with speculated input values. Clearly, outputs based on speculated inputs have to be held back until the speculation can be proved right, and have to be withdrawn in case the speculation was wrong. In contrast to related work, our approach has been implemented purely in software using standard hardware to address a broad field of multicore processors. Moreover, a software implementation allows us to dynamically adapt parameters to the needs of the application. This allows us to enforce a user-defined hit ratio of speculation that might even switch speculation off. After a detailed description of this approach and a discussion of possibilities of its implementation, we show its feasibility using a couple of benchmarks. In these benchmarks, the use of speculation achieved an average speedup of 1.2 compared to the non-speculative out-of-order execution.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Allen, J. (ed.): Software Synthesis from Dataflow Graphs. Kluwer, Dordrecht (1996) Allen, J. (ed.): Software Synthesis from Dataflow Graphs. Kluwer, Dordrecht (1996)
2.
go back to reference Arvind, Nikhil, R.: Executing a program on the MIT tagged-token dataflow architecture. IEEE Trans. Comput. (T-C) 39(3), 300–318 (1990)CrossRef Arvind, Nikhil, R.: Executing a program on the MIT tagged-token dataflow architecture. IEEE Trans. Comput. (T-C) 39(3), 300–318 (1990)CrossRef
3.
go back to reference Balakrishnan, S., Sohi, G.: Program demultiplexing: Data-flow based speculative parallelization of methods in sequential programs. In: International Symposium on Computer Architecture (ISCA), pp. 302–313. IEEE Computer Society, Boston, Massachusetts, USA (2006) Balakrishnan, S., Sohi, G.: Program demultiplexing: Data-flow based speculative parallelization of methods in sequential programs. In: International Symposium on Computer Architecture (ISCA), pp. 302–313. IEEE Computer Society, Boston, Massachusetts, USA (2006)
4.
go back to reference Baudisch, D., Brandt, J., Schneider, K.: Dependency-driven distribution of synchronous programs. In: Hinchey, M., Kleinjohann, B., Kleinjohann, L., Lindsay, P., Rammig, F., Wolf, M. (eds.) Distributed and Parallel Embedded Systems (DIPES), pp. 169–180. International Federation for Information Processing (IFIP), Brisbane, Queensland, Australia (2010) Baudisch, D., Brandt, J., Schneider, K.: Dependency-driven distribution of synchronous programs. In: Hinchey, M., Kleinjohann, B., Kleinjohann, L., Lindsay, P., Rammig, F., Wolf, M. (eds.) Distributed and Parallel Embedded Systems (DIPES), pp. 169–180. International Federation for Information Processing (IFIP), Brisbane, Queensland, Australia (2010)
5.
go back to reference Baudisch, D., Brandt, J., Schneider, K.: Translating synchronous systems to data-flow process networks. In: Yeo, S.S., Vaidya, B., Papadopoulos, G. (eds.) Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp. 354–361. IEEE Computer Society, Gwangju, Korea (2011) Baudisch, D., Brandt, J., Schneider, K.: Translating synchronous systems to data-flow process networks. In: Yeo, S.S., Vaidya, B., Papadopoulos, G. (eds.) Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp. 354–361. IEEE Computer Society, Gwangju, Korea (2011)
6.
go back to reference Baudisch, D., Brandt, J., Schneider, K.: Out-of-order execution of synchronous data-flow networks. In: McAllister, J., Bhattacharyya, S. (eds.) International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (ICSAMOS), pp. 168–175. IEEE Computer Society, Samos, Greece (2012) Baudisch, D., Brandt, J., Schneider, K.: Out-of-order execution of synchronous data-flow networks. In: McAllister, J., Bhattacharyya, S. (eds.) International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (ICSAMOS), pp. 168–175. IEEE Computer Society, Samos, Greece (2012)
7.
go back to reference Bhattacharyya, S., Brebner, G., Janneck, J., Eker, J., von Platen, C., Mattavelli, M., Raulet, M.: OpenDF-a dataflow toolset for reconfigurable hardware and multicore systems. ACM SIGARCH Comput. Archit. News 36(5), 29–35 (2009)CrossRef Bhattacharyya, S., Brebner, G., Janneck, J., Eker, J., von Platen, C., Mattavelli, M., Raulet, M.: OpenDF-a dataflow toolset for reconfigurable hardware and multicore systems. ACM SIGARCH Comput. Archit. News 36(5), 29–35 (2009)CrossRef
8.
go back to reference Bhattacharyya, S., Lee, E.: Scheduling synchronous dataflow graphs for efficient looping. J. VLSI Sig. Process. 6(3), 271–288 (1992)CrossRef Bhattacharyya, S., Lee, E.: Scheduling synchronous dataflow graphs for efficient looping. J. VLSI Sig. Process. 6(3), 271–288 (1992)CrossRef
9.
go back to reference Bhattacharyya, S., Lee, E.: Looped schedules for dataflow descriptions of multirate signal processing algorithms. Formal Methods Syst. Des. 5(3), 183–205 (1994)CrossRef Bhattacharyya, S., Lee, E.: Looped schedules for dataflow descriptions of multirate signal processing algorithms. Formal Methods Syst. Des. 5(3), 183–205 (1994)CrossRef
10.
go back to reference Böhm, A., Oldehoeft, R., Cann, D., Feo, J.: SISAL 2.0 Reference Manual. Technical Report CS-91-118, Computer Science Department of Colorado State University (1991) Böhm, A., Oldehoeft, R., Cann, D., Feo, J.: SISAL 2.0 Reference Manual. Technical Report CS-91-118, Computer Science Department of Colorado State University (1991)
11.
go back to reference Bilsen, G., Engels, M., Lauwereins, R., Peperstraete, J.: Cyclo-static dataflow. IEEE Trans. Sig. Process. 44(2), 397–408 (1996)CrossRef Bilsen, G., Engels, M., Lauwereins, R., Peperstraete, J.: Cyclo-static dataflow. IEEE Trans. Sig. Process. 44(2), 397–408 (1996)CrossRef
12.
go back to reference Bonfietti, A., Benini, L., Lombardi, M., Milano, M.: An efficient and complete approach for throughput-maximal SDF allocation and scheduling on multi-core platforms. Design, Automation and Test in Europe (DATE), pp. 897–902. EDA Consortium, Dresden, Germany (2010) Bonfietti, A., Benini, L., Lombardi, M., Milano, M.: An efficient and complete approach for throughput-maximal SDF allocation and scheduling on multi-core platforms. Design, Automation and Test in Europe (DATE), pp. 897–902. EDA Consortium, Dresden, Germany (2010)
13.
go back to reference Buck, J., Lee, E.: The token flow model. In: Bic, L., Gao, G., Gaudiot, J.L. (eds.) Advanced Topics in Dataflow Computing and Multithreading, pp. 267–290. IEEE Computer Society, Hamilton Island, Queensland, Australia (1995) Buck, J., Lee, E.: The token flow model. In: Bic, L., Gao, G., Gaudiot, J.L. (eds.) Advanced Topics in Dataflow Computing and Multithreading, pp. 267–290. IEEE Computer Society, Hamilton Island, Queensland, Australia (1995)
14.
go back to reference Cintra, M., Martínez, J., Torrellas, J.: Architectural support for scalable speculative parallelization in shared-memory multiprocessors. International Symposium on Computer Architecture (ISCA), pp. 13–24. ACM, Vancouver, British Columbia, Canada (2000) Cintra, M., Martínez, J., Torrellas, J.: Architectural support for scalable speculative parallelization in shared-memory multiprocessors. International Symposium on Computer Architecture (ISCA), pp. 13–24. ACM, Vancouver, British Columbia, Canada (2000)
15.
go back to reference Colohan, C., Ailamaki, A., Steffan, J., Mowry, T.: CMP support for large and dependent speculative threads. IEEE Trans. Parallel Distrib. Syst. 18(8), 1041–1054 (2007)CrossRef Colohan, C., Ailamaki, A., Steffan, J., Mowry, T.: CMP support for large and dependent speculative threads. IEEE Trans. Parallel Distrib. Syst. 18(8), 1041–1054 (2007)CrossRef
16.
go back to reference Colwell, R., Hall, W., Joshi, C., Papworth, D., Rodman, P., Tomes, J.: Architecture and implementation of a VLIW supercomputer. Supercomputing, pp. 910–919. IEEE Computer Society, New York, NY, USA (1990) Colwell, R., Hall, W., Joshi, C., Papworth, D., Rodman, P., Tomes, J.: Architecture and implementation of a VLIW supercomputer. Supercomputing, pp. 910–919. IEEE Computer Society, New York, NY, USA (1990)
17.
18.
go back to reference Dennis, J., Misunas, D.: A preliminary architecture for a basic data-flow processor. 25 Years of the International Symposia on Computer Architecture (ISCA), pp. 125–131. ACM, Barcelona, Spain (1998) Dennis, J., Misunas, D.: A preliminary architecture for a basic data-flow processor. 25 Years of the International Symposia on Computer Architecture (ISCA), pp. 125–131. ACM, Barcelona, Spain (1998)
19.
go back to reference Dennis, J., Misunas, D., Thiagarajan, P.: Data-flow computer architecture. Technical Report CSG-MEMO 104, MIT Lab for Computer Science, Cambridge, Massachusetts, USA (1974) Dennis, J., Misunas, D., Thiagarajan, P.: Data-flow computer architecture. Technical Report CSG-MEMO 104, MIT Lab for Computer Science, Cambridge, Massachusetts, USA (1974)
20.
go back to reference Engels, M., Bilsen, G., Lauwereins, R., Peperstraete, J.: Cyclo-static dataflow: Model and implementation. In: Asilomar Conference on Signals, Systems and Computers (ACSSC). IEEE Computer Society, Pacific Grove, California, USA (1994) Engels, M., Bilsen, G., Lauwereins, R., Peperstraete, J.: Cyclo-static dataflow: Model and implementation. In: Asilomar Conference on Signals, Systems and Computers (ACSSC). IEEE Computer Society, Pacific Grove, California, USA (1994)
21.
go back to reference Fisher, J., Faraboschi, P., Young, C.: Embedded Computing: A VLIW Approach to Architecture. Compilers and Tools. Morgan Kaufmann, San Francisco (2005) Fisher, J., Faraboschi, P., Young, C.: Embedded Computing: A VLIW Approach to Architecture. Compilers and Tools. Morgan Kaufmann, San Francisco (2005)
22.
go back to reference Gao, G., Govindarajan, R., Panangaden, P.: Well-behaved programs for DSP computation. International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 561–564. IEEE Computer Society, San Francisco, California, USA (1992) Gao, G., Govindarajan, R., Panangaden, P.: Well-behaved programs for DSP computation. International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 561–564. IEEE Computer Society, San Francisco, California, USA (1992)
23.
go back to reference Genin, D., De Moortel, J., Desmet, D., van de Velde, E.: System design, optimization, and intelligent code generation for standard digital signal processors. International Symposium on Circuits and Systems (ISCAS), pp. 565–569. IEEE Computer Society, Portland, Oregon, USA (1989) Genin, D., De Moortel, J., Desmet, D., van de Velde, E.: System design, optimization, and intelligent code generation for standard digital signal processors. International Symposium on Circuits and Systems (ISCAS), pp. 565–569. IEEE Computer Society, Portland, Oregon, USA (1989)
24.
go back to reference Hammond, L., Willey, M., Olukotun, K.: Data speculation support for a chip multiprocessor. In: Bhandarkar, D., Agarwal, A. (eds.) Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 58–69. ACM, San Jose, CA, USA (1998) Hammond, L., Willey, M., Olukotun, K.: Data speculation support for a chip multiprocessor. In: Bhandarkar, D., Agarwal, A. (eds.) Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 58–69. ACM, San Jose, CA, USA (1998)
25.
go back to reference Janneck, J., Miller, I., Parlour, D., Roquier, G., Wipliez, M., Raulet, M.: Synthesizing hardware from dataflow programs: An MPEG-4 simple profile decoder case study. Signal Processing Systems (SiPS), pp. 287–292. IEEE Computer Society, Washington, District of Columbia, USA (2008) Janneck, J., Miller, I., Parlour, D., Roquier, G., Wipliez, M., Raulet, M.: Synthesizing hardware from dataflow programs: An MPEG-4 simple profile decoder case study. Signal Processing Systems (SiPS), pp. 287–292. IEEE Computer Society, Washington, District of Columbia, USA (2008)
26.
go back to reference Johnson, T., Eigenmann, R., Vijaykumar, T.: Min cut program decomposition for thread level speculation. In: Chambers, C. (ed.) Programming Language Design and Implementation (PLDI), pp. 59–70. ACM, Washington, DC, USA (2004) Johnson, T., Eigenmann, R., Vijaykumar, T.: Min cut program decomposition for thread level speculation. In: Chambers, C. (ed.) Programming Language Design and Implementation (PLDI), pp. 59–70. ACM, Washington, DC, USA (2004)
27.
go back to reference Johnston, W., Hanna, J., Millar, R.: Advances in dataflow programming languages. ACM Comput. Surv. (CSUR) 36(1), 1–34 (2004)CrossRef Johnston, W., Hanna, J., Millar, R.: Advances in dataflow programming languages. ACM Comput. Surv. (CSUR) 36(1), 1–34 (2004)CrossRef
28.
go back to reference Kahn, G.: The semantics of a simple language for parallel programming. In: Rosenfeld, J. (ed.) Information Processing, pp. 471–475. North-Holland, Stockholm, Sweden (1974) Kahn, G.: The semantics of a simple language for parallel programming. In: Rosenfeld, J. (ed.) Information Processing, pp. 471–475. North-Holland, Stockholm, Sweden (1974)
29.
go back to reference Kazi, I., Lilja, D.: Coarse-grained thread pipelining—a speculative parallel execution model for shared-memory multiprocessors. IEEE Trans. Parallel Distrib. Syst 12(9), 952–966 (2001)CrossRef Kazi, I., Lilja, D.: Coarse-grained thread pipelining—a speculative parallel execution model for shared-memory multiprocessors. IEEE Trans. Parallel Distrib. Syst 12(9), 952–966 (2001)CrossRef
30.
go back to reference Le Mentec, F., Gautier, T., Danjean, V.: The X-Kaapi’s application programming interface. part I: Data flow programming. Technical Report RT-0418, Institut National de Recherche en Informatique et en Automatique (INRIA) (2011) Le Mentec, F., Gautier, T., Danjean, V.: The X-Kaapi’s application programming interface. part I: Data flow programming. Technical Report RT-0418, Institut National de Recherche en Informatique et en Automatique (INRIA) (2011)
31.
go back to reference Lee, B., Hurson, A.: Dataflow architectures and multithreading. IEEE. Comput. 27(8), 27–39 (1994)CrossRef Lee, B., Hurson, A.: Dataflow architectures and multithreading. IEEE. Comput. 27(8), 27–39 (1994)CrossRef
32.
go back to reference Lee, E.: Consistency in dataflow graphs. IEEE Trans. Parallel Distrib. Syst. 2(2) (1991) Lee, E.: Consistency in dataflow graphs. IEEE Trans. Parallel Distrib. Syst. 2(2) (1991)
33.
34.
35.
go back to reference Lee, E., Ha, S.: Scheduling strategies for multiprocessor real-time DSP. In: Global Telecommunications Conference (GLOBECOM), pp. 1279–1283. IEEE Computer Society (1989) Lee, E., Ha, S.: Scheduling strategies for multiprocessor real-time DSP. In: Global Telecommunications Conference (GLOBECOM), pp. 1279–1283. IEEE Computer Society (1989)
36.
go back to reference Lee, E., Messerschmitt, D.: Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans. Comput. 36(1), 24–35 (1987)CrossRefMATH Lee, E., Messerschmitt, D.: Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans. Comput. 36(1), 24–35 (1987)CrossRefMATH
37.
go back to reference Lee, E., Messerschmitt, D.: Synchronous data flow. Proc. IEEE 75(9), 1235–1245 (1987)CrossRef Lee, E., Messerschmitt, D.: Synchronous data flow. Proc. IEEE 75(9), 1235–1245 (1987)CrossRef
38.
go back to reference Lee, E., Parks, T.: Dataflow process networks. Proc. IEEE 83(5), 773–801 (1995)CrossRef Lee, E., Parks, T.: Dataflow process networks. Proc. IEEE 83(5), 773–801 (1995)CrossRef
39.
go back to reference Lilja, D.: Reducing the branch penalty in pipelined processors. IEEE Comput. 21(7), 47–55 (1988)CrossRef Lilja, D.: Reducing the branch penalty in pipelined processors. IEEE Comput. 21(7), 47–55 (1988)CrossRef
40.
go back to reference Lipasti, M., Shen, J.: Exceeding the dataflow limit via value prediction. Microarchitecture (MICRO), pp. 226–237. IEEE Computer Society, Paris, France (1996) Lipasti, M., Shen, J.: Exceeding the dataflow limit via value prediction. Microarchitecture (MICRO), pp. 226–237. IEEE Computer Society, Paris, France (1996)
41.
go back to reference Madriles, C., López, P., Codina, J., Gibert, E., Latorre, F., Martínez, A., Martínez, R., González, A.: Boosting single-thread performance in multi-core systems through fine-grain multi-threading. In: Keckler, S., Barroso, L. (eds.) International Symposium on Computer Architecture (ISCA), pp. 474–483. ACM, Austin, TX, USA (2009) Madriles, C., López, P., Codina, J., Gibert, E., Latorre, F., Martínez, A., Martínez, R., González, A.: Boosting single-thread performance in multi-core systems through fine-grain multi-threading. In: Keckler, S., Barroso, L. (eds.) International Symposium on Computer Architecture (ISCA), pp. 474–483. ACM, Austin, TX, USA (2009)
42.
go back to reference Marcuello, P., González, A.: Exploiting speculative thread-level parallelism on a SMT processor. In: Sloot, P., Bubak, M., Hoekstra, A., Hertzberger, B. (eds.) International Conference on High-Performance Computing and Networking (HPCN), LNCS, vol. 1593, pp. 754–763. Springer, Amsterdam, The Netherlands (1999)CrossRef Marcuello, P., González, A.: Exploiting speculative thread-level parallelism on a SMT processor. In: Sloot, P., Bubak, M., Hoekstra, A., Hertzberger, B. (eds.) International Conference on High-Performance Computing and Networking (HPCN), LNCS, vol. 1593, pp. 754–763. Springer, Amsterdam, The Netherlands (1999)CrossRef
43.
go back to reference Marcuello, P., González, A., Tubella, J.: Thread partitioning and value prediction for exploiting speculative thread-level parallelism. IEEE Trans. Comput. 53(2), 114–125 (2004)CrossRef Marcuello, P., González, A., Tubella, J.: Thread partitioning and value prediction for exploiting speculative thread-level parallelism. IEEE Trans. Comput. 53(2), 114–125 (2004)CrossRef
44.
go back to reference McGraw, J.: The VAL language: description and analysis. ACM Trans. Program. Lang. Syst. 4(1), 44–82 (1982)CrossRefMATH McGraw, J.: The VAL language: description and analysis. ACM Trans. Program. Lang. Syst. 4(1), 44–82 (1982)CrossRefMATH
46.
go back to reference Moshovos, A., Breach, S., Vijaykumar, T., Sohi, G.: Dynamic speculation and synchronization of data dependences. In: International Symposium on Computer Architecture (ISCA), pp. 181–193 (1997) Moshovos, A., Breach, S., Vijaykumar, T., Sohi, G.: Dynamic speculation and synchronization of data dependences. In: International Symposium on Computer Architecture (ISCA), pp. 181–193 (1997)
47.
go back to reference Murthy, P., Bhattacharyya, S., Lee, E.: Joint minimization of code and data for synchronous dataflow programs. Formal Methods Syst. Des. 11(1), 41–70 (1997)CrossRef Murthy, P., Bhattacharyya, S., Lee, E.: Joint minimization of code and data for synchronous dataflow programs. Formal Methods Syst. Des. 11(1), 41–70 (1997)CrossRef
48.
go back to reference Nikhil, R.: Dataflow Programming Languages. Technical Report CSG-MEMO 333, Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts, USA (1991) Nikhil, R.: Dataflow Programming Languages. Technical Report CSG-MEMO 333, Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts, USA (1991)
49.
go back to reference Pajuelo, A., González, A., Valero, M.: Speculative execution for hiding memory latency. In: MEmory Performance: DEaling with Applications, Systems and Architecture (MEDEA), pp. 49–56. ACM, Antibes Juan-les-Pins, France (2004) Pajuelo, A., González, A., Valero, M.: Speculative execution for hiding memory latency. In: MEmory Performance: DEaling with Applications, Systems and Architecture (MEDEA), pp. 49–56. ACM, Antibes Juan-les-Pins, France (2004)
50.
go back to reference Parks, T.: Bounded Scheduling of Process Networks. Ph.D. Thesis, Princeton University (1995) Parks, T.: Bounded Scheduling of Process Networks. Ph.D. Thesis, Princeton University (1995)
51.
go back to reference Powell, D., Lee, E., Newmann, W.: Direct synthesis of optimized DSP assembly from signal flow diagrams. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 553–556. IEEE Computer Society, San Francisco, California, USA (1992) Powell, D., Lee, E., Newmann, W.: Direct synthesis of optimized DSP assembly from signal flow diagrams. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 553–556. IEEE Computer Society, San Francisco, California, USA (1992)
52.
go back to reference Pérez, J., Badia, R., Labarta, J.: A dependency-aware task-based programming environment for multi-core architectures. In: International Conference on Cluster Computing (CLUSTER), pp. 142–151. IEEE Computer Society, Tsukuba, Japan (2008) Pérez, J., Badia, R., Labarta, J.: A dependency-aware task-based programming environment for multi-core architectures. In: International Conference on Cluster Computing (CLUSTER), pp. 142–151. IEEE Computer Society, Tsukuba, Japan (2008)
53.
54.
go back to reference Renau, J., Strauss, K., Ceze, L., Liu, W., Sarangi, S., Tuck, J., Torrellas, J.: Thread-level speculation on a CMP can be energy efficient. International Conference on Supercomputing (ICS), pp. 219–228. ACM, Cambridge, Massachusetts, USA (2005) Renau, J., Strauss, K., Ceze, L., Liu, W., Sarangi, S., Tuck, J., Torrellas, J.: Thread-level speculation on a CMP can be energy efficient. International Conference on Supercomputing (ICS), pp. 219–228. ACM, Cambridge, Massachusetts, USA (2005)
55.
go back to reference Richardson, S.: Caching function results: Faster arithmetic by avoiding unnecessary computation. Technical Report SMLI TR-92-1, Sun Microsystems Inc., Mountain View, CA, USA (1992) Richardson, S.: Caching function results: Faster arithmetic by avoiding unnecessary computation. Technical Report SMLI TR-92-1, Sun Microsystems Inc., Mountain View, CA, USA (1992)
56.
go back to reference Roquier, G., Lucarz, C., Mattavelli, M., Wipliez, M., Raulet, M., Janneck, J., Miller, I., Parlour, D.: An integrated environment for HW/SW co-design based on a CAL specification and HW/SW code generators. In: International Symposium on Circuits and Systems (ISCAS), pp. 799–799. IEEE Computer Society, Taipei, Taiwan (2009) Roquier, G., Lucarz, C., Mattavelli, M., Wipliez, M., Raulet, M., Janneck, J., Miller, I., Parlour, D.: An integrated environment for HW/SW co-design based on a CAL specification and HW/SW code generators. In: International Symposium on Circuits and Systems (ISCAS), pp. 799–799. IEEE Computer Society, Taipei, Taiwan (2009)
57.
58.
go back to reference Schneider, K.: The synchronous programming language Quartz. Internal Report 375, Department of Computer Science, University of Kaiserslautern, Kaiserslautern, Germany (2009) Schneider, K.: The synchronous programming language Quartz. Internal Report 375, Department of Computer Science, University of Kaiserslautern, Kaiserslautern, Germany (2009)
60.
go back to reference Stulova, A., Leupers, R., Ascheid, G.: Throughput driven transformations of synchronous data flows for mapping to heterogeneous MPSoCs. In: McAllister, J., Bhattacharyya, S. (eds.) International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (ICSAMOS), pp. 144–151. IEEE Computer Society, Samos, Greece (2012) Stulova, A., Leupers, R., Ascheid, G.: Throughput driven transformations of synchronous data flows for mapping to heterogeneous MPSoCs. In: McAllister, J., Bhattacharyya, S. (eds.) International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (ICSAMOS), pp. 144–151. IEEE Computer Society, Samos, Greece (2012)
61.
go back to reference Tejedor, E., Farreras, M., Grove, D., Almasi, G., Labarta, J.: ClusterSs: a task-based programming model for clusters. In: High Performance Distributed Computing (HPDC), pp. 267–268. ACM, San Jose, CA, USA (2011) Tejedor, E., Farreras, M., Grove, D., Almasi, G., Labarta, J.: ClusterSs: a task-based programming model for clusters. In: High Performance Distributed Computing (HPDC), pp. 267–268. ACM, San Jose, CA, USA (2011)
62.
go back to reference Tomasulo, R.: An efficient algorithm for exploiting multiple arithmetic units. IBM J. Res. Dev. 11(1), 25–33 (1967)CrossRefMATH Tomasulo, R.: An efficient algorithm for exploiting multiple arithmetic units. IBM J. Res. Dev. 11(1), 25–33 (1967)CrossRefMATH
63.
go back to reference Vachharajani, N., Rangan, R., Raman, E., Bridges, M., Ottoni, G., August, D.: Speculative decoupled software pipelining. Parallel Architectures and Compilation Techniques (PACT), pp. 49–59. IEEE Computer Society, Brasov, Romania (2007) Vachharajani, N., Rangan, R., Raman, E., Bridges, M., Ottoni, G., August, D.: Speculative decoupled software pipelining. Parallel Architectures and Compilation Techniques (PACT), pp. 49–59. IEEE Computer Society, Brasov, Romania (2007)
64.
go back to reference Zilles, C., Sohi, G.: Master/slave speculative parallelization. Microarchitecture (MICRO), pp. 85–96. IEEE Computer Society, Istanbul, Turkey (2002) Zilles, C., Sohi, G.: Master/slave speculative parallelization. Microarchitecture (MICRO), pp. 85–96. IEEE Computer Society, Istanbul, Turkey (2002)
Metadata
Title
Evaluation of Speculation in Out-of-Order Execution of Synchronous Dataflow Networks
Authors
Daniel Baudisch
Klaus Schneider
Publication date
01-02-2015
Publisher
Springer US
Published in
International Journal of Parallel Programming / Issue 1/2015
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-013-0277-2

Other articles of this Issue 1/2015

International Journal of Parallel Programming 1/2015 Go to the issue

Premium Partner