Top

International Journal of Parallel Programming

Published in:

01-02-2015

Evaluation of Speculation in Out-of-Order Execution of Synchronous Dataflow Networks

Authors: Daniel Baudisch, Klaus Schneider

Published in: International Journal of Parallel Programming | Issue 1/2015

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Dataflow process networks are a convenient formalism for implementing robust concurrent systems that has been successfully used for hardware and software systems in the past. However, the strictly stream-based execution limits the performance of dataflow process networks and requires to carefully balance the entire execution to avoid backpressure and idle nodes. Inspired by related techniques used in processor architectures, we already introduced in our previous work out-of-order execution of dataflow process networks. In this paper, we extend this improvement with speculation of input values for process nodes and allow otherwise idle processes to start computations with speculated input values. Clearly, outputs based on speculated inputs have to be held back until the speculation can be proved right, and have to be withdrawn in case the speculation was wrong. In contrast to related work, our approach has been implemented purely in software using standard hardware to address a broad field of multicore processors. Moreover, a software implementation allows us to dynamically adapt parameters to the needs of the application. This allows us to enforce a user-defined hit ratio of speculation that might even switch speculation off. After a detailed description of this approach and a discussion of possibilities of its implementation, we show its feasibility using a couple of benchmarks. In these benchmarks, the use of speculation achieved an average speedup of 1.2 compared to the non-speculative out-of-order execution.

previous article Revisiting Cache Resizing

next article BADCO: Behavioral Application-Dependent Superscalar Core Models

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

http://www.openmp.org/mp-documents/spec30.pdf.

http://threadingbuildingblocks.org.

http://www.averest.org.

Allen, J. (ed.): Software Synthesis from Dataflow Graphs. Kluwer, Dordrecht (1996)

Arvind, Nikhil, R.: Executing a program on the MIT tagged-token dataflow architecture. IEEE Trans. Comput. (T-C) 39(3), 300–318 (1990)CrossRef

Balakrishnan, S., Sohi, G.: Program demultiplexing: Data-flow based speculative parallelization of methods in sequential programs. In: International Symposium on Computer Architecture (ISCA), pp. 302–313. IEEE Computer Society, Boston, Massachusetts, USA (2006)

Baudisch, D., Brandt, J., Schneider, K.: Dependency-driven distribution of synchronous programs. In: Hinchey, M., Kleinjohann, B., Kleinjohann, L., Lindsay, P., Rammig, F., Wolf, M. (eds.) Distributed and Parallel Embedded Systems (DIPES), pp. 169–180. International Federation for Information Processing (IFIP), Brisbane, Queensland, Australia (2010)

Baudisch, D., Brandt, J., Schneider, K.: Translating synchronous systems to data-flow process networks. In: Yeo, S.S., Vaidya, B., Papadopoulos, G. (eds.) Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp. 354–361. IEEE Computer Society, Gwangju, Korea (2011)

Baudisch, D., Brandt, J., Schneider, K.: Out-of-order execution of synchronous data-flow networks. In: McAllister, J., Bhattacharyya, S. (eds.) International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (ICSAMOS), pp. 168–175. IEEE Computer Society, Samos, Greece (2012)

Bhattacharyya, S., Brebner, G., Janneck, J., Eker, J., von Platen, C., Mattavelli, M., Raulet, M.: OpenDF-a dataflow toolset for reconfigurable hardware and multicore systems. ACM SIGARCH Comput. Archit. News 36(5), 29–35 (2009)CrossRef

Bhattacharyya, S., Lee, E.: Scheduling synchronous dataflow graphs for efficient looping. J. VLSI Sig. Process. 6(3), 271–288 (1992)CrossRef

Bhattacharyya, S., Lee, E.: Looped schedules for dataflow descriptions of multirate signal processing algorithms. Formal Methods Syst. Des. 5(3), 183–205 (1994)CrossRef

10.

Böhm, A., Oldehoeft, R., Cann, D., Feo, J.: SISAL 2.0 Reference Manual. Technical Report CS-91-118, Computer Science Department of Colorado State University (1991)

11.

Bilsen, G., Engels, M., Lauwereins, R., Peperstraete, J.: Cyclo-static dataflow. IEEE Trans. Sig. Process. 44(2), 397–408 (1996)CrossRef

12.

Bonfietti, A., Benini, L., Lombardi, M., Milano, M.: An efficient and complete approach for throughput-maximal SDF allocation and scheduling on multi-core platforms. Design, Automation and Test in Europe (DATE), pp. 897–902. EDA Consortium, Dresden, Germany (2010)

13.

Buck, J., Lee, E.: The token flow model. In: Bic, L., Gao, G., Gaudiot, J.L. (eds.) Advanced Topics in Dataflow Computing and Multithreading, pp. 267–290. IEEE Computer Society, Hamilton Island, Queensland, Australia (1995)

14.

Cintra, M., Martínez, J., Torrellas, J.: Architectural support for scalable speculative parallelization in shared-memory multiprocessors. International Symposium on Computer Architecture (ISCA), pp. 13–24. ACM, Vancouver, British Columbia, Canada (2000)

15.

Colohan, C., Ailamaki, A., Steffan, J., Mowry, T.: CMP support for large and dependent speculative threads. IEEE Trans. Parallel Distrib. Syst. 18(8), 1041–1054 (2007)CrossRef

16.

Colwell, R., Hall, W., Joshi, C., Papworth, D., Rodman, P., Tomes, J.: Architecture and implementation of a VLIW supercomputer. Supercomputing, pp. 910–919. IEEE Computer Society, New York, NY, USA (1990)

17.

Dennis, J.: Data flow supercomputers. IEEE. Comput. 13(11), 48–56 (1980)CrossRef

18.

Dennis, J., Misunas, D.: A preliminary architecture for a basic data-flow processor. 25 Years of the International Symposia on Computer Architecture (ISCA), pp. 125–131. ACM, Barcelona, Spain (1998)

19.

Dennis, J., Misunas, D., Thiagarajan, P.: Data-flow computer architecture. Technical Report CSG-MEMO 104, MIT Lab for Computer Science, Cambridge, Massachusetts, USA (1974)

20.

Engels, M., Bilsen, G., Lauwereins, R., Peperstraete, J.: Cyclo-static dataflow: Model and implementation. In: Asilomar Conference on Signals, Systems and Computers (ACSSC). IEEE Computer Society, Pacific Grove, California, USA (1994)

21.

Fisher, J., Faraboschi, P., Young, C.: Embedded Computing: A VLIW Approach to Architecture. Compilers and Tools. Morgan Kaufmann, San Francisco (2005)

22.

Gao, G., Govindarajan, R., Panangaden, P.: Well-behaved programs for DSP computation. International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 561–564. IEEE Computer Society, San Francisco, California, USA (1992)

23.

Genin, D., De Moortel, J., Desmet, D., van de Velde, E.: System design, optimization, and intelligent code generation for standard digital signal processors. International Symposium on Circuits and Systems (ISCAS), pp. 565–569. IEEE Computer Society, Portland, Oregon, USA (1989)

24.

Hammond, L., Willey, M., Olukotun, K.: Data speculation support for a chip multiprocessor. In: Bhandarkar, D., Agarwal, A. (eds.) Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 58–69. ACM, San Jose, CA, USA (1998)

25.

Janneck, J., Miller, I., Parlour, D., Roquier, G., Wipliez, M., Raulet, M.: Synthesizing hardware from dataflow programs: An MPEG-4 simple profile decoder case study. Signal Processing Systems (SiPS), pp. 287–292. IEEE Computer Society, Washington, District of Columbia, USA (2008)

26.

Johnson, T., Eigenmann, R., Vijaykumar, T.: Min cut program decomposition for thread level speculation. In: Chambers, C. (ed.) Programming Language Design and Implementation (PLDI), pp. 59–70. ACM, Washington, DC, USA (2004)

27.

Johnston, W., Hanna, J., Millar, R.: Advances in dataflow programming languages. ACM Comput. Surv. (CSUR) 36(1), 1–34 (2004)CrossRef

28.

Kahn, G.: The semantics of a simple language for parallel programming. In: Rosenfeld, J. (ed.) Information Processing, pp. 471–475. North-Holland, Stockholm, Sweden (1974)

29.

Kazi, I., Lilja, D.: Coarse-grained thread pipelining—a speculative parallel execution model for shared-memory multiprocessors. IEEE Trans. Parallel Distrib. Syst 12(9), 952–966 (2001)CrossRef

30.

Le Mentec, F., Gautier, T., Danjean, V.: The X-Kaapi’s application programming interface. part I: Data flow programming. Technical Report RT-0418, Institut National de Recherche en Informatique et en Automatique (INRIA) (2011)

31.

Lee, B., Hurson, A.: Dataflow architectures and multithreading. IEEE. Comput. 27(8), 27–39 (1994)CrossRef

32.

Lee, E.: Consistency in dataflow graphs. IEEE Trans. Parallel Distrib. Syst. 2(2) (1991)

33.

Lee, E.: The problem with threads. IEEE. Comput. 39(5), 33–42 (2006)CrossRef

34.

Lee, E.: Computing needs time. Commun. ACM (CACM) 52(5), 70–79 (2009)CrossRef

35.

Lee, E., Ha, S.: Scheduling strategies for multiprocessor real-time DSP. In: Global Telecommunications Conference (GLOBECOM), pp. 1279–1283. IEEE Computer Society (1989)

36.

Lee, E., Messerschmitt, D.: Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans. Comput. 36(1), 24–35 (1987)CrossRefMATH

37.

Lee, E., Messerschmitt, D.: Synchronous data flow. Proc. IEEE 75(9), 1235–1245 (1987)CrossRef

38.

Lee, E., Parks, T.: Dataflow process networks. Proc. IEEE 83(5), 773–801 (1995)CrossRef

39.

Lilja, D.: Reducing the branch penalty in pipelined processors. IEEE Comput. 21(7), 47–55 (1988)CrossRef

40.

Lipasti, M., Shen, J.: Exceeding the dataflow limit via value prediction. Microarchitecture (MICRO), pp. 226–237. IEEE Computer Society, Paris, France (1996)

41.

Madriles, C., López, P., Codina, J., Gibert, E., Latorre, F., Martínez, A., Martínez, R., González, A.: Boosting single-thread performance in multi-core systems through fine-grain multi-threading. In: Keckler, S., Barroso, L. (eds.) International Symposium on Computer Architecture (ISCA), pp. 474–483. ACM, Austin, TX, USA (2009)

42.

Marcuello, P., González, A.: Exploiting speculative thread-level parallelism on a SMT processor. In: Sloot, P., Bubak, M., Hoekstra, A., Hertzberger, B. (eds.) International Conference on High-Performance Computing and Networking (HPCN), LNCS, vol. 1593, pp. 754–763. Springer, Amsterdam, The Netherlands (1999)CrossRef

43.

Marcuello, P., González, A., Tubella, J.: Thread partitioning and value prediction for exploiting speculative thread-level parallelism. IEEE Trans. Comput. 53(2), 114–125 (2004)CrossRef

44.

McGraw, J.: The VAL language: description and analysis. ACM Trans. Program. Lang. Syst. 4(1), 44–82 (1982)CrossRefMATH

45.

McKenney, P.: Memory barriers: A hardware view for software hackers. http://www.rdrop.com/users/paulmck (2010)

46.

Moshovos, A., Breach, S., Vijaykumar, T., Sohi, G.: Dynamic speculation and synchronization of data dependences. In: International Symposium on Computer Architecture (ISCA), pp. 181–193 (1997)

47.

Murthy, P., Bhattacharyya, S., Lee, E.: Joint minimization of code and data for synchronous dataflow programs. Formal Methods Syst. Des. 11(1), 41–70 (1997)CrossRef

48.

Nikhil, R.: Dataflow Programming Languages. Technical Report CSG-MEMO 333, Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts, USA (1991)

49.

Pajuelo, A., González, A., Valero, M.: Speculative execution for hiding memory latency. In: MEmory Performance: DEaling with Applications, Systems and Architecture (MEDEA), pp. 49–56. ACM, Antibes Juan-les-Pins, France (2004)

50.

Parks, T.: Bounded Scheduling of Process Networks. Ph.D. Thesis, Princeton University (1995)

51.

Powell, D., Lee, E., Newmann, W.: Direct synthesis of optimized DSP assembly from signal flow diagrams. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 553–556. IEEE Computer Society, San Francisco, California, USA (1992)

52.

Pérez, J., Badia, R., Labarta, J.: A dependency-aware task-based programming environment for multi-core architectures. In: International Conference on Cluster Computing (CLUSTER), pp. 142–151. IEEE Computer Society, Tsukuba, Japan (2008)

53.

Ramamoorthy, C., Li, H.: Pipeline architecture. ACM Comput. Surv. 9(1), 61–102 (1977)CrossRefMATH

54.

Renau, J., Strauss, K., Ceze, L., Liu, W., Sarangi, S., Tuck, J., Torrellas, J.: Thread-level speculation on a CMP can be energy efficient. International Conference on Supercomputing (ICS), pp. 219–228. ACM, Cambridge, Massachusetts, USA (2005)

55.

Richardson, S.: Caching function results: Faster arithmetic by avoiding unnecessary computation. Technical Report SMLI TR-92-1, Sun Microsystems Inc., Mountain View, CA, USA (1992)

56.

Roquier, G., Lucarz, C., Mattavelli, M., Wipliez, M., Raulet, M., Janneck, J., Miller, I., Parlour, D.: An integrated environment for HW/SW co-design based on a CAL specification and HW/SW code generators. In: International Symposium on Circuits and Systems (ISCAS), pp. 799–799. IEEE Computer Society, Taipei, Taiwan (2009)

57.

Rumbaugh, J.: A data flow multiprocessor. IEEE Trans. Comput. 26(2), 138–146 (1977)CrossRefMATH

58.

Schneider, K.: The synchronous programming language Quartz. Internal Report 375, Department of Computer Science, University of Kaiserslautern, Kaiserslautern, Germany (2009)

59.

Steinke, R., Nutt, G.: A unified theory of shared memory consistency. J. ACM 51(5), 800–849 (2004)CrossRefMATHMathSciNet

60.

Stulova, A., Leupers, R., Ascheid, G.: Throughput driven transformations of synchronous data flows for mapping to heterogeneous MPSoCs. In: McAllister, J., Bhattacharyya, S. (eds.) International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (ICSAMOS), pp. 144–151. IEEE Computer Society, Samos, Greece (2012)

61.

Tejedor, E., Farreras, M., Grove, D., Almasi, G., Labarta, J.: ClusterSs: a task-based programming model for clusters. In: High Performance Distributed Computing (HPDC), pp. 267–268. ACM, San Jose, CA, USA (2011)

62.

Tomasulo, R.: An efficient algorithm for exploiting multiple arithmetic units. IBM J. Res. Dev. 11(1), 25–33 (1967)CrossRefMATH

63.

Vachharajani, N., Rangan, R., Raman, E., Bridges, M., Ottoni, G., August, D.: Speculative decoupled software pipelining. Parallel Architectures and Compilation Techniques (PACT), pp. 49–59. IEEE Computer Society, Brasov, Romania (2007)

64.

Zilles, C., Sohi, G.: Master/slave speculative parallelization. Microarchitecture (MICRO), pp. 85–96. IEEE Computer Society, Istanbul, Turkey (2002)

Title: Evaluation of Speculation in Out-of-Order Execution of Synchronous Dataflow Networks
Authors: Daniel Baudisch
Klaus Schneider
Publication date: 01-02-2015
Publisher: Springer US
Published in: International Journal of Parallel Programming / Issue 1/2015
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI: https://doi.org/10.1007/s10766-013-0277-2

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 1/2015

Revisiting Cache Resizing

BADCO: Behavioral Application-Dependent Superscalar Core Models

Low-Power Reconfigurable Miniature Sensor Nodes for Condition Monitoring

Guest Editorial: Special Issue on Embedded Computer Systems: Architectures, Modeling and Simulation

A Transaction-Based Environment for System Modeling and Parallel Simulation

Premium Partner