skip to main content
research-article
Free Access

Bringing Parallel Patterns Out of the Corner: The P3 ARSEC Benchmark Suite

Published:24 October 2017Publication History
Skip Abstract Section

Abstract

High-level parallel programming is an active research topic aimed at promoting parallel programming methodologies that provide the programmer with high-level abstractions to develop complex parallel software with reduced time to solution. Pattern-based parallel programming is based on a set of composable and customizable parallel patterns used as basic building blocks in parallel applications. In recent years, a considerable effort has been made in empowering this programming model with features able to overcome shortcomings of early approaches concerning flexibility and performance. In this article, we demonstrate that the approach is flexible and efficient enough by applying it on 12 out of 13 PARSEC applications. Our analysis, conducted on three different multicore architectures, demonstrates that pattern-based parallel programming has reached a good level of maturity, providing comparable results in terms of performance with respect to both other parallel programming methodologies based on pragma-based annotations (i.e., Openmp and OmpSs) and native implementations (i.e., Pthreads). Regarding the programming effort, we also demonstrate a considerable reduction in lines of code and code churn compared to Pthreads and comparable results with respect to other existing implementations.

References

  1. Marco Aldinucci, Mario Coppo, Ferruccio Damiani, Maurizio Drocco, Massimo Torquati, and Angelo Troina. 2011. On designing multicore-aware simulators for biological systems. In Proceedings of the 19th International Euromicro Conference on Parallel, Distributed, and Network-Based Processing (PDP’11). 318--325. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Marco Aldinucci and Marco Danelutto. 1999. Stream parallel skeleton optimization. In Proceedings of the 11th IASTED International Conference on Parallel and Distributed Computing and Systems. 966--962.Google ScholarGoogle Scholar
  3. Marco Aldinucci, Marco Danelutto, Peter Kilpatrick, Massimiliano Meneghin, and Massimo Torquati. 2012. An efficient unbounded lock-free queue for multi-core systems. In Euro-Par 2012 Parallel Processing. Lecture Notes in Computer Science, Vol. 7484. Springer, 662--673. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Marco Aldinucci, Salvatore Ruggieri, and Massimo Torquati. 2014. Decision tree building on multi-core using FastFlow. Concurrency and Computation: Practice and Experience 26, 3, 800--820. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bruno Bacci, Marco Danelutto, Salvatore Orlando, Susanna Pelagatti, and Marco Vanneschi. 1995. P3L: A structured high-level parallel language, and its structured support. Concurrency: Practice and Experience 7, 3, 225--255. Google ScholarGoogle ScholarCross RefCross Ref
  6. Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). ACM, New York, NY, 72--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Fischer Black and Myron Scholes. 1973. The pricing of options and corporate liabilities. Journal of Political Economy 81, 3, 637--654. Google ScholarGoogle ScholarCross RefCross Ref
  8. Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: An efficient multithreaded runtime system. ACM SIGPLAN Notices 30, 8, 207--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Andrea Bracciali, Marco Aldinucci, Murray Patterson, Tobias Marschall, Nadia Pisanti, Ivan Merelli, and Massimo Torquati. 2016. PWHATSHAP: Efficient haplotyping for future generation sequencing. BMC Bioinformatics 17, S-11, 342. Google ScholarGoogle ScholarCross RefCross Ref
  10. Kevin J. Brown, Arvind K. Sujeeth, Hyouk Joong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2011. A heterogeneous parallel framework for domain-specific languages. In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT’11). IEEE, Los Alamitos, CA, 89--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Daniele Buono, Marco Danelutto, Tiziano De Matteis, Gabriele Mencagli, and Massimo Torquati. 2014. A lightweight run-time support for fast dense linear algebra on multi-core. In Proceedings of the 12th IASTED International Conference on Parallel and Distributed Computing and Networks. Google ScholarGoogle ScholarCross RefCross Ref
  12. Colin Campbell and Ade Miller. 2011. A Parallel Programming With Microsoft Visual C++: Design Patterns for Decomposition and Coordination on Multicore Architectures. Microsoft Press, Redmond, WA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Denis Caromel, Ludovic Henrio, and Mario Leyton. 2008. Type safe algorithmic skeletons. In Proceedings of the 16th Euromicro Conference on Parallel, Distributed, and Network-Based Processing (PDP’08). 45--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Juan M. Cebrian, Magnus Jahre, and Lasse Natvig. 2015. ParVec: Vectorizing the PARSEC benchmark suite. Computing 97, 11, 1077--1100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R. Henry, Robert Bradshaw, and Nathan Weizenbaum. 2010. FlumeJava: Easy, efficient data-parallel pipelines. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). ACM, New York, NY, 363--375. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Barbara Chapman. 2007. The Multicore Programming Challenge. Springer, Berlin, Germany. Google ScholarGoogle ScholarCross RefCross Ref
  17. Dimitrios Chasapis, Marc Casas, Miquel Moretó, Raul Vidal, Eduard Ayguadé, Jesús Labarta, and Mateo Valero. 2015. PARSECSs: Evaluating the impact of task parallelism in the PARSEC benchmark suite. ACM Transactions on Architecture and Code Optimization 12, 4, Article No. 41, 22 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Murray Cole. 2004. Bringing skeletons out of the closet: A pragmatic manifesto for skeletal parallel programming. Parallel Computing 30, 3, 389--406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Marco Danelutto, Luca Deri, Daniele De Sensi, and Massimo Torquati. 2013. Deep packet inspection on commodity hardware using FastFlow. In Proceedings of the 15th International Parallel Computing Conference (ParCo’13). 92--99.Google ScholarGoogle Scholar
  20. Marco Danelutto, José Daniel Garcia, Luis Miguel Sanchez, Rafael Sotomayor, and Massimo Torquati. 2016. Introducing parallelism by using REPARA C++11 attributes. In Proceedings of the 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP’16). 354--358. Google ScholarGoogle ScholarCross RefCross Ref
  21. Marco Danelutto, Tiziano De Matteis, Daniele De Sensi, Gabriele Mencagli, and Massimo Torquati. 2017. P3 ARSEC: Towards parallel patterns benchmarking. In Proceedings of the Symposium on Applied Computing (SAC’17). ACM, New York, NY, 1582--1589. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Marco Danelutto and Massimo Torquati. 2015. Structured parallel programming with “core” FastFlow. In Central European Functional Programming School. Lecture Notes in Computer Science, Vol. 8606. Springer, 29--75.Google ScholarGoogle Scholar
  23. Daniele De Sensi, Massimo Torquati, and Marco Danelutto. 2016. A reconfiguration algorithm for power-aware parallel applications. ACM Transactions on Architecture and Code Optimization 13, 4, Article No. 43, 25 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Daniele De Sensi, Massimo Torquati, and Marco Danelutto. 2017. Mammut: High-level management of system knobs and sensors. SoftwareX 6, 150--154. Google ScholarGoogle Scholar
  25. Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of the ACM 51, 1, 107--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. David del Rio Astorga, Manuel F. Dolz, Javier Fernndez, and J. Daniel Garca. 2017. A generic parallel pattern interface for stream and data processing. Available at Google ScholarGoogle ScholarCross RefCross Ref
  27. Antonio J. Dios, Rafael Asenjo, Angeles Navarro, Francisco Corbera, and Emilio L. Zapata. 2010. Evaluation of the task programming model in the parallelization of wavefront problems. In Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC’10). 257--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Pradeep Dubey. 2005. Recognition, mining and synthesis moves computers to the era of tera. Technology@Intel Magazine 9, 2, 1--10.Google ScholarGoogle Scholar
  29. Kento Emoto and Kiminori Matsuzaki. 2014. An automatic fusion mechanism for variable-length list skeletons in SkeTo. International Journal of Parallel Programming 42, 4, 546--563. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Johan Enmyren and Christoph W. Kessler. 2010. SkePU: A multi-backend skeleton programming library for multi-GPU systems. In Proceedings of the 4th International Workshop on High-Level Parallel Programming and Applications (HLPP’10). ACM, New York, NY, 5--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Steffen Ernsting and Herbert Kuchen. 2012. Algorithmic skeletons for multi-core, multi-GPU systems and clusters. IInternational Journal of High Performance Computing and Networking 7, 2, 129--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. August Ernstsson, Lu Li, and Christoph Kessler. 2017. SkePU 2: Flexible and type-safe skeleton programming for heterogeneous parallel systems. Available at Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. 1995. Design Patterns: Elements of Reusable Object-oriented Software. Longman, Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Buğra Gedik, Habibe G. Özsema, and Özcan Öztürk. 2016. Pipelined fission for stream programs with dynamic selectivity and partitioned state. Journal of Parallel and Distributed Computing 96, C, 106--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Horacio González-Vélez and Mario Leyton. 2010. A survey of algorithmic skeleton frameworks: High-level structured parallel programming enablers. Software: Practice and Experience 40, 12, 1135--1160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Clemens Grelck. 2005. Shared memory multiprocessor support for functional array processing in SAC. Journal of Functional Programming 15, 3, 353--401. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Dalvan Griebler, Marco Danelutto, Massimo Torquati, and Luiz Gustavo Fernandes. 2017. SPar: A DSL for high-level and productive stream parallelism. Parallel Processing Letters 27, 1, 1--20. Google ScholarGoogle ScholarCross RefCross Ref
  38. Michael Haidl and Sergei Gorlatch. 2017. High-level programming for many-cores using C++14 and the STL. Available at Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining frequent patterns without candidate generation. ACM SIGMOD Record 29, 2, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. David Heath, Robert Jarrow, and Andrew Morton. 1992. Bond pricing and the term structure of interest rates: A new methodology for contingent claims valuation. Econometrica 60, 1, 77--105. Google ScholarGoogle ScholarCross RefCross Ref
  41. Vladimir Janjic, Chris Brown, Kenneth Mackenzie, Kevin Hammond, Marco Danelutto, Marco Aldinucci, and José Daniel Garcia. 2016. RPL: A domain-specific language for designing and implementing parallel C++ applications. In Proceedings of the 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP’16). 288--295. Google ScholarGoogle ScholarCross RefCross Ref
  42. I-Ting Angelina Lee, Charles E. Leiserson, Tao B. Schardl, Zhunping Zhang, and Jim Sukha. 2015. On-the-fly pipeline parallelism. ACM Transactions on Parallel Computing 2, 3, Article No. 17, 42 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Joeffrey Legaux, Frdric Loulergue, and Sylvain Jubertie. 2013. OSL: An algorithmic skeleton library with exceptions. Procedia Computer Science 18, 260--269. 2013 International Conference on Computational Science. Google ScholarGoogle ScholarCross RefCross Ref
  44. Mario Leyton and José M. Piquer. 2010. Skandium: Multi-core programming with algorithmic skeletons. In Proceedings of the18th Euromicro Conference on Parallel, Distributed, and Network-Based Processing (PDP’10). 289--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. 2006. Ferret: A toolkit for content-based similarity search of feature-rich data. ACM SIGOPS Operating System Review 40, 4, 317--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Kirk Martinez and John Cupitt. 2005. VIPS—a highly tuned image processing software architecture. In Proceedings of the IEEE International Conference on Image Processing, Vol. 2. II--574--7. Google ScholarGoogle ScholarCross RefCross Ref
  47. Tiziano De Matteis and Gabriele Mencagli. 2016. Keep calm and react with foresight: Strategies for low-latency and energy-efficient elastic data stream processing. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Timothy Mattson, Beverly Sanders, and Berna Massingill. 2004. Patterns for Parallel Programming. Addison-Wesley Professional. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Michael McCool, James Reinders, and Arch Robison. 2012. Structured Parallel Programming. Morgan Kaufmann, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Gabriele Mencagli, Massimo Torquati, Marco Danelutto, and Tiziano De Matteis. 2017. Parallel continuous preference queries over out-of-order and bursty data streams. IEEE Transactions on Parallel and Distributed Systems PP, 99, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. John C. Munson and Sebastian G. Elbaum. 1998. Code churn: A measure for estimating the impact of code change. In Proceedings of the International Conference on Software Maintenance. 24--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Nachiappan Nagappan and Thomas Ball. 2005. Use of relative code churn measures to predict system defect density. In Proceedings of the 27th International Conference on Software Engineering (ICSE’05). 284--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Angeles Navarro, Rafael Asenjo, Siham Tabik, and Calin Cascaval. 2009. Analytical modeling of pipeline parallelism. In Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques (PACT’09). IEEE, Los Alamitos, CA, 281--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Srinath Sridharan, Gagan Gupta, and Gurindar S. Sohi. 2014. Adaptive, efficient, parallel execution of parallel programs. ACM SIGPLAN Notices 49, 6, 169--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Michel Steuwer, Philipp Kegel, and Sergei Gorlatch. 2011. SkelCL—a portable skeleton library for high-level GPU programming. In Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW’11). 1176--1182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Arvind K. Sujeeth, Tiark Rompf, Kevin J. Brown, HyoukJoong Lee, Hassan Chafi, Victoria Popic, Michael Wu, Aleksandar Prokopec, Vojin Jovanovic, Martin Odersky, and Kunle Olukotun. 2013. In ECOOP 2013—Object-Oriented Programming. Lecture Notes in Computer Science, Vol. 7920. Springer, 52--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Marco Vanneschi. 2002. The programming model of ASSIST, an environment for parallel and distributed portable applications. Parallel Computing 28, 12, 1709--1732. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Elaine J. Weyuker. 1988. Evaluating software complexity measures. IEEE Transactions on Software Engineering 14, 9, 1357--1365. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. William A. Wulf and Sally A. McKee. 1995. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News 23, 1, 20--24. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Bringing Parallel Patterns Out of the Corner: The P3 ARSEC Benchmark Suite

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Architecture and Code Optimization
      ACM Transactions on Architecture and Code Optimization  Volume 14, Issue 4
      December 2017
      600 pages
      ISSN:1544-3566
      EISSN:1544-3973
      DOI:10.1145/3154814
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 October 2017
      • Accepted: 1 August 2017
      • Revised: 1 July 2017
      • Received: 1 June 2017
      Published in taco Volume 14, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader