Bringing Parallel Patterns Out of the Corner: The P³ ARSEC Benchmark Suite

Authors:
Daniele De Sensi

University of Pisa

University of Pisa

0000-0002-7244-639X
View Profile

,
Tiziano De Matteis

University of Pisa

University of Pisa
View Profile

,
Massimo Torquati

University of Pisa

University of Pisa
View Profile

,
Gabriele Mencagli

University of Pisa

University of Pisa
View Profile

,
Marco Danelutto

University of Pisa

University of Pisa
View Profile

ACM Transactions on Architecture and Code Optimization Volume 14 Issue 4Article No.: 33pp 1–26https://doi.org/10.1145/3132710

Published:24 October 2017Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

High-level parallel programming is an active research topic aimed at promoting parallel programming methodologies that provide the programmer with high-level abstractions to develop complex parallel software with reduced time to solution. Pattern-based parallel programming is based on a set of composable and customizable parallel patterns used as basic building blocks in parallel applications. In recent years, a considerable effort has been made in empowering this programming model with features able to overcome shortcomings of early approaches concerning flexibility and performance. In this article, we demonstrate that the approach is flexible and efficient enough by applying it on 12 out of 13 PARSEC applications. Our analysis, conducted on three different multicore architectures, demonstrates that pattern-based parallel programming has reached a good level of maturity, providing comparable results in terms of performance with respect to both other parallel programming methodologies based on pragma-based annotations (i.e., Openmp and OmpSs) and native implementations (i.e., Pthreads). Regarding the programming effort, we also demonstrate a considerable reduction in lines of code and code churn compared to Pthreads and comparable results with respect to other existing implementations.

References

Marco Aldinucci, Mario Coppo, Ferruccio Damiani, Maurizio Drocco, Massimo Torquati, and Angelo Troina. 2011. On designing multicore-aware simulators for biological systems. In Proceedings of the 19th International Euromicro Conference on Parallel, Distributed, and Network-Based Processing (PDP’11). 318--325. Google ScholarDigital Library
Marco Aldinucci and Marco Danelutto. 1999. Stream parallel skeleton optimization. In Proceedings of the 11th IASTED International Conference on Parallel and Distributed Computing and Systems. 966--962.Google Scholar
Marco Aldinucci, Marco Danelutto, Peter Kilpatrick, Massimiliano Meneghin, and Massimo Torquati. 2012. An efficient unbounded lock-free queue for multi-core systems. In Euro-Par 2012 Parallel Processing. Lecture Notes in Computer Science, Vol. 7484. Springer, 662--673. Google ScholarDigital Library
Marco Aldinucci, Salvatore Ruggieri, and Massimo Torquati. 2014. Decision tree building on multi-core using FastFlow. Concurrency and Computation: Practice and Experience 26, 3, 800--820. Google ScholarDigital Library
Bruno Bacci, Marco Danelutto, Salvatore Orlando, Susanna Pelagatti, and Marco Vanneschi. 1995. P3L: A structured high-level parallel language, and its structured support. Concurrency: Practice and Experience 7, 3, 225--255. Google ScholarCross Ref
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). ACM, New York, NY, 72--81. Google ScholarDigital Library
Fischer Black and Myron Scholes. 1973. The pricing of options and corporate liabilities. Journal of Political Economy 81, 3, 637--654. Google ScholarCross Ref
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: An efficient multithreaded runtime system. ACM SIGPLAN Notices 30, 8, 207--216. Google ScholarDigital Library
Andrea Bracciali, Marco Aldinucci, Murray Patterson, Tobias Marschall, Nadia Pisanti, Ivan Merelli, and Massimo Torquati. 2016. PWHATSHAP: Efficient haplotyping for future generation sequencing. BMC Bioinformatics 17, S-11, 342. Google ScholarCross Ref
Kevin J. Brown, Arvind K. Sujeeth, Hyouk Joong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2011. A heterogeneous parallel framework for domain-specific languages. In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT’11). IEEE, Los Alamitos, CA, 89--100. Google ScholarDigital Library
Daniele Buono, Marco Danelutto, Tiziano De Matteis, Gabriele Mencagli, and Massimo Torquati. 2014. A lightweight run-time support for fast dense linear algebra on multi-core. In Proceedings of the 12th IASTED International Conference on Parallel and Distributed Computing and Networks. Google ScholarCross Ref
Colin Campbell and Ade Miller. 2011. A Parallel Programming With Microsoft Visual C++: Design Patterns for Decomposition and Coordination on Multicore Architectures. Microsoft Press, Redmond, WA. Google ScholarDigital Library
Denis Caromel, Ludovic Henrio, and Mario Leyton. 2008. Type safe algorithmic skeletons. In Proceedings of the 16th Euromicro Conference on Parallel, Distributed, and Network-Based Processing (PDP’08). 45--53. Google ScholarDigital Library
Juan M. Cebrian, Magnus Jahre, and Lasse Natvig. 2015. ParVec: Vectorizing the PARSEC benchmark suite. Computing 97, 11, 1077--1100. Google ScholarDigital Library
Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R. Henry, Robert Bradshaw, and Nathan Weizenbaum. 2010. FlumeJava: Easy, efficient data-parallel pipelines. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). ACM, New York, NY, 363--375. Google ScholarDigital Library
Barbara Chapman. 2007. The Multicore Programming Challenge. Springer, Berlin, Germany. Google ScholarCross Ref
Dimitrios Chasapis, Marc Casas, Miquel Moretó, Raul Vidal, Eduard Ayguadé, Jesús Labarta, and Mateo Valero. 2015. PARSECSs: Evaluating the impact of task parallelism in the PARSEC benchmark suite. ACM Transactions on Architecture and Code Optimization 12, 4, Article No. 41, 22 pages. Google ScholarDigital Library
Murray Cole. 2004. Bringing skeletons out of the closet: A pragmatic manifesto for skeletal parallel programming. Parallel Computing 30, 3, 389--406. Google ScholarDigital Library
Marco Danelutto, Luca Deri, Daniele De Sensi, and Massimo Torquati. 2013. Deep packet inspection on commodity hardware using FastFlow. In Proceedings of the 15th International Parallel Computing Conference (ParCo’13). 92--99.Google Scholar
Marco Danelutto, José Daniel Garcia, Luis Miguel Sanchez, Rafael Sotomayor, and Massimo Torquati. 2016. Introducing parallelism by using REPARA C++11 attributes. In Proceedings of the 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP’16). 354--358. Google ScholarCross Ref
Marco Danelutto, Tiziano De Matteis, Daniele De Sensi, Gabriele Mencagli, and Massimo Torquati. 2017. P³ ARSEC: Towards parallel patterns benchmarking. In Proceedings of the Symposium on Applied Computing (SAC’17). ACM, New York, NY, 1582--1589. Google ScholarDigital Library
Marco Danelutto and Massimo Torquati. 2015. Structured parallel programming with “core” FastFlow. In Central European Functional Programming School. Lecture Notes in Computer Science, Vol. 8606. Springer, 29--75.Google Scholar
Daniele De Sensi, Massimo Torquati, and Marco Danelutto. 2016. A reconfiguration algorithm for power-aware parallel applications. ACM Transactions on Architecture and Code Optimization 13, 4, Article No. 43, 25 pages. Google ScholarDigital Library
Daniele De Sensi, Massimo Torquati, and Marco Danelutto. 2017. Mammut: High-level management of system knobs and sensors. SoftwareX 6, 150--154. Google Scholar
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of the ACM 51, 1, 107--113. Google ScholarDigital Library
David del Rio Astorga, Manuel F. Dolz, Javier Fernndez, and J. Daniel Garca. 2017. A generic parallel pattern interface for stream and data processing. Available at Google ScholarCross Ref
Antonio J. Dios, Rafael Asenjo, Angeles Navarro, Francisco Corbera, and Emilio L. Zapata. 2010. Evaluation of the task programming model in the parallelization of wavefront problems. In Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC’10). 257--264. Google ScholarDigital Library
Pradeep Dubey. 2005. Recognition, mining and synthesis moves computers to the era of tera. Technology@Intel Magazine 9, 2, 1--10.Google Scholar
Kento Emoto and Kiminori Matsuzaki. 2014. An automatic fusion mechanism for variable-length list skeletons in SkeTo. International Journal of Parallel Programming 42, 4, 546--563. Google ScholarDigital Library
Johan Enmyren and Christoph W. Kessler. 2010. SkePU: A multi-backend skeleton programming library for multi-GPU systems. In Proceedings of the 4th International Workshop on High-Level Parallel Programming and Applications (HLPP’10). ACM, New York, NY, 5--14. Google ScholarDigital Library
Steffen Ernsting and Herbert Kuchen. 2012. Algorithmic skeletons for multi-core, multi-GPU systems and clusters. IInternational Journal of High Performance Computing and Networking 7, 2, 129--138. Google ScholarDigital Library
August Ernstsson, Lu Li, and Christoph Kessler. 2017. SkePU 2: Flexible and type-safe skeleton programming for heterogeneous parallel systems. Available at Google ScholarDigital Library
Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. 1995. Design Patterns: Elements of Reusable Object-oriented Software. Longman, Boston, MA. Google ScholarDigital Library
Buğra Gedik, Habibe G. Özsema, and Özcan Öztürk. 2016. Pipelined fission for stream programs with dynamic selectivity and partitioned state. Journal of Parallel and Distributed Computing 96, C, 106--120. Google ScholarDigital Library
Horacio González-Vélez and Mario Leyton. 2010. A survey of algorithmic skeleton frameworks: High-level structured parallel programming enablers. Software: Practice and Experience 40, 12, 1135--1160. Google ScholarDigital Library
Clemens Grelck. 2005. Shared memory multiprocessor support for functional array processing in SAC. Journal of Functional Programming 15, 3, 353--401. Google ScholarDigital Library
Dalvan Griebler, Marco Danelutto, Massimo Torquati, and Luiz Gustavo Fernandes. 2017. SPar: A DSL for high-level and productive stream parallelism. Parallel Processing Letters 27, 1, 1--20. Google ScholarCross Ref
Michael Haidl and Sergei Gorlatch. 2017. High-level programming for many-cores using C++14 and the STL. Available at Google ScholarDigital Library
Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining frequent patterns without candidate generation. ACM SIGMOD Record 29, 2, 1--12. Google ScholarDigital Library
David Heath, Robert Jarrow, and Andrew Morton. 1992. Bond pricing and the term structure of interest rates: A new methodology for contingent claims valuation. Econometrica 60, 1, 77--105. Google ScholarCross Ref
Vladimir Janjic, Chris Brown, Kenneth Mackenzie, Kevin Hammond, Marco Danelutto, Marco Aldinucci, and José Daniel Garcia. 2016. RPL: A domain-specific language for designing and implementing parallel C++ applications. In Proceedings of the 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP’16). 288--295. Google ScholarCross Ref
I-Ting Angelina Lee, Charles E. Leiserson, Tao B. Schardl, Zhunping Zhang, and Jim Sukha. 2015. On-the-fly pipeline parallelism. ACM Transactions on Parallel Computing 2, 3, Article No. 17, 42 pages. Google ScholarDigital Library
Joeffrey Legaux, Frdric Loulergue, and Sylvain Jubertie. 2013. OSL: An algorithmic skeleton library with exceptions. Procedia Computer Science 18, 260--269. 2013 International Conference on Computational Science. Google ScholarCross Ref
Mario Leyton and José M. Piquer. 2010. Skandium: Multi-core programming with algorithmic skeletons. In Proceedings of the18th Euromicro Conference on Parallel, Distributed, and Network-Based Processing (PDP’10). 289--296. Google ScholarDigital Library
Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. 2006. Ferret: A toolkit for content-based similarity search of feature-rich data. ACM SIGOPS Operating System Review 40, 4, 317--330. Google ScholarDigital Library
Kirk Martinez and John Cupitt. 2005. VIPS—a highly tuned image processing software architecture. In Proceedings of the IEEE International Conference on Image Processing, Vol. 2. II--574--7. Google ScholarCross Ref
Tiziano De Matteis and Gabriele Mencagli. 2016. Keep calm and react with foresight: Strategies for low-latency and energy-efficient elastic data stream processing. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16). Google ScholarDigital Library
Timothy Mattson, Beverly Sanders, and Berna Massingill. 2004. Patterns for Parallel Programming. Addison-Wesley Professional. Google ScholarDigital Library
Michael McCool, James Reinders, and Arch Robison. 2012. Structured Parallel Programming. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
Gabriele Mencagli, Massimo Torquati, Marco Danelutto, and Tiziano De Matteis. 2017. Parallel continuous preference queries over out-of-order and bursty data streams. IEEE Transactions on Parallel and Distributed Systems PP, 99, 1. Google ScholarDigital Library
John C. Munson and Sebastian G. Elbaum. 1998. Code churn: A measure for estimating the impact of code change. In Proceedings of the International Conference on Software Maintenance. 24--31. Google ScholarDigital Library
Nachiappan Nagappan and Thomas Ball. 2005. Use of relative code churn measures to predict system defect density. In Proceedings of the 27th International Conference on Software Engineering (ICSE’05). 284--292. Google ScholarDigital Library
Angeles Navarro, Rafael Asenjo, Siham Tabik, and Calin Cascaval. 2009. Analytical modeling of pipeline parallelism. In Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques (PACT’09). IEEE, Los Alamitos, CA, 281--290. Google ScholarDigital Library
Srinath Sridharan, Gagan Gupta, and Gurindar S. Sohi. 2014. Adaptive, efficient, parallel execution of parallel programs. ACM SIGPLAN Notices 49, 6, 169--180. Google ScholarDigital Library
Michel Steuwer, Philipp Kegel, and Sergei Gorlatch. 2011. SkelCL—a portable skeleton library for high-level GPU programming. In Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW’11). 1176--1182. Google ScholarDigital Library
Arvind K. Sujeeth, Tiark Rompf, Kevin J. Brown, HyoukJoong Lee, Hassan Chafi, Victoria Popic, Michael Wu, Aleksandar Prokopec, Vojin Jovanovic, Martin Odersky, and Kunle Olukotun. 2013. In ECOOP 2013—Object-Oriented Programming. Lecture Notes in Computer Science, Vol. 7920. Springer, 52--78. Google ScholarDigital Library
Marco Vanneschi. 2002. The programming model of ASSIST, an environment for parallel and distributed portable applications. Parallel Computing 28, 12, 1709--1732. Google ScholarDigital Library
Elaine J. Weyuker. 1988. Evaluating software complexity measures. IEEE Transactions on Software Engineering 14, 9, 1357--1365. Google ScholarDigital Library
William A. Wulf and Sally A. McKee. 1995. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News 23, 1, 20--24. Google ScholarDigital Library

Index Terms

Bringing Parallel Patterns Out of the Corner: The P³ ARSEC Benchmark Suite
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages

Recommendations

The correlation between parallel patterns and multi-core benchmarks
IWMSE '10: Proceedings of the 3rd International Workshop on Multicore Software Engineering

Parallel Patterns can be thought of as standard solutions used to evaluate parallelism used in software. Multi-core benchmarks can be thought of as standard codes used for evaluating parallelism in hardware. In this document, we discuss the relationship ...
Read More
Data Parallel Algorithmic Skeletons with Accelerator Support

Hardware accelerators such as GPUs or Intel Xeon Phi comprise hundreds or thousands of cores on a single chip and promise to deliver high performance. They are widely used to boost the performance of highly parallel applications. However, because of ...
Read More
Autonomic Coordination of Skeleton-Based Applications Over CPU/GPU Multi-Core Architectures

Widely adumbrated as patterns of parallel computation and communication, algorithmic skeletons introduce a viable solution for efficiently programming modern heterogeneous multi-core architectures equipped not only with traditional multi-core CPUs, but ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Architecture and Code Optimization Volume 14, Issue 4
December 2017
600 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3154814
Editor:
Koen De Bosschere
Ghent University
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 October 2017
- Accepted: 1 August 2017
- Revised: 1 July 2017
- Received: 1 June 2017
Published in taco Volume 14, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Parallel patterns
algorithmic skeletons
benchmarking
multicore programming
parsec
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 695
  Total Downloads
- Downloads (Last 12 months)81
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Bringing Parallel Patterns Out of the Corner: The P³ ARSEC Benchmark Suite

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

The correlation between parallel patterns and multi-core benchmarks

Data Parallel Algorithmic Skeletons with Accelerator Support

Autonomic Coordination of Skeleton-Based Applications Over CPU/GPU Multi-Core Architectures