research-article

Paraprox: pattern-based approximation for data parallel applications

Authors:
Mehrzad Samadi

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

,
Davoud Anoushe Jamshidi

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

,
Janghaeng Lee

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

,
Scott Mahlke

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systemsFebruary 2014Pages 35–50https://doi.org/10.1145/2541940.2541948

Published:24 February 2014Publication History

ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Pages 35–50

ABSTRACT

Approximate computing is an approach where reduced accuracy of results is traded off for increased speed, throughput, or both. Loss of accuracy is not permissible in all computing domains, but there are a growing number of data-intensive domains where the output of programs need not be perfectly correct to provide useful results or even noticeable differences to the end user. These soft domains include multimedia processing, machine learning, and data mining/analysis. An important challenge with approximate computing is transparency to insulate both software and hardware developers from the time, cost, and difficulty of using approximation. This paper proposes a software-only system, Paraprox, for realizing transparent approximation of data-parallel programs that operates on commodity hardware systems. Paraprox starts with a data-parallel kernel implemented using OpenCL or CUDA and creates a parameterized approximate kernel that is tuned at runtime to maximize performance subject to a target output quality (TOQ) that is supplied by the user. Approximate kernels are created by recognizing common computation idioms found in data-parallel programs (e.g., Map, Scatter/Gather, Reduction, Scan, Stencil, and Partition) and substituting approximate implementations in their place. Across a set of 13 soft data-parallel applications with at most 10% quality degradation, Paraprox yields an average performance gain of 2.7x on a NVIDIA GTX 560 GPU and 2.5x on an Intel Core i7 quad-core processor compared to accurate execution on each platform.

References

The credit card equation, 2013. http://faculty.bennington.edu/~jzimba/CreditCardEquation Derivation.pdf.Google Scholar
A. Agarwal, M. Rinard, S. Sidiroglou, S. Misailovic, and H. Hoffmann. Using code perforation to improve performance, reduce energy consumption, and respond to failures. Technical Report MIT-CSAIL-TR-2009-042, MIT, Mar. 2009.Google Scholar
C. Álvarez, J. Corbal, and M. Valero. Fuzzy memoization for floating-point multimedia applications. IEEE Transactions on Computers, pages 922--927, 2005. Google ScholarDigital Library
J. Ansel, Y. L. Wong, C. Chan, M. Olszewski, A. Edelman, and S. Amarasinghe. Language and compiler support for autotuning variable-accuracy algorithms. In Proc. of the 2011 International Symposium on Code Generation and Optimization, pages 85--96, 2011. Google ScholarDigital Library
W. Baek and T. M. Chilimbi. Green: a framework for supporting energy-conscious programming using controlled approximation. In Proc. of the '10 Conference on Programming Language Design and Implementation, pages 198--209, 2010. Google ScholarDigital Library
D. H. Bailey. Experimental mathematics in action. A.K. Peters, 2007.Google Scholar
F. Bass. A new product growth for model consumer durables. Management Science, pages 215--227, 1969.Google Scholar
S. Chaudhuri, S. Gulwani, R. L. Roberto, and S. Navidpour. Proving programs robust. In Proc. of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, pages 102--112, 2011. Google ScholarDigital Library
S. Che, M. Boyer, J. Meng, D. Tarjan, , J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Proc. of the IEEE Symposium on Workload Characterization, pages 44--54, 2009. Google ScholarDigital Library
EMC Corporation. Extracting value from chaos, 2011. www.emc.com /collateral/analyst-reports/idc-extractingvalue-from-chaos-ar.pdf.Google Scholar
H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. Architecture support for disciplined approximate programming. In 17th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 301--312, 2012. Google ScholarDigital Library
H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. Neural acceleration for general-purpose approximate programs. In Proc. of the 45th Annual International Symposium on Microarchitecture, pages 449--460, 2012. Google ScholarDigital Library
KHRONOS Group. OpenCL - the open standard for parallel programming of heterogeneous systems, 2013.Google Scholar
C. A. Martinez, J. C. S. Adrian, and M. Cortes. Dynamic tolerance region computing for multimedia. IEEE Transactions on Computers, pages 650--665, 2012. Google ScholarDigital Library
M. McCool, J. Reinders, and A. Robison. Structured Parallel Programming: Patterns for Efficient Computation. Morgan Kaufmann, 2012. Google ScholarDigital Library
P. D. Michailidis and K. G. Margaritis. Accelerating kerne stimation on the GPU using the CUDA framework. Journal of Applied Mathematical Science, 7(30):1447--1476, 2013.Google ScholarCross Ref
S. Misailovic, D. M. Roy, and M. C. Rinard. Probabilistically accurate program transformations. In Proc. of the 18th Static Analysis Symposium, pages 316--333, 2011. Google ScholarDigital Library
S. Misailovic, S. Sidiroglou, H. Hoffmann, and M. Rinard. Quality of service profiling. In Proc. of the 32nd ACM/IEEE conference on Software Engineering, pages 25--34, 2010. Google ScholarDigital Library
NVIDIA. CUDA C Programming Guide, Oct. 2012.Google Scholar
NVIDIA. NVIDIA's next generation CUDA compute architecture: Kepler GK110, 2012. www.nvidia.com/content/PDF/NVIDIA Kepler GK110 Architecture Whitepaper.pdf.Google Scholar
K. Ohishi, H. Okamura, and T. Dohi. Gompertz software reliability model: Estimation algorithm and empirical validation. Journal of Systems and Software, pages 535--543, 2009. Google ScholarDigital Library
L. Renganarayana, V. Srinivasan, R. Nair, and D. Prener. Programming with relaxed synchronization. In Proc. of the 2012 ACM Workshop on Relaxing Synchronization for Multicore and Manycore Scalability, pages 41--50, 2012. Google ScholarDigital Library
M. Rinard. Probabilistic accuracy bounds for fault-tolerant computations that discard tasks. In Proc. of the 2006 International Conference on Supercomputing, pages 324--334, 2006. Google ScholarDigital Library
M. Rinard. Parallel synchronization-free approximate data structure construction. In Proc. of the 5th USENIX Workshop on Hot Topics in Parallelism, pages 1--8, 2013.Google Scholar
M. C. Rinard. Using early phase termination to eliminate load imbalances at barrier synchronization points. In Proc. of the 22nd annual ACM SIGPLAN conference on Object-Oriented Systems and applications, pages 369--386, 2007. Google ScholarDigital Library
D. Roger, U. Assarsson, and N. Holzschuch. Efficient stream reduction on the GPU. In Proc. of the 1st Workshop on General Purpose Processing on Graphics Processing Units, pages 1--4, 2007.Google Scholar
M. Samadi, J. Lee, D. A. Jamshidi, A. Hormati, and S.Mahlke. SAGE: Self-tuning approximation for graphics engines. In Proc. of the 46th Annual International Symposium on Microarchitecture, pages 13--24, 2013. Google ScholarDigital Library
A. Sampson, W. Dietl, E. Fortuna, D. Gnanapragasam, L. Ceze, and D. Grossman. EnerJ: approximate data types for safe and general low-power computation. Proc. of the '11 Conference on Programming Language Design and Implementation, 46(6):164--174, June 2011. Google ScholarDigital Library
A. Sampson, J. Nelson, K. Strauss, and L. Ceze. Approximate storage in solid-state memories. In Proc. of the 46th Annual International Symposium on Microarchitecture, pages 25--36, 2013. Google ScholarDigital Library
J. Sartori and R. Kumar. Branch and data herding: Reducing control and memory divergence for error-tolerant GPU applications. In IEEE Transactions on on Multimedia, pages 427--428, 2012. Google ScholarDigital Library
H. Sheikh, M. Sabir, and A. Bovik. A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Transactions on Image Processing, 15(11):3440--3451, 2006. Google ScholarDigital Library
A. K. Sujeeth, H. Lee, K. J. Brown, H. Chafi, M. Wu, A. R. Atreya, K. Olukotun, T. Rompf, and M. Odersky. OptiML: an implicitly parallel domain specific language for machine learning. In Proc. of the 28th International Conference on Machine learning, pages 609--616, 2011.Google Scholar
O. Temam. A defect-tolerant accelerator for emerging highperformance applications. In Proc. of the 39th Annual International Symposium on Computer Architecture, pages 356--367, 2012. Google ScholarDigital Library
S. Venkataramani, V. K. Chippa, S. T. Chakradhar, K. Roy, and A. Raghunathan. Quality programmable vector processors for approximate computing. In Proc. of the 46th Annual International Symposium on Microarchitecture, pages 1--12, 2013. Google ScholarDigital Library
H. Wong, M. M. Papadopoulou, M. Sadooghi-Alvandi, and A. Moshovos. Demystifying GPU microarchitecture through microbenchmarking. In Proc. of the 2010 IEEE Symposium on Performance Analysis of Systems and Software, pages 235--246, 2010.Google ScholarCross Ref
X.-L. Wu, N. Obeid, and W.-M. Hwu. Exploiting more parallelism from applications having generalized reductions on GPU architectures. In Proc. of the 2010 10th International Conference on Computers and Information Technology, pages 1175--1180, 2010. Google ScholarDigital Library

Index Terms

Paraprox: pattern-based approximation for data parallel applications
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

Paraprox: pattern-based approximation for data parallel applications
ASPLOS '14

Approximate computing is an approach where reduced accuracy of results is traded off for increased speed, throughput, or both. Loss of accuracy is not permissible in all computing domains, but there are a growing number of data-intensive domains where ...
Read More
Paraprox: pattern-based approximation for data parallel applications
ASPLOS '14

Approximate computing is an approach where reduced accuracy of results is traded off for increased speed, throughput, or both. Loss of accuracy is not permissible in all computing domains, but there are a growing number of data-intensive domains where ...
Read More
Approximate weighted matching on emerging manycore and multithreaded architectures

Graph matching is a prototypical combinatorial problem with many applications in high-performance scientific computing. Optimal algorithms for computing matchings are challenging to parallelize. Approximation algorithms are amenable to parallelization ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
February 2014
780 pages
ISBN:9781450323055
DOI:10.1145/2541940
General Chairs:
Rajeev Balasubramonian
University of Utah
,
Al Davis
University of Utah
,
Program Chair:
Sarita Adve
University of Illinois at Urbana-Champ
ACM SIGARCH Computer Architecture News Volume 42, Issue 1
ASPLOS '14
March 2014
729 pages
ISSN:0163-5964
DOI:10.1145/2654822
Editor:
Doug DeGroot
acm dot org
Issue’s Table of Contents
ACM SIGPLAN Notices Volume 49, Issue 4
ASPLOS '14
April 2014
729 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2644865
Editors:
Mark W. Bailey
Hamilton College, Clinton, NY
,
Rajeev Balasubramonian
University of Utah
,
Al Davis
University of Utah
,
Sarita Adve
University of Illinois at Urbana-Champ
Issue’s Table of Contents
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 February 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
accuracy-aware computing
approximation
data parallel
gpu
Qualifiers
- research-article
Conference

Acceptance Rates
ASPLOS '14 Paper Acceptance Rate49of217submissions,23%Overall Acceptance Rate535of2,713submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 162
  Total Citations
  View Citations
- 1,210
  Total Downloads
- Downloads (Last 12 months)70
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Paraprox: pattern-based approximation for data parallel applications

ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Paraprox: pattern-based approximation for data parallel applications

Paraprox: pattern-based approximation for data parallel applications

Approximate weighted matching on emerging manycore and multithreaded architectures