nach oben

International Journal of Parallel Programming

Erschienen in:

21.04.2016

Adaptive Optimization \(l_1\)-Minimization Solvers on GPU

verfasst von: Jiaquan Gao, Zejie Li, Ronghua Liang, Guixia He

Erschienen in: International Journal of Parallel Programming | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

\(l_1\)-minimization (\(l_1\)-min) algorithms for the \(l_1\)-min problem have been widely developed. For most \(l_1\)-min algorithms, their main components include dense matrix-vector multiplications such as Ax and \(A^Tx\), and vector operations. We propose a novel warp-based implementation of the matrix-vector multiplication (Ax) on the graphics processing unit (GPU), called the GEMV kernel, and a novel thread-based implementation of the matrix-vector multiplication (\(A^Tx\)) on the GPU, called the GEMV-T kernel. For the GEMV kernel, a self-adaptive warp allocation strategy is used to assign the optimal warp number for each matrix row. Similar to the GEMV kernel, we design a self-adaptive thread allocation strategy to assign the optimal thread number to each matrix row for the GEMV-T kernel. Two popular \(l_1\)-min algorithms, fast iterative shrinkage-thresholding algorithm and augmented Lagrangian multiplier method, are taken for example. Based on the GEMV and GEMV-T kernels, we present two highly parallel \(l_1\)-min solvers on the GPU utilizing the technique of merging kernels and the sparsity of the solution of the \(l_1\)-min algorithms. Furthermore, we design a concurrent multiple \(l_1\)-min solver on the GPU, and optimize its performance by using new features of GPU such as the shuffle instruction and read-only data cache. The experimental results have validated high efficiency and good performance of our methods.

Vorheriger Artikel Restart Optimization for Transactional Memory with Lazy Conflict Detection

Nächster Artikel Adaptive Runtime-Assisted Block Prefetching on Chip-Multiprocessors

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Bruckstein, A., Donoho, D., Elad, M.: From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Review 51(1), 34–81 (2009)MathSciNetCrossRefMATH

Mallat, S.: A Wavelet Tour of Signal Processing—The Sparse Way, 3rd edn. Academic, Cambridge (2009)MATH

Donoho, D.L., Elad, M.: Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization. Proc.Natl. Acad. Sci. 100(5), 2197–2202 (2003)MathSciNetCrossRefMATH

Donoho, D.L., Elad, M.: On the stability of the basis pursuit in the presence of noise. Signal Process. 86(3), 511–532 (2006)CrossRefMATH

Tropp, J.A.: Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 50(10), 2231–2242 (2004)MathSciNetCrossRefMATH

Tropp, J.A.: Just relax: convex programming methods for subset selection and sparse approximation. IEEE Trans. Inf. Theory 52(3), 1030–1051 (2006)CrossRefMATH

Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1), 33–61 (1998)MathSciNetCrossRefMATH

Candès, E., Romberg, J., Tao, T.: Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59(8), 1207–1223 (2006)MathSciNetCrossRefMATH

Wright, J., Yang, A., Ganesh, A., Sastry, S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. 31(2), 210–227 (2009)CrossRef

10.

Elhamifar, E., Vidal, R.: Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans. Pattern Anal. 35(11), 2765–2781 (2013)CrossRef

11.

Elhamifar, E., Vidal, R.: Sparse subspace clustering: computer vision and pattern recognition. In: IEEE Conference on CVPR 2009, pp. 2790–2797 (2009)

12.

Wright, J., Ma, Y., Mairal, J., et al.: Sparse representation for computer vision and pattern recognition. Proc. IEEE 98(6), 1031–1044 (2010)CrossRef

13.

Zibulevsky, M., Elad, M.: L1–L2 optimization in signal and image processing. IEEE Signal Proc. Mag. 27(3), 76–88 (2010)CrossRef

14.

Figueiredo, M.A.T., Nowak, R.D., Wright, S.J.: Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. STSP 1(4), 586–597 (2007)

15.

Kim, S.J., Koh, K., Lustig, M., Boyd, S., Gorinevsky, D.: An interior-point method for large-scale 1-regularized least squares. IEEE J. STSP 1(4), 606–617 (2007)

16.

Donoho, D.L., Tsaig. Y.: Fast solution of L1-norm minimization problems when the solution may be sparse. Stanford University, Technical Report (2006)

17.

Nesterov, Y.: Gradient methods for minimizing composite objective function. Gen. Inf. 38(3), 768–785 (2007)MathSciNet

18.

Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)MathSciNetCrossRefMATH

19.

Bertsekas, D.: Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific, Belmont (1982)MATH

20.

Yang, A.Y., Zhou, Z., Balasubramanian, A.G., Sastry, S.S., Ma, Y.: Fast \(l1\)-minimization algorithms for robust face recognition. IEEE Trans. Image Process. 22(8), 3234–3246 (2013)CrossRef

21.

Stephen, B., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)MathSciNet

22.

Yang, A.Y., Sastry, S.S., Ganesh, A., Ma, Y.: Fast \(l1\)-minimization algorithms and an application in robust face recognition: a review. In: 17th IEEE International Conference on Image Processing (ICIP), pp.1849–1852 (2010)

23.

NVIDIA: CUDA C Programming Guide 6.5. http://docs.nvidia.com/cuda/cuda-c-programming-guide/ (2014)

24.

NVIDIA: CUBLAS Library 6.5. http://docs.nvidia.com/cuda/cublas-library/ (2014)

25.

Nagesh, P., Gowda, R., Li, B.: Fast GPU implementation of large scale dictionary and sparse representation based vision problems. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp.1570–1573 (2010)

26.

Shia, V., Yang, A.Y., Sastry, S.S.: Fast \(l1\)-minimization and parallelization for face recognition. In: 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), pp.1199–1203 (2011)

27.

Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Series B 58, 267–288 (1996)MathSciNetMATH

28.

Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (2003)MATH

29.

Gao, J., Liang, R., Wang, J.: Research on the conjugate gradient algorithm with a modified incomplete Cholesky preconditioner on GPU. J. Parallel Distrib. Comput. 74(2), 2088–2098 (2014)CrossRef

30.

Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC09). Portland, Oregon, USA: ACM, November, pp.14–19 (2009)

Titel: Adaptive Optimization -Minimization Solvers on GPU
verfasst von: Jiaquan Gao
Zejie Li
Ronghua Liang
Guixia He
Publikationsdatum: 21.04.2016
Verlag: Springer US
Erschienen in: International Journal of Parallel Programming / Ausgabe 3/2017
Print ISSN: 0885-7458
Elektronische ISSN: 1573-7640
DOI: https://doi.org/10.1007/s10766-016-0430-9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 3/2017

Restart Optimization for Transactional Memory with Lazy Conflict Detection

RT-CUDA: A Software Tool for CUDA Code Restructuring

Scalable Loop Self-Scheduling Schemes for Large-Scale Clusters and Cloud Systems

A Wait-Free Hash Map

Priority Based Yield of Shared Cache to Provide Cache QoS in Multicore Systems

Adaptive Runtime-Assisted Block Prefetching on Chip-Multiprocessors