Skip to main content
Top
Published in: The Journal of Supercomputing 9/2015

01-09-2015

Fast filter bank convolution for three-dimensional wavelet transform by shared memory on mobile GPU computing

Author: Di Zhao

Published in: The Journal of Supercomputing | Issue 9/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Mobile GPU applications usually constrain by the real-time requirement. However, FLOPS of mobile GPU is limited by the size and power supply of the SoC systems. Same to desktop GPUs, the mobile GPU consists of an on-chip memory hierarchy, and proper usage of memory hierarchy accelerates mobile GPU applications such as Discrete Wavelet Transform (DWT) to satisfy the real-time requirement. In this paper, by taking advantage of GPU shared memory in Tegra K1, a mobile GPU from Nvidia, we develop Bank Conflict Free Shared Memory Parallel DWT for mobile GPU applications. Computational results show that, with the display resolution of \(640 \times 350\) (EGA), Bank Conflict Free Shared Memory Parallel DWT is significantly faster than SoC CPU-based DWT. Computational results also show that, with the display resolution of \(320\times 200\) (CGA), \(640\times 480\) (VGA), \(800\times 600\) (SVGA) and \(1024\times 768\) (XGA), Bank Conflict Free Shared Memory Parallel DWT can generally satisfy the real-time requirement.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bordawekar R, Bondhugula U, Rao R (2010) Believe it or not: mult-core CPUs can match GPU performance for a FLOP-intensive application! In: Proceedings of the 19th international conference on Parallel architectures and compilation techniques, 2010. ACM, Vienna, Austria, pp. 537–538 Bordawekar R, Bondhugula U, Rao R (2010) Believe it or not: mult-core CPUs can match GPU performance for a FLOP-intensive application! In: Proceedings of the 19th international conference on Parallel architectures and compilation techniques, 2010. ACM, Vienna, Austria, pp. 537–538
2.
go back to reference Huang Q et al (2008) GPU as a general purpose computing resource. In: Ninth international conference on parallel and distributed computing, applications and technologies, 2008. PDCAT 2008 Huang Q et al (2008) GPU as a general purpose computing resource. In: Ninth international conference on parallel and distributed computing, applications and technologies, 2008. PDCAT 2008
3.
go back to reference Suda R et al (2009) Aspects of GPU for general purpose high performance computing. In: Proceedings of the 2009 Asia and South Pacific Design Automation Conference. 2009. IEEE Press, Yokohama, Japan, pp 216–223 Suda R et al (2009) Aspects of GPU for general purpose high performance computing. In: Proceedings of the 2009 Asia and South Pacific Design Automation Conference. 2009. IEEE Press, Yokohama, Japan, pp 216–223
4.
go back to reference Collange S, Defour D, Tisserand A (2009) Power consumption of GPUs from a software perspective. In: Allen G et al (eds) Computational science—ICCS 2009. Springer, Berlin, pp 914–923 Collange S, Defour D, Tisserand A (2009) Power consumption of GPUs from a software perspective. In: Allen G et al (eds) Computational science—ICCS 2009. Springer, Berlin, pp 914–923
5.
go back to reference Sanders J, Kandrot E (2010) CUDA by example: an introduction to general-purpose GPU programming. Pearson education, Boston Sanders J, Kandrot E (2010) CUDA by example: an introduction to general-purpose GPU programming. Pearson education, Boston
6.
go back to reference Gou C, Gaydadjiev GN (2013) Addressing GPU on-chip shared memory bank conflicts using elastic pipeline. Int J Parallel Program 41(3):400–429CrossRef Gou C, Gaydadjiev GN (2013) Addressing GPU on-chip shared memory bank conflicts using elastic pipeline. Int J Parallel Program 41(3):400–429CrossRef
7.
go back to reference Yuen DA et al (2013) GPU solutions to multi-scale problems in science and engineering. Springer, BerlinCrossRef Yuen DA et al (2013) GPU solutions to multi-scale problems in science and engineering. Springer, BerlinCrossRef
8.
go back to reference Lobeiras J, Amor M, Doallo R (2011) Performance evaluation of GPU memory hierarchy using the FFT. In: The 11th international conference on computational and mathematical methods in science and engineering, CMMSE 2011 Lobeiras J, Amor M, Doallo R (2011) Performance evaluation of GPU memory hierarchy using the FFT. In: The 11th international conference on computational and mathematical methods in science and engineering, CMMSE 2011
9.
go back to reference Hong S, Kim H (2009) An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. SIGARCH Comput Arch News 37(3):152–163MathSciNetCrossRef Hong S, Kim H (2009) An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. SIGARCH Comput Arch News 37(3):152–163MathSciNetCrossRef
10.
go back to reference Ryoo S et al (2008) Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: Proceedings of the 13th ACM SIGPLAN symposium on principles and practice of parallel programming. ACM, Salt Lake City, UT, USA, pp 73–82 Ryoo S et al (2008) Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: Proceedings of the 13th ACM SIGPLAN symposium on principles and practice of parallel programming. ACM, Salt Lake City, UT, USA, pp 73–82
11.
go back to reference Luebke D (2008) CUDA: scalable parallel programming for high-performance scientific computing. In: 5th IEEE international symposium on biomedical imaging: from nano to macro, 2008. ISBI 2008 Luebke D (2008) CUDA: scalable parallel programming for high-performance scientific computing. In: 5th IEEE international symposium on biomedical imaging: from nano to macro, 2008. ISBI 2008
12.
go back to reference Ryoo S et al (2008) Program optimization space pruning for a multithreaded gpu. In: Proceedings of the 6th annual IEEE/ACM international symposium on code generation and optimization, 2008. ACM, Boston, MA, USA, pp 195–204 Ryoo S et al (2008) Program optimization space pruning for a multithreaded gpu. In: Proceedings of the 6th annual IEEE/ACM international symposium on code generation and optimization, 2008. ACM, Boston, MA, USA, pp 195–204
13.
go back to reference Baghsorkhi SS et al (2010) An adaptive performance modeling tool for GPU architectures. SIGPLAN Not 45(5):105–114CrossRef Baghsorkhi SS et al (2010) An adaptive performance modeling tool for GPU architectures. SIGPLAN Not 45(5):105–114CrossRef
14.
go back to reference Zhao D, Yu J (2015) Efficiently solving tri-diagonal system by chunked cyclic reduction and single-GPU shared memory. J Supercomput 71(2):369–390 Zhao D, Yu J (2015) Efficiently solving tri-diagonal system by chunked cyclic reduction and single-GPU shared memory. J Supercomput 71(2):369–390
15.
go back to reference Shi L et al (2012) vCUDA: GPU-accelerated high-performance computing in virtual machines. IEEE Trans Comput 61(6):804–816MathSciNetCrossRef Shi L et al (2012) vCUDA: GPU-accelerated high-performance computing in virtual machines. IEEE Trans Comput 61(6):804–816MathSciNetCrossRef
16.
go back to reference Gou C, Gaydadjiev GN (2011) Elastic pipeline: addressing GPU on-chip shared memory bank conflicts. In: Proceedings of the 8th ACM international conference on computing frontiers, 2011. ACM, Ischia, Italy, pp 1–11 Gou C, Gaydadjiev GN (2011) Elastic pipeline: addressing GPU on-chip shared memory bank conflicts. In: Proceedings of the 8th ACM international conference on computing frontiers, 2011. ACM, Ischia, Italy, pp 1–11
17.
go back to reference Yang Y et al (2010) A GPGPU compiler for memory optimization and parallelism management. SIGPLAN Not 45(6):86–97CrossRef Yang Y et al (2010) A GPGPU compiler for memory optimization and parallelism management. SIGPLAN Not 45(6):86–97CrossRef
18.
go back to reference Che S et al (2008) A performance study of general-purpose applications on graphics processors using CUDA. J Parallel Distrib Comput 68(10):1370–1380CrossRef Che S et al (2008) A performance study of general-purpose applications on graphics processors using CUDA. J Parallel Distrib Comput 68(10):1370–1380CrossRef
19.
go back to reference Han TD, Abdelrahman TS (2009) hiCUDA: a high-level directive-based language for GPU programming. In: Proceedings of 2nd workshop on general purpose processing on graphics processing units, 2009. ACM, Washington, DC, pp 52–61 Han TD, Abdelrahman TS (2009) hiCUDA: a high-level directive-based language for GPU programming. In: Proceedings of 2nd workshop on general purpose processing on graphics processing units, 2009. ACM, Washington, DC, pp 52–61
20.
go back to reference Mei C, Jiang H, Jenness J (2010) CUDA-based AES parallelization with fine-tuned GPU memory utilization. In: IEEE international symposium on parallel and distributed processing, workshops and Phd forum (IPDPSW), 2010 Mei C, Jiang H, Jenness J (2010) CUDA-based AES parallelization with fine-tuned GPU memory utilization. In: IEEE international symposium on parallel and distributed processing, workshops and Phd forum (IPDPSW), 2010
21.
go back to reference Govindaraju NK et al (2006) A memory model for scientific algorithms on graphics processors. In: SC 2006 Conference, Proceedings of the ACM/IEEE Govindaraju NK et al (2006) A memory model for scientific algorithms on graphics processors. In: SC 2006 Conference, Proceedings of the ACM/IEEE
22.
go back to reference Gupta V et al (2009) GViM: GPU-accelerated virtual machines. In: Proceedings of the 3rd ACM workshop on system-level virtualization for high performance computing, 2009. ACM, Nuremburg, Germany, pp 17–24 Gupta V et al (2009) GViM: GPU-accelerated virtual machines. In: Proceedings of the 3rd ACM workshop on system-level virtualization for high performance computing, 2009. ACM, Nuremburg, Germany, pp 17–24
23.
go back to reference Chen D, Chen W, Zheng W (2012) CUDA-Zero: a framework for porting shared memory GPU applications to multi-GPUs. Sci China Inf Sci 55(3):663–676CrossRef Chen D, Chen W, Zheng W (2012) CUDA-Zero: a framework for porting shared memory GPU applications to multi-GPUs. Sci China Inf Sci 55(3):663–676CrossRef
24.
go back to reference Karantasis KI, Polychronopoulos ED, Ekaterinaris JA (2014) High order accurate simulation of compressible flows on GPU clusters over software distributed shared memory. Comput Fluids 93:18–29MathSciNetCrossRef Karantasis KI, Polychronopoulos ED, Ekaterinaris JA (2014) High order accurate simulation of compressible flows on GPU clusters over software distributed shared memory. Comput Fluids 93:18–29MathSciNetCrossRef
25.
go back to reference Ji F, Ma X (2011) Using shared memory to accelerate MapReduce on graphics processing units. In: 2011 IEEE international parallel and distributed processing symposium (IPDPS), IEEE Ji F, Ma X (2011) Using shared memory to accelerate MapReduce on graphics processing units. In: 2011 IEEE international parallel and distributed processing symposium (IPDPS), IEEE
26.
go back to reference Che S, Sheaffer JW, Skadron K (2011) Dymaxion: optimizing memory access patterns for heterogeneous systems. In: Proceedings of 2011 international conference for high performance computing, networking, storage and analysis, 2011. ACM, Seattle, Washington, pp 1–11 Che S, Sheaffer JW, Skadron K (2011) Dymaxion: optimizing memory access patterns for heterogeneous systems. In: Proceedings of 2011 international conference for high performance computing, networking, storage and analysis, 2011. ACM, Seattle, Washington, pp 1–11
27.
go back to reference Lee W-J et al (2012) SGRT: a scalable mobile GPU architecture based on ray tracing. In: ACM SIGGRAPH 2012 posters, 2012. ACM, Los Angeles, California Lee W-J et al (2012) SGRT: a scalable mobile GPU architecture based on ray tracing. In: ACM SIGGRAPH 2012 posters, 2012. ACM, Los Angeles, California
28.
go back to reference Lee W-J et al (2013) SGRT: a mobile GPU architecture for real-time ray tracing. In: Proceedings of the 5th high-performance graphics conference, 2013. ACM, Anaheim, California, pp 109–119 Lee W-J et al (2013) SGRT: a mobile GPU architecture for real-time ray tracing. In: Proceedings of the 5th high-performance graphics conference, 2013. ACM, Anaheim, California, pp 109–119
29.
go back to reference Nah J-H et al (2010) MobiRT: an implementation of OpenGL ES-based CPU–GPU hybrid ray tracer for mobile devices. In: ACM SIGGRAPH ASIA 2010 sketches, 2010. ACM, Seoul, Republic of Korea, pp 1–2 Nah J-H et al (2010) MobiRT: an implementation of OpenGL ES-based CPU–GPU hybrid ray tracer for mobile devices. In: ACM SIGGRAPH ASIA 2010 sketches, 2010. ACM, Seoul, Republic of Korea, pp 1–2
30.
go back to reference Singhal N et al (2011) Design and optimization of image processing algorithms on mobile GPU. In: ACM SIGGRAPH 2011 posters, 2011. ACM, Vancouver, British Columbia, Canada, pp 1–1 Singhal N et al (2011) Design and optimization of image processing algorithms on mobile GPU. In: ACM SIGGRAPH 2011 posters, 2011. ACM, Vancouver, British Columbia, Canada, pp 1–1
31.
go back to reference Abramov A et al (2012) Real-time segmentation of stereo videos on a portable system with a mobile GPU. IEEE Trans Circuits Syst Video Technol 22(9):1292–1305CrossRef Abramov A et al (2012) Real-time segmentation of stereo videos on a portable system with a mobile GPU. IEEE Trans Circuits Syst Video Technol 22(9):1292–1305CrossRef
32.
go back to reference Singhal N, Yoo JW, Choi HY, Park IK (2010) Implementation and optimization of image processing algorithms on handheld GPU. In: 2010 17th IEEE international conference on image processing (ICIP) Singhal N, Yoo JW, Choi HY, Park IK (2010) Implementation and optimization of image processing algorithms on handheld GPU. In: 2010 17th IEEE international conference on image processing (ICIP)
33.
go back to reference Bachoo A (2010) Using the CPU and GPU for real-time video enhancement on a mobile computer. In: 2010 IEEE 10th international conference on signal processing (ICSP) Bachoo A (2010) Using the CPU and GPU for real-time video enhancement on a mobile computer. In: 2010 IEEE 10th international conference on signal processing (ICSP)
34.
go back to reference López MB et al (2014) Interactive multi-frame reconstruction for mobile devices. Multimed Tools Appl 69(1):31–51CrossRef López MB et al (2014) Interactive multi-frame reconstruction for mobile devices. Multimed Tools Appl 69(1):31–51CrossRef
35.
go back to reference Rister B, Wang G, Wu M, Cavallaro JR (2013) A fast and efficient sift detector using the mobile GPU. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP) Rister B, Wang G, Wu M, Cavallaro JR (2013) A fast and efficient sift detector using the mobile GPU. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP)
36.
go back to reference Cheng K-T, Wang Y-C (2011) Using mobile GPU for general-purpose computing—a case study of face recognition on smartphones. In: 2011 international symposium on VLSI design, automation and test (VLSI-DAT) Cheng K-T, Wang Y-C (2011) Using mobile GPU for general-purpose computing—a case study of face recognition on smartphones. In: 2011 international symposium on VLSI design, automation and test (VLSI-DAT)
37.
go back to reference Wang G et al (2013) Accelerating computer vision algorithms using OpenCL framework on the mobile GPU—a case study. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP) Wang G et al (2013) Accelerating computer vision algorithms using OpenCL framework on the mobile GPU—a case study. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP)
38.
go back to reference Wang Y-C, Donyanavard B, Cheng K-T (2012) Energy-aware real-time face recognition system on mobile CPU-GPU platform. In: Kutulakos KN (ed) Trends and topics in computer vision. Springer, Berlin, pp 411–422 Wang Y-C, Donyanavard B, Cheng K-T (2012) Energy-aware real-time face recognition system on mobile CPU-GPU platform. In: Kutulakos KN (ed) Trends and topics in computer vision. Springer, Berlin, pp 411–422
39.
go back to reference Wang Y-C, Cheng K-T (2011) Energy-optimized mapping of application to smartphone platform—a case study of mobile face recognition. In: 2011 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW) Wang Y-C, Cheng K-T (2011) Energy-optimized mapping of application to smartphone platform—a case study of mobile face recognition. In: 2011 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW)
40.
go back to reference Wang Y-C, Pang S, Cheng K-T (2010) A GPU-accelerated face annotation system for smartphones. In: Proceedings of the international conference on Multimedia, 2010. ACM, Firenze, Italy, pp 1667–1668 Wang Y-C, Pang S, Cheng K-T (2010) A GPU-accelerated face annotation system for smartphones. In: Proceedings of the international conference on Multimedia, 2010. ACM, Firenze, Italy, pp 1667–1668
41.
go back to reference Hartl A et al (2011) Rapid reconstruction of small objects on mobile phones. In: 2011 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW) Hartl A et al (2011) Rapid reconstruction of small objects on mobile phones. In: 2011 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW)
42.
go back to reference Nvidia (2014) NVIDIA Tegra K1 A new era in mobile computing. NVIDIA Corporation, San Jose, California Nvidia (2014) NVIDIA Tegra K1 A new era in mobile computing. NVIDIA Corporation, San Jose, California
43.
go back to reference Zhao D et al (2014) Acceleration of l1-regularization MRI reconstruction by lookup table and GPU shared memory based DWT. In: GPU technology conference, 2014, San Jose California Zhao D et al (2014) Acceleration of l1-regularization MRI reconstruction by lookup table and GPU shared memory based DWT. In: GPU technology conference, 2014, San Jose California
Metadata
Title
Fast filter bank convolution for three-dimensional wavelet transform by shared memory on mobile GPU computing
Author
Di Zhao
Publication date
01-09-2015
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 9/2015
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-015-1443-7

Other articles of this Issue 9/2015

The Journal of Supercomputing 9/2015 Go to the issue

Premium Partner