Skip to main content
Log in

A high performance parallel DCT with OpenCL on heterogeneous computing environment

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

A noteworthy thing in desktop PCs is that they can provide a great opportunity to increase the performance of processing multimedia data by exploiting task- and data-parallelism with multi-core CPU and many-core GPU. This paper presents a high performance parallel implementation of 2D DCT on this heterogeneous computing environment. For this purpose, Intel TBB (threading building blocks) and OpenCL (Open Compute Language) are utilized for task- and data-parallelism, respectively. The simulation result shows that the parallel DCT implementations far the serial ones in processing speed. Especially, OpenCL implementation shows a linear speedup, a typical SIMD characteristic as the increase of 2D data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Akhter S, Roberts J (2006) Multi-core programming: increasing performance through software multi-threading. Intel Press

  2. Antão S, Sousa L (2010) Exploiting SIMD extensions for linear image processing with OpenCL, 2010 IEEE International Conference on Computer Design (ICCD), pp. 425–430

  3. Chong RM, Tanaka T (2010) Motion blur identification using maxima locations for blind colour image restoration. JoC 1(1):49–56

    Google Scholar 

  4. Chu SL, Hsiao CC (2010) OpenCL: make ubiquitous supercomputing possible. 12th IEEE Int’l Conference on High Performance Computing and Communications (HPCC), pp. 556–561

  5. Contreras G, Martonosi M (2008) Characterizing and improving the performance of Intel threading building blocks. In Proceedings. IEEE Int’l Symposium on Workload Characterization), pp. 1–10

  6. Fagerlund A (2010) Multi-core programming with OpenCL: performance and portability- OpenCL in a memory bound scenario, Master thesis, Norwegian University of Science and Technology, Available at http://daim.idi.ntnu.no/

  7. Gong C, Liu J, Chen H, Xie J, Gong Z (2011) Accelerating the Sweep3D for a graphic processor unit. J Inform Process Syst 7(1):63–74

    Article  Google Scholar 

  8. Hawick KA, Leist A, Playne DP (2009) Mixing multi-core CPUs and GPUs for scientific simulation software. Computer Science, Massey University, Tech. Rep. CSTN-102

  9. http://developer.intel.com/design/xeon/applnots/241618.htm

  10. Kim CG, Lee SJ, Kim SD (2005) 2-D discrete cosine transform (DCT) on meshes with hierarchical control modes. Lect Notes Comput Sci 3522:675–682

    Article  Google Scholar 

  11. Kirk DB, Hwu WW (2010) Programming massively parallel processors: a hands-on approach, Morgan Kaufmann

  12. Klyuev V, Oleshchuk V (2011) Semantic retrieval: an approach to representing, searching and summarising text documents. IJITCC 1(2):221–234

    Article  Google Scholar 

  13. Li Y, Xiao L, Chen S, Tian H, Ruan L, Yu B (2011) Parallel point-multiplication based on the extended basic operations on conic curves over Ring Zn. JoC 2(1):69–78

    Google Scholar 

  14. Nie DH, Han KP, Lee HS (2009) GPU-based stereo matching algorithm with the strategy of population-based incremental learning. J Inform Process Syst 5(2):105–116

    Article  Google Scholar 

  15. Owens JD (2005) Streaming architectures and technology trends. In: M. Pharr (ed) GPU Gems 2. Addison-Wesley, pp. 457–470.

  16. Reinders J (2007) Intel threading building block. O’Reilly, Sebastopol

    Google Scholar 

  17. Robison A, Voss M, Kukanov A (2008) Optimization via reflection on work stealing in TBB. IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8

  18. Sathappan OL, Chitra P, Venkatesh P, Prabhu M. Modified genetic algorithm for multiobjective task scheduling on heterogeneous computing system. IJITCC 1(2), 146–158

  19. Stallings W (2009) Computer organization and architecture 8/E: designing for performance. Prentice Hall

  20. Stone JE, Gohara D, Guochun S (2010) OpenCL: a parallel programming standard for heterogeneous computing systems. Comput Sci Eng 12(3):66–73

    Article  Google Scholar 

  21. Tullsen DM, Eggers SJ, Levy HM (1995) Simultaneous multithreading: maximizing on-chip parallelism. In Proceedings. 22nd Annual Int’l Symposium on Computer Architecture, ISCA-22, pp. 392–403

  22. Zhu W, Curry J (2009) Parallel ant colony for nonlinear function optimization with graphics hardware acceleration. IEEE Int’l Conference on Systems, Man and Cybernetics, pp. 1803–1808

Download references

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (KRF 2011-0027264).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Soo Choi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, C.G., Choi, Y.S. A high performance parallel DCT with OpenCL on heterogeneous computing environment. Multimed Tools Appl 64, 475–489 (2013). https://doi.org/10.1007/s11042-012-1028-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-012-1028-x

Keywords

Navigation