Abstract
A noteworthy thing in desktop PCs is that they can provide a great opportunity to increase the performance of processing multimedia data by exploiting task- and data-parallelism with multi-core CPU and many-core GPU. This paper presents a high performance parallel implementation of 2D DCT on this heterogeneous computing environment. For this purpose, Intel TBB (threading building blocks) and OpenCL (Open Compute Language) are utilized for task- and data-parallelism, respectively. The simulation result shows that the parallel DCT implementations far the serial ones in processing speed. Especially, OpenCL implementation shows a linear speedup, a typical SIMD characteristic as the increase of 2D data sets.
Similar content being viewed by others
References
Akhter S, Roberts J (2006) Multi-core programming: increasing performance through software multi-threading. Intel Press
Antão S, Sousa L (2010) Exploiting SIMD extensions for linear image processing with OpenCL, 2010 IEEE International Conference on Computer Design (ICCD), pp. 425–430
Chong RM, Tanaka T (2010) Motion blur identification using maxima locations for blind colour image restoration. JoC 1(1):49–56
Chu SL, Hsiao CC (2010) OpenCL: make ubiquitous supercomputing possible. 12th IEEE Int’l Conference on High Performance Computing and Communications (HPCC), pp. 556–561
Contreras G, Martonosi M (2008) Characterizing and improving the performance of Intel threading building blocks. In Proceedings. IEEE Int’l Symposium on Workload Characterization), pp. 1–10
Fagerlund A (2010) Multi-core programming with OpenCL: performance and portability- OpenCL in a memory bound scenario, Master thesis, Norwegian University of Science and Technology, Available at http://daim.idi.ntnu.no/
Gong C, Liu J, Chen H, Xie J, Gong Z (2011) Accelerating the Sweep3D for a graphic processor unit. J Inform Process Syst 7(1):63–74
Hawick KA, Leist A, Playne DP (2009) Mixing multi-core CPUs and GPUs for scientific simulation software. Computer Science, Massey University, Tech. Rep. CSTN-102
Kim CG, Lee SJ, Kim SD (2005) 2-D discrete cosine transform (DCT) on meshes with hierarchical control modes. Lect Notes Comput Sci 3522:675–682
Kirk DB, Hwu WW (2010) Programming massively parallel processors: a hands-on approach, Morgan Kaufmann
Klyuev V, Oleshchuk V (2011) Semantic retrieval: an approach to representing, searching and summarising text documents. IJITCC 1(2):221–234
Li Y, Xiao L, Chen S, Tian H, Ruan L, Yu B (2011) Parallel point-multiplication based on the extended basic operations on conic curves over Ring Zn. JoC 2(1):69–78
Nie DH, Han KP, Lee HS (2009) GPU-based stereo matching algorithm with the strategy of population-based incremental learning. J Inform Process Syst 5(2):105–116
Owens JD (2005) Streaming architectures and technology trends. In: M. Pharr (ed) GPU Gems 2. Addison-Wesley, pp. 457–470.
Reinders J (2007) Intel threading building block. O’Reilly, Sebastopol
Robison A, Voss M, Kukanov A (2008) Optimization via reflection on work stealing in TBB. IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8
Sathappan OL, Chitra P, Venkatesh P, Prabhu M. Modified genetic algorithm for multiobjective task scheduling on heterogeneous computing system. IJITCC 1(2), 146–158
Stallings W (2009) Computer organization and architecture 8/E: designing for performance. Prentice Hall
Stone JE, Gohara D, Guochun S (2010) OpenCL: a parallel programming standard for heterogeneous computing systems. Comput Sci Eng 12(3):66–73
Tullsen DM, Eggers SJ, Levy HM (1995) Simultaneous multithreading: maximizing on-chip parallelism. In Proceedings. 22nd Annual Int’l Symposium on Computer Architecture, ISCA-22, pp. 392–403
Zhu W, Curry J (2009) Parallel ant colony for nonlinear function optimization with graphics hardware acceleration. IEEE Int’l Conference on Systems, Man and Cybernetics, pp. 1803–1808
Acknowledgements
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (KRF 2011-0027264).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kim, C.G., Choi, Y.S. A high performance parallel DCT with OpenCL on heterogeneous computing environment. Multimed Tools Appl 64, 475–489 (2013). https://doi.org/10.1007/s11042-012-1028-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-012-1028-x