A high performance parallel DCT with OpenCL on heterogeneous computing environment

Kim, Cheong Ghil; Choi, Yong Soo

doi:10.1007/s11042-012-1028-x

A high performance parallel DCT with OpenCL on heterogeneous computing environment

Published: 28 February 2012

Volume 64, pages 475–489, (2013)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Cheong Ghil Kim¹ &
Yong Soo Choi²

634 Accesses
10 Citations
3 Altmetric
Explore all metrics

Abstract

A noteworthy thing in desktop PCs is that they can provide a great opportunity to increase the performance of processing multimedia data by exploiting task- and data-parallelism with multi-core CPU and many-core GPU. This paper presents a high performance parallel implementation of 2D DCT on this heterogeneous computing environment. For this purpose, Intel TBB (threading building blocks) and OpenCL (Open Compute Language) are utilized for task- and data-parallelism, respectively. The simulation result shows that the parallel DCT implementations far the serial ones in processing speed. Especially, OpenCL implementation shows a linear speedup, a typical SIMD characteristic as the increase of 2D data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Containerization technologies: taxonomies, applications and challenges

Article 08 June 2021

GPU Architecture

MapReduce scheduling algorithms in Hadoop: a systematic study

Article Open access 10 October 2023

References

Akhter S, Roberts J (2006) Multi-core programming: increasing performance through software multi-threading. Intel Press
Antão S, Sousa L (2010) Exploiting SIMD extensions for linear image processing with OpenCL, 2010 IEEE International Conference on Computer Design (ICCD), pp. 425–430
Chong RM, Tanaka T (2010) Motion blur identification using maxima locations for blind colour image restoration. JoC 1(1):49–56
Google Scholar
Chu SL, Hsiao CC (2010) OpenCL: make ubiquitous supercomputing possible. 12th IEEE Int’l Conference on High Performance Computing and Communications (HPCC), pp. 556–561
Contreras G, Martonosi M (2008) Characterizing and improving the performance of Intel threading building blocks. In Proceedings. IEEE Int’l Symposium on Workload Characterization), pp. 1–10
Fagerlund A (2010) Multi-core programming with OpenCL: performance and portability- OpenCL in a memory bound scenario, Master thesis, Norwegian University of Science and Technology, Available at http://daim.idi.ntnu.no/
Gong C, Liu J, Chen H, Xie J, Gong Z (2011) Accelerating the Sweep3D for a graphic processor unit. J Inform Process Syst 7(1):63–74
Article Google Scholar
Hawick KA, Leist A, Playne DP (2009) Mixing multi-core CPUs and GPUs for scientific simulation software. Computer Science, Massey University, Tech. Rep. CSTN-102
http://developer.intel.com/design/xeon/applnots/241618.htm
Kim CG, Lee SJ, Kim SD (2005) 2-D discrete cosine transform (DCT) on meshes with hierarchical control modes. Lect Notes Comput Sci 3522:675–682
Article Google Scholar
Kirk DB, Hwu WW (2010) Programming massively parallel processors: a hands-on approach, Morgan Kaufmann
Klyuev V, Oleshchuk V (2011) Semantic retrieval: an approach to representing, searching and summarising text documents. IJITCC 1(2):221–234
Article Google Scholar
Li Y, Xiao L, Chen S, Tian H, Ruan L, Yu B (2011) Parallel point-multiplication based on the extended basic operations on conic curves over Ring Zn. JoC 2(1):69–78
Google Scholar
Nie DH, Han KP, Lee HS (2009) GPU-based stereo matching algorithm with the strategy of population-based incremental learning. J Inform Process Syst 5(2):105–116
Article Google Scholar
Owens JD (2005) Streaming architectures and technology trends. In: M. Pharr (ed) GPU Gems 2. Addison-Wesley, pp. 457–470.
Reinders J (2007) Intel threading building block. O’Reilly, Sebastopol
Google Scholar
Robison A, Voss M, Kukanov A (2008) Optimization via reflection on work stealing in TBB. IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8
Sathappan OL, Chitra P, Venkatesh P, Prabhu M. Modified genetic algorithm for multiobjective task scheduling on heterogeneous computing system. IJITCC 1(2), 146–158
Stallings W (2009) Computer organization and architecture 8/E: designing for performance. Prentice Hall
Stone JE, Gohara D, Guochun S (2010) OpenCL: a parallel programming standard for heterogeneous computing systems. Comput Sci Eng 12(3):66–73
Article Google Scholar
Tullsen DM, Eggers SJ, Levy HM (1995) Simultaneous multithreading: maximizing on-chip parallelism. In Proceedings. 22nd Annual Int’l Symposium on Computer Architecture, ISCA-22, pp. 392–403
Zhu W, Curry J (2009) Parallel ant colony for nonlinear function optimization with graphics hardware acceleration. IEEE Int’l Conference on Systems, Man and Cybernetics, pp. 1803–1808

Download references

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (KRF 2011-0027264).

Author information

Authors and Affiliations

Department of Computer Science, Namseoul University, 91 Seongwhaneub, Seobukgu, Cheonan, Chungnam, South Korea
Cheong Ghil Kim
Graduate School of Information Security, Korea University, 15 Anamro Seongbukgu, Seoul, South Korea
Yong Soo Choi

Authors

Cheong Ghil Kim
View author publications
You can also search for this author in PubMed Google Scholar
Yong Soo Choi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Soo Choi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, C.G., Choi, Y.S. A high performance parallel DCT with OpenCL on heterogeneous computing environment. Multimed Tools Appl 64, 475–489 (2013). https://doi.org/10.1007/s11042-012-1028-x

Download citation

Published: 28 February 2012
Issue Date: May 2013
DOI: https://doi.org/10.1007/s11042-012-1028-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A high performance parallel DCT with OpenCL on heterogeneous computing environment

Abstract

Access this article

Similar content being viewed by others

Containerization technologies: taxonomies, applications and challenges

GPU Architecture

MapReduce scheduling algorithms in Hadoop: a systematic study

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A high performance parallel DCT with OpenCL on heterogeneous computing environment

Abstract

Access this article

Similar content being viewed by others

Containerization technologies: taxonomies, applications and challenges

GPU Architecture

MapReduce scheduling algorithms in Hadoop: a systematic study

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation