High Throughput Parallel-Pipeline 2-D DCT/IDCT Processor Chip

Ruiz, G. A.; Michell, J. A.; Burón, A.

doi:10.1007/s11265-006-9764-7

G. A. Ruiz¹,
J. A. Michell¹ &
A. Burón¹

145 Accesses
5 Citations
Explore all metrics

Abstract

This paper presents a 2-D DCT/IDCT processor chip for high data rate image processing and video coding. It uses a fully pipelined row–column decomposition method based on two 1-D DCT processors and a transpose buffer based on D-type flip-flops with a double serial input/output data-flow. The proposed architecture allows the main processing elements and arithmetic units to operate in parallel at half the frequency of the data input rate. The main characteristics are: high throughput, parallel processing, reduced internal storage, and maximum efficiency in computational elements. The processor has been implemented using standard cell design methodology in 0.35 μm CMOS technology. It measures 6.25 mm² (the core is 3 mm²) and contains a total of 11.7 k gates. The maximum frequency is 300 MHz with a latency of 172 cycles for 2-D DCT and 178 cycles for 2-D IDCT. The computing time of a block is close to 580 ns. It has been designed to meets the demands of IEEE Std. 1,180–1,990 used in different video codecs. The good performance in the computing speed and hardware cost indicate that this processor is suitable for HDTV applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A low-area high-efficiency video coding inverse transform core using resource and time sharing architecture

Article Open access 30 November 2020

An efficient parallel-pipelined intra prediction architecture to support DCT/DST engine of HEVC encoder

Article 21 February 2022

An Area Efficient and Reusable HEVC 1D-DCT Hardware Accelerator

References

K. R. Rao and P. Yip, “Discrete Cosine Transform: Algorithms, Advantages and Applications,” Boston/San Diego/New York/London/Sydney/Tokyo/Toronto: Academic, 1990.
MATH Google Scholar
V. Bhaskaran and K. Konstantinides, “Image and Video Compression Standards: Algorithms and Architectures,” Boston/Dordrecht/London: Kluwer, 2nd Edition, 1997.
Google Scholar
K. R. Rao and J. J. Hwang, “Techniques and Standards for Image Video and Audio Coding,” New Jersey: Prentice Hall PTR, 1996.
Google Scholar
T. Kuroda, T. Fujita, S. Mita, T. Nagamatsu, S. Yoshioka, K. Suzuki, F. Sano, M. Norishima, M. Murota, M. Kako, M. Kinugawa, M. Kakumu, and T. Sakurai, “A 0.9-V, 150 MHz, 10-mW, 4 mm2, 2-D Discrete Cosine Transform Core Processor with Variable Threshold-voltage (VT) Scheme,” IEEE J. Solid-state Circuits, vol. 31, no. 11, 1996, pp. 1770–1779.
Article Google Scholar
T. H. Chen, “A Cost-effective 8 × 8 2-D IDCT Core Processor with Folded Architecture,” IEEE Trans. Consum. Electron., vol. 45, no. 2, 1999, pp. 333–339.
Article Google Scholar
I. K. Kim, J. J. Cha, and H. J. Cho, “A Design of 2-D DCT/IDCT for Real-time Video Applications,” 6th Int. Conf. on VLSI and CAD, 1999, pp. 557–559.
T. S. Chang, C. S. Kung, and C. W. Jen, “A Simple Processor Core Design for DCT/IDCT,” IEEE Trans. Circuits Syst. Video Technol., vol. 10, no. 3, 2000, pp. 439–447, April.
Article Google Scholar
Dae-Won-Kim, Taek-Won-Kwon, Jung-Min-Seo, Jae-Kun-Yu, Suk-Kyu-Lee, and Jung-Hee-Suk, “A Compatible DCT/IDCT Architecture Using Hardwired Distributed Arithmetic,” IEEE Int. Symp. Circuits Syst. Proc., vol. 2, 2001, pp. 457–460.
Google Scholar
J. I. Guo and J. C. Yen, “An Efficient IDCT Processor Design for HDTV Applications,” J. VLSI Signal Process., vol. 33, 2003, pp. 147–155.
Article Google Scholar
Y. P. Lee, T. H. Chen, L. G. Chen, M. J. Chen, and C. W. Ku, “A Cost-effective Architecture for 8 × 8 Two Dimensional DCT/IDCT using Direct Method,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 3, 1997, pp. 459–466.
Article Google Scholar
D. Gong, Y. He, and Z. Cao, “New Cost-efective VLSI Implementation of a 2-D Discrete Cosine Transform and its Inverse,” IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 4, 2004, pp. 405–415.
Article Google Scholar
S. I. Uramoto, Y. Inoue, A. Takabatate, J. Takeda, Y. Yamashita, H. Terane, and M. Yoshimoto, “A 100 MHz 2-D Discrete Cosine Transform Core Processor,” IEEE J. Solid-state Circuits, vol. 27, no. 4, 1992, pp. 492–499.
Article Google Scholar
A. Madisetti and A. N. Willson, “A 100 MHz 2-D 8 × 8 DCT/IDCT Processor for HDTV Applications,” IEEE Trans. Circuits Syst. Video Technol., vol. 5, no. 2, 1995, pp. 158–165.
Article Google Scholar
R. Rambaldi, A. Uguzzoni, and R. Guerrieri, “A 35 μW 1.1 V Gate Array 8 × 8 IDCT for Video Technology,” Proc. ICASSP, vol. 5, 1998, pp. 2993–2996.
Google Scholar
L. Fanucci and S. Saponara, “Data Driven VLSI Computation for Low Power DCT-based Video Coding,” 9th IEEE Int. Conf. on Electronics-Circuits-and-Systems, vol. 2, 2002, pp. 541–544.
Google Scholar
B. D. Tseng and W. C. Miller, “On the Computing the Discrete Cosine Transform,” IEEE Trans. Comput., vol. C-27, no. 10, 1978, pp. 966–968.
Google Scholar
H. Malvar, “Fast Computation of Discrete Cosine Transform Through Fast Hartley Transform,” Electron. Lett., vol. 22, no. 7, 1986, pp. 352–353.
Google Scholar
S. Yu and E. E. Swartzlander, “A Scaled DCT Architecture with the CORDIC Algorithm,” IEEE Trans. Signal Process., vol. 50, no. 1, 2002, pp. 160–167.
Article Google Scholar
Y. T. Chang and C. L. Wang, “A New Fast DCT Algorithm and its Systolic VLSI Implementation,” IEEE Trans. Circuits Syst., 2 Analog Digit. Signal Process., vol. 44, no. 11, 1997, pp. 959–962.
Article Google Scholar
P. A. Ruetz, P. Tong, D. Bailey, A. D. Luthi, and P. H. Ang, “A High-performance Full-motion Video Compression Chip Set,” IEEE Trans. Circuits Syst. Video Technol., vol. 2, no. 2, 1992, pp. 111–122.
Article Google Scholar
L. Fanucci, R. Saletti, and F. Vavala, “A Low-complexity 2-D Discrete Cosine Transform Processor for Multimedia Applications,” 1999, pp. 449–452.
G. A. Ruiz, J. A. Michell, and A. M. Burón, “Parallel-pipeline 2D DCT/IDCT Processor Architecture,” SPIE Symp. on Microtechnologies for the New Millennium, pp. 774–784, May 2005.
“IEEE Standard Specifications for the Implementations of 8 × 8 Inverse Discrete Cosine Transform,” Institute of Electrical and Electronics Engineers, New York, March 1991.
Video Codec for Audio-Visual Services at px64 kbits/s, ITU-T H.261, 1993.
ITU-T recommendation H.263. Video Coding for Low Bit-Rate Communication, 1996.
ITU-T recommendation H.263+, Jan. 27, 1999, Draft 21.
W. Chen, C. H. Smith, and S. Fralick, “A Fast Computation Algorithm for the Discrete Cosine Transform,” IEEE Trans. Commun., vol. 25, 1977, pp 703–709.
Article Google Scholar
S. Kim and W. Sung, “Optimum Wordlength Determination of 8 × 8 IDCT Architectures Conforming to the IEEE Standard Specifications,” Conference Record of The 29th Asilomar Conference on Signals, Systems and Computers, vol. 2, 1996, pp. 821–825.
Google Scholar
T. Y. Chang and M. J. Hsiao, “Carry-select Adder Using Single Ripple-carry Adder,” Electron. Lett., vol. 34, no. 22, 1998, pp. 2101–2103.
Article Google Scholar
R. P. Brent and H. T. Kung, “A Regular Layout for Parallel Adders,” IEEE Trans. Comput., vol. C-31, no. 3, 1982, pp. 260–264.
MathSciNet Google Scholar
G. S. Taylor and G. M. Blair, “High Design for the Discrete Cosine Transform in VLSI,” IEE Proc. Comput. Digit. Tech., vol. 145, no. 2, 1998, pp. 127–133, March.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Electrónica y Computadores, Facultad de Ciencias, Universidad de Cantabria, Avda. de Los Castros s/n, 39005, Santander, Spain
G. A. Ruiz, J. A. Michell & A. Burón

Authors

G. A. Ruiz
View author publications
You can also search for this author in PubMed Google Scholar
J. A. Michell
View author publications
You can also search for this author in PubMed Google Scholar
A. Burón
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

This work was supported by the Spanish Ministry of Science and Technology (TIC2000-1289).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ruiz, G.A., Michell, J.A. & Burón, A. High Throughput Parallel-Pipeline 2-D DCT/IDCT Processor Chip. J VLSI Sign Process Syst Sign Image Video Technol 45, 161–175 (2006). https://doi.org/10.1007/s11265-006-9764-7

Download citation

Received: 27 June 2005
Revised: 27 June 2005
Accepted: 02 May 2006
Published: 14 December 2006
Issue Date: December 2006
DOI: https://doi.org/10.1007/s11265-006-9764-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High Throughput Parallel-Pipeline 2-D DCT/IDCT Processor Chip

Abstract

Access this article

Similar content being viewed by others

A low-area high-efficiency video coding inverse transform core using resource and time sharing architecture

An efficient parallel-pipelined intra prediction architecture to support DCT/DST engine of HEVC encoder

An Area Efficient and Reusable HEVC 1D-DCT Hardware Accelerator

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

High Throughput Parallel-Pipeline 2-D DCT/IDCT Processor Chip

Abstract

Access this article

Similar content being viewed by others

A low-area high-efficiency video coding inverse transform core using resource and time sharing architecture

An efficient parallel-pipelined intra prediction architecture to support DCT/DST engine of HEVC encoder

An Area Efficient and Reusable HEVC 1D-DCT Hardware Accelerator

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation