Skip to main content
Log in

A Multi-Shared Register File Structure for VLIW Processors

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

The available instruction level parallelism allowed by current register file organizations is not always fully exploited by media processors when running a multimedia application. This paper introduces a novel register file organization, called multi-shared register file, that eliminates this superfluous instruction scheduling flexibility by reducing the number of read and write ports and partitioning the register file in a special ring structure. A parameterized generic VLIW architecture is used to explore different configurations of our proposed register file structure in terms of estimated silicon area, minimum clock period, estimated power consumption, and multimedia task processing performance. Moreover, a metric highly related to multimedia applications is introduced to study trade-offs between hardware cost and performance. The results show that by substituting a monolithic register file with an equivalent multi-shared register file, the estimated area and the power consumption are considerably reduced at the cost of a negligible performance degradation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13

Similar content being viewed by others

References

  1. Agarwala, S., Anderson, T., Hill, A., Ales, M., Damodaran, R., Wiley, P., et al. (2002). A 600-MHz VLIW DSP. IEEE Journal of Solid-State Circuits, 37(11), 1532–1544.

    Article  Google Scholar 

  2. Breach, S. E., Vijaykumar, T. N., & Sohi, G. S. (1994). The anatomy of the register file in a multiscalar processor. In Proceedings of the 27th annual international symposium on microarchitecture (MICRO-27), 1994 (pp. 181–190).

  3. Capitanio, A., Dutt, N., & Nicolau, A. (1992). Partitioned register files for VLIWs: A preliminary analysis of tradeoffs. In Proceedings of the 25th annual international symposium on microarchitecture (MICRO 25), 1992 (pp. 292–300).

  4. Dasu, A., & Panchanathan, S. (2002). A survey of media processing approaches. IEEE Transactions on Circuits and Systems for Video Technology, 12(8), 633–645.

    Article  Google Scholar 

  5. Daubechies, I., & Sweldens, W. (1998). Factoring wavelet transforms into lifting steps. Journal of Fourier Analysis and Applications, 4(3), 245–267.

    Article  MathSciNet  Google Scholar 

  6. Faraboschi, P., Brown, G., Fisher, J., Desoll, G., & Homewood, F. (2000). Lx: A technology platform for customizable VLIW embedded processing. In Proceedings of the 27th international symposium on computer architecture, 2000 (pp. 203–213).

  7. Foley, P. (1996). The Mpact media processor redefines the multimedia PC. Compcon ’96. ‘Technologies for the Information Superhighway’ Digest of Papers, pp. 311–318.

  8. Hammond, L., Hubbert, B., Siu, M., Prabhu, M., Chen, M., & Olukolun, K. (2000). The stanford hydra cmp. IEEE Micro, 20(2), 71–84.

    Article  Google Scholar 

  9. ISO/IEC (2002). 15444-3:2002 Information Technology—JPEG 2000 image coding system—Part 3: Motion JPEG 2000. Technical Report.

  10. Janssen, J., & Corporaal, H. (1995). Partitioned register file for TTAs. In Proceedings of the 28th annual international symposium on microarchitecture, 1995 (pp. 303–312).

  11. Jau, T. S., Yang, W. B., & Chang, C. Y. (2006). Analysis and design of high performance, low power multiple ports register files. In IEEE Asia Pacific conference on circuits and systems (APCCAS 2006), 4–7 December 2006 (pp. 1453–1456).

  12. Kailas, K., Franklin, M., & Ebcioğlu, K. (2002). A register file architecture and compilation scheme for clustered ILP processors. Lecture Notes in Computer Science, 2400, 500–510.

    Article  Google Scholar 

  13. Kapasi, U., Rixner, S., Dally, W., Khailany, B., Ahn, J.H., Mattson, P., et al. (2003). Programmable stream processors. Computer, 36(8), 54–62.

    Article  Google Scholar 

  14. Kuroda, I., & Nishitani, T. (1998). Multimedia processors. Proceedings of the IEEE, 86(6), 1203–1221.

    Article  Google Scholar 

  15. Lang, T., Musoll, E., & Cortadella, J. (1997). Individual flip-flops with gated clocks for low power datapaths. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, [see also IEEE Transactions on Circuits and Systems II: Express Briefs] 44(6), 507–516.

    Google Scholar 

  16. Lee, C., & Smith, J. (1992). A study of partitioned vector register files. In Proceedings on Supercomputing ’92 (pp. 94–103).

  17. Lu, N. P., & Chung, C. P. (1998). Parallelism exploitation in superscalar multiprocessing. IEE Proceedings Computers and Digital Techniques, 145(4), 255–264.

    Article  Google Scholar 

  18. Mallat, S. G. (1989). Multifrequency channel decompositions of images and wavelet models. IEEE Transactions on Acoustics, Speech and Signal Processing, 37(12), 2091–2110. doi:10.1109/29.45554.

    Article  Google Scholar 

  19. Mueller, M., Simon, S., Gryska, H., Wortmann, A., & Buch, S. (2006). Low power synthesizable register files for processor and IP cores. Integrity of VLSI Journal 39(2), 131–155.

    Article  Google Scholar 

  20. Muench, M., Wurth, B., Mehra, R., Sproch, J., & Wehn, N. (2000). Automating RT-level operand isolation to minimize power consumption in datapaths. In Proceedings of the conference on Design, automation and test in Europe (DATE ’00) (pp. 624–633). New York: ACM.

  21. Ostermann, J., Bormans, J., List, P., Marpe, D., Narroschke, M., Pereira, F., et al. (2004). Video coding with H.264/AVC: tools, performance, and complexity. IEEE Circuits and Systems Magazine, 4(1), 7–28.

    Article  Google Scholar 

  22. Payá-Vayá, G., Martín-Langerwerf, J., Taptimthong, P., & Pirsch, P. (2005). RAPANUI: Rapid prototyping for media processor architecture exploration. In: SAMOS 2005, LNCS (Vol. 3553, pp. 32–40). Berlin: Springer.

    Google Scholar 

  23. Payá-Vayá, G., Martín-Langerwerf, J., & Pirsch, P. (2007). Design space exploration of media processors: A generic VLIW architecture and a parameterized scheduler. In ARCS 2007, LNCS (Vol. 4415, pp. 254–267). Berlin: Springer.

    Google Scholar 

  24. Payá-Vayá, G., Martín-Langerwerf, J., Taptimthong, P., & Pirsch, P. (2007). Design space exploration of media processors: A parameterized scheduler. In Proceedings of the Intl. Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (IC-SAMOS 2007) (pp. 41–49). Piscataway: IEEE

    Chapter  Google Scholar 

  25. Pechanek, G., & Vassiliadis, S. (2000). The ManArray embedded processor architecture. In Proceedings of the 26th Euromicro Conference, 2000 (Vol. 1, pp. 348–355).

  26. Rixner, S., Dally, W., Khailany, B., Mattson, P., Kapasi, U., & Owens, J. (2000). Register organization for media processing. In Proceedings of the sixth international symposium on high-performance computer architecture (HPCA-6), 2000 (pp. 375–386).

  27. Russell, R. M. (1978). The CRAY-1 computer system. Communications of the ACM, 21(1), 63–72.

    Article  Google Scholar 

  28. Saluja, S., & Kumar, A. (2004). Performance analysis of inter cluster communication methods in VLIW architecture. In Proceedings of the 17th international conference on VLSI design, 2004 (pp. 761–764).

  29. Sasanka, R., Adve, S. V., Chen, Y. K., & Debes, E. (2004). The energy efficiency of CMP vs. SMT for multimedia workloads (pp. 196–206).

  30. Seznec, A., Toullec, E., & Rochecouste, O. (2002). Register write specialization register read specialization: A path to complexity-effective wide-issue superscalar processors. In Proceedings of the 35th annual IEEE/ACM international symposium on microarchitecture (MICRO-35), 2002 (pp. 383–394).

  31. Sudharsanan, S., Sriram, P., Frederickson, H., & Gulati, A. (2000). Image and video processing using MAJC 5200. In Proceedings of the 2000 international conference on image processing, 2000 (Vol. 3, pp. 122–125).

  32. Suga, A., & Matsunami, K. (2000). Introducing the FR500 embedded microprocessor. IEEE Micro, 20(4), 21–27.

    Article  Google Scholar 

  33. Swensen, J. A., & Patt, Y. N. (1988). Hierarchical registers for scientific computers. In Proceedings of the 2nd international conference on supercomputing (ICS ’88) (pp. 346–354). New York: ACM.

    Chapter  Google Scholar 

  34. Synopsys: PrimePower Manual (2006). Synopsys, y-2006.06 edn.

  35. Synopsys: Design Compiler User Guide (2007). Synopsys, version z-2007.03 edn.

  36. Taiwan Semiconductor Manufacturing Company, Ltd (TSMC) (2004). TSMC 0.13 um Core Library Databook (TVB013GHP).

  37. Terechko, A., Le Thenaff, E., Garg, M., van Eijndhoven, J., & Corporaal, H. (2003). Inter-cluster communication models for clustered VLIW processors. In Proceedings of the ninth international symposium on high-performance computer architecture (HPCA-9), 2003 (pp. 354–364).

  38. Texas Instruments Inc. (www.ti.com). TI TMS320C64xx DSPs.

  39. Tremblay, M., Chan, J., Chaudhry, S., Conigliam, A., & Tse, S. (2000). The MAJC architecture: A synthesis of parallelism and scalability. IEEE Micro, 20(6), 12–25.

    Article  Google Scholar 

  40. Vaidyanathan, P. (1993). Multifrequency systems and filters banks. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  41. Zalamea, J., Llosa, J., Ayguadé, E., & Valero, M. (2001). Modulo scheduling with integrated register spilling for clustered VLIW architectures. In Proceedings of the 34th annual ACM/IEEE international symposium on microarchitecture (MICRO 34) (pp. 160–169). Washington, DC: IEEE Computer Society.

  42. Zalamea, J., Llosa, J., Ayguade, E., & Valero, M. (2003). Hierarchical clustered register file organization for VLIW processors. In Proceedings of the international parallel and distributed processing symposium, 2003 (p. 10).

  43. Zhang, Y., He, H., & Sun, Y. (2005). A new register file access architecture for software pipelining in VLIW processors. In Proceedings of the Asia and South Pacific—Design Automation Conference (ASP-DAC), 2005 (Vol. 1, pp. 627–630).

  44. Zyuban, V., & Kogge, P. (1998). The energy complexity of register files. In Proceedings of the 1998 international symposium on low power electronics and design, 1998 (pp. 305–310).

Download references

Acknowledgements

The authors thank Prof. Dr.-Ing. Holger Blume for the given comments in the review process.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guillermo Payá-Vayá.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Payá-Vayá, G., Martín-Langerwerf, J. & Pirsch, P. A Multi-Shared Register File Structure for VLIW Processors. J Sign Process Syst Sign Image Video Technol 58, 215–231 (2010). https://doi.org/10.1007/s11265-009-0355-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-009-0355-2

Keywords

Navigation