Skip to main content
Top
Published in: International Journal of Parallel Programming 6/2014

01-12-2014

Parallelizing Complex Streaming Applications on Distributed Scratchpad Memory Multicore Architecture

Authors: Shin-Kai Chen, Cheng-Yu Hung, Ching-Chih Chen, Chih-Wei Liu

Published in: International Journal of Parallel Programming | Issue 6/2014

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Multicore processors can provide sufficient computing power and flexibility for complex streaming applications, such as high-definition video processing. For less hardware complexity and power consumption, the distributed scratchpad memory architecture is considered, instead of the cache memory architecture. However, the distributed design poses new challenges to programming. It is difficult to exploit all available capabilities and achieve maximal throughput, due to the combined complexity of inter-processor communication, synchronization, and workload balancing. In this study, we developed an efficient design flow for parallelizing multimedia applications on a distributed scratchpad memory multicore architecture. An application is first partitioned into streaming components and then mapped onto multicore processors. Various hardware-dependent factors and application-specific characteristics are involved in generating efficient task partitions and allocating resources appropriately. To test and verify the proposed design flow, three popular multimedia applications were implemented: a full-HD motion JPEG decoder, an object detector, and a full-HD H.264/AVC decoder. For demonstration purposes, SONY PlayStation\(^{\circledR }\)3 was selected as the target platform. Simulation results show that, on PS3, the full-HD motion JPEG decoder with the proposed design flow can decode about 108.9 frames per second (fps) in the 1080p format. The object detection application can perform real-time object detection at 2.84 fps at \(1280 \times 960\) resolution, 11.75 fps at \(640 \times 480\) resolution, and 62.52 fps at \(320 \times 240\) resolution. The full-HD H.264/AVC decoder applications can achieve nearly 50 fps.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bai, K., Shrivastava, A.: Heap data management for limited local memory (LLM) multi-core processors. In: Proceedings of the CODES+ISSS, pp. 317–325 (2010) Bai, K., Shrivastava, A.: Heap data management for limited local memory (LLM) multi-core processors. In: Proceedings of the CODES+ISSS, pp. 317–325 (2010)
2.
go back to reference Baik, H., Sihn, K., Kim, Y., Bae, S., Han, N., Song, H.J.: Analysis and parallelization of H.264 decoder on cell broadband engine architecture. In: Proceedings of the IEEE Symposium Signal Processing and Information Technology, pp. 791–795 (2007) Baik, H., Sihn, K., Kim, Y., Bae, S., Han, N., Song, H.J.: Analysis and parallelization of H.264 decoder on cell broadband engine architecture. In: Proceedings of the IEEE Symposium Signal Processing and Information Technology, pp. 791–795 (2007)
3.
go back to reference Bai, K., Shrivastava, A., Kudchadker, S.: Stack data management for limited local memory (LLM) multi-core processors. In: Proceedings of the ASAP, pp. 231–234 (2011) Bai, K., Shrivastava, A., Kudchadker, S.: Stack data management for limited local memory (LLM) multi-core processors. In: Proceedings of the ASAP, pp. 231–234 (2011)
4.
go back to reference Chen, S.-K., Lin, T.-J., Liu, C.-W.: Parallel object detection on multicore platforms. In: IEEE Workshop on Signal Processing Systems, pp. 75–80 (2007) Chen, S.-K., Lin, T.-J., Liu, C.-W.: Parallel object detection on multicore platforms. In: IEEE Workshop on Signal Processing Systems, pp. 75–80 (2007)
5.
go back to reference Che, W., Panda, A., Chatha, K.S.: Compilation of stream programs for multicore processors that incorporate scratchpad memories. In: Proceedings of the DATE, pp. 1118–1123 (2011) Che, W., Panda, A., Chatha, K.S.: Compilation of stream programs for multicore processors that incorporate scratchpad memories. In: Proceedings of the DATE, pp. 1118–1123 (2011)
6.
go back to reference Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification, ITU-T Rec. H.264 and ISO/IEC 14496–10 AVC (2003) Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification, ITU-T Rec. H.264 and ISO/IEC 14496–10 AVC (2003)
7.
go back to reference Gschwind, M.: The cell broadband engine: exploiting multiple levels of parallelism in a chip multiprocessor. Int. J. Parallel Program. 35(3), 233–262 (2007)CrossRef Gschwind, M.: The cell broadband engine: exploiting multiple levels of parallelism in a chip multiprocessor. Int. J. Parallel Program. 35(3), 233–262 (2007)CrossRef
8.
go back to reference Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach, 4th edn. Morgan Kaufmann Publishers, California (2007) Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach, 4th edn. Morgan Kaufmann Publishers, California (2007)
9.
go back to reference IBM Corp.: C/C++ Language Extensions for Cell Broadband Engine Architecture. User Guide (2008) IBM Corp.: C/C++ Language Extensions for Cell Broadband Engine Architecture. User Guide (2008)
10.
go back to reference IBM Corp.: Cell Programming Guide. User Guide, (2008) IBM Corp.: Cell Programming Guide. User Guide, (2008)
11.
go back to reference IBM Corp.: Cell Programming Tutorial. User Guide, (2008) IBM Corp.: Cell Programming Tutorial. User Guide, (2008)
12.
go back to reference IBM Corp.: SPE Runtime Management Library. User Guide, (2008) IBM Corp.: SPE Runtime Management Library. User Guide, (2008)
13.
go back to reference Ismail, L., Guerchi, D.: Performance evaluation of convolution of the cell broadband engine processor. IEEE Trans. Parallel Distrib. Syst. 22(2), 337–351 (2011) Ismail, L., Guerchi, D.: Performance evaluation of convolution of the cell broadband engine processor. IEEE Trans. Parallel Distrib. Syst. 22(2), 337–351 (2011)
14.
go back to reference Jung, S.C., Shrivastava, S., Bai, K.: Dynamic code mapping for limited local memory systems. In: Proceedings of the ASAP, pp. 13–20 (2010) Jung, S.C., Shrivastava, S., Bai, K.: Dynamic code mapping for limited local memory systems. In: Proceedings of the ASAP, pp. 13–20 (2010)
15.
go back to reference Kahn, G.: The semantics of a simple language for parallel programming. In: Proceedings of the IFIP Congress, pp. 471–475 (1974) Kahn, G.: The semantics of a simple language for parallel programming. In: Proceedings of the IFIP Congress, pp. 471–475 (1974)
16.
go back to reference Kudlur, M., Mahlke, S.: Orchestrating the execution of stream programs on multicore platforms. In: Proceedings of the PLDI, pp. 114–124 (2008) Kudlur, M., Mahlke, S.: Orchestrating the execution of stream programs on multicore platforms. In: Proceedings of the PLDI, pp. 114–124 (2008)
17.
go back to reference Kapasi, U., Rixner, S., Dally, W., Khailany, B., Ahn, J., Mattson, P., Owens, J.: Programmable stream processors. IEEE Comput. 36(8), 54–62 (2003)CrossRef Kapasi, U., Rixner, S., Dally, W., Khailany, B., Ahn, J., Mattson, P., Owens, J.: Programmable stream processors. IEEE Comput. 36(8), 54–62 (2003)CrossRef
18.
go back to reference Kahle, J.A., Day, M.N., Hofstee, H.P., Johns, C.R., Maeurer, T.R., Shippy, D.: Introduction to the cell multiprocessor. IBM J. Res. Dev. 49(4/5), 589–604 (2005)CrossRef Kahle, J.A., Day, M.N., Hofstee, H.P., Johns, C.R., Maeurer, T.R., Shippy, D.: Introduction to the cell multiprocessor. IBM J. Res. Dev. 49(4/5), 589–604 (2005)CrossRef
19.
go back to reference Kistler, M., Perrone, M., Petrini, F.: Cell multiprocessor communication network: built for speed. IEEE Micro. 26(3), 10–23 (2006)CrossRef Kistler, M., Perrone, M., Petrini, F.: Cell multiprocessor communication network: built for speed. IEEE Micro. 26(3), 10–23 (2006)CrossRef
20.
go back to reference Kim, Y., Kim, J., Bae, S., Baik, H., Song, H. J.: H.264/AVC decoder parallelization and optimization on asymmetric multicore platform using dynamic load balancing. In: IEEE International Conference on Multimedia and Expo., pp. 1001–1004 (2008) Kim, Y., Kim, J., Bae, S., Baik, H., Song, H. J.: H.264/AVC decoder parallelization and optimization on asymmetric multicore platform using dynamic load balancing. In: IEEE International Conference on Multimedia and Expo., pp. 1001–1004 (2008)
21.
go back to reference McCool, M.: Data-parallel programming on the cell BE and the GPU using the RapidMind development platform. In: GSPx Multicore Applications Conference (2006) McCool, M.: Data-parallel programming on the cell BE and the GPU using the RapidMind development platform. In: GSPx Multicore Applications Conference (2006)
22.
go back to reference Ohara, M., Inoue, H., Sohda, Y., Komatsu, H., Nakatani, T.: MPI microtask for programming the cell broadband engine\(^{\rm TM}\) processor. IBM Syst. J. 45(1), 85–102 (2006)CrossRef Ohara, M., Inoue, H., Sohda, Y., Komatsu, H., Nakatani, T.: MPI microtask for programming the cell broadband engine\(^{\rm TM}\) processor. IBM Syst. J. 45(1), 85–102 (2006)CrossRef
24.
go back to reference Pennebarker, W.B., Mitchell, J.L.: JPEG: Still Image Data Compression Standard. Kluwer, Massachusetts (1993) Pennebarker, W.B., Mitchell, J.L.: JPEG: Still Image Data Compression Standard. Kluwer, Massachusetts (1993)
25.
go back to reference Perez, J.M., Bellens, P., Badia, R.M., Labarta, J.: CellSs: making it easier to program the cell broadband engine processor. IBM J. Res. Dev. 51(5), 593–604 (2007)CrossRef Perez, J.M., Bellens, P., Badia, R.M., Labarta, J.: CellSs: making it easier to program the cell broadband engine processor. IBM J. Res. Dev. 51(5), 593–604 (2007)CrossRef
26.
go back to reference Sarje, A., Zola, J., Aluru, S.: Accelerating pairwise computations on cell processors. IEEE Trans. Parallel Distrib. Syst. 22(1), 69–77 (2011) Sarje, A., Zola, J., Aluru, S.: Accelerating pairwise computations on cell processors. IEEE Trans. Parallel Distrib. Syst. 22(1), 69–77 (2011)
27.
go back to reference Sugano, H., Miyamoto, R.: A real-time object recognition system on cell broadband engine. In: Mery, D., Rueda, L. (eds.) Advances in Image and Video Technology, LNCS Series 4872, pp. 932–943. Springer, Berlin (2007) Sugano, H., Miyamoto, R.: A real-time object recognition system on cell broadband engine. In: Mery, D., Rueda, L. (eds.) Advances in Image and Video Technology, LNCS Series 4872, pp. 932–943. Springer, Berlin (2007)
28.
go back to reference Tol, E. van der, Jaspers, E., Gelderblom, R.: Mapping of H.264 decoding on multiprocessor architecture. In: Proceedings of the SPIE Conference on Image and Video Communications and Processing, pp. 707–718 (2003) Tol, E. van der, Jaspers, E., Gelderblom, R.: Mapping of H.264 decoding on multiprocessor architecture. In: Proceedings of the SPIE Conference on Image and Video Communications and Processing, pp. 707–718 (2003)
29.
go back to reference Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE Symposium Computer Vision and Pattern Recognition, pp. 511–518 (2001) Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE Symposium Computer Vision and Pattern Recognition, pp. 511–518 (2001)
Metadata
Title
Parallelizing Complex Streaming Applications on Distributed Scratchpad Memory Multicore Architecture
Authors
Shin-Kai Chen
Cheng-Yu Hung
Ching-Chih Chen
Chih-Wei Liu
Publication date
01-12-2014
Publisher
Springer US
Published in
International Journal of Parallel Programming / Issue 6/2014
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-013-0256-7

Other articles of this Issue 6/2014

International Journal of Parallel Programming 6/2014 Go to the issue

Premium Partner