Top

International Journal of Parallel Programming

Published in:

01-12-2015

List Scheduling in Embedded Systems Under Memory Constraints

Authors: Paul-Antoine Arras, Didier Fuin, Emmanuel Jeannot, Arthur Stoutchinin, Samuel Thibault

Published in: International Journal of Parallel Programming | Issue 6/2015

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Video decoding and image processing in embedded systems are subject to strong resource constraints, particularly in terms of memory. List-scheduling heuristics with static priorities (HEFT, SDC, etc.) being the oft-cited solutions due to both their good performance and their low complexity, we propose a method aimed at introducing the notion of memory into them. Moreover, we show that through adequate adjustment of task priorities and judicious resort to insertion-based policy, speedups up to 20 % can be achieved. We also show that our technique allows to prevent deadlock and to substantially reduce the required memory footprint compared to classic list-scheduling heuristics. Lastly, we propose a methodology to assess the appropriateness of dynamic scheduling in this context.

previous article BPLG: A Tuned Butterfly Processing Library for GPU Architectures

next article A Hardware/Software Approach for Database Query Acceleration with FPGAs

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

The bottom level is also sometimes referred to as the upward rank.

The orders of magnitude of the latency for local and external memories are respectively 1 cycle and 100 cyles.

A token is the smallest unit of data that can be processed by a task. It is application specific; e.g. for an image-processing algorithm, it can be a line of pixels.

Thus, from a processing viewpoint, pixel lines are independent.

If a node can be pebbled more than once, then the problem is PSPACE-complete (and hence NP-Hard), but probably not in NP [10].

As a reminder, the bigger its priority value, the earlier a task is scheduled.

The sole purpose of this reference schedule is to serve as a basis to construct the self-timed schedule. Thus, the makespans resulting from it are not meaningful for our study and, as such, are not represented.

To keep the model simple, interprediction is not considered.

Adam, T.L., Chandy, K., Dickson, J.: Comparison of list schedules for parallel processing systems. Commun. ACM 17(12), 685–690 (1974)MATHCrossRef

Baker, T.P.: Stack-based scheduling for realtime processes. Real-Time Syst. 3(1), 84–88 (1991)

Batat, A., Feitelson, D.: Gang scheduling with memory considerations. In: Parallel and Distributed Processing Symposium, 2000. IPDPS 2000. Proceedings. 14th International, pp. 109–114. IEEE (2000)

Benini, L., Flamand, E., Fuin, D., Melpignano, D.: P2012: building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator. In: Design, Automation Test in Europe Conference Exhibition (DATE), 2012, pp. 983–987 (2012)

Buck, J., Lee, E.: Scheduling dynamic dataflow graphs with bounded memory using the token flow model. In: 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-93), vol. 1, pp. 429–432. IEEE (1993)

Canon, L.C., Jeannot, E., Sakellariou, R., Zheng, W.: Comparative evaluation of the robustness of dag scheduling heuristics. In: Gorlatch, S., Fragopoulou, P., Priol, T. (eds.) Grid Computing, pp. 73–84. Springer, US (2008). doi:10.1007/978-0-387-09457-1_7

Chaitin, G.J., Auslander, M.A., Chandra, A.K., Cocke, J., Hopkins, M.E., Markstein, P.W.: Register allocation via coloring. Comput. Lang. 6, 47–57 (1981)CrossRef

Fradet, P., Girault, A., Poplavkoy, P.: SPDF: a schedulable parametric data-flow MoC. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2012, p. 769774 (2012)

Geng, T., et al.: Parallelization of computing-intensive tasks of the h.264 high profile decoding algorithm on a reconfigurable multimedia system. IEICE Trans. Inf. Syst. E93–D(12), 3223–3231 (2010)CrossRef

10.

Gilbert, J., Lengauer, T., Tarjan, R.: The pebbling problem is complete in polynomial space. SIAM J. Comput. 9(3), 513–524 (1980)MATHMathSciNetCrossRef

11.

Guermouche, A., L’Excellent, J.Y.: Memory-based scheduling for a parallel multifrontal solver. In: Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, p. 71. IEEE (2004)

12.

Herrmann, J., Marchal, L., Robert, Y.: Model and complexity results for tree traversals on hybrid platforms. In: Euro-Par 2013 Parallel Processing, pp. 647–658. Springer (2013)

13.

Herrmann, J., Marchal, L., Robert, Y.: Memory-aware list scheduling for hybrid platforms. Rapport de recherche RR-8461, INRIA (2014). http://hal.inria.fr/hal-00944336

14.

Jian, G.A., Chu, J.C., Huang, T.Y., Chang, T.C., Guo, J.I.: A system architecture exploration on the configurable HW/SW co-design for h.264 video decoder. In: Circuits and Systems, 2009. ISCAS 2009. IEEE International Symposium on, pp. 2237–2240 (2009). doi:10.1109/ISCAS.2009.5118243

15.

Kwok, Y.K., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31(4), 406–471 (1999). doi:10.1145/344588.344618 CrossRef

16.

Lawler, E.L., Lenstra, J.K., Kan, A.R., Shmoys, D.B.: Sequencing and scheduling: algorithms and complexity. Handb. Oper. Res. Manag. Sci. 4, 445–522 (1993)CrossRef

17.

Lee, E., Messerschmitt, D.: Synchronous data flow. Proc. IEEE 75(9), 1235–1245 (1987)CrossRef

18.

Lee, E.A., Ha, S.: Scheduling strategies for multiprocessor real-time DSP. IEEE Global Telecommunications inproceedings and Exhibition 2 (1989)

19.

Liu, J.W.: On the storage requirement in the out-of-core multifrontal method for sparse factorization. ACM Trans. Math. Softw. TOMS 12(3), 249–264 (1986)MATHCrossRef

20.

Marchal, L., Sinnen, O., Vivien, F.: Scheduling tree-shaped task graphs to minimize memory and makespan. In: Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on, pp. 839–850. IEEE (2013)

21.

Melpignano, D., Benini, L., Flamand, E., Jego, B., Lepley, T., Haugou, G., Clermidy, F., Dutoit, D.: Platform 2012, a many-core computing accelerator for embedded socs: performance evaluation of visual analytics applications. In: Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE, pp. 1137–1142 (2012)

22.

Saponara, S., et al.: Performance and complexity co-evaluation of the advanced video coding standard for cost-effective multimedia communications. EURASIP J. Appl. Signal Process. 2004(2), 220–235 (2004)CrossRef

23.

Sethi, R.: Complete register allocation problems. SIAM J. Comput. 4(3), 226–248 (1975)MATHMathSciNetCrossRef

24.

Shi, Z., Dongarra, J.J.: Scheduling workflow applications on processors with different capabilities. Futur. Gener. Comput. Syst. 22(6), 665–675 (2006). doi:10.1016/j.future.2005.11.002 CrossRef

25.

Sih, G.C., Lee, E.A.: Compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Trans. Parallel Distrib. Syst. 4(2), 175–187 (1993)CrossRef

26.

Sullivan, G., Ohm, J.R.: Recent developments in standardization of high efficiency video coding (HEVC). In: Proceedings of SPIE: The International Society for Optical Engineering, vol. 7798 (2010)

27.

Topcuoglu, H., Hariri, S., Wu, M.Y.: Task scheduling algorithms for heterogeneous processors. In: 8th IEEE Heterogeneous Computing Workshop (HCW’99), pp. 3–14. San Juan, Puerto Rico (1999)

28.

Wang, S.H., et al.: A software-hardware co-implementation of mpeg-4 advanced video coding (AVC) decoder with block level pipelining. J. VLSI Signal Process. Syst. Signal Image Video Technol. 41(1), 93–110 (2005)CrossRef

29.

Wiegand, T., et al.: Overview of the h.264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol. 13(7), 560–576 (2003)CrossRef

Title: List Scheduling in Embedded Systems Under Memory Constraints
Authors: Paul-Antoine Arras
Didier Fuin
Emmanuel Jeannot
Arthur Stoutchinin
Samuel Thibault
Publication date: 01-12-2015
Publisher: Springer US
Published in: International Journal of Parallel Programming / Issue 6/2015
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI: https://doi.org/10.1007/s10766-014-0338-1

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 6/2015

Invasive Compute Balancing for Applications with Shared and Hybrid Parallelization

A Hardware/Software Approach for Database Query Acceleration with FPGAs

Cluster Cache Monitor: Leveraging the Proximity Data in CMP

Fully Optimized Code Block Segmentation Algorithm for LTE-Advanced

A Decomposition-Based Approach for Scalable Many-Field Packet Classification on Multi-core Processors

An Autotuning Engine for the 3D Fast Wavelet Transform on Clusters with Hybrid CPU + GPU Platforms

Premium Partner