Abstract
Loop pipelining is an essential technique in high-level synthesis to increase the throughput and resource utilisation of field-programmable gate array--based accelerators. It relies on modulo schedulers to compute an operator schedule that allows subsequent loop iterations to overlap partially when executed while still honouring all precedence and resource constraints. Modulo schedulers face a bi-criteria problem: minimise the initiation interval (II; i.e., the number of timesteps after which new iterations are started) and minimise the schedule length.
We present Moovac, a novel exact formulation that models all aspects (including the II minimisation) of the modulo scheduling problem as a single integer linear program, and discuss simple measures to prevent excessive runtimes, to challenge the old preconception that exact modulo scheduling is impractical.
We substantiate this claim by conducting an experimental study covering 188 loops from two established high-level synthesis benchmark suites, four different time limits, and three bounds for the schedule length, to compare our approach against a highly tuned exact formulation and a state-of-the-art heuristic algorithm. In the fastest configuration, an accumulated runtime of under 16 minutes is spent on scheduling all loops, and proven optimal IIs are found for 179 test instances.
- Erik R. Altman and Guang R. Gao. 1998. Optimal modulo scheduling through enumeration. International Journal of Parallel Programming 26, 2 (1998), 313--344. Google ScholarDigital Library
- Erik R. Altman, Ramaswamy Govindarajan, and Guang R. Gao. 1995. Scheduling and mapping: Software pipelining in the presence of structural hazards. In Proceedings of the ACM SIGPLAN’95 Conference on Programming Language Design and Implementation (PLDI’95). ACM, New York, NY, 139--150. Google ScholarDigital Library
- Maria Ayala and Christian Artigues. 2010. On Integer Linear Programming Formulations for the Resource-Constrained Modulo Scheduling Problem. Technical Report LAAS No. 10393. Archive ouverte HAL. https://hal.archives-ouvertes.fr/hal-00538821.Google Scholar
- Alessio Bonfietti, Michele Lombardi, Luca Benini, and Michela Milano. 2014. CROSS cyclic resource-constrained scheduling solver. Artificial Intelligence 206 (2014), 25--52. Google ScholarDigital Library
- Andrew Canis, Stephen Dean Brown, and Jason Helge Anderson. 2014. Modulo SDC scheduling with recurrence minimization in high-level synthesis. In Proceedings of the 24th International Conference on Field Programmable Logic and Applications (FPL’14). IEEE, Los Alamitos, CA, 1--8.Google ScholarCross Ref
- Josep M. Codina, Josep Llosa, and Antonio González. 2002. A comparative study of modulo scheduling techniques. In Proceedings of the 16th International Conference on Supercomputing (ICS’02). ACM, New York, NY, 97--106. Google ScholarDigital Library
- Jason Cong and Zhiru Zhang. 2006. An efficient and versatile scheduling algorithm based on SDC formulation. In Proceedings of the 43rd Design Automation Conference (DAC’06). ACM, New York, NY, 433--438. Google ScholarDigital Library
- Benoît Dupont de Dinechin. 1994. Simplex Scheduling: More Than Lifetime-Sensitive Instruction Scheduling. Technical Report PRISM 1994.22.Google Scholar
- B. D. de Dinechin. 2007. Time-indexed formulations and a large neighborhood search for the resource-constrained modulo scheduling problem. In Proceedings of the 3rd Multidisciplinary International Conference on Scheduling: Theory and Applications (MISTA’07). 144--151.Google Scholar
- Elizabeth D. Dolan and Jorge J. Moré. 2002. Benchmarking optimization software with performance profiles. Mathematical Programming 91, 2 (2002), 201--213.Google ScholarCross Ref
- Łukasz Domagała. 2012. Application of CLP to Instruction Modulo Scheduling for VLIW Processors. Ph.D. Dissertation. Silesian University of Technology.Google Scholar
- Alexandre E. Eichenberger and Edward S. Davidson. 1997. Efficient formulation for optimal modulo schedulers. In Proceedings of the ACM SIGPLAN’97 Conference on Programming Language Design and Implementation (PLDI’97). ACM, New York, NY, 194--205. Google ScholarDigital Library
- Alexandre E. Eichenberger, Edward S. Davidson, and Santosh G. Abraham. 1995. Optimum modulo schedules for minimum register requirements. In Proceedings of the 9th International Conference on Supercomputing (ICS’95). ACM, New York, NY, 31--40. Google ScholarDigital Library
- Alexandre E. Eichenberger, Edward S. Davidson, and Santosh G. Abraham. 2014. Author retrospective for optimum modulo schedules for minimum register requirements. In Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume. ACM, New York, NY, 35--36. Google ScholarDigital Library
- Dirk Fimmel and Jan Müller. 2002. Optimal software pipelining with rational initiation interval. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’02). 638--643. Google ScholarDigital Library
- Yuko Hara, Hiroyuki Tomiyama, Shinya Honda, and Hiroaki Takada. 2009. Proposal and quantitative analysis of the CHStone benchmark program suite for practical C-based high-level synthesis. Journal of Information Processing 17 (2009), 242--254.Google ScholarCross Ref
- Richard A. Huff. 1993. Lifetime-sensitive modulo scheduling. In Proceedings of the ACM SIGPLAN ’93 Conference on Programming Language Design and Implementation (PLDI’93). ACM, New York, NY, 258--267. Google ScholarDigital Library
- Jens Huthmann, Björn Liebig, Julian Oppermann, and Andreas Koch. 2013. Hardware/software co-compilation with the Nymble system. In Proceedings of the 2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC’13). IEEE, Los Alamitos, CA, 1--8.Google ScholarCross Ref
- Ed Klotz and Alexandra M. Newman. 2013. Practical guidelines for solving difficult mixed integer linear programs. Surveys in Operations Research and Management Science 18, 1 (2013), 18--32.Google ScholarCross Ref
- Jens Korinth, David de la Chevallerie, and Andreas Koch. 2015. An open-source tool flow for the composition of reconfigurable hardware thread pool architectures. In Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’15). IEEE, Los Alamitos, CA, 195--198. Google ScholarDigital Library
- Monica S. Lam. 1988. Software pipelining: An effective scheduling technique for VLIW machines. In Proceedings of the ACM SIGPLAN’88 Conference on Programming Language Design and Implementation (PLDI’88). ACM, New York, NY, 318--328. Google ScholarDigital Library
- Chris Lattner and Vikram S. Adve. 2004. LLVM: A compilation framework for lifelong program analysis and transformation. In Proceedings of the 2nd IEE/ACM International Symposium on Code Generation and Optimization (CGO’04). IEEE, Los Alamitos, CA, 75--88. Google ScholarDigital Library
- Josep Llosa, Eduard Ayguadé, Antonio González, Mateo Valero, and Jason Eckhardt. 2001. Lifetime-sensitive modulo scheduling in a production environment. IEEE Transactions on Computers 50, 3 (2001), 234--249. Google ScholarDigital Library
- Julian Oppermann, Andreas Koch, Melanie Reuter-Oppermann, and Oliver Sinnen. 2016. ILP-based modulo scheduling for high-level synthesis. In Proceedings of the 2016 International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES’16). ACM, New York, NY, Article 1, 10 pages. Google ScholarDigital Library
- B. Ramakrishna Rau. 1996. Iterative modulo scheduling. International Journal of Parallel Programming 24, 1 (1996), 3--65. Google ScholarDigital Library
- Brandon Reagen, Robert Adolf, Yakun Sophia Shao, Gu-Yeon Wei, and David M. Brooks. 2014. MachSuite: Benchmarks for accelerator design and customized architectures. In Proceedings of the 2014 IEEE International Symposium on Workload Characterication (IISWC’14). IEEE, Los Alamitos, CA, 110--119.Google Scholar
- Přemysl Šůcha and Zdeněk Hanzálek. 2011. A cyclic scheduling problem with an undetermined number of parallel identical processors. Computational Optimization and Applications 48, 1 (2011), 71--90. Google ScholarDigital Library
- Sarad Venugopalan and Oliver Sinnen. 2015. ILP formulations for optimal task scheduling with communication delays on parallel systems. IEEE Transactions on Parallel and Distributed Systems 26, 1 (2015), 142--151.Google ScholarCross Ref
- Zhiru Zhang and Bin Liu. 2013. SDC-based modulo scheduling for pipeline synthesis. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’13). IEEE, Los Alamitos, CA, 211--218. Google ScholarDigital Library
Index Terms
Exact and Practical Modulo Scheduling for High-Level Synthesis
Recommendations
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27: Proceedings of the 27th annual international symposium on MicroarchitectureModulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with ...
Iterative Modulo Scheduling
Modulo scheduling is a framework within which algorithms for software pipelining innermost loops may be defined. The framework specifies a set of constraints that must be met in order to achieve a legal modulo schedule. A wide variety of algorithms and ...
Non-iterative SDC modulo scheduling for high-level synthesis
AbstractHigh-level synthesis is a powerful tool for increasing productivity in digital hardware design. However, as digital systems become larger and more complex, designers have to consider an increased number of optimizations and directives ...
Comments