skip to main content
research-article

Exact and Practical Modulo Scheduling for High-Level Synthesis

Published:06 May 2019Publication History
Skip Abstract Section

Abstract

Loop pipelining is an essential technique in high-level synthesis to increase the throughput and resource utilisation of field-programmable gate array--based accelerators. It relies on modulo schedulers to compute an operator schedule that allows subsequent loop iterations to overlap partially when executed while still honouring all precedence and resource constraints. Modulo schedulers face a bi-criteria problem: minimise the initiation interval (II; i.e., the number of timesteps after which new iterations are started) and minimise the schedule length.

We present Moovac, a novel exact formulation that models all aspects (including the II minimisation) of the modulo scheduling problem as a single integer linear program, and discuss simple measures to prevent excessive runtimes, to challenge the old preconception that exact modulo scheduling is impractical.

We substantiate this claim by conducting an experimental study covering 188 loops from two established high-level synthesis benchmark suites, four different time limits, and three bounds for the schedule length, to compare our approach against a highly tuned exact formulation and a state-of-the-art heuristic algorithm. In the fastest configuration, an accumulated runtime of under 16 minutes is spent on scheduling all loops, and proven optimal IIs are found for 179 test instances.

References

  1. Erik R. Altman and Guang R. Gao. 1998. Optimal modulo scheduling through enumeration. International Journal of Parallel Programming 26, 2 (1998), 313--344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Erik R. Altman, Ramaswamy Govindarajan, and Guang R. Gao. 1995. Scheduling and mapping: Software pipelining in the presence of structural hazards. In Proceedings of the ACM SIGPLAN’95 Conference on Programming Language Design and Implementation (PLDI’95). ACM, New York, NY, 139--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Maria Ayala and Christian Artigues. 2010. On Integer Linear Programming Formulations for the Resource-Constrained Modulo Scheduling Problem. Technical Report LAAS No. 10393. Archive ouverte HAL. https://hal.archives-ouvertes.fr/hal-00538821.Google ScholarGoogle Scholar
  4. Alessio Bonfietti, Michele Lombardi, Luca Benini, and Michela Milano. 2014. CROSS cyclic resource-constrained scheduling solver. Artificial Intelligence 206 (2014), 25--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Andrew Canis, Stephen Dean Brown, and Jason Helge Anderson. 2014. Modulo SDC scheduling with recurrence minimization in high-level synthesis. In Proceedings of the 24th International Conference on Field Programmable Logic and Applications (FPL’14). IEEE, Los Alamitos, CA, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  6. Josep M. Codina, Josep Llosa, and Antonio González. 2002. A comparative study of modulo scheduling techniques. In Proceedings of the 16th International Conference on Supercomputing (ICS’02). ACM, New York, NY, 97--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jason Cong and Zhiru Zhang. 2006. An efficient and versatile scheduling algorithm based on SDC formulation. In Proceedings of the 43rd Design Automation Conference (DAC’06). ACM, New York, NY, 433--438. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Benoît Dupont de Dinechin. 1994. Simplex Scheduling: More Than Lifetime-Sensitive Instruction Scheduling. Technical Report PRISM 1994.22.Google ScholarGoogle Scholar
  9. B. D. de Dinechin. 2007. Time-indexed formulations and a large neighborhood search for the resource-constrained modulo scheduling problem. In Proceedings of the 3rd Multidisciplinary International Conference on Scheduling: Theory and Applications (MISTA’07). 144--151.Google ScholarGoogle Scholar
  10. Elizabeth D. Dolan and Jorge J. Moré. 2002. Benchmarking optimization software with performance profiles. Mathematical Programming 91, 2 (2002), 201--213.Google ScholarGoogle ScholarCross RefCross Ref
  11. Łukasz Domagała. 2012. Application of CLP to Instruction Modulo Scheduling for VLIW Processors. Ph.D. Dissertation. Silesian University of Technology.Google ScholarGoogle Scholar
  12. Alexandre E. Eichenberger and Edward S. Davidson. 1997. Efficient formulation for optimal modulo schedulers. In Proceedings of the ACM SIGPLAN’97 Conference on Programming Language Design and Implementation (PLDI’97). ACM, New York, NY, 194--205. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Alexandre E. Eichenberger, Edward S. Davidson, and Santosh G. Abraham. 1995. Optimum modulo schedules for minimum register requirements. In Proceedings of the 9th International Conference on Supercomputing (ICS’95). ACM, New York, NY, 31--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Alexandre E. Eichenberger, Edward S. Davidson, and Santosh G. Abraham. 2014. Author retrospective for optimum modulo schedules for minimum register requirements. In Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume. ACM, New York, NY, 35--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dirk Fimmel and Jan Müller. 2002. Optimal software pipelining with rational initiation interval. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’02). 638--643. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yuko Hara, Hiroyuki Tomiyama, Shinya Honda, and Hiroaki Takada. 2009. Proposal and quantitative analysis of the CHStone benchmark program suite for practical C-based high-level synthesis. Journal of Information Processing 17 (2009), 242--254.Google ScholarGoogle ScholarCross RefCross Ref
  17. Richard A. Huff. 1993. Lifetime-sensitive modulo scheduling. In Proceedings of the ACM SIGPLAN ’93 Conference on Programming Language Design and Implementation (PLDI’93). ACM, New York, NY, 258--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jens Huthmann, Björn Liebig, Julian Oppermann, and Andreas Koch. 2013. Hardware/software co-compilation with the Nymble system. In Proceedings of the 2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC’13). IEEE, Los Alamitos, CA, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  19. Ed Klotz and Alexandra M. Newman. 2013. Practical guidelines for solving difficult mixed integer linear programs. Surveys in Operations Research and Management Science 18, 1 (2013), 18--32.Google ScholarGoogle ScholarCross RefCross Ref
  20. Jens Korinth, David de la Chevallerie, and Andreas Koch. 2015. An open-source tool flow for the composition of reconfigurable hardware thread pool architectures. In Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’15). IEEE, Los Alamitos, CA, 195--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Monica S. Lam. 1988. Software pipelining: An effective scheduling technique for VLIW machines. In Proceedings of the ACM SIGPLAN’88 Conference on Programming Language Design and Implementation (PLDI’88). ACM, New York, NY, 318--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Chris Lattner and Vikram S. Adve. 2004. LLVM: A compilation framework for lifelong program analysis and transformation. In Proceedings of the 2nd IEE/ACM International Symposium on Code Generation and Optimization (CGO’04). IEEE, Los Alamitos, CA, 75--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Josep Llosa, Eduard Ayguadé, Antonio González, Mateo Valero, and Jason Eckhardt. 2001. Lifetime-sensitive modulo scheduling in a production environment. IEEE Transactions on Computers 50, 3 (2001), 234--249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Julian Oppermann, Andreas Koch, Melanie Reuter-Oppermann, and Oliver Sinnen. 2016. ILP-based modulo scheduling for high-level synthesis. In Proceedings of the 2016 International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES’16). ACM, New York, NY, Article 1, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. Ramakrishna Rau. 1996. Iterative modulo scheduling. International Journal of Parallel Programming 24, 1 (1996), 3--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Brandon Reagen, Robert Adolf, Yakun Sophia Shao, Gu-Yeon Wei, and David M. Brooks. 2014. MachSuite: Benchmarks for accelerator design and customized architectures. In Proceedings of the 2014 IEEE International Symposium on Workload Characterication (IISWC’14). IEEE, Los Alamitos, CA, 110--119.Google ScholarGoogle Scholar
  27. Přemysl Šůcha and Zdeněk Hanzálek. 2011. A cyclic scheduling problem with an undetermined number of parallel identical processors. Computational Optimization and Applications 48, 1 (2011), 71--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sarad Venugopalan and Oliver Sinnen. 2015. ILP formulations for optimal task scheduling with communication delays on parallel systems. IEEE Transactions on Parallel and Distributed Systems 26, 1 (2015), 142--151.Google ScholarGoogle ScholarCross RefCross Ref
  29. Zhiru Zhang and Bin Liu. 2013. SDC-based modulo scheduling for pipeline synthesis. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’13). IEEE, Los Alamitos, CA, 211--218. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exact and Practical Modulo Scheduling for High-Level Synthesis

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Reconfigurable Technology and Systems
        ACM Transactions on Reconfigurable Technology and Systems  Volume 12, Issue 2
        June 2019
        117 pages
        ISSN:1936-7406
        EISSN:1936-7414
        DOI:10.1145/3322884
        • Editor:
        • Deming Chen
        Issue’s Table of Contents

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 6 May 2019
        • Accepted: 1 February 2019
        • Revised: 1 November 2018
        • Received: 1 February 2018
        Published in trets Volume 12, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format