skip to main content
research-article
Open Access

Fault-tolerant algorithms for tick-generation in asynchronous logic: Robust pulse generation

Published:08 September 2014Publication History
Skip Abstract Section

Abstract

Today’s hardware technology presents a new challenge in designing robust systems. Deep submicron VLSI technology introduces transient and permanent faults that were never considered in low-level system designs in the past. Still, robustness of that part of the system is crucial and needs to be guaranteed for any successful product. Distributed systems, on the other hand, have been dealing with similar issues for decades. However, neither the basic abstractions nor the complexity of contemporary fault-tolerant distributed algorithms match the peculiarities of hardware implementations.

This article is intended to be part of an attempt striving to bridge over this gap between theory and practice for the clock synchronization problem. Solving this task sufficiently well will allow to build an ultra-robust high-precision clocking system for hardware designs like systems-on-chips in critical applications. As our first building block, we describe and prove correct a novel distributed, Byzantine fault-tolerant, probabilistically self-stabilizing pulse synchronization protocol, called FATAL, that can be implemented using standard asynchronous digital logic: Correct FATAL nodes are guaranteed to generate pulses (i.e., unnumbered clock ticks) in a synchronized way, despite a certain fraction of nodes being faulty. FATAL uses randomization only during stabilization and, despite the strict limitations introduced by hardware designs, offers optimal resilience and smaller complexity than all existing protocols. Finally, we show how to leverage FATAL to efficiently generate synchronized, self-stabilizing, high-frequency clocks.

References

  1. Ajtai, M., Komlós, J., and Szemerédi, E. 1983. An O(n log n) sorting network. In Proceedings of the 15th Symposium on Theory of Computing (STOC). 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Attiya, H. and Censor, K. 2008. Lower bounds for randomized consensus under a weak adversary. In Proceedings of the 27th Symposium on Principles of Distributed Computing (PODC). 315--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ben-Or, M., Dolev, D., and Hoch, E. N. 2008. Fast self-stabilizing byzantine tolerant digital clock synchronization. In Proceedings of the 27th Symposium on Principles of Distributed Computing (PODC). 385--394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bhamidipati, R., Zaidi, A., Makineni, S., Low, K., Chen, R., Liu, K.-Y., and Dalgrehn, J. 2002. Challenges and methodologies for implementing high-performance network processors. Intel Technol. J. 6, 3, 83--92.Google ScholarGoogle Scholar
  5. Biaz, S. and Welch, J. 2001. Closed form bounds for clock synchronization under simple uncertainty assumptions. Inf. Process. Lett. 80, 3, 151--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chapiro, D. M. 1984. Globally-asynchronous locally-synchronous systems. Ph.D. thesis, Stanford University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Constantinescu, C. 2003. Trends and challenges in VLSI circuit reliability. IEEE Micro 23, 4, 14--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Daliot, A. and Dolev, D. 2006. Self-stabilizing byzantine pulse synchronization. Comput. Res. Repository abs/cs/0608092.Google ScholarGoogle Scholar
  9. Daliot, A., Dolev, D., and Parnas, H. 2003. Self-stabilizing pulse synchronization inspired by biological pacemaker networks. In Proceedings of the SSS. 32--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dike, C. and Burton, E. 1999. Miller and noise effects in a synchronizing flip-flop. IEEE J. Solid-State Circ. SC-34, 6, 849--855.Google ScholarGoogle Scholar
  11. Dolev, D. 1982. The Byzantine generals strike again. J. Algor. 3, 14--30.Google ScholarGoogle ScholarCross RefCross Ref
  12. Dolev, D., Fuegger, M., Lenzen, C., Posch, M., Schmid, U., and Steininger, A. 2014. Rigorously modeling self-stabilizing fault-tolerant circuits: An ultra-robust clocking scheme for systems-on-chip. J. Comput. Syst. Sci. 80, 4, 860--900. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dolev, D. and Hoch, E. 2007. Byzantine self-stabilizing pulse in a bounded-delay model. In Proceedings of the 9th Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS). Vol. 4280, 350--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dolev, D., Lynch, N. A., Pinter, S. S., Stark, E. W., and Weihl, W. E. 1986. Reaching approximate agreement in the presence of faults. J. ACM 33, 499--516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dolev, S. and Welch, J. L. 2004. Self-stabilizing clock synchronization in the presence of Byzantine faults. J. ACM 51, 5, 780--799. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Fischer, M. J. and Lynch, N. A. 1982. A lower bound for the time to assure interactive consistency. Inf. Process. Lett. 14, 183--186.Google ScholarGoogle ScholarCross RefCross Ref
  17. Fischer, M. J., Lynch, N. A., and Merritt, M. 1985. Easy impossibility proofs for distributed consensus problems. In Proceedings of the 4th Conference of Principles of Distributed Computing (PODC). 59--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Friedman, E. G. 2001. Clock distribution networks in synchronous digital integrated circuits. Proc. IEEE 89, 5, 665--692.Google ScholarGoogle ScholarCross RefCross Ref
  19. Fuchs, G., Függer, M., and Steininger, A. 2009. On the threat of metastability in an asynchronous fault-tolerant clock generation scheme. In Proceedings of the 15th Symposium on Asynchronous Circuits and Systems (ASYNC) (Chapel Hill, N. Carolina). 127--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Függer, M. 2010. Analysis of on-chip fault-tolerant distributed algorithms. Ph.D. thesis, Technische Universität Wien, Institut für Technische Informatik.Google ScholarGoogle Scholar
  21. Függer, M., Dielacher, A., and Schmid, U. 2010. How to speed-up fault-tolerant clock generation in VLSI systems-on-chip via pipelining. In Proceedings of the 8th European Dependable Computing Conference (EDCC). 230--239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Függer, M. and Schmid, U. 2012. Reconciling fault-tolerant distributed computing and systems-on-chip. Distrib. Comput. 24, 6, 323--355.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Függer, M., Schmid, U., Fuchs, G., and Kempf, G. 2006. Fault-tolerant distributed clock generation in VLSI systems-on-chip. In Proceedings of the 6th European Dependable Computing Conference (EDCC). 87--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Gadlage, M. J., Eaton, P. H., Benedetto, J. M., Carts, M., Zhu, V., and Turflinger, T. L. 2006. Digital device error rate trends in advanced CMOS technologies. IEEE Trans. Nuclear Sci. 53, 6, 3466--3471.Google ScholarGoogle ScholarCross RefCross Ref
  25. Hoch, E., Dolev, D., and Daliot, A. 2006. Self-stabilizing byzantine digital clock synchronization. In Proceedings of the 8th Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS’06). Vol. 4280, 350--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. International Technology Roadmap for Semiconductors 2012. International Technology Roadmap for Semiconductors. http://www.itrs.net.Google ScholarGoogle Scholar
  27. Kinniment, D. J., Bystrov, A., and Yakovlev, A. V. 2002. Synchronization circuit performance. IEEE J. Solid-State Circ. SC-37, 2, 202--209.Google ScholarGoogle Scholar
  28. Kuhn, F., Lenzen, C., Locher, T., and Oshman, R. 2010. Optimal gradient clock synchronization in dynamic networks. In Proceedings of the 29th Symposium on Principles of Distributed Computing (PODC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lenzen, C., Locher, T., and Wattenhofer, R. 2010. Tight bounds for clock synchronization. In J. ACM 57, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lundelius, J. and Lynch, N. 1984. An upper and lower bound for clock synchronization. Inf. Control 62, 2--3, 190--204.Google ScholarGoogle ScholarCross RefCross Ref
  31. Malekpour, M. 2006. A Byzantine-fault tolerant self-stabilizing protocol for distributed clock synchronization systems. In Proceedings of the 9th Conference on Stabilization, Safety, and Security of Distributed Systems (SSS). 411--427. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Malekpour, M. 2009. A self-stabilizing Byzantine-fault-tolerant clock synchronization protocol. Tech. rep., TM-2009-215758, NASA.Google ScholarGoogle Scholar
  33. Marino, L. 1981. General theory of metastable operation. IEEE Trans. Comput. C-30, 2, 107--115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Metra, C., Francescantonio, S., and Mak, T. 2004. Implications of clock distribution faults and issues with screening them during manufacturing testing. IEEE Trans. Comput. 53, 5, 531--546. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Pease, M., Shostak, R., and Lamport, L. 1980. Reaching agreement in the presence of faults. J. ACM 27, 228--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Polzer, T., Handl, T., and Steininger, A. 2009. A metastability-free multi-synchronous communication scheme for SoCs. In Proceedings of the 11th Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS). 578--592. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Portmann, C. L. and Meng, T. H. Y. 1995. Supply noise and CMOS synchronization errors. IEEE J. Solid-State Circ. SC-30, 9, 1015--1017.Google ScholarGoogle Scholar
  38. Restle, P., McNamara, T., Webber, D., Camporese, P., Eng, K., Jenkins, K., Allen, D., Rohn, M., Quaranta, M., Boerstler, D., Alpert, C., Carter, C., Bailey, R., Petrovick, J., Krauter, B., and McCredie, B. 2001. A clock distribution network for microprocessors. IEEE J. Solid-State Circ. 36, 5, 792--799.Google ScholarGoogle ScholarCross RefCross Ref
  39. Semiat, Y. and Ginosar, R. 2003. Timing measurements of synchronization circuits. In Proceedings of the 9th Symposium on Asynchronous Circuits and Systems (ASYNC). 68--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Srikanth, T. K. and Toueg, S. 1987. Optimal clock synchronization. J. ACM 34, 3, 626--645. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Sundaresan, K., Allen, P., and Ayazi, F. 2006. Process and temperature compensation in a 7-MHz CMOS clock oscillator. IEEE J. Solid-State Circ. 41, 2, 433--442.Google ScholarGoogle ScholarCross RefCross Ref
  42. Teehan, P., Greenstreet, M., and Lemieux, G. 2007. A survey and taxonomy of GALS design styles. IEEE Des. Test Comput. 24, 5, 418--428. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Widder, J. and Schmid, U. 2009. The theta-model: Achieving synchrony without clocks. Distrib. Comput. 22, 1, 29--47.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fault-tolerant algorithms for tick-generation in asynchronous logic: Robust pulse generation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Journal of the ACM
          Journal of the ACM  Volume 61, Issue 5
          August 2014
          171 pages
          ISSN:0004-5411
          EISSN:1557-735X
          DOI:10.1145/2668245
          Issue’s Table of Contents

          Copyright © 2014 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 8 September 2014
          • Accepted: 1 December 2013
          • Revised: 1 May 2013
          • Received: 1 March 2012
          Published in jacm Volume 61, Issue 5

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader