skip to main content
article
Free Access

Dynamic fault-tolerant clock synchronization

Published:03 January 1995Publication History
Skip Abstract Section

Abstract

This paper gives two simple efficient distributed algorithms: one for keeping clocks in a network synchronized and one for allowing new processors to join the network with their clocks synchronized. Assuming a fault-tolerant authentication protocol, the algorithms tolerate both link and processor failures of any type. The algorithm for maintaining synchronization works for arbitrary networks (rather than just completely connected networks) and tolerates any number of processor or communication link faults as long as the correct processors remain connected by fault-free paths. It thus represents an improvement over other clock synchronization algorithms such as those of Lamport and Melliar Smith and Welch and Lynch, although, unlike them, it does require an authentication protocol to handle Byzantine faults. Our algorithm for allowing new processors to join requires that more than half the processors be correct, a requirement that is provably necessary.

References

  1. ~CRISTIAN, F. 1989. Probabilistic clock synchronization. Dzst. Comput. 3, 3 (July), 146-158.Google ScholarGoogle Scholar
  2. ~CRISTIAN, F., AGHILI, H., STRONG, H. R., AND DOLEV, D. 1986. Atomic broadcast: from simple ~message diffusion to Byzantine agreement. IBM Tech. Rep. RJ 5244. IBM, San Jose, Calif.Google ScholarGoogle Scholar
  3. ~DOLEV, D., HALPERN, J. Y., SIMONS, m. B., AND STRONG, $. R. 1987. A new look at fault ~tolerant network routing. Inf. Comput. 72, 180-196. Google ScholarGoogle Scholar
  4. ~DOLEV, D., HALPERN, J. Y., AND STRONG, H.R. 1986. On the possibility and impossibility of ~achieving clock synchronization. J. Comput. Syst. Scz. 32, 2 (Apr.), 230-250. Google ScholarGoogle Scholar
  5. ~DOLEV, D. AND STRONG, n. R. 1983. Authenticated algorithms for Byzantine agreement. ~SiAM J. Conzput. 12, 4 (Nov.), 656-666.Google ScholarGoogle Scholar
  6. ~GRmFER, A. D., AND STRONG, H.R. 1988. DCF: Distributed communication with fault toler- ~ance. In Proceedings of the 7th Annual ACM Symposium on Principles of Distributed Computing ~(Toronto, Ont., Canada, Aug. 15-17). ACM, New York, pp. 18-27. Google ScholarGoogle Scholar
  7. ~I4ALPERN. J. Is/., MEGIDDO, N., AND MUNSHI, A. 1985. Optimal precision in the presence of ~uncertainty. J. Complexl~ 1, 2 (June), 170-196.Google ScholarGoogle Scholar
  8. ~HALPERN, J. Y., SIMONS, B. g., STRONG, m. R., AND DOLEV, D. 1984. Fault-tolerant clock ~synchronization. In Proceedings of' the 3rd Annual ACM S. vmposium on Principles of Distributed ~Computing (Vancouver, B.C., Canada, Aug. 27-29). ACM, New York, pp. 89-102. Google ScholarGoogle Scholar
  9. ~KRISHNA, C. M., SHIN, K. a., AND BUTLER, R.W. 1985. Ensuring fault tolerance of phase-locked ~clocks. IEEE Trans. Comput. C-34, 8, 752-756. Google ScholarGoogle Scholar
  10. ~LAMPORT, m. 1978a. Time, clocks and the ordering of events in a distributed system. Commun. ~ACM 2l, 7, (July), 558-565. Google ScholarGoogle Scholar
  11. ~LAMPORT, m. 1978b. The implementation of reliable distributed multiprocess systems. Comput. ~Netw. 2, 2 (May), 95-114.Google ScholarGoogle Scholar
  12. ~L?,MPORT, L. 1984. Using time instead of timeout for fault-tolerant distributed systems. ACM ~Trans. Prog. Lang. Syst. 6, 2 (Apr.) 254-280. Google ScholarGoogle Scholar
  13. ~LAMPORT. L. AND MELLIAR-SMITH, P.M. 1985. Synchronizing clocks in the presence of faults. ~J. ACM 32, 1 (Jan.), 52-78. Google ScholarGoogle Scholar
  14. ~LUNDELIUS, J., AND LYNCH, N. 1984. An upper and lower bound for clock synchronization. Inf. ~Control 62, 21 (Aug./Sept.), 190-204.Google ScholarGoogle Scholar
  15. ~MARZULLO, K. 1983. Loosely-coupled distributed services: A distributed time system. Ph.D. ~dissertation. Stanford Univ., Stanford, Calif.Google ScholarGoogle Scholar
  16. ~PEASE, M., 8HOSTAK, R., AND LAMPORT, L. 1980. Reaching agreement in the presence of faults. ~J. ACM 27, 2, (Apr.), 228-234. Google ScholarGoogle Scholar
  17. ~RIVEST, R. L., SHAMIR, A., AND ADELMAN, L. 1978. A method for obtaining digital signatures ~and public-key cryptosystems. Communications of the ACM 21, 2 (Feb.), 120-126. Google ScholarGoogle Scholar
  18. ~RAMANATHAN, P., SHIN, K. O., AND BUTLER, R.W. 1990. Fault4olerant clock synchronization ~in distributed systems. IEEE Comput. (Oct.), 33-42. Google ScholarGoogle Scholar
  19. ~SCHNEIDER, F.B. 1987. Understanding protocols for byzantine clock synchronization. Tech. ~Rep. Dept. Computer Science, Cornell University, Ithaca, N.Y. Google ScholarGoogle Scholar
  20. ~SRIKANTH, T. K., AND TOUEG, S. 1987. Optimal clock synchronization. J. ACM 34, 3 (July), ~626-645. Google ScholarGoogle Scholar
  21. ~WELCH, J. LUNDELIUS, AND LYNCH, N. 1988. A new fault-tolerant algorithm for clock synchro- ~nization. Inf. Comput. 77, 1, 1-36. Google ScholarGoogle Scholar

Index Terms

  1. Dynamic fault-tolerant clock synchronization

                  Recommendations

                  Reviews

                  Charles N. Schroeder

                  First impressions are often important in reading research reports containing lots of proofs and theorems. This research report makes an excellent first impression, and rightfully so. It is extremely well written, with above-average explanations of the material presented, its background, its theory, and related proofs. The report develops the theory behind, and proofs of, two distributed algorithms: one for keeping network clocks synchronized, and one for allowing new processors to join the network with their clocks synchronized. The problems are formalized, formal definitions are given, and underlying assumptions are described. Clear discussions of related issues are developed, and the algorithms developed are carefully analyzed. The synchronization algorithm presented differs significantly from others developed to date, in that it does not require a minimum number of processors to handle f processor faults, as long as the subnetwork containing the nonfaulty processors remains connected. The role of the synchronizer is distributed. In addition, it does not use averaging, and requires only n 2 messages per synchronization. The join algorithm allows processors to join a short time after requesting to do so. Link failures are tolerated, provided that nonfaulty processors remain connected. Fewer than half of the processors may be faulty during the join process, however. This research is a significant contribution to networking and distributed computing. It should interest researchers in this area, as well as those in the applications-oriented environment. Furthermore, it is organized so readers with different levels of interest can choose the parts that interest them.

                  Access critical reviews of Computing literature here

                  Become a reviewer for Computing Reviews.

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in

                  Full Access

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader