skip to main content
article
Free Access

Multiview access protocols for large-scale replication

Published:01 June 1998Publication History
Skip Abstract Section

Abstract

The article proposes a scalable protocol for replication management in large-scale replicated systems. The protocol organizes sites and data replicas into a tree-structured, hierarchical cluster architecture. The basic idea of the protocol is to accomplish the complex task of updating replicated data with a very large number of replicas by a set of related but independently committed transactions. Each transaction is responsible for updating replicas in exactly one cluster and invoking additional transactions for member clusters. Primary copies (one from each cluster) are updated by a cross-cluster transaction. Then each cluster is independently updated by a separate transaction. This decoupled update propagation process results in possible multiple views of replicated data in a cluster. Compared to other replicated data management protocols, the proposed protocol has several unique advantages. First, thanks to a smaller number of replicas each transaction needs to atomically update in a cluster, the protocol significantly reduces the transaction abort rate, which tends to soar in large transactional systems. Second, the protocol improves user-level transaction response time as top-level update transactions are allowed to commit before all replicas have been updated. Third, read-only queries have the flexibility to see database views of different degrees of consistency and data currency. This ranges from global, most up to date, and consistent views, to local, consistent, but potentially old views, to local, nearest to users but potentially inconsistent views. Fourth, the protocol maintains its scalability by allowing dynamic system reconfiguration as it grows by splitting a cluster into two or more smaller ones. Fifth, autonomy of the clusters is preserved as no specific protocol is required to update replicas within the same cluster. Clusters are, therefore, free to use any valid replication or concurrency control protocols.

References

  1. ABBADI, A. E., SKEEN, D., AND CHRISTIAN, F. 1985. An efficient fault tolerant protocol for replicated data management. In Proceedings of the Fourth ACM Symposium on Principles of Database Systems (Portland, OR, March), 215-229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. ABBADI, A. E. AND TOUEG, S. 1989. Maintaining availability in partitioned replicated databases. ACM Trans. Database Syst. 14, 2 (June), 264-290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. ADLY, N. 1995. Performance evaluation of HARP: A hierarchical asynchronous replication protocol for large scale systems. Tech. Rep. TR-378 (August), Computer Laboratory, University of Cambridge.Google ScholarGoogle Scholar
  4. ADLY, N. AND KUMAR, A. 1994. HPP: A hierarchical propagation protocol for large scale replication in wide area networks. Tech. Rep. TR-331 (March), Computer Laboratory, University of Cambridge.Google ScholarGoogle Scholar
  5. ADLY, N., NAGI, M., AND BACON, g. 1993. A hierarchical asynchronous replication protocol for large scale systems. In Proceedings of the IEEE Workshop on Advances in Parallel and Distributed Systems (Princeton, NJ, Oct.), 152-157.Google ScholarGoogle ScholarCross RefCross Ref
  6. AGRAWAL, D. AND ABBADI, A.E. 1991. An efficient and fault-tolerant solution for distributed mutual exclusion. ACM Trans. Comput. Syst. 9, 1 (Feb.), 1-20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. AGRAWAL, D. AND ABBADI, A.E. 1992. Dynamic logical structures: A position statement for managing replicated data. In Proceedings of the Second Workshop on the Management of Replicated Data (Monterey, CA, Nov. 1992), 26-29.Google ScholarGoogle ScholarCross RefCross Ref
  8. ALSBERG, P. A. AND DAY, J. D. 1976. A principle for resilient sharing of distributed resources. In Proceedings of the Second International Conference on Data Engineering, (Los Angeles, CA, Feb. 1986), 562-570. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. ANON ET AL. 1985. A measure of transaction processing power. Datamation 31, 7 (April), 112-118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. BARAK, A. AND KORNATZKY, Y. 1987. Design principles of operating systems for large scale multicomputers. Tech. Rep., IBM Research Division, T. J. Watson Research Center, Yorktown Heights, NY, RC 13220 (#59114).Google ScholarGoogle Scholar
  11. BARBARA, D., GARCIA-MOLINA, H., AND SPAUSTER, A. 1989. Increasing availability under mutual exclusion constraints with dynamic voting reassignment. ACM Trans. Comput. Syst. 7, 4 (Nov.), 394-426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. BERNSTEIN, P. AND GOODMAN, N. 1986. Serializability theory for replicated databases. J. Comput. Syst. Sci. 31, 3 (Dec.), 355-374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. BERNSTEIN, P., HADZILACOS, V., AND GOODMAN, N. 1987. Concurrency Control and Recovery in Database Systems. Addison-Wesley, Reading, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. BIRRELL, A. D., LEVIN, R., AND NEEDHAM, R.M. 1981. Grapevine: An exercise in distributed computing. In Proceedings of the Eighth Symposium on Operating Systems Principles (Pacific Grove, CA, Dec.), 178-179. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. BLAZE, M. AND ALONSO, R. 1992. Dynamic hierarchical caching in large-scale distributed file systems. In Proceedings of the Twelfth International Distributed Computing Systems Conference (Yokohama, Japan, June 1992), 521-528.Google ScholarGoogle ScholarCross RefCross Ref
  16. CHEUNG, S., AMMAR, M., AND AHAMAD, A. 1990. The grid protocol: A high performance scheme for maintaining replicated data. In Proceedings of the IEEE Sixth International Conference on Data Engineering (Los Angeles, CA, Feb. 1990), 438-445. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. CHUENG, S. Y., AMMAR, M. H., AND AHAMAD, A. 1991. Multidimensional voting. ACM Trans. Comput. Syst. 9, 4 (Nov.), 399-431. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. DAVCEV, D. AND BURKHARD, W. 1985. Consistency and recovery control for replicated files. In Proceedings of the Tenth Symposium on Operating Systems Principles (Orcas Island, WA, Dec.), 87-96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. DAVIDSON, S. B., GARCIA-MOLINA, H., AND SKEEN, D. 1985. Consistency in partitioned networks. ACM Comput. Surv. 17, 3 (Sept.), 341-370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. GARCIA-MOLINA, H. AND WIEDERHOLD, G. 1982. Read-only transactions in a distributed database. ACM Trans. Database Syst. 7, 2 (June), 209-234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. GIFFORD, D. K. 1979. Weighted voting for replicated data. In Proceedings of the Seventh Symposium on Operating Systems Principles (Asilomar, CA, Dec.), 150-162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. GOLDING, R.A. 1993. Accessing replicated data in a large-scale distributed system. Ph.D. Thesis, Department of Computer Science, University of California, Santa Cruz.Google ScholarGoogle Scholar
  23. GOLDING, R. A. AND LONG, D.E. 1991. Accessing replicated data in a large-scale distributed system. Int. J. Comput. Simul. 1, 2 (Jan.), 347-372.Google ScholarGoogle Scholar
  24. GRAY, J., ED. 1991. The Benchmark Handbook for Database and Transaction Processing Systems. Morgan-Kaufmann, San Mateo, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. GRAY, J. AND REUTER, A. 1993. Transaction Processing: Concepts and Techniques. Morgan- Kaufmann, San Mateo, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. GuY, R. G., PAGE, T. W., HEIDEMANN, J. S., AND POPEK, G.J. 1990. Name transparency in very large scale distributed file systems. In Proceedings of the Second IEEE Workshop on Experimental Distributed Systems (Huntsville, AL, Oct. 1990), 20-25.Google ScholarGoogle Scholar
  27. HAERDER, T. AND REUTER, A. 1983. Principles of transaction-oriented database recovery. ACM Comput. Surv. 15, 4 (July), 287-317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. HELAL, A. 1991. Experimental analysis of replication in distributed systems. Ph.D. Thesis, Department of Computer Sciences, Purdue University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. HELAL, A., HEDDAYA, A., AND BHARGAVA, B. 1996. Replication Techniques in Distributed Systems. Kluwer Academic, Advances in Database Systems Book Series, Norwell, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. HERLIHY, M. 1986. A quorum consensus replication method for abstract data types. ACM Trans. Comput. Syst. 4, 1 (Feb.), 32-53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. JAJODIA, S. AND MUTCHLER, D. 1987. Dynamic voting. In Proceedings of the Sixteenth SIGMOD Conference (San Francisco, May). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. KUMAR, A. 1990a. Hierarchical quorum consensus: A new class of algorithms for replicated data. Tech. Rep. (Aug.), Graduate School of Management, Cornell University.Google ScholarGoogle Scholar
  33. KUMAR, A. 1990b. Performance analysis of a hierarchical quorum consensus. In Proceedings of the Tenth International Conference on Distributed Computing Systems (Paris, May), 378-385.Google ScholarGoogle ScholarCross RefCross Ref
  34. KUMAR, A. AND CHUENG, S. 1991. A high availability V~ hierarchical grid algorithm for replicated data. Inf. Proc. Lett. 40, 311-316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. LIU, X. 1995. Data communication and replication strategies for large scale distributed databases. Ph.D. Thesis, Department of Computer Sciences, Purdue University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. LIU, X., CHENG, L., BHARGAVA, B., AND ZHAO, Z. 1995a. Experimental study of data communication for scalability in distributed databases. Tech. Rep. CSD-TR-95-046 (July), Department of Computer Sciences, Purdue University.Google ScholarGoogle Scholar
  37. LIU, X., PITOURA, E., AND BHARGAVA, B. 1995b. Adapting distributed database systems for high availability. Tech. Rep. CSD-TR-95-013 (Feb.), Department of Computer Sciences, Purdue University.Google ScholarGoogle Scholar
  38. MAEKAWA, M. 1985. A V~ algorithm for mutual exclusion in decentralized systems. ACM Trans. Comput. Syst. 3, 2 (May), 145-159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. MINOURA, T. AND WIEDERHOLD, G. 1982. Resilient extended true-copy token scheme for a distributed database system. IEEE Trans. Softw. Eng. 9, 5 (May), 173-189.Google ScholarGoogle Scholar
  40. OPPEN, D. C. AND DALAL, Y.K. 1981. The clearinghouse: A decentralized agent for locating named objects in a distributed environment. Tech. Rep. OPD-T8103, Xerox Research Center, Palo Alto, CA.Google ScholarGoogle Scholar
  41. SANDHU, H. S. AND ZHOU, S. 1992. Cluster-based file replication in large-scale distributed systems. In Proceedings of the ACM SIGMETRICS & Performance Conference (Rhode Island, June), 91-102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. SATYANARAYANAN, M. 1990. Scalable, secure, and highly available distributed file access. IEEE Comput. 23, 5 (May), 9-21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. SATYANARAYANAN, M., KISTLER, J. J., KUMAR, P., OKASAKI, M. E., SIEGEL, E. H., AND STEERE, D.C. 1990. Coda: A highly available file system for a distributed workstation environment. IEEE Trans. Comput. 39, 4 (April), 447-459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. STONEBRAKER, M. 1979. Concurrency control and consistency of multiple copies of data in distributed INGRES. IEEE Trans. Softw. Eng. SE-5, 3 (May), 188-194.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. TRANSACTION PROCESSING PERFORMANCE COUNCIL (TPC). 1992. TPC Benchmark C (Rev. 1.0 ed.). Transaction Processing Performance Council (TPC), San Jose, CA.Google ScholarGoogle Scholar
  46. ZHANG, Y. 1994. Communication experiments for distributed transaction processing--from LAN to WAN. Ph.D. Thesis, Department of Computer Sciences, Purdue University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. ZHANG, Y. AND BHARGAVA, B. 1993. WANCE: A wide area network communication emulation system. In Proceedings of the IEEE Workshop on Advances in Parallel and Distributed Systems (Princeton, NJ, Oct.), 40-45.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Reviews

Alan Raymond Hevner

The authors propose a scalable protocol for managing replicated data in large-scale systems. Extensions to previous replication management protocols are predicated on the need to handle data copies at hundreds and even thousands of locations. The multiview access (MVA) protocol organizes sites and data copies into a tree-structured, hierarchical cluster architecture. The key to the MVA protocol is the ability to decompose a large update transaction into a related set of independently committed update transactions. Each update transaction is responsible for updating all replicas in one cluster and the primary copy in all of its child clusters. Then new update transactions are invoked in all the child clusters. Beginning at the root cluster, this protocol continues until all data copies are updated. Thus, at any time, multiple views of replicated data can appear in the system. Correctness proofs are provided for the basic protocol and for an extended protocol with backup primary and coordinator sites. Five unique advantages are claimed for the MVA protocol: Due to the smaller number of copies to be updated, the transaction abort rate is reduced. Transaction response times are improved as top-level update transactions are allowed to commit before all copies have been updated. Read-only queries have the flexibility to request a desired level of data consistency and currency from the multiple database views. The MVA protocol is scalable, since clusters can be dynamically split or combined. Autonomy of the clusters is preserved since no specific protocol is required to update copies within the cluster. The performance claims are validated via simulation experiments. The MVA protocol is an interesting and seemingly practical approach for providing replication management in large-scale systems, such as large organizational intranets or even the Internet. More research and evaluation will be needed before actual implementation, but developers of large distributed systems with significant data replication requirements are encouraged to read this paper and understand the basic concepts of the proposed approach.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader