Replicated abstract data types: Building blocks for collaborative applications

https://doi.org/10.1016/j.jpdc.2010.12.006Get rights and content

Abstract

For distributed applications requiring collaboration, responsive and transparent interactivity is highly desired. Though such interactivity can be achieved with optimistic replication, maintaining replica consistency is difficult. To support efficient implementations of collaborative applications, this paper extends a few representative abstract data types (ADTs), such as arrays, hash tables, and growable arrays (or linked lists), into replicated abstract data types (RADTs). In RADTs, a shared ADT is replicated and modified with optimistic operations. Operation commutativity and precedence transitivity are two principles enabling RADTs to maintain consistency despite different execution orders. Especially, replicated growable arrays (RGAs) support insertion/deletion/update operations. Over previous approaches to the optimistic insertion and deletion, RGAs show significant improvement in performance, scalability, and reliability.

Research highlights

► We propose three RADTs as building blocks of collaborative applications. ► Precedence transitivity can achieve consistency of RADTs by ensuring commutativity. ► RGAs provide insertion, deletion, and update operations working in O(1) at remote sites. ► RGAs show significant improvement in complexity, scalability, and reliability.

Introduction

Optimistic replication is an essential technique for interactive collaborative applications [8], [30]. To illustrate replication issues in collaboration, consider the following scenario in an editorial office publishing a daily newspaper.

A number of pressmen are editing a newspaper using computerized collaboration tools. Each of them is browsing and editing pages consisting of news items, such as text, pictures, and tables. When a writer collaborates on editing the same article with others, his local interaction is never blocked, but interactions of the others are shown to him as soon as possible. After all interactions cease, all the copies of the newspaper become consistent.

Human users, the subjects of these applications, prefer high responsiveness and transparent interactivity to strict consistency [8], [30], [13]. Responsiveness means how quickly the effect of an operation is delivered to users, and interactivity is how freely operations can be performed. Optimistic operations that are executed first at each local site enable to achieve these properties, but consistency should be maintained as sites execute operations in different orders.

Optimistic replication contrasts with pessimistic concurrency control protocols [30], such as serialization [5], [14] or locking [3], [12]. Even if a global locking protocol allows optimistic operations [13], it not only requires a state rollback mechanism, but also damages interactivity due to the nature of the locking protocol. There has been research on genuine optimistic replication oriented to specific services, such as a replicated databases [36], [37], Usenet [21], [9], [6], and a collaborative textual or graphical editor [8], [27], [34], [33]. However, these service-oriented techniques are inflexible for various complex functions of modern interactive applications; e.g., electronic blackboards, games, CAD tools, and office tools such as Microsoft Office and Google Docs, all of which can be extended for collaboration.

Interactive applications, e.g., CAD tools designing for skyscrapers or spaceships, have demanded managing of indeterminate data; one data may consist of limited elements, another data may need quick access to unlimited elements, and the other data may contain ordered elements frequently inserted and deleted. Sensible developers would make use of various abstract data types (ADTs) to reflect such a demand. When those applications are extended for collaboration, however, developers may abandon to use ADTs for shared data owing to inconsistency. Hence, we suggest replicated abstract data types (RADTs), a novel class of ADTs that can be used as building blocks for collaborative applications.

RADTs are multiple copies of a shared ADT replicated over distributed sites. RADTs provide a set of primitive operations corresponding to that of normal ADTs, concealing the details of consistency maintenance. RADTs ensure eventual consistency [30], a weak consistency model for achieving responsiveness and interactivity. By imposing no constraint on operation delivery except causal dependency, we accommodate RADT deployment in general environments. This allows a site to execute operations in any causal order. We model such executions and explore principles to achieve eventual consistency.

This paper suggests two principles that lead to successful designs of non-trivial RADTs. First, operation commutativity (OC) requires that every pair of concurrent operations commutes. Though the concept of commutativity was discussed in many distributed systems [39], [1], [27], it was not fully assimilated. We formally prove that OC guarantees eventual consistency for all possible execution orders; so, we mandate RADT operations to satisfy OC. Second, precedence transitivity (PT) requires that all precedence rules are transitive. RADTs require precedence rules to reconcile conflicting intentions. PT is a guideline on how to design remote operations so that RADT operations satisfy OC and preserve their intentions. In short, OC is a sufficient condition to ensure eventual consistency, while PT is a principle for exploiting OC.

We present efficient implementations of three RADTs: replicated fixed-size arrays (RFAs), replicated hash tables (RHTs), and replicated growable arrays (RGAs). Although some key ideas for RFAs and RHTs were already present in the literature [36], [37], [21], [9], [6], we introduce them again because they exemplify the concepts of RADTs, and above all because their problems and ideas are inherited by RGAs.

RGAs are another main contribution of this paper, which solves the problem of optimistic insertions and deletions into a replicated ordered set. As these operations have been highly desired in collaborative applications [8], [13], the operational transformation (OT) framework is the classic approach for these operations. Various OT methods have been introduced [7], [27], [35], [34], [19], [22], and one of them is adopted by a web collaboration tool Google Wave [11]. However, the OT framework has difficulty in verifying correctness, and an evaluation study on recent OT methods reports that their performance and scalability are poor and non-negligible [18].

Thanks to OC and PT, RGAs provide full correctness verification not only for insertions and deletions, but also for updates [29]. In addition, RGAs are superior to most of the previous works in complexity, scalability, and reliability. Whereas remote operations of OT methods generally have quadratic time-complexity, remote RGA operations can perform in O(1) time by proposing the s4vector index (SVI) scheme. Our evaluation shows that operations needing about hundreds of ms in OT methods take only dozens of μs in RGAs. Due to the optimal remote operations and the fixed-size s4vectors, RGAs scale. Additionally, RGAs have a chance to enhance reliability by autonomous causality validations of the SVI scheme. RGAs, therefore, can be a better alternative of OT methods.

Section 2 describes three RADTs and their inconsistency problems. Sections 3 Operation commutativity, 4 Precedence transitivity formalize OC and PT, respectively. Concrete algorithms of RADTs are proposed in Section 5. We survey the related work in Section 6, and contrast RGAs with previous work in Section 7. Section 8 presents the performance evaluation, and we conclude this paper in Section 9.

Section snippets

Preliminary: causality preservation among operations

The replication system discussed in this paper is characterized by a set of distributed sites and operations as shown in the time–space diagram of Fig. 1 which describes the propagations and executions of operations. Lamport presented two definitions for causality [15]: happened-before relation (‘’) and concurrent relation (‘’). Given a time–space diagram consisting of n operations, all n(n1)2 relations are obtained; every pair of distinct operations is in either of the two relations.

While

Operation commutativity

RADTs allow sites to execute operations in different orders. To denote an execution order of two operations, we use ‘’; e.g., OaOb if Oa is executed before Ob. In addition, we use ‘’ to express changes of replica states caused by the execution of an operation or a sequence of operations; e.g., RS0OaRS1ObRS2 means that Oa and Ob change a replica state RS0 into RS1 and then into RS2 in order. We abbreviate this as RS0OaObRS2. Though time–space diagrams, such as Fig. 1, Fig. 3, are

Precedence transitivity

In RADTs, operations relate to object containers, i.e., elements of RFAs, slots of RHTs, and nodes of RGAs. This relationship would be clarified by causal object (cobject) and effective operation (eoperation).

  • cobject: For a local operation O, its cobject is the object container indicated by the index of O. If O is an Insert, it has two cobjects: one is called as left cobject which is indicated by the index of O (say i), and the other is right cobject which is the one of i+1 when O is generated.

The S4Vector

For optimization purpose, we define a quadruple vector type.

Let vO be the vector clock of an operation issued at site i. Then, an S4VectorsO can be derived from vO as follows: (1) sO[ssn] is a global session number that increases monotonically, (2) sO[sid] is the site ID unique to the site, (3) sO[sum] is (vO)ivO[i], and (4) sO[seq] is vO[i] reserved for purging tombstones (see Section 5.6). To illustrate, suppose that vO=[1,2,3] is the vector clock of an operation that is

Related work

The concept of commutativity was first introduced in distributed database systems [1], [39]. It was, however, applied to concurrency control over centralized resources, but not to consistency maintenance among replicas. In other words, to grant more concurrency to some transactions in a locking protocol, transaction schedulers allow only innately commutative operations, e.g., Writes on different objects, to be executed concurrently while noncommutative operations still have to be locked. For

Complexity, scalability, and reliability

As the building blocks, the time complexity of RADTs is decisive for the performance and quality of collaborative applications. The time complexity of the local RADT operations is the same as that of the operations of the corresponding normal ADTs. In RFAs and RHTs, the remote operations perform in the same complexity as the corresponding local ADTs based on the same data structures; thus, Write, Put, and Remove work optimally in O(1). Only when the hash functions malfunction, is the

Performance evaluation

We perform some experiments on RGAs to verify if the RGA operations actually work as the analysis of Section 7 and to compare with some previous approaches. To our knowledge, however, no previous approaches have presented any performance evaluation yet, except SDT and ABT [18]. Comparably to the experiments in [18], RGAs are implemented in C++ language and compiled by GNU g++ v4.2.4 on Linux kernel 2.6.24. We automatically generate intensive workloads modeling real-time collaborations with

Conclusions

When developing applications, programmers are used to using various ADTs. Providing the same semantics of ADTs to programmers, RADTs can support efficient implementations of collaborative applications. Operation commutativity and precedence transitivity make it possible to design the complicated optimistic RGA operations without serialization/locking protocols/state rollback scheme/undo-do-redo scheme/OT methods. Especially, in performance, RGAs provide the remote operations of O(1) with the

Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 2010-0000829). We would like to give warm thanks to all the anonymous reviewers and special thanks to Dr. Marc Shapiro at INRIA.

Hyun-Gul Roh received his B.S. degree in computer science from Yonsei University, Korea, in 2002, and is due to receive his Ph.D. degree in computer science from KAIST (Korea Advanced Institute of Science and Technology), in 2011. Currently, he is working as a research intern at INRIA from September, 2010. His research interests include distributed and replication systems, especially, collaboration and version vectors.

References (41)

  • B. Lushman et al.

    Proof of correctness of Ressel’s adOPTed algorithm

    Information Processing Letters

    (2003)
  • R. Prakash et al.

    An adaptive causal ordering algorithm suited to mobile computing environments

    Journal of Parallel and Distributed Computing

    (1997)
  • B.R. Badrinath et al.

    Semantics-based concurrency control: beyond commutativity

    ACM Transactions on Database Systems

    (1992)
  • V. Balakrishnan

    Graph Theory

    (1997)
  • P.A. Bernstein et al.

    An algorithm for concurrency control and recovery in replicated distributed databases

    ACM Transactions on Database Systems

    (1984)
  • K. Birman et al.

    The ISIS project: real experience with a fault tolerant programming system

    SIGOPS Operating Systems Review

    (1991)
  • K.P. Birman et al.

    Lightweight causal and atomic group multicast

    ACM Transactions on Computer Systems

    (1991)
  • A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Sturgis, D. Swinehart, D. Terry, Epidemic...
  • C.A. Ellis, S.J. Gibbs, Concurrency control in groupware systems, in: Proceedings of ACM International Conference on...
  • C.A. Ellis et al.

    Groupware: some issues and experiences

    Communications of the ACM

    (1991)
  • M.J. Fischer, A. Michael, Sacrificing serializability to attain availability of data in an unreliable network, in:...
  • R.A. Golding, Weak-consistency group communication and membership, Ph.D. Thesis, University of California, Santa Cruz,...
  • Google Inc., Google wave protocols, 2009....
  • J. Gray, P. Helland, P. O’Neil, D. Shasha, The dangers of replication and a solution, in: Proceedings of ACM...
  • S. Greenberg, D. Marwood, Real time groupware as a distributed system: concurrency control and its effect on the...
  • P.A. Jensen, N.R. Soparkar, A.G. Mathur, Characterizing multicast orderings using concurrency control theory, in:...
  • L. Lamport

    Time, clocks, and the ordering of events in a distributed system

    Communications of the ACM

    (1978)
  • D. Li, R. Li, Preserving operation effects relation in group editors, in: Proceedings of ACM Conference on Computer...
  • R. Li, D. Li, Commutativity-based concurrency control in groupware, in: International Conference on Collaborative...
  • D. Li et al.

    A performance study of group editing algorithms

  • Cited by (105)

    • On the correctness of highly available systems in the presence of failures

      2023, Journal of Parallel and Distributed Computing
    • Specification and space complexity of collaborative text editing

      2021, Theoretical Computer Science
      Citation Excerpt :

      The proof of the lemma is analogous to that of Lemma 3. Our lower bound results hold for push-based protocols, a class of protocols that contains the protocols of several collaborative editing systems [20,23,26,29], including the RGA protocol of Section 4. Informally, a replica in a push-based protocol propagates list updates to its peers as soon as possible and merges remote updates into its state as soon as they arrive (as opposed to using a more sophisticated mechanism, such as a consensus protocol).

    • Riffle: Reactive Relational State for Local-First Applications

      2023, UIST 2023 - Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology
    View all citing articles on Scopus

    Hyun-Gul Roh received his B.S. degree in computer science from Yonsei University, Korea, in 2002, and is due to receive his Ph.D. degree in computer science from KAIST (Korea Advanced Institute of Science and Technology), in 2011. Currently, he is working as a research intern at INRIA from September, 2010. His research interests include distributed and replication systems, especially, collaboration and version vectors.

    Myeongjae Jeon is currently a Ph.D. student in computer science at Rice University. He received his M.S. degree in computer science from Korea Advanced Institute of Science and Technology (KAIST) in 2009 and his B.E. degree in computer engineering from Kwangwoon University in 2005. His research interests include machine virtualization, distributed systems, and storage systems.

    Jin-Soo Kim received his B.S., M.S., and Ph.D. degrees in Computer Engineering from Seoul National University, Korea, in 1991, 1993, and 1999, respectively. He was with the IBM T.J. Watson Research Center as an academic visitor from 1998 to 1999. He was the faculty in computer science department at KAIST from 2002 to 2008. Currently, he is the faculty of SungKyunKwan University. His research interests include operating systems, distributed file systems, and grid computing.

    Joonwon Lee received his B.S. degree from Seoul National University in 1983 and his Ph.D. degree from the Georgia Institute of Technology in 1991. From 1991 to 1992, he was with IBM T.J. Watson Research Center. After working for IBM, he was the faculty of KAIST from 1992 to 2008. Currently, he is the faculty of SungKyunKwan University. His research interests include operating systems, virtual machines, and parallel processing.

    View full text