Article

Tashkent: uniting durability with transaction ordering for high-performance scalable database replication

Authors:
Sameh Elnikety

School of Computer and Communication Sciences, EPFL, Switzerland

School of Computer and Communication Sciences, EPFL, Switzerland
View Profile

,
Steven Dropsho

School of Computer and Communication Sciences, EPFL, Switzerland

School of Computer and Communication Sciences, EPFL, Switzerland
View Profile

,
Fernando Pedone

Università della Svizzera Italiana, USI, Switzerland

Università della Svizzera Italiana, USI, Switzerland
View Profile

EuroSys '06: Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006April 2006Pages 117–130https://doi.org/10.1145/1217935.1217947

Published:18 April 2006Publication History

EuroSys '06: Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006

Pages 117–130

ABSTRACT

In stand-alone databases, the functions of ordering the transaction commits and making the effects of transactions durable are performed in one single action, namely the writing of the commit record to disk. For efficiency many of these writes are grouped into a single disk operation. In replicated databases in which all replicas agree on the commit order of update transactions, these two functions are typically separated. Specifically, the replication middleware determines the global commit order, while the database replicas make the transactions durable.The contribution of this paper is to demonstrate that this separation causes a significant scalability bottleneck. It forces some of the commit records to be written to disk serially, where in a standalone system they could have been grouped together in a single disk write. Two solutions are possible: (1) move durability from the database to the replication middleware, or (2) keep durability in the database and pass the global commit order from the replication middleware to the database.We implement these two solutions. Tashkent-MW is a pure middleware solution that combines durability and ordering in the middleware, and treats an unmodified database as a black box. In Tashkent-API, we modify the database API so that the middleware can specify the commit order to the database, thus, combining ordering and durability inside the database. We compare both Tashkent systems to an otherwise identical replicated system, called Base, in which ordering and durability remain separated. Under high update transaction loads both Tashkent systems greatly outperform Base in throughput and response time.

References

Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O'Neil, and Patrick O'Neil. A critique of ANSI SQL isolation levels. In proceedings of the SIGMOD International Conference on Management of Data, May 1995. Google ScholarDigital Library
Philip Bernstein, Vassos Hadzilacos, and Nathan Goodman. Concurrency Control and Recovery in Database Systems. Addison-Wesley, 1987. Google ScholarDigital Library
Sameh Elnikety, Fernando Pedone, and Willy Zwaenepoel. Database Replication Using Generalized Snapshot Isolation. IEEE Symposium on Reliable Distributed Systems (SRDS 2005), Orlando, Florida, Oct. 2005. Google ScholarDigital Library
Alan Fekete. Allocating Isolation Levels to Transactions. ACM Sigmod, Baltimore, Maryland, June 2005. Google ScholarDigital Library
Alan Fekete. Serialisability and snapshot isolation. In proceedings of the Australian Database Conference, pages 201--210, Auckland, New Zealand, January 1999.Google Scholar
Lars Frank. Evaluation of the basic remote backup and replication methods for high availability databases. Software Practice and Experience, 29:1339--1353, 1999. Google ScholarDigital Library
Alan Fekete, Dimitrios Liarokapis, Elizabeth O'Neil, Patrick O'Neil, and Dennis Shasha. Making snapshot isolation serializable. In proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pages 173--182, June 1996.Google Scholar
Lei Gao, Mike Dahlin, Amol Nayate, Jiandan Zheng, and Arun Iyengar. Application specific data replication for edge services. In Proceedings of the twelfth international conference on World Wide. Web, pages 449--460. ACM Press, 2003. Google ScholarDigital Library
Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. The dangers of replication and a solution. In proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Canada, June 1996. Google ScholarDigital Library
K. Jacobs. Concurrency control, transaction isolation and serializability in SQL92 and Oracle7. Technical report number A33745, Oracle Corporation, Redwood City, CA, July 1995.Google Scholar
Bettina Kemme and Gustavo Alonso. Don't be lazy, be consistent: Postgres-R, a new way to implement database replication. In proceedings of 26th International Conference on Very Large Data Bases (VLDB 2000), Cairo, Egypt, September 2000. Google ScholarDigital Library
Bettina Kemme and Gustavo Alonso. A suite of database replication protocols based on group communication primitives. In proceedings 18th International Conference on Distributed Computing Systems (ICDCS), Amsterdam, The Netherlands, May 1998. Google ScholarDigital Library
Leslie Lamport. The Part-time Parliament. ACM Transactions on Computer Systems, 16(2):133--169, May 1998. Google ScholarDigital Library
Yi Lin, Bettina Kemme, Marta Patifio-Martínez, and Ricardo Jiménez-Peris. Middleware based Data Replication providing Snapshot Isolation. ACM Int. Conf. on Management of Data (SIGMOD), Baltimore, Maryland, June 2005. Google ScholarDigital Library
Oracle parallel server for windows NT clusters. Online White Paper.Google Scholar
Data Concurrency and Consistency, Oracle8 Concepts, Release 8.0: Chapter 23. Technical report, Oracle Corporation, 1997.Google Scholar
Christos Papadimitriou. The theory of database concurrency control. Computer Science Press. July 1986. Google ScholarDigital Library
Christian Plattner and Gustavo Alonso. Ganymed: Scalable Replication for Transactional Web Applications. In proceedings of the 5th ACM/IFIP/USENIX International Middleware Conference, Toronto, Canada, October 2004. Google ScholarDigital Library
PostgreSQL, SQL compliant, open source object-relational database management system. http://www.postgresql.org/.Google Scholar
Calton Pu and Avraham Leff. Replica control in distributed systems: an asynchronous approach. SIGMOD Record (ACM Special Interest Group on Management of Data), 20(2): 377--386, June 1991. Google ScholarDigital Library
Robbert van Renesse and Fred B. Schneider. Chain Replication for Supporting High Throughput and Availability. Sixth Symposium on Operating Systems Design and Implementation (OSDI '04), San Francisco, California, December 2004. Google ScholarDigital Library
Fred B. Schneider. Implementing fault-tolerant services using the state machine approach: a tutorial. In ACM Computing Surveys. 22 (4):299--319, December 1990. Google ScholarDigital Library
Transaction Processing Performance Council - http://www.tpc.org/.Google Scholar
Shuqing Wu and Bettina Kemme. Postgres-R(SI): Combining Replica Control with Concurrency Control based on Snapshot Isolation. In proceedings of International Conference on Data Engineering (ICDE), April 2005. Google ScholarDigital Library
Matthias Wiesmann, Fernando Pedone, André Schiper, Bettina Kemme, and Gustavo Alonso. Understanding replication in databases and distributed systems. In proceedings of 20th International Conference on Distributed Computing Systems (ICDCS'2000), Taipei, Taiwan, April 2000. Google ScholarDigital Library

Index Terms

Tashkent: uniting durability with transaction ordering for high-performance scalable database replication
1. Information systems
  1. Data management systems
    1. Database management system engines

Recommendations

Tashkent: uniting durability with transaction ordering for high-performance scalable database replication
Proceedings of the 2006 EuroSys conference

In stand-alone databases, the functions of ordering the transaction commits and making the effects of transactions durable are performed in one single action, namely the writing of the commit record to disk. For efficiency many of these writes are ...
Read More
SIPRe: a partial database replication protocol with SI replicas
SAC '08: Proceedings of the 2008 ACM symposium on Applied computing

Database replication has been researched as a solution to overcome the problems of performance and availability of distributed systems. Full database replication, based on group communication systems, is an attempt to enhance performance that works well ...
Read More
Tashkent+: memory-aware load balancing and update filtering in replicated databases
EuroSys'07 Conference Proceedings

We present a memory-aware load balancing (MALB) technique to dispatch transactions to replicas in a replicated database. Our MALB algorithm exploits knowledge of the working sets of transactions to assign them to replicas in such a way that they execute ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EuroSys '06: Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
April 2006
420 pages
ISBN:1595933220
DOI:10.1145/1217935
Conference Chair:
Yolande Berbers
K. U. Leuven, Belgium
,
Program Chair:
Willy Zwaenepoel
EPFL
ACM SIGOPS Operating Systems Review Volume 40, Issue 4
Proceedings of the 2006 EuroSys conference
October 2006
383 pages
ISSN:0163-5980
DOI:10.1145/1218063
Issue’s Table of Contents
Copyright © 2006 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 April 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
database replication
generalized snapshot isolation
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate241of1,308submissions,18%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 78
  Total Citations
  View Citations
- 463
  Total Downloads
- Downloads (Last 12 months)34
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Tashkent: uniting durability with transaction ordering for high-performance scalable database replication

EuroSys '06: Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006

ABSTRACT

References

Cited By

Index Terms

Recommendations

Tashkent: uniting durability with transaction ordering for high-performance scalable database replication

SIPRe: a partial database replication protocol with SI replicas

Tashkent+: memory-aware load balancing and update filtering in replicated databases