Skip to main content
Erschienen in: Cluster Computing 4/2017

24.08.2017

Design and implementation of an efficient flushing scheme for cloud key-value storage

verfasst von: Yongseok Son, Heon Young Yeom, Hyuck Han

Erschienen in: Cluster Computing | Ausgabe 4/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A key-value store is an essential component that is increasingly demanded in many scale-out environments, including social networks, online retail environments, and cloud services. Modern key-value storage engines provide many features, including transaction, versioning, and replication. In storage engines, transaction processing provides atomicity and durability by using write-ahead logging (WAL), which flushes log data before the data page is written to persistent storage in synchronous commit. However, according to our observation, WAL is a performance bottleneck in key-value storage engines since the flushing of log data to persistent storage incurs a significant overhead of lock contention and fsync() calls, even with the various optimizations in the existing scheme. In this article, we propose an approach to improve the performance of key-value storage by optimizing the existing flushing scheme combined with group commit and consolidate array. Our scheme aggregates the multiple flushing of log data into a large request on the fly and completes the request early. This scheme is an efficient group commit that reduces the number of frequent lock acquisitions and fsync() calls in the synchronous commit while supporting the same transaction level that the existing scheme provides. Furthermore, we integrate our flushing scheme into the replication system and evaluate it by using multiple nodes. We implement our scheme on the WiredTiger storage engine. The experimental results show that our scheme improves the performance of the key-value workload compared to the existing scheme.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
A leader is the first thread to acquire a log buffer.
 
2
The offset is its own LSN.
 
3
The group size is the size of the total joined log records in the slot.
 
4
The completion flag denotes whether the LSN is completed or not.
 
5
Replication can be broadly classified into two categories, such as synchronous and asynchronous replication. In synchronous replication, the transactions are committed simultaneously in all nodes. The master and the slave always remain synchronized. Thus, the data is guaranteed to be consistent in all nodes when the transaction is committed. Meanwhile, in asynchronous replication, the transactions are committed at the master server first and then they are replicated to the slave. This means that the master and slave may not be consistent. The advantage of using asynchronous replication is that it is faster and scales better than synchronous replication. However, the data is not guaranteed to be consistent in all nodes.
 
Literatur
1.
Zurück zum Zitat Aguilera, M.K., Leners, J.B., Walfish, M.: Yesquel: scalable sql storage for web applications. In: Proceedings of the 25th Symposium on Operating Systems Principles, ACM, pp. 245–262 (2015) Aguilera, M.K., Leners, J.B., Walfish, M.: Yesquel: scalable sql storage for web applications. In: Proceedings of the 25th Symposium on Operating Systems Principles, ACM, pp. 245–262 (2015)
2.
Zurück zum Zitat Arulraj, J., Perron, M., Pavlo, A.: Write-behind logging. Proc. VLDB Endow. 10(4), 337–348 (2016)CrossRef Arulraj, J., Perron, M., Pavlo, A.: Write-behind logging. Proc. VLDB Endow. 10(4), 337–348 (2016)CrossRef
3.
Zurück zum Zitat Atikoglu, B., Xu, Y., Frachtenberg, E., Jiang, S., Paleczny, M.: Workload analysis of a large-scale key-value store. ACM SIGMETRICS Perform. Eval. Rev. 40, 53–64 (2012)CrossRef Atikoglu, B., Xu, Y., Frachtenberg, E., Jiang, S., Paleczny, M.: Workload analysis of a large-scale key-value store. ACM SIGMETRICS Perform. Eval. Rev. 40, 53–64 (2012)CrossRef
4.
Zurück zum Zitat Banker, K.: MongoDB in Action. Manning Publications Co., Greenwich (2011) Banker, K.: MongoDB in Action. Manning Publications Co., Greenwich (2011)
5.
Zurück zum Zitat Bernstein, P.A., Hadzilacos, V., Goodman, N.: Currency Control and Recovery in Database Systems. Addison-Wesley, Reading (1987) Bernstein, P.A., Hadzilacos, V., Goodman, N.: Currency Control and Recovery in Database Systems. Addison-Wesley, Reading (1987)
6.
Zurück zum Zitat Carlson, J.L.: Redis in Action. Manning Publications Co., Greenwich (2013) Carlson, J.L.: Redis in Action. Manning Publications Co., Greenwich (2013)
7.
Zurück zum Zitat Chen, S.: Flashlogging: exploiting flash devices for synchronous logging performance. In: SIGMOD, New York, NY, USA, SIGMOD’09, ACM, pp. 73–86 (2009) Chen, S.: Flashlogging: exploiting flash devices for synchronous logging performance. In: SIGMOD, New York, NY, USA, SIGMOD’09, ACM, pp. 73–86 (2009)
8.
Zurück zum Zitat Cloud, A.E.C.: Amazon web services (2011). Accessed on 9 November 2011 Cloud, A.E.C.: Amazon web services (2011). Accessed on 9 November 2011
9.
Zurück zum Zitat Felber, P., Pasin, M., Rivière, É., Schiavoni, V., Sutra, P., Coelho, F., Oliveira, R., Matos, M., Vilaça, R.: On the support of versioning in distributed key-value stores. In: 2014 IEEE 33rd International Symposium on Reliable Distributed Systems (SRDS), IEEE, pp. 95–104 (2014) Felber, P., Pasin, M., Rivière, É., Schiavoni, V., Sutra, P., Coelho, F., Oliveira, R., Matos, M., Vilaça, R.: On the support of versioning in distributed key-value stores. In: 2014 IEEE 33rd International Symposium on Reliable Distributed Systems (SRDS), IEEE, pp. 95–104 (2014)
10.
Zurück zum Zitat Fitzpatrick, B.: Distributed caching with memcached. Linux J. 2004(124), 5 (2004) Fitzpatrick, B.: Distributed caching with memcached. Linux J. 2004(124), 5 (2004)
11.
Zurück zum Zitat Fruhwirt, P., Kieseberg, P., Schrittwieser, S., Huber, M., Weippl, E.: Innodb database forensics: reconstructing data manipulation queries from redo logs. In: 2012 Seventh International Conference on Availability, Reliability and Security (ARES) (2012) Fruhwirt, P., Kieseberg, P., Schrittwieser, S., Huber, M., Weippl, E.: Innodb database forensics: reconstructing data manipulation queries from redo logs. In: 2012 Seventh International Conference on Availability, Reliability and Security (ARES) (2012)
12.
Zurück zum Zitat Gao, S., Xu, J., He, B., Choi, B., Hu, H.: Pcmlogging: reducing transaction logging overhead with pcm. In: 20th ACM International Conference on Information and Knowledge Management, New York, NY, USA, CIKM’11, ACM, pp. 2401–2404 (2011) Gao, S., Xu, J., He, B., Choi, B., Hu, H.: Pcmlogging: reducing transaction logging overhead with pcm. In: 20th ACM International Conference on Information and Knowledge Management, New York, NY, USA, CIKM’11, ACM, pp. 2401–2404 (2011)
13.
Zurück zum Zitat Goel, S., Buyya, R.: Data replication strategies in wide-area distributed systems. In: Enterprise Service Computing: From Concept to Deployment. IGI Global, pp. 211–241 (2007) Goel, S., Buyya, R.: Data replication strategies in wide-area distributed systems. In: Enterprise Service Computing: From Concept to Deployment. IGI Global, pp. 211–241 (2007)
14.
Zurück zum Zitat Gray, J., Reuter, A.: Transaction Processing: Concepts and Techniques. Elsevier, San Francisco (1992)MATH Gray, J., Reuter, A.: Transaction Processing: Concepts and Techniques. Elsevier, San Francisco (1992)MATH
15.
Zurück zum Zitat Han, J., Haihong, E., Le, G., Du, J.: Survey on NoSQL database. In: 2011 6th International Conference on Pervasive Computing and Applications (ICPCA), IEEE, pp. 363–366 (2011) Han, J., Haihong, E., Le, G., Du, J.: Survey on NoSQL database. In: 2011 6th International Conference on Pervasive Computing and Applications (ICPCA), IEEE, pp. 363–366 (2011)
16.
Zurück zum Zitat Helland, P., et al. Group commit timers and high volume transaction systems. In: High Performance Transaction Systems. Springer, New York, pp. 301–329 (1989) Helland, P., et al. Group commit timers and high volume transaction systems. In: High Performance Transaction Systems. Springer, New York, pp. 301–329 (1989)
17.
Zurück zum Zitat Huang, J., Schwan, K., Qureshi, M.K.: Nvram-aware logging in transaction systems. Proc. VLDB Endow. 8(4), 389–400 (2014)CrossRef Huang, J., Schwan, K., Qureshi, M.K.: Nvram-aware logging in transaction systems. Proc. VLDB Endow. 8(4), 389–400 (2014)CrossRef
18.
Zurück zum Zitat Johnson, R., Pandis, I., Stoica, R., Athanassoulis, M., Ailamaki, A.: Aether: a scalable approach to logging. Proc. VLDB Endow. 3(1–2), 681–692 (2010)CrossRef Johnson, R., Pandis, I., Stoica, R., Athanassoulis, M., Ailamaki, A.: Aether: a scalable approach to logging. Proc. VLDB Endow. 3(1–2), 681–692 (2010)CrossRef
19.
Zurück zum Zitat Kang, W.-H., Lee, S.-W., Moon, B., Kee, Y.-S., Oh, M.: Durable write cache in flash memory ssd for relational and nosql databases. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD ’14 (2014) Kang, W.-H., Lee, S.-W., Moon, B., Kee, Y.-S., Oh, M.: Durable write cache in flash memory ssd for relational and nosql databases. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD ’14 (2014)
20.
Zurück zum Zitat Kang, W.-H., Lee, S.-W., Moon, B., Oh, G.-H., Min, C.: X-FTL: transactional FTL for SQLite databases. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, SIGMOD’13, ACM, pp. 97–108 (2013) Kang, W.-H., Lee, S.-W., Moon, B., Oh, G.-H., Min, C.: X-FTL: transactional FTL for SQLite databases. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, SIGMOD’13, ACM, pp. 97–108 (2013)
22.
Zurück zum Zitat Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)CrossRef Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)CrossRef
23.
Zurück zum Zitat Lee, S.-W., Moon, B., Park, C., Kim, J.-M., Kim, S.-W.: A case for flash memory ssd in enterprise database applications. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, ACM, pp. 1075–1086 (2008) Lee, S.-W., Moon, B., Park, C., Kim, J.-M., Kim, S.-W.: A case for flash memory ssd in enterprise database applications. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, ACM, pp. 1075–1086 (2008)
25.
Zurück zum Zitat Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P.: Aries: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst. (TODS) 17(1), 94–162 (1992)CrossRef Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P.: Aries: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst. (TODS) 17(1), 94–162 (1992)CrossRef
28.
Zurück zum Zitat Oh, G., Seo, C., Mayuram, R., Kee, Y.-S., Lee, S.-W.: SHARE interface in flash storage for relational and NoSQL databases. In: Proceedings of the 2016 International Conference on Management of Data, New York, NY, USA, SIGMOD’16, ACM, pp. 343–354 (2016) Oh, G., Seo, C., Mayuram, R., Kee, Y.-S., Lee, S.-W.: SHARE interface in flash storage for relational and NoSQL databases. In: Proceedings of the 2016 International Conference on Management of Data, New York, NY, USA, SIGMOD’16, ACM, pp. 343–354 (2016)
29.
Zurück zum Zitat Ouyang, X., Nellans, D., Wipfel, R., Flynn, D., Panda, D.K.: Beyond block I/O: rethinking traditional storage primitives. In: 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA), IEEE, pp. 301–311 (2011) Ouyang, X., Nellans, D., Wipfel, R., Flynn, D., Panda, D.K.: Beyond block I/O: rethinking traditional storage primitives. In: 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA), IEEE, pp. 301–311 (2011)
30.
Zurück zum Zitat Ramakrishnan, R., Gehrke, J.: Database Management Systems. Osborne/McGraw-Hill, Berkeley (2000)MATH Ramakrishnan, R., Gehrke, J.: Database Management Systems. Osborne/McGraw-Hill, Berkeley (2000)MATH
33.
Zurück zum Zitat Sivasubramanian, S.: Amazon dynamodb: a seamlessly scalable non-relational database service. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ACM, pp. 729–730 (2012) Sivasubramanian, S.: Amazon dynamodb: a seamlessly scalable non-relational database service. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ACM, pp. 729–730 (2012)
34.
Zurück zum Zitat Son, Y., Kang, H., Han, H., Yeom, H.Y.: An empirical evaluation and analysis of the performance of nvm express solid state drive. Cluster Comput. 19, 1–13 (2016)CrossRef Son, Y., Kang, H., Han, H., Yeom, H.Y.: An empirical evaluation and analysis of the performance of nvm express solid state drive. Cluster Comput. 19, 1–13 (2016)CrossRef
35.
Zurück zum Zitat Son, Y., Kang, H., Han, H., and Yeom, H.Y.: Improving performance of cloud key-value storage using flushing optimization. In: 2016 IEEE 1st International Workshops on Foundations and Applications of Self* Systems (FAS*W), pp. 42–47 (2016) Son, Y., Kang, H., Han, H., and Yeom, H.Y.: Improving performance of cloud key-value storage using flushing optimization. In: 2016 IEEE 1st International Workshops on Foundations and Applications of Self* Systems (FAS*W), pp. 42–47 (2016)
36.
Zurück zum Zitat Son, Y., Yeom, H., Han, H.: Optimizing i/o operations in file systems for fast storage devices. IEEE Trans. Comput. 66, 1071–1084 (2016)CrossRefMATHMathSciNet Son, Y., Yeom, H., Han, H.: Optimizing i/o operations in file systems for fast storage devices. IEEE Trans. Comput. 66, 1071–1084 (2016)CrossRefMATHMathSciNet
37.
Zurück zum Zitat Song, N.Y., Son, Y., Han, H., Yeom, H.Y.: Efficient memory-mapped i/o on fast storage device. ACM Trans. Storage 19:1(19:27), 12–4 (2016) Song, N.Y., Son, Y., Han, H., Yeom, H.Y.: Efficient memory-mapped i/o on fast storage device. ACM Trans. Storage 19:1(19:27), 12–4 (2016)
38.
Zurück zum Zitat Sumbaly, R., Kreps, J., Gao, L., Feinberg, A., Soman, C., Shah, S.: Serving large-scale batch computed data with project voldemort. In: Proceedings of the 10th USENIX Conference on File and Storage Technologies, USENIX Association, p. 18 (2012) Sumbaly, R., Kreps, J., Gao, L., Feinberg, A., Soman, C., Shah, S.: Serving large-scale batch computed data with project voldemort. In: Proceedings of the 10th USENIX Conference on File and Storage Technologies, USENIX Association, p. 18 (2012)
39.
Zurück zum Zitat Wang, T., Johnson, R.: Scalable logging through emerging non-volatile memory. Proc. VLDB Endow. 7, 10 (2014) Wang, T., Johnson, R.: Scalable logging through emerging non-volatile memory. Proc. VLDB Endow. 7, 10 (2014)
Metadaten
Titel
Design and implementation of an efficient flushing scheme for cloud key-value storage
verfasst von
Yongseok Son
Heon Young Yeom
Hyuck Han
Publikationsdatum
24.08.2017
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 4/2017
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-1101-3

Weitere Artikel der Ausgabe 4/2017

Cluster Computing 4/2017 Zur Ausgabe

Premium Partner