Skip to main content
Top
Published in: The VLDB Journal 3/2021

18-02-2021 | Regular Paper

Better database cost/performance via batched I/O on programmable SSD

Authors: Jaeyoung Do, Ivan Luiz Picoli, David Lomet, Philippe Bonnet

Published in: The VLDB Journal | Issue 3/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Data should be placed at the most cost- and performance-effective tier in the storage hierarchy. While performance and cost decrease with distance from the CPU, the cost/performance trade-off depends on how efficiently data can be moved across tiers. Log structuring improves this cost/performance by writing batches of pages from main memory to secondary storage using a conventional block-at-a-time I/O interface. However, log structuring incurs overhead in the form of recovery and garbage collection. With computational Solid-State Drives, it is now possible to design a storage interface that minimizes this overhead. In this paper, we offload log structuring from the CPU to the SSD. We define a new batch I/O storage interface and we design a Flash Translation Layer that takes care of log structuring on the SSD side. This removes the CPU computational and I/O load associated with recovery and garbage collection. We compare the performance of the Bw-tree key-value store with its LLAMA host-based log structuring to the same key-value software stack executing on a computational SSD equipped with a batch I/O interface. Our experimental results show the benefits of eliminating redundancies, minimizing interactions across storage layers, and avoiding the CPU cost of providing log structuring.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Recently, many robust open-source FTLs have been released. Pblk [2], for example, implements a full-fledged, host-based FTL exposing a traditional block I/O interface, and is released as part of the Linux Kernel 4.12. Intel released a user-space FTL in the context of SPDK [41]. Those FTLs, however, must remain generic. They are not meant to be modified to support application-specific code.
 
2
The source code of OXBlock is available at https://​github.​com/​DFC-OpenSource/​ox-ctrl.
 
3
Note that the maximum size of an IP datagram, a basic transfer unit associated with a packet-switched network is 65,532 bytes including a 20 bytes header followed by a data area.
 
4
Due to the host-based checkpoint for persisting BwTree mapping table entries, the BwTree_Block required to write additional 3.5 MB data during the checkpoint.
 
5
In some cases, where the entire dataset fits into memory, no performance drops were observed as the SSD usage during the experiment was not big enough to meet the given GC condition.
 
6
Note that we found similar observations when running the read-heavy workload.
 
7
Note that the GC process on the SSD controller moves both User- and Meta-type data (Sect. 8.3).
 
Literature
1.
go back to reference Bae, D.-H., Jo, I., Choi, Y.A., Hwang, J.-Y., Cho, S., Lee, D.-G., Jeong, J.: 2B-SSD: the case for dual, byte-and block-addressable solid-state drives. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pp. 425–438. IEEE (2018) Bae, D.-H., Jo, I., Choi, Y.A., Hwang, J.-Y., Cho, S., Lee, D.-G., Jeong, J.: 2B-SSD: the case for dual, byte-and block-addressable solid-state drives. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pp. 425–438. IEEE (2018)
2.
go back to reference Bjørling, M., González, J., Bonnet, P.: LightNVM: the linux open-channel SSD subsystem. In: 15th USENIX Conference on File and Storage Technologies (FAST 17), pp. 359–374 (2017) Bjørling, M., González, J., Bonnet, P.: LightNVM: the linux open-channel SSD subsystem. In: 15th USENIX Conference on File and Storage Technologies (FAST 17), pp. 359–374 (2017)
3.
go back to reference Bonnet, P.: What’s up with the storage hierarchy? In: CIDR (2017) Bonnet, P.: What’s up with the storage hierarchy? In: CIDR (2017)
4.
go back to reference Chung, T.-S., Park, D.-J., Park, S., Lee, D.-H., Lee, S.-W., Song, H.-J.: A survey of flash translation layer. J. Syst. Archit. 55(5–6), 332–343 (2009)CrossRef Chung, T.-S., Park, D.-J., Park, S., Lee, D.-H., Lee, S.-W., Song, H.-J.: A survey of flash translation layer. J. Syst. Archit. 55(5–6), 332–343 (2009)CrossRef
5.
go back to reference Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 143–154. ACM (2010) Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 143–154. ACM (2010)
6.
go back to reference Cornwell, M.: Anatomy of a solid-state drive. Commun. ACM 55(12), 59–63 (2012)CrossRef Cornwell, M.: Anatomy of a solid-state drive. Commun. ACM 55(12), 59–63 (2012)CrossRef
7.
go back to reference Diaconu, C., Freedman, C., Ismert, E., Larson, P.-A., Mittal, P., Stonecipher, R., Verma, N., Zwilling, M.: Hekaton: SQL server’s memory-optimized OLTP engine. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 1243–1254 (2013) Diaconu, C., Freedman, C., Ismert, E., Larson, P.-A., Mittal, P., Stonecipher, R., Verma, N., Zwilling, M.: Hekaton: SQL server’s memory-optimized OLTP engine. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 1243–1254 (2013)
8.
go back to reference Do, J., Kee, Y.-S., Patel, J.M., Park, C., Park, K., DeWitt, D.J.: Query processing on smart SSDS: opportunities and challenges. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 1221–1230. ACM (2013) Do, J., Kee, Y.-S., Patel, J.M., Park, C., Park, K., DeWitt, D.J.: Query processing on smart SSDS: opportunities and challenges. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 1221–1230. ACM (2013)
9.
go back to reference Do, J., Lomet, D., Picoli, I.L.: Improving CPU I/O performance via SSD controller FTL support for batched writes. In: Proceedings of the 15th International Workshop on Data Management on New Hardware, pp. 1–8 (2019) Do, J., Lomet, D., Picoli, I.L.: Improving CPU I/O performance via SSD controller FTL support for batched writes. In: Proceedings of the 15th International Workshop on Data Management on New Hardware, pp. 1–8 (2019)
10.
go back to reference Do, J., Sengupta, S., Swanson, S.: Programmable solid-state storage in future cloud datacenters. Commun. ACM 62(6), 54–62 (2019)CrossRef Do, J., Sengupta, S., Swanson, S.: Programmable solid-state storage in future cloud datacenters. Commun. ACM 62(6), 54–62 (2019)CrossRef
12.
go back to reference González, J., Bjørling, M.: Multi-tenant I/O isolation with open-channel SSDS. In: Nonvolatile Memory Workshop (NVMW) (2017) González, J., Bjørling, M.: Multi-tenant I/O isolation with open-channel SSDS. In: Nonvolatile Memory Workshop (NVMW) (2017)
13.
go back to reference Gray, J.: Put everything in figure (disk) controller. NASD Workshop (1998) Gray, J.: Put everything in figure (disk) controller. NASD Workshop (1998)
14.
go back to reference Gu, B., Yoon, A.S., Bae, D.-H., Jo, I., Lee, J., Yoon, J., Kang, J.-U., Kwon, M., Yoon, C., Cho, S. et al.: Biscuit: a framework for near-data processing of big data workloads. In: ACM SIGARCH Computer Architecture News, vol. 44, pp. 153–165. IEEE Press (2016) Gu, B., Yoon, A.S., Bae, D.-H., Jo, I., Lee, J., Yoon, J., Kang, J.-U., Kwon, M., Yoon, C., Cho, S. et al.: Biscuit: a framework for near-data processing of big data workloads. In: ACM SIGARCH Computer Architecture News, vol. 44, pp. 153–165. IEEE Press (2016)
15.
go back to reference Guo, C., Wu, H., Deng, Z., Soni, G., Ye, J., Padhye, J., Lipshteyn, M.: RDMA over commodity ethernet at scale. In: Proceedings of the 2016 ACM SIGCOMM Conference, pp. 202–215 (2016) Guo, C., Wu, H., Deng, Z., Soni, G., Ye, J., Padhye, J., Lipshteyn, M.: RDMA over commodity ethernet at scale. In: Proceedings of the 2016 ACM SIGCOMM Conference, pp. 202–215 (2016)
16.
go back to reference Hao, M., Soundararajan, G., Kenchammana-Hosekote, D., Chien, A.A., Gunawi, H.S.: The tail at store: a revelation from millions of hours of disk and SSD deployments. In: 14th USENIX Conference on File and Storage Technologies (FAST 16), pp. 263–276 (2016) Hao, M., Soundararajan, G., Kenchammana-Hosekote, D., Chien, A.A., Gunawi, H.S.: The tail at store: a revelation from millions of hours of disk and SSD deployments. In: 14th USENIX Conference on File and Storage Technologies (FAST 16), pp. 263–276 (2016)
17.
go back to reference Hu, X.-Y., Eleftheriou, E., Haas, R., Iliadis, I., Pletka, R.: Write amplification analysis in flash-based solid state drives. In: Proceedings of SYSTOR 2009: the Israeli Experimental Systems Conference, pp. 1–9 (2009) Hu, X.-Y., Eleftheriou, E., Haas, R., Iliadis, I., Pletka, R.: Write amplification analysis in flash-based solid state drives. In: Proceedings of SYSTOR 2009: the Israeli Experimental Systems Conference, pp. 1–9 (2009)
18.
go back to reference Huang, J., Badam, A., Caulfield, L., Nath, S., Sengupta, S., Sharma, B., Qureshi, M.K.: Flashblox: achieving both performance isolation and uniform lifetime for virtualized SSDS. In: 15th USENIX Conference on File and Storage Technologies (FAST 17), pp. 375–390 (2017) Huang, J., Badam, A., Caulfield, L., Nath, S., Sengupta, S., Sharma, B., Qureshi, M.K.: Flashblox: achieving both performance isolation and uniform lifetime for virtualized SSDS. In: 15th USENIX Conference on File and Storage Technologies (FAST 17), pp. 375–390 (2017)
19.
go back to reference Jin, Y., Tseng, H.-W., Papakonstantinou, Y., Swanson, S.: KAML: a flexible, high-performance key-value SSD. In: 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 373–384. IEEE (2017) Jin, Y., Tseng, H.-W., Papakonstantinou, Y., Swanson, S.: KAML: a flexible, high-performance key-value SSD. In: 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 373–384. IEEE (2017)
20.
go back to reference Jo, I., Bae, D.-H., Yoon, A.S., Kang, J.-U., Cho, S., Lee, D.D., Jeong, J.: YourSQL: a high-performance database system leveraging in-storage computing. Proc. VLDB Endow. 9(12), 924–935 (2016)CrossRef Jo, I., Bae, D.-H., Yoon, A.S., Kang, J.-U., Cho, S., Lee, D.D., Jeong, J.: YourSQL: a high-performance database system leveraging in-storage computing. Proc. VLDB Endow. 9(12), 924–935 (2016)CrossRef
21.
go back to reference Kim, J., Lee, D., Noh, S.H.: Towards SLO complying SSDS through OPS isolation. In: 13th USENIX Conference on File and Storage Technologies (FAST 15), pp. 183–189 (2015) Kim, J., Lee, D., Noh, S.H.: Towards SLO complying SSDS through OPS isolation. In: 13th USENIX Conference on File and Storage Technologies (FAST 15), pp. 183–189 (2015)
22.
go back to reference Lee, C., Sim, D., Hwang, J., Cho, S.: F2FS: a new file system for flash storage. In: 13th USENIX Conference on File and Storage Technologies (FAST 15), pp. 273–286 (2015) Lee, C., Sim, D., Hwang, J., Cho, S.: F2FS: a new file system for flash storage. In: 13th USENIX Conference on File and Storage Technologies (FAST 15), pp. 273–286 (2015)
23.
go back to reference Leis, V., Haubenschild, M., Neumann, T.: Optimistic lock coupling: a scalable and efficient general-purpose synchronization method. IEEE Data Eng. Bull. 42(1), 73–84 (2019) Leis, V., Haubenschild, M., Neumann, T.: Optimistic lock coupling: a scalable and efficient general-purpose synchronization method. IEEE Data Eng. Bull. 42(1), 73–84 (2019)
24.
go back to reference Levandoski, J.J., Lomet, D.B., Sengupta, S.: The BW-tree: a B-tree for new hardware platforms. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 302–313. IEEE (2013) Levandoski, J.J., Lomet, D.B., Sengupta, S.: The BW-tree: a B-tree for new hardware platforms. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 302–313. IEEE (2013)
25.
go back to reference Levandoski, J., Lomet, D., Zhao, K.K.: Deuteronomy: transaction support for cloud data (2011) Levandoski, J., Lomet, D., Zhao, K.K.: Deuteronomy: transaction support for cloud data (2011)
26.
go back to reference Levandoski, J., Lomet, D., Sengupta, S.: LLAMA: a cache/storage subsystem for modern hardware. Proc. VLDB Endow. 6(10), 877–888 (2013)CrossRef Levandoski, J., Lomet, D., Sengupta, S.: LLAMA: a cache/storage subsystem for modern hardware. Proc. VLDB Endow. 6(10), 877–888 (2013)CrossRef
27.
go back to reference Lomet, D.: Cost/performance in modern data stores: how data caching systems succeed. In: Proceedings of the 14th International Workshop on Data Management on New Hardware, pp. 1–10 (2018) Lomet, D.: Cost/performance in modern data stores: how data caching systems succeed. In: Proceedings of the 14th International Workshop on Data Management on New Hardware, pp. 1–10 (2018)
28.
go back to reference Lu, Y., Shu, J., Zheng, W.: Extending the lifetime of flash-based storage through reducing write amplification from file systems. In: Presented as part of the 11th USENIX Conference on File and Storage Technologies (FAST 13), pp. 257–270 (2013) Lu, Y., Shu, J., Zheng, W.: Extending the lifetime of flash-based storage through reducing write amplification from file systems. In: Presented as part of the 11th USENIX Conference on File and Storage Technologies (FAST 13), pp. 257–270 (2013)
31.
go back to reference Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P.: ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Datab. Syst. TODS 17(1), 94–162 (1992)CrossRef Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P.: ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Datab. Syst. TODS 17(1), 94–162 (1992)CrossRef
34.
go back to reference Park, K., Kee, Y.-S., Patel, J.M., Do, J., Park, C., Dewitt, D.J.: Query processing on smart ssds. IEEE Data Eng. Bull. 37(2), 19–26 (2014) Park, K., Kee, Y.-S., Patel, J.M., Do, J., Park, C., Dewitt, D.J.: Query processing on smart ssds. IEEE Data Eng. Bull. 37(2), 19–26 (2014)
35.
go back to reference Picoli, I.L., Hedam, N., Tözün, P., Bonnet, P.: Open-channel SSD (what is it good for). In: CIDR (2020) Picoli, I.L., Hedam, N., Tözün, P., Bonnet, P.: Open-channel SSD (what is it good for). In: CIDR (2020)
36.
go back to reference Rosenblum, M., Ousterhout, J.K.: The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. TOCS 10(1), 26–52 (1992)CrossRef Rosenblum, M., Ousterhout, J.K.: The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. TOCS 10(1), 26–52 (1992)CrossRef
39.
go back to reference Seshadri, S., Gahagan, M., Bhaskaran, S., Bunker, T., De, A., Jin, Y., Liu, Y., Swanson, S.: Willow: a user-programmable SSD. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pp. 67–80 (2014) Seshadri, S., Gahagan, M., Bhaskaran, S., Bunker, T., De, A., Jin, Y., Liu, Y., Swanson, S.: Willow: a user-programmable SSD. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pp. 67–80 (2014)
42.
go back to reference Wang, P., Sun, G., Jiang, S., Ouyang, J., Lin, S., Zhang, C., Cong, J.: An efficient design and implementation of LSM-tree based key-value store on open-channel SSD. In: Proceedings of the Ninth European Conference on Computer Systems, p. 16. ACM (2014) Wang, P., Sun, G., Jiang, S., Ouyang, J., Lin, S., Zhang, C., Cong, J.: An efficient design and implementation of LSM-tree based key-value store on open-channel SSD. In: Proceedings of the Ninth European Conference on Computer Systems, p. 16. ACM (2014)
43.
go back to reference Xu, J., Swanson, S.: NOVA: a log-structured file system for hybrid volatile/non-volatile main memories. In: 14th USENIX Conference on File and Storage Technologies (FAST 16), pp. 323–338 (2016) Xu, J., Swanson, S.: NOVA: a log-structured file system for hybrid volatile/non-volatile main memories. In: 14th USENIX Conference on File and Storage Technologies (FAST 16), pp. 323–338 (2016)
44.
go back to reference Zhang, J., Lu, Y., Shu, J., Qin, X.: FlashKV: accelerating KV performance with open-channel ssds. ACM Trans. Embed. Comput. Syst. TECS 16(5s), 139 (2017) Zhang, J., Lu, Y., Shu, J., Qin, X.: FlashKV: accelerating KV performance with open-channel ssds. ACM Trans. Embed. Comput. Syst. TECS 16(5s), 139 (2017)
45.
go back to reference Zhu, F.: Toward the large deployment of open channel SSD. Flash Memory Summit (2019) Zhu, F.: Toward the large deployment of open channel SSD. Flash Memory Summit (2019)
46.
go back to reference Zhu, Y., Eran, H., Firestone, D., Guo, C., Lipshteyn, M., Liron, Y., Padhye, J., Raindel, S., Yahia, M.H., Zhang, M.: Congestion control for large-scale RDMA deployments. ACM SIGCOMM Comput. Commun. Rev. 45(4), 523–536 (2015)CrossRef Zhu, Y., Eran, H., Firestone, D., Guo, C., Lipshteyn, M., Liron, Y., Padhye, J., Raindel, S., Yahia, M.H., Zhang, M.: Congestion control for large-scale RDMA deployments. ACM SIGCOMM Comput. Commun. Rev. 45(4), 523–536 (2015)CrossRef
Metadata
Title
Better database cost/performance via batched I/O on programmable SSD
Authors
Jaeyoung Do
Ivan Luiz Picoli
David Lomet
Philippe Bonnet
Publication date
18-02-2021
Publisher
Springer Berlin Heidelberg
Published in
The VLDB Journal / Issue 3/2021
Print ISSN: 1066-8888
Electronic ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-020-00648-z

Other articles of this Issue 3/2021

The VLDB Journal 3/2021 Go to the issue

Premium Partner