Skip to main content
Erschienen in: Cluster Computing 4/2021

30.06.2021

montage: NVM-based scalable synchronization framework for crash-consistent file systems

verfasst von: Woong Sul, Heon Y. Yeom, Hyuck Han

Erschienen in: Cluster Computing | Ausgabe 4/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In file systems, a single write system call can make multiple modifications to data and metadata, but such changes are not flushed in an atomic way. To retain the consistency of file systems, conventional approaches guarantee crash consistency in exchange for sacrificing system performance. To mitigate the performance penalty, non-volatile memory (NVM) technologies are believed to be a good candidate for this purpose owing to their low latency and byte-addressibility. However, none of the prior proposals that exploit NVM provide both scalability and strict ordering of modifications. In this article, we propose montage, a crash consistency framework for file systems that consists of two parts. First, montage splits NVM space into multiple staging areas and synchronizes the flushing of modifications these to the storage device. Second, montage uses the pipelined architecture to speed up data flushing to storage. We apply montage to two journaling file systems (ext4 and JFS) and evaluate them on a multicore server with high-performance storage. The evaluation results demonstrate that our design exhibits better performance by a wide margin than recent NVM-based journaling file systems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
In software deployment, the staging phase denotes a pre-production phase that is separated from the real environment but eventually transits to the production phase. We borrow this term to represent the file system states that are durable and consistent with modifications but are not reflected in the file system storage.
 
2
The ext4 file system uses this routine for both metadata and data journaling.
 
3
Since the commodity storage devices limit I/O parallelism and I/O bandwidth compared to high-end storage devices, more than 16 threads do not increase the file system performance in all cases. Due to the limited space, we compare the file systems with high-end storage devices.
 
4
At the current experimental settings, montage cannot take performance benefit from more than eight partitions.
 
Literatur
1.
Zurück zum Zitat Best, S.: Jfs log: how the journaled file system performs logging. In: Annual Linux Showcase & Conference (2000) Best, S.: Jfs log: how the journaled file system performs logging. In: Annual Linux Showcase & Conference (2000)
2.
Zurück zum Zitat Bhat, S.S., Eqbal, R., Clements, A.T., Kaashoek, M.F., Zeldovich, N.: Scaling a file system to many cores using an operation log. In: ACM SOSP (2017) Bhat, S.S., Eqbal, R., Clements, A.T., Kaashoek, M.F., Zeldovich, N.: Scaling a file system to many cores using an operation log. In: ACM SOSP (2017)
3.
Zurück zum Zitat Boyd-Wickizer, S., Clements, A.T., Mao, Y., Pesterev, A., Kaashoek, M.F., Morris, R., Zeldovich, N.: An analysis of linux scalability to many cores. In: USENIX OSDI (2010) Boyd-Wickizer, S., Clements, A.T., Mao, Y., Pesterev, A., Kaashoek, M.F., Morris, R., Zeldovich, N.: An analysis of linux scalability to many cores. In: USENIX OSDI (2010)
4.
Zurück zum Zitat Chen, C., Yang, J., Wei, Q., Wang, C., Xue, M.: Fine-grained metadata journaling on NVM. In: IEEE MSST (2016) Chen, C., Yang, J., Wei, Q., Wang, C., Xue, M.: Fine-grained metadata journaling on NVM. In: IEEE MSST (2016)
5.
Zurück zum Zitat Chen, J., Wei, Q., Chen, C., Wu, L.: FSMAC: a file system metadata accelerator with non-volatile memory. In: IEEE MSST (2013) Chen, J., Wei, Q., Chen, C., Wu, L.: FSMAC: a file system metadata accelerator with non-volatile memory. In: IEEE MSST (2013)
6.
Zurück zum Zitat Clements, A.T., Kaashoek, M.F., Zeldovich, N.: RadixVM: scalable address spaces for multithreaded applications. In: ACM EuroSys (2013) Clements, A.T., Kaashoek, M.F., Zeldovich, N.: RadixVM: scalable address spaces for multithreaded applications. In: ACM EuroSys (2013)
7.
Zurück zum Zitat Clements A.T., Kaashoek, M.F., Zeldovich, N., Morris, R.T., Kohler, E.: The scalable commutativity rule: designing scalable software for multicore processors. In: ACM SOSP (2013) Clements A.T., Kaashoek, M.F., Zeldovich, N., Morris, R.T., Kohler, E.: The scalable commutativity rule: designing scalable software for multicore processors. In: ACM SOSP (2013)
8.
Zurück zum Zitat Coburn, J., Bunker, T., Schwarz, M., Gupta, R., Swanson, S.: From ARIES to MARS: transaction support for next-generation. solid-state drives. In: ACM SOSP (2013) Coburn, J., Bunker, T., Schwarz, M., Gupta, R., Swanson, S.: From ARIES to MARS: transaction support for next-generation. solid-state drives. In: ACM SOSP (2013)
9.
Zurück zum Zitat Condit, J., Nightingale, E.B., Frost, C., Ipek, E., Lee, B., Burger, D., Coetzee, D.: Better I/O through byte-addressable. In: Persistent Memory, ACM SOSP (2009) Condit, J., Nightingale, E.B., Frost, C., Ipek, E., Lee, B., Burger, D., Coetzee, D.: Better I/O through byte-addressable. In: Persistent Memory, ACM SOSP (2009)
10.
Zurück zum Zitat Dulloor, S.R., Kumar, S., Keshavamurthy, A., Lantz, P., Reddy, D., Sankaran, R., Jackson, J.: System software for persistent memory. In: ACM EuroSys (2014) Dulloor, S.R., Kumar, S., Keshavamurthy, A., Lantz, P., Reddy, D., Sankaran, R., Jackson, J.: System software for persistent memory. In: ACM EuroSys (2014)
11.
Zurück zum Zitat Esmet, J., Bender, M.A., Farach-Colton, M., Kuszmaul, B.C.: The tokufs streaming file system. In: USENIX HotStorage (2012) Esmet, J., Bender, M.A., Farach-Colton, M., Kuszmaul, B.C.: The tokufs streaming file system. In: USENIX HotStorage (2012)
12.
Zurück zum Zitat Gao, S., Xu, J., Härder, T., He, B., Choi, B., Hu, H.: Pcmlogging: Optimizing transaction logging and recovery performance with pcm. IEEE Trans. Knowl. Data Eng. 27(12), 3332–3346 (2015)CrossRef Gao, S., Xu, J., Härder, T., He, B., Choi, B., Hu, H.: Pcmlogging: Optimizing transaction logging and recovery performance with pcm. IEEE Trans. Knowl. Data Eng. 27(12), 3332–3346 (2015)CrossRef
13.
Zurück zum Zitat Han, H., Park, S., Jung, H., Fekete, A., Röhm, U., Yeom, H.Y.: Scalable serializable snapshot isolation for multicore systems. In: IEEE ICDE (2014) Han, H., Park, S., Jung, H., Fekete, A., Röhm, U., Yeom, H.Y.: Scalable serializable snapshot isolation for multicore systems. In: IEEE ICDE (2014)
14.
Zurück zum Zitat Huang, F., Feng, D., Hua, Y., Zhou, W.: A wear-leveling-aware counter mode for data encryption in non-volatile memories. In: DATE (2017) Huang, F., Feng, D., Hua, Y., Zhou, W.: A wear-leveling-aware counter mode for data encryption in non-volatile memories. In: DATE (2017)
15.
Zurück zum Zitat Izraelevitz, J., Yang, J., Zhang, L., Kim, J., Liu, X., Memaripour, A., Soh, Y.J., Wang, Z., Xu, Y., Dulloor, S.R., et al.: Basic performance measurements of the intel optane dc persistent memory module. arXiv preprint arXiv:1903.05714 (2019) Izraelevitz, J., Yang, J., Zhang, L., Kim, J., Liu, X., Memaripour, A., Soh, Y.J., Wang, Z., Xu, Y., Dulloor, S.R., et al.: Basic performance measurements of the intel optane dc persistent memory module. arXiv preprint arXiv:​1903.​05714 (2019)
16.
Zurück zum Zitat Jang, H., Rhee, S.Y., Kim, J.E., Kang, S., Han, H., Jung, H.: Autobahn: accelerating concurrent, durable file i/o via a non-volatile buffer. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 228–232. IEEE (2017) Jang, H., Rhee, S.Y., Kim, J.E., Kang, S., Han, H., Jung, H.: Autobahn: accelerating concurrent, durable file i/o via a non-volatile buffer. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 228–232. IEEE (2017)
17.
Zurück zum Zitat Johnson, R., Pandis, I., Stoica, R., Athanassoulis, M., Ailamaki, A.: Aether: a scalable approach to logging. Proc. VLDB Endow. 3(1–2), 681 (2010)CrossRef Johnson, R., Pandis, I., Stoica, R., Athanassoulis, M., Ailamaki, A.: Aether: a scalable approach to logging. Proc. VLDB Endow. 3(1–2), 681 (2010)CrossRef
18.
Zurück zum Zitat Jung, H., Han, H., Fekete, A., Heiser, G., Yeom, H.Y.: A scalable lock manager for multicores. ACM Trans. Database Syst. 39(4), 1 (2014)MathSciNetCrossRef Jung, H., Han, H., Fekete, A., Heiser, G., Yeom, H.Y.: A scalable lock manager for multicores. ACM Trans. Database Syst. 39(4), 1 (2014)MathSciNetCrossRef
19.
Zurück zum Zitat Jung, H., Han, H., Kang, S.: Scalable database logging for multicores. Proc. VLDB Endow. 11(2), 135 (2017)CrossRef Jung, H., Han, H., Kang, S.: Scalable database logging for multicores. Proc. VLDB Endow. 11(2), 135 (2017)CrossRef
20.
Zurück zum Zitat Kadekodi, R., Lee, S.K., Kashyap, S., Kim, T., Kolli, A., Chidambaram, V.: Splitfs: reducing software overhead in file systems for persistent memory. In: Proceedings of the 27th ACM Symposium on Operating Systems Principles, pp. 494–508 (2019) Kadekodi, R., Lee, S.K., Kashyap, S., Kim, T., Kolli, A., Chidambaram, V.: Splitfs: reducing software overhead in file systems for persistent memory. In: Proceedings of the 27th ACM Symposium on Operating Systems Principles, pp. 494–508 (2019)
21.
Zurück zum Zitat Kang, J., Zhang, B., Wo, T., Yu, W., Du, L., Ma, S., Huai, J.: SpanFS: a scalable file system on fast storage devices. In: USENIX ATC (2015) Kang, J., Zhang, B., Wo, T., Yu, W., Du, L., Ma, S., Huai, J.: SpanFS: a scalable file system on fast storage devices. In: USENIX ATC (2015)
22.
Zurück zum Zitat Kim, J., Jang, H., Son, S., Han, H., Kang, S., Jung, H.: Border-collie: a wait-free, read-optimal algorithm for database logging on multicore hardware. In: ACM SIGMOD (2019) Kim, J., Jang, H., Son, S., Han, H., Kang, S., Jung, H.: Border-collie: a wait-free, read-optimal algorithm for database logging on multicore hardware. In: ACM SIGMOD (2019)
23.
Zurück zum Zitat Kim, J., Min, C., Eom, Y.I.: Reducing excessive journaling overhead with small-sized NVRAM for mobile devices. IEEE Trans. Consumer Electron. 60, 217 (2014)CrossRef Kim, J., Min, C., Eom, Y.I.: Reducing excessive journaling overhead with small-sized NVRAM for mobile devices. IEEE Trans. Consumer Electron. 60, 217 (2014)CrossRef
24.
Zurück zum Zitat Kimura, H.: Foedus: Oltp engine for a thousand cores and nvram. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 691–706 (2015) Kimura, H.: Foedus: Oltp engine for a thousand cores and nvram. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 691–706 (2015)
25.
Zurück zum Zitat Krishnan, R.M., Kim, J., Mathew, A., Fu, X., Demeri, A., Min, C., Kannan, S.: Durable transactional memory can scale with timestone. In: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 335–349 (2020) Krishnan, R.M., Kim, J., Mathew, A., Fu, X., Demeri, A., Min, C., Kannan, S.: Durable transactional memory can scale with timestone. In: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 335–349 (2020)
26.
Zurück zum Zitat Kwon, Y., Fingler, H., Hunt, T., Peter, S., Witchel, E., Anderson, T.: Strata: A cross media file system. In: Proceedings of the 26th Symposium on Operating Systems Principles, pp. 460–477 (2017) Kwon, Y., Fingler, H., Hunt, T., Peter, S., Witchel, E., Anderson, T.: Strata: A cross media file system. In: Proceedings of the 26th Symposium on Operating Systems Principles, pp. 460–477 (2017)
27.
Zurück zum Zitat Lee, C., Sim, D., Hwang, J., Cho, S.: F2fs: a new file system for flash storage. In: 13thUSENIX Conference on File and Storage Technologies (FAST 15), pp. 273–286 (2015) Lee, C., Sim, D., Hwang, J., Cho, S.: F2fs: a new file system for flash storage. In: 13thUSENIX Conference on File and Storage Technologies (FAST 15), pp. 273–286 (2015)
28.
Zurück zum Zitat Lee, E., Bahn, H., Noh, S.H.: Unioning of the buffer cache and journaling layers with non-volatile memory. In: USENIX FAST (2013) Lee, E., Bahn, H., Noh, S.H.: Unioning of the buffer cache and journaling layers with non-volatile memory. In: USENIX FAST (2013)
30.
Zurück zum Zitat Liu, Y., Li, H., Lu, Y., Chen, Z., Xiao, N., Zhao, M.: Hasfs: optimizing file system consistency mechanism on nvm-based hybrid storage architecture. Clust. Comput. 1–15, (2019) Liu, Y., Li, H., Lu, Y., Chen, Z., Xiao, N., Zhao, M.: Hasfs: optimizing file system consistency mechanism on nvm-based hybrid storage architecture. Clust. Comput. 1–15, (2019)
31.
Zurück zum Zitat Lorie, R.A.: Physical integrity in a large segmented database. ACM Trans. Database Syst. (TODS) 2(1), 91–104 (1977)CrossRef Lorie, R.A.: Physical integrity in a large segmented database. ACM Trans. Database Syst. (TODS) 2(1), 91–104 (1977)CrossRef
32.
Zurück zum Zitat Lu, L., Zhang, Y., Do, T., Al-Kiswany, S., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H.: Physical disentanglement in a container-based file system. In: USENIX OSDI (2014) Lu, L., Zhang, Y., Do, T., Al-Kiswany, S., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H.: Physical disentanglement in a container-based file system. In: USENIX OSDI (2014)
33.
Zurück zum Zitat Min, C., Kashyap, S., Maass, S., Kang, W., Kim, T.: Understanding manycore scalability of file systems. In: USENIX ATC (2016) Min, C., Kashyap, S., Maass, S., Kang, W., Kim, T.: Understanding manycore scalability of file systems. In: USENIX ATC (2016)
34.
Zurück zum Zitat Oh, G., Kim, S., Lee, S.-W., Moon, B.: SQLite optimization with phase change memory for mobile applications. Proc. VLDB Endow. 8(12), 1454–1465 (2015)CrossRef Oh, G., Kim, S., Lee, S.-W., Moon, B.: SQLite optimization with phase change memory for mobile applications. Proc. VLDB Endow. 8(12), 1454–1465 (2015)CrossRef
35.
Zurück zum Zitat Park, D., Shin, D.: iJournaling: fine-grained journaling for improving the latency of fsync system call. In: USENIX ATC (2017) Park, D., Shin, D.: iJournaling: fine-grained journaling for improving the latency of fsync system call. In: USENIX ATC (2017)
36.
Zurück zum Zitat Son, Y., Kim, S., Yeom, H.Y., Han, H.: High-performance transaction processing in journaling file systems. In: USENIX FAST (2018) Son, Y., Kim, S., Yeom, H.Y., Han, H.: High-performance transaction processing in journaling file systems. In: USENIX FAST (2018)
37.
Zurück zum Zitat Sul, W., Kim, K., Ryu, M., Jung, H., Han, H.: Fast journaling made simple with nvm. In: Proceedings of the 35th Annual ACM Symposium on Applied Computing, pp. 1214–1221 (2020) Sul, W., Kim, K., Ryu, M., Jung, H., Han, H.: Fast journaling made simple with nvm. In: Proceedings of the 35th Annual ACM Symposium on Applied Computing, pp. 1214–1221 (2020)
38.
Zurück zum Zitat Tarasov, V., Zadok, E., Shepler, S.: Filebench: A flexible framework for file system benchmarking. login 41(1), 6 (2016) Tarasov, V., Zadok, E., Shepler, S.: Filebench: A flexible framework for file system benchmarking. login 41(1), 6 (2016)
39.
Zurück zum Zitat Tu, S., Zheng, W., Kohler, E., Liskov, B., Madden, S.: Speedy transactions in multicore in-memory databases. In: ACM SOSP (2013) Tu, S., Zheng, W., Kohler, E., Liskov, B., Madden, S.: Speedy transactions in multicore in-memory databases. In: ACM SOSP (2013)
40.
Zurück zum Zitat Wang, T., Johnson, R.: Scalable logging through emerging non-volatile memory. Proc. VLDB Endow. 7(10), 865–876 (2014)CrossRef Wang, T., Johnson, R.: Scalable logging through emerging non-volatile memory. Proc. VLDB Endow. 7(10), 865–876 (2014)CrossRef
41.
Zurück zum Zitat Wu, T., Chen, X., Liu, K., Xiao, C., Liu, Z., Zhuge, Q., Sha, E.H.-M.: Hydrafs: an efficient numa-aware in-memory file system. Clust. Comput. 1–20, (2019) Wu, T., Chen, X., Liu, K., Xiao, C., Liu, Z., Zhuge, Q., Sha, E.H.-M.: Hydrafs: an efficient numa-aware in-memory file system. Clust. Comput. 1–20, (2019)
42.
Zurück zum Zitat Wu, X., Reddy, A.L.N.. SCMFS: a file system for storage class memory. In: IEEE/ACM SC (2011) Wu, X., Reddy, A.L.N.. SCMFS: a file system for storage class memory. In: IEEE/ACM SC (2011)
43.
Zurück zum Zitat Xu, J., Kim, J., Memaripour, A., Swanson, S.: Finding and fixing performance pathologies in persistent memory software stacks. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 427–439 (2019) Xu, J., Kim, J., Memaripour, A., Swanson, S.: Finding and fixing performance pathologies in persistent memory software stacks. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 427–439 (2019)
44.
Zurück zum Zitat Xu, J., Swanson, S.: NOVA: A log-structured file system for hybrid volatile/non-volatile main memories. In: USENIX FAST (2016) Xu, J., Swanson, S.: NOVA: A log-structured file system for hybrid volatile/non-volatile main memories. In: USENIX FAST (2016)
45.
Zurück zum Zitat Yang, J., Kim, J., Hoseinzadeh, M., Izraelevitz, J., Swanson, S.: An empirical guide to the behavior and use of scalable persistent memory. In: 18th USENIX Conference on File and Storage Technologies (FAST 20), pp. 169–182 (2020) Yang, J., Kim, J., Hoseinzadeh, M., Izraelevitz, J., Swanson, S.: An empirical guide to the behavior and use of scalable persistent memory. In: 18th USENIX Conference on File and Storage Technologies (FAST 20), pp. 169–182 (2020)
46.
Zurück zum Zitat Zeng, L., Hou, B., Feng, D., Kent, K.B.: SJM: An SCM-based journaling mechanism with write reduction for file systems. In: DISCS (2015) Zeng, L., Hou, B., Feng, D., Kent, K.B.: SJM: An SCM-based journaling mechanism with write reduction for file systems. In: DISCS (2015)
47.
Zurück zum Zitat Zhang, X., Feng, D., Hua, Y., Chen, J.: A cost-efficient nvm-based journaling scheme for file systems. In: IEEE ICCD (2017) Zhang, X., Feng, D., Hua, Y., Chen, J.: A cost-efficient nvm-based journaling scheme for file systems. In: IEEE ICCD (2017)
48.
Zurück zum Zitat Zheng, W., Tu, S., Kohler, E., Liskov, B.: Fast databases with fast durability and recovery through multicore parallelism. In: USENIX OSDI (2014) Zheng, W., Tu, S., Kohler, E., Liskov, B.: Fast databases with fast durability and recovery through multicore parallelism. In: USENIX OSDI (2014)
Metadaten
Titel
montage: NVM-based scalable synchronization framework for crash-consistent file systems
verfasst von
Woong Sul
Heon Y. Yeom
Hyuck Han
Publikationsdatum
30.06.2021
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 4/2021
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-021-03329-w

Weitere Artikel der Ausgabe 4/2021

Cluster Computing 4/2021 Zur Ausgabe

Premium Partner