Skip to main content
Erschienen in: Design Automation for Embedded Systems 3-4/2013

01.09.2013

Loop transformations for flash memory: cost models and performance effects

verfasst von: Joon-Young Paik, Tae-Sun Chung, Eun-Sun Cho

Erschienen in: Design Automation for Embedded Systems | Ausgabe 3-4/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Loop optimization, made of a sequence of loop transformations, plays an important role in performance improvement in data centric applications. Programs using flash memory are no exception to this, but, under certain conditions careless applications of specific loop transformations might cause unexpected results, due to the characteristics of flash memory and underlying management systems. In this article, we analyze how loop transformations affect the performance in flash translation layers (FTLs). First, we choose four loop structures which have distinct reference patterns and propose a cost model for each structure, reflecting the properties of flash memory. Then, using these cost models, we investigate how loop transformations affect the block associative sector translation (BAST)’s and fully associative sector translation (FAST)’s internal operations and analyze the performance effect of loop transformations experimentally. As a result, we find that some of the major loop transformations cause unexpected performance effects in those major FTLs under certain conditions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
To simplify the analysis and cost models, our work does not consider any layers interleaved between applications and FTL. We believe that such an assumption is reasonable starting point of complex models analysis. In addition, this assumption is immediately applicable for explicit I/O operations [19, 20]. In Fig. 1, iowrite() indicates a function to request explicit write operations [19].
 
2
lbn is calculated by dividing the lpn by the number of pages in a block. In Fig. 3, the spare area has lpn, and the number of pages per block is 4. Thus, the lbn within pbn, 10, is 1 (4 div 4).
 
3
The immature state means that a log block has available space.
 
Literatur
1.
Zurück zum Zitat Intel Corporation (1998) Understanding the flash translation layer (FTL) specification. Intel Technical Report AP-684 Intel Corporation (1998) Understanding the flash translation layer (FTL) specification. Intel Technical Report AP-684
2.
Zurück zum Zitat Chung TS, Park DJ, Park S, Lee DH, Lee SW, Song HJ (2009) A survey of flash translation layer. J Syst Archit 55:332–343CrossRef Chung TS, Park DJ, Park S, Lee DH, Lee SW, Song HJ (2009) A survey of flash translation layer. J Syst Archit 55:332–343CrossRef
3.
Zurück zum Zitat Kim JS, Kim JM, Noh SH, Min SL, Cho YK (2002) A space-efficient flash translation layer for compactflash systems. IEEE Trans Consum Electron 48:366–375CrossRef Kim JS, Kim JM, Noh SH, Min SL, Cho YK (2002) A space-efficient flash translation layer for compactflash systems. IEEE Trans Consum Electron 48:366–375CrossRef
4.
Zurück zum Zitat Lee SW, Park DJ, Chung TS, Lee DH, Park S, Song HJ (2007) A log buffer-based flash translation layer using fully-associative sector translation. ACM Trans Embed Comput Syst doi:10.1145/1275986.1275990 Lee SW, Park DJ, Chung TS, Lee DH, Park S, Song HJ (2007) A log buffer-based flash translation layer using fully-associative sector translation. ACM Trans Embed Comput Syst doi:10.​1145/​1275986.​1275990
5.
Zurück zum Zitat Jung D, Kang J, Jo H, Kim JS, Lee J (2010) Superblock FTL: a superblock-based flash translation layer with a hybrid address translation scheme. ACM Trans Embed Comput Syst 9:1–41CrossRef Jung D, Kang J, Jo H, Kim JS, Lee J (2010) Superblock FTL: a superblock-based flash translation layer with a hybrid address translation scheme. ACM Trans Embed Comput Syst 9:1–41CrossRef
6.
Zurück zum Zitat Park C, Cheon W, Kang J, Roh K, Cho W, Kim JS (2008) A re-configurable FTL architecture for NAND flash-based applications. ACM Trans Embed Comput Syst 7:1–23 Park C, Cheon W, Kang J, Roh K, Cho W, Kim JS (2008) A re-configurable FTL architecture for NAND flash-based applications. ACM Trans Embed Comput Syst 7:1–23
7.
Zurück zum Zitat Kim H, Ahn S (2008) BPLRU: a buffer management scheme for improving random writes in flash storage. In: Proceeding of USENIX conference on file and storage technologies (FAST), pp 1–14 Kim H, Ahn S (2008) BPLRU: a buffer management scheme for improving random writes in flash storage. In: Proceeding of USENIX conference on file and storage technologies (FAST), pp 1–14
8.
Zurück zum Zitat Jo H, Kang JU, Park SY, Kim JS, Lee J (2006) FAB: flash-aware buffer management policy for portable media players. IEEE Trans Consum Electron 52:485–493CrossRef Jo H, Kang JU, Park SY, Kim JS, Lee J (2006) FAB: flash-aware buffer management policy for portable media players. IEEE Trans Consum Electron 52:485–493CrossRef
9.
Zurück zum Zitat Kang S, Park S, Jung H, Shim H, Cha J (2009) Performance trade-offs in using nvram write buffer for flash memory-based storage devices. IEEE Trans Comput 58:744–758CrossRefMathSciNet Kang S, Park S, Jung H, Shim H, Cha J (2009) Performance trade-offs in using nvram write buffer for flash memory-based storage devices. IEEE Trans Comput 58:744–758CrossRefMathSciNet
10.
Zurück zum Zitat Shi L, Li J, Xue CJ, Yang C, Zhou X (2011) ExLRU: a unified write buffer cache management for flash memory. In: Proceeding of ACM international conference on embedded software (EMSOFT), pp 339–348 Shi L, Li J, Xue CJ, Yang C, Zhou X (2011) ExLRU: a unified write buffer cache management for flash memory. In: Proceeding of ACM international conference on embedded software (EMSOFT), pp 339–348
11.
Zurück zum Zitat Lee SW, Moon B (2007) Design of flash-based DBMS: an in-page logging approach. In: Proceeding of SIGMOD conference, pp 55–66 Lee SW, Moon B (2007) Design of flash-based DBMS: an in-page logging approach. In: Proceeding of SIGMOD conference, pp 55–66
12.
Zurück zum Zitat Kang WH, Lee SW, Moon B, Oh GH, Min C (2013) X-FTL: transactional FTL for SQLite databases. In: Proceeding of SIGMOD conference, pp 97–108 Kang WH, Lee SW, Moon B, Oh GH, Min C (2013) X-FTL: transactional FTL for SQLite databases. In: Proceeding of SIGMOD conference, pp 97–108
13.
Zurück zum Zitat Chen F, Luo T, Zhang X (2011) CAFTL: A content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives. In: Proceeding of USENIX Conference file and storage technologies (FAST), pp 77–90 Chen F, Luo T, Zhang X (2011) CAFTL: A content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives. In: Proceeding of USENIX Conference file and storage technologies (FAST), pp 77–90
14.
Zurück zum Zitat Kim J, Lee C, Lee S, Son S, Choi J, Yoon S, Lee HU, Kang S, Won Y, Cha J (2012) Deduplication in SSDs: model and quantitative analysis. In: Proceeding of IEEE conference massive data storage (MSST), pp 1–12 Kim J, Lee C, Lee S, Son S, Choi J, Yoon S, Lee HU, Kang S, Won Y, Cha J (2012) Deduplication in SSDs: model and quantitative analysis. In: Proceeding of IEEE conference massive data storage (MSST), pp 1–12
15.
Zurück zum Zitat Gupta A, Pisolkar R, Urgaonkar B, Sivasubramaniam A (2011) Leveraging value locality in optimizing NAND flash-based SSDs. In: Proceeding of USENIX conference on file and storage technologies (FAST), pp 91–103 Gupta A, Pisolkar R, Urgaonkar B, Sivasubramaniam A (2011) Leveraging value locality in optimizing NAND flash-based SSDs. In: Proceeding of USENIX conference on file and storage technologies (FAST), pp 91–103
16.
Zurück zum Zitat Debnath B, Sengupta S, Li J (2010) ChunkStash: speeding up inline storage deduplication using flash memory. In: Proceeding of USENIX Conference on file and storage technologies (FAST) Debnath B, Sengupta S, Li J (2010) ChunkStash: speeding up inline storage deduplication using flash memory. In: Proceeding of USENIX Conference on file and storage technologies (FAST)
17.
Zurück zum Zitat Muchnick SS (1998) Advanced compiler design and implementation. Morgan Kaufmann Publishers Inc., San Francisco Muchnick SS (1998) Advanced compiler design and implementation. Morgan Kaufmann Publishers Inc., San Francisco
18.
Zurück zum Zitat Mckinley KS, Carr S, Tseng CW (1996) Improving data locality with loop transformations. ACM Trans Program Lang Syst 18(4):424–453CrossRef Mckinley KS, Carr S, Tseng CW (1996) Improving data locality with loop transformations. ACM Trans Program Lang Syst 18(4):424–453CrossRef
19.
Zurück zum Zitat Kandemir M, Choudhary A, Ramanujam J (2002) An I/O-conscious tiling strategy for disk-resident data sets. J Supercomput 21(3):257–284CrossRefMATH Kandemir M, Choudhary A, Ramanujam J (2002) An I/O-conscious tiling strategy for disk-resident data sets. J Supercomput 21(3):257–284CrossRefMATH
20.
Zurück zum Zitat Mowry TC, Demke AK, Krieger O (1996) Automatic compiler-inserted I/O prefetching for out-of-core applications. In: Proceeding of USENIX symposium on operating systems design and implementation (OSDI), pp 3–17 Mowry TC, Demke AK, Krieger O (1996) Automatic compiler-inserted I/O prefetching for out-of-core applications. In: Proceeding of USENIX symposium on operating systems design and implementation (OSDI), pp 3–17
21.
Zurück zum Zitat Samsung Electronics (2004) NAND flash memory & smartmedia data book Samsung Electronics (2004) NAND flash memory & smartmedia data book
22.
Zurück zum Zitat Shin D (2010) Workload-driven adaptive log buffer-based FTL. IEICE Electron Express 7(11):804–809CrossRef Shin D (2010) Workload-driven adaptive log buffer-based FTL. IEICE Electron Express 7(11):804–809CrossRef
23.
Zurück zum Zitat Lee D, Shin F, Kim Y, Kim J (2008) LAST: locality-aware sector translation for NAND flash memory-based storage systems. Proc ACM SIGOPS Oper Syst Rev 42(6):36–42CrossRef Lee D, Shin F, Kim Y, Kim J (2008) LAST: locality-aware sector translation for NAND flash memory-based storage systems. Proc ACM SIGOPS Oper Syst Rev 42(6):36–42CrossRef
24.
Zurück zum Zitat Shin H, Jung D, Kim J, Kim J, Maeng S (2010) Co-optimization of buffer layer and FTL in high-performance flash-based storage systems. Des Autom Embed Syst 14:415–443CrossRef Shin H, Jung D, Kim J, Kim J, Maeng S (2010) Co-optimization of buffer layer and FTL in high-performance flash-based storage systems. Des Autom Embed Syst 14:415–443CrossRef
25.
Zurück zum Zitat Bouganim L, Jonsson B, Bonnet P (2009) uFlip: understanding flash IO patterns. In: Proceeding of conference on innovative data systems research (CIDR) Bouganim L, Jonsson B, Bonnet P (2009) uFlip: understanding flash IO patterns. In: Proceeding of conference on innovative data systems research (CIDR)
26.
Zurück zum Zitat Paik JY, Chung TS, Cho ES (2013) Cost model based analyses on performance effects of loop transformations in block associative sector translation. In: Proceeding of IEEE/IFIP international conference on embedded and ubiquitous computing (EUC), pp 1998–2005 Paik JY, Chung TS, Cho ES (2013) Cost model based analyses on performance effects of loop transformations in block associative sector translation. In: Proceeding of IEEE/IFIP international conference on embedded and ubiquitous computing (EUC), pp 1998–2005
27.
Zurück zum Zitat Zhang W, Leiss EL (2001) A compiler driven out-of-core programming approach for optimizing data locality in loop nests. In: Proceeding of international conference on parallel and distributed processing techniques and applications (PDPTA), pp 25–28 Zhang W, Leiss EL (2001) A compiler driven out-of-core programming approach for optimizing data locality in loop nests. In: Proceeding of international conference on parallel and distributed processing techniques and applications (PDPTA), pp 25–28
28.
Zurück zum Zitat Johnson T and Shasha D (1994) 2Q: A low overhead high performance buffer management replacement algorithm. In: Proceeding of international conference on very large data bases (VLDB), pp 439–450 Zhang W, pp 25–28 Johnson T and Shasha D (1994) 2Q: A low overhead high performance buffer management replacement algorithm. In: Proceeding of international conference on very large data bases (VLDB), pp 439–450 Zhang W, pp 25–28
29.
Zurück zum Zitat Megiddo N and Modha DS (2003) ARC: a self-tuning, low overhead replacement cache. In: Proceeding of USENIX conference on file and storage technogies (FAST), pp 115–130 Megiddo N and Modha DS (2003) ARC: a self-tuning, low overhead replacement cache. In: Proceeding of USENIX conference on file and storage technogies (FAST), pp 115–130
30.
Zurück zum Zitat Jo H, Kang K, Park S, Kim J, Lee J (2006) FAB: flash-aware buffer management replacment algorithm. IEEE Trans Consum Electron 52(2):485–493CrossRef Jo H, Kang K, Park S, Kim J, Lee J (2006) FAB: flash-aware buffer management replacment algorithm. IEEE Trans Consum Electron 52(2):485–493CrossRef
31.
Zurück zum Zitat Kim H and Ahn S (2008) BPLRU: a buffer management scheme for improving random writes in flash storage. In: Proceeding of USENIX conference on file and storage technogies (FAST), pp 1–14 Kim H and Ahn S (2008) BPLRU: a buffer management scheme for improving random writes in flash storage. In: Proceeding of USENIX conference on file and storage technogies (FAST), pp 1–14
32.
Zurück zum Zitat Laforest E (2010) Survey of loop transformation techniques. Technical Report. University of Toronto, Toronto Laforest E (2010) Survey of loop transformation techniques. Technical Report. University of Toronto, Toronto
33.
Zurück zum Zitat Qiu M, Wu J, Xue C.J., Hu JA, Tseng W-C, and Sha E H-M (2008) Loop scheduling and assignment to minimize energy while hiding latency for heterogeneous multi-bank memory. In: Proceedings of IEEE international conference on field programmable logic and applications (FPL), pp 459–462 Qiu M, Wu J, Xue C.J., Hu JA, Tseng W-C, and Sha E H-M (2008) Loop scheduling and assignment to minimize energy while hiding latency for heterogeneous multi-bank memory. In: Proceedings of IEEE international conference on field programmable logic and applications (FPL), pp 459–462
34.
Zurück zum Zitat Xue C, Shao Z, Chen Y, and Sha E H-M (2005) Optimizing DSP scheduling via address assignment with array and loop transformation. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 85–88 Xue C, Shao Z, Chen Y, and Sha E H-M (2005) Optimizing DSP scheduling via address assignment with array and loop transformation. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 85–88
35.
Zurück zum Zitat Xue CJ, Sha EH-M, Shao Z, Qiu M (2008) Effective loop partitioning and scheduling under memory and register dual constraints. In: Proceedings of IEEE/ACM design, automation and test in Europe (DATE), pp 1202–1207 Xue CJ, Sha EH-M, Shao Z, Qiu M (2008) Effective loop partitioning and scheduling under memory and register dual constraints. In: Proceedings of IEEE/ACM design, automation and test in Europe (DATE), pp 1202–1207
36.
Zurück zum Zitat Hosomi M, Yamagishi H, Yamamoto Y, Bessho K, Higo Y, Yamane K, Yamada H, Shoji M, Hachino H, Fukumoto C, Nagao H, Kano H (2005) A novel nonvolatile memory with spin torque transfer magnetization switching: spin-RAM. In: Proceeding of international electron devices meeting, pp 459–462 Hosomi M, Yamagishi H, Yamamoto Y, Bessho K, Higo Y, Yamane K, Yamada H, Shoji M, Hachino H, Fukumoto C, Nagao H, Kano H (2005) A novel nonvolatile memory with spin torque transfer magnetization switching: spin-RAM. In: Proceeding of international electron devices meeting, pp 459–462
37.
Zurück zum Zitat Qiu K, Zhao M, Fu C, Shi L, Xue CJ (2013) Migration-aware loop retiming for STT-RAM based hybrid cache for embedded systems. Proceeding of IEEE international conference on application-specific systems, architectures and processors (ASAP), pp 83–89 Qiu K, Zhao M, Fu C, Shi L, Xue CJ (2013) Migration-aware loop retiming for STT-RAM based hybrid cache for embedded systems. Proceeding of IEEE international conference on application-specific systems, architectures and processors (ASAP), pp 83–89
38.
Zurück zum Zitat Paik JY, Cho ES, Chung TS (2009) Performance improvement for flash memories using loop optimization. In: Proceeding of IEEE international conference on computational science and engineering (CSE), pp 508–513 Paik JY, Cho ES, Chung TS (2009) Performance improvement for flash memories using loop optimization. In: Proceeding of IEEE international conference on computational science and engineering (CSE), pp 508–513
39.
Zurück zum Zitat Kim S, Kwon K, Kim C, Jang C, Lee J, Min SL (2010) Demand paging techniques for flash memory using compiler post-pass optimizations. ACM Trans Embed Comput Syst 10(4):40 Kim S, Kwon K, Kim C, Jang C, Lee J, Min SL (2010) Demand paging techniques for flash memory using compiler post-pass optimizations. ACM Trans Embed Comput Syst 10(4):40
40.
Zurück zum Zitat Lin CC, Chen CL, and Tseng CH (2007) Source code arrangement of embedded java virtual machine for NAND flash memory. In: Proceeding of international conference on symposium on communications and information technologies (ISCIT), pp 152–157 Lin CC, Chen CL, and Tseng CH (2007) Source code arrangement of embedded java virtual machine for NAND flash memory. In: Proceeding of international conference on symposium on communications and information technologies (ISCIT), pp 152–157
41.
Zurück zum Zitat Gupta A, Kim Y, Urgaonkar B (2009) DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings. In: Proceeding of internationl conference on architectural support for programming languages and operating systems (ASPLOS) pp 229–240 Gupta A, Kim Y, Urgaonkar B (2009) DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings. In: Proceeding of internationl conference on architectural support for programming languages and operating systems (ASPLOS) pp 229–240
42.
Zurück zum Zitat Qin Z, Wang Y, Liu D, Shao Z (2010) Demand-based block-level address mapping in large-scale NAND flash storage systems. In: Proceeding of CODES + ISSS, pp 173–182 Qin Z, Wang Y, Liu D, Shao Z (2010) Demand-based block-level address mapping in large-scale NAND flash storage systems. In: Proceeding of CODES + ISSS, pp 173–182
43.
Zurück zum Zitat Qin Z, Wang Y, Liu D, Shao Z (2011) A two-level caching mechanism for demand-based page-level address mapping in NAND flash memory storage systems. In: Proceeding of international conference on IEEE real-time and embedded technology and applications symposium, pp 157–166 Qin Z, Wang Y, Liu D, Shao Z (2011) A two-level caching mechanism for demand-based page-level address mapping in NAND flash memory storage systems. In: Proceeding of international conference on IEEE real-time and embedded technology and applications symposium, pp 157–166
44.
Zurück zum Zitat Gniady C, Butt AR, Hu YC (2004) Program-counter-based pattern classification in buffer caching. In: Proceeding of USENIX conference, symposium on opearting systems design and implementation (OSDI), pp 27–27 Gniady C, Butt AR, Hu YC (2004) Program-counter-based pattern classification in buffer caching. In: Proceeding of USENIX conference, symposium on opearting systems design and implementation (OSDI), pp 27–27
45.
Zurück zum Zitat Brock J, Gu X, Bao B, Ding C (2013) Pacman: Program-assisted cache management. In: Proceeding of international symposium on memory management (ISMM), pp 39–50 Brock J, Gu X, Bao B, Ding C (2013) Pacman: Program-assisted cache management. In: Proceeding of international symposium on memory management (ISMM), pp 39–50
Metadaten
Titel
Loop transformations for flash memory: cost models and performance effects
verfasst von
Joon-Young Paik
Tae-Sun Chung
Eun-Sun Cho
Publikationsdatum
01.09.2013
Verlag
Springer US
Erschienen in
Design Automation for Embedded Systems / Ausgabe 3-4/2013
Print ISSN: 0929-5585
Elektronische ISSN: 1572-8080
DOI
https://doi.org/10.1007/s10617-014-9144-7

Weitere Artikel der Ausgabe 3-4/2013

Design Automation for Embedded Systems 3-4/2013 Zur Ausgabe

Premium Partner