ABSTRACT
Parity based RAID suffers from poor small write performance due to heavy parity update overhead. The recently proposed method EPLOG constructs a new stripe with updated data chunks without updating old parity chunks. However, due to skewness of data accesses, old versions of updated data chunks often need to be kept to protect other data chunks of the same stripe. This seriously hurts the efficiency of recovering system from device failures due to the need of reconstructing the preserved old data chunks on failed devices.
In this paper, we propose a Recovery Friendly Parity Logging scheme, called RFPL, which minimizes small write penalty and provides high recovery performance for SSD RAID. The key idea of RFPL is to reduce the mixture of old and new data chunks in a stripe by exploiting skewness of data accesses. RFPL constructs a new stripe with updated data chunks of the same old stripe. Since cold data chunks of the old stripe are rarely updated, it is likely that all of data chunks written to the new stripe are hot data and become old together within a short time span. This co-old of data chunks in a stripe effectively mitigates the total number of old data chunks which need to be preserved. We have implemented RFPL on a RAID-5 SSD array in Linux 4.3. Experimental results show that, compared with the Linux software RAID, RFPL reduces user I/O response time by 83.1% for normal state and 81.6% for reconstruction state. Compared with the state-of-the-art scheme EPLOG, RFPL reduces user I/O response time by 46.8% for normal state and 40.9% for reconstruction state. Our reliability analysis shows RFPL improves the mean time to data loss (MTTDL) by 9.36X and 1.44X compared with the Linux software RAID and EPLOG.
- 2006. blktrace User Guide. https://linux.die.net/man/8/blktrace. (2006).Google Scholar
- 2017. Intel Optane Memory. https://www.intel.cn/content/www/cn/zh/products/memory-storage/optane-memory/optane-32gb-m-2-80mm.html. (2017).Google Scholar
- 2018. SanDisk Solid State Driver. https://www.sandisk.com/. (2018).Google Scholar
- Ching-Che Chung and Hao-Hsiang Hsu. 2014. Partial parity cache and data cache management method to improve the performance of an SSD-based RAID. VLSI 22, 7 (2014), 1470--1480.Google Scholar
- Garth Gibson. 2007. Reflections on failure in post-terascale parallel computing. In International Conference on Parallel Processing. IEEE.Google Scholar
- Y Hu. 2013. Exploring and exploiting the multilevel parallelism inside SSDs for improved performance and endurance. IEEE TOC 62, 6 (2013), 1141--1151. Google ScholarDigital Library
- Soojun Im and Dongkun Shin. 2011. Flash-aware RAID techniques for dependable and high-performance flash memory SSD. TOC 60, 1 (2011), 80--92. Google ScholarDigital Library
- J Kim, D Lee, and Noh S H. 2015. Towards SLO Complying SSDs Through OPS Isolation. In FAST. 183--189. Google ScholarDigital Library
- Jaeho Kim and Jongmin Lee. 2013. Improving SSD reliability with RAID via elastic striping and anywhere. In DSN. IEEE, 1--12. Google ScholarDigital Library
- Yongkun Li, Helen HW Chan, Patrick PC Lee, and Yinlong Xu. 2016. Elastic Parity Logging for SSD RAID Arrays. In DSN. IEEE, 49--60.Google Scholar
- Marc Liberatore. 2007. Storage Performance Council. http://traces.cs.umass.edu/index.php/Storage/Storage. (2007).Google Scholar
- Bo Mao, Hong Jiang, Suzhen Wu, et al. 2012. HPDA: A hybrid parity-based disk array for enhanced performance and reliability. TOS 8, 1 (2012), 4. Google ScholarDigital Library
- Sangwhan Moon and A. L. Reddy. 2016. Does RAID improve lifetime of SSD arrays? Transactions on Storage (TOS) 12, 3 (2016), 11--29. Google ScholarDigital Library
- Dushyanth Narayanan and Austin Donnelly. 2008. Write off-loading: Practical power management for enterprise storage. TOS 4, 3 (2008), 10. Google ScholarDigital Library
- J. Ostergaard and E. Bueso. 2010. The Software-RAID HOWTO. http://www.tldp.org/HOWTO/Software-RAID-HOWTO.html. (2010).Google Scholar
- Yubiao Pan and Yongkun Li. 2015. Grouping-Based Elastic Striping with Hotness Awareness for Improving SSD RAID Performance. In DSN. IEEE, 160--171. Google ScholarDigital Library
- Amer A Paris J F. 2009. Using storage class memories to increase the reliability of two-dimensional RAID arrays. In International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE, 1--8.Google Scholar
- R Pawula. 1967. Generalizations and extensions of Fokker-Planck-Kolmogorov equations. IEEE Transactions on Information Theory 13, 1 (1967), 33--41. Google ScholarDigital Library
- Mendel Rosenblum. 1992. The design and implementation of a log-structured file system. ACM Transactions on Computer Systems 10, 1 (1992), 26--52. Google ScholarDigital Library
- Gibson G A Schroeder B. 2007. Disk failures in the real world: What does an mttf of 1, 000, 000 hours mean to you?. In FAST. 1--16. Google ScholarDigital Library
- Gibson G Stodolsky D. 1993. Parity logging overcoming the small write problem in redundant disk arrays. In SIGARCH Computer Architecture News. ACM, 64--75. Google ScholarDigital Library
- Jiang H Tian L, Feng D. 2007. PRO: A Popularity-based Multi-threaded Reconstruction Optimization for RAID-Structured Storage Systems. In FAST. 301--314. Google ScholarDigital Library
- Jiguang Wan and Wei Wu. 2017. DEFT-Cache: A Cost-Effective and Highly Reliable SSD Cache for RAID Storage. In IPDPS. IEEE, 102--111.Google Scholar
- Yang Q Wan J, Wang J. 2010. S2-RAID: A new RAID architecture for fast data recovery. In Mass Storage Systems and Technologies (MSST). IEEE, 1--9. Google ScholarDigital Library
- Suzhen Wu and Bo Mao. 2016. LDM: Log Disk Mirroring with Improved Performance and Reliability for SSD-Based Disk Arrays. TOS 12, 4 (2016), 22. Google ScholarDigital Library
- Feng D Wu S, Jiang H. 2009. WorkOut: I/O Workload Outsourcing for Boosting RAID Reconstruction Performance. In FAST. 239--252. Google ScholarDigital Library
- Jiang H et al Wu S, Feng D. 2009. JOR: A journal-guided reconstruction optimization for RAID-structured storage systems. In International Conference on Parallel and Distributed Systems (ICPADS). IEEE, 609--616. Google ScholarDigital Library
- Schwarz T et al Xin Q, Miller E L. 2003. Reliability mechanisms for very large storage systems. In Mass Storage Systems and Technologies. IEEE, 146--156. Google ScholarDigital Library
- Jie Yao, Hong Jiang, et al. 2016. Elastic-RAID: A New Architecture for Improved Availability of Parity-Based RAIDs by Elastic Mirroring. IPDPS 27, 4 (2016), 1044--1056. Google ScholarDigital Library
Recommendations
Modeling SSD RAID reliability under general settings
CF '18: Proceedings of the 15th ACM International Conference on Computing FrontiersSolid-state drives (SSDs) are susceptible to the limited number of program/erase (P/E) cycles and uncorrectable flash errors, and hence achieving high reliability of SSD storage systems is a critical issue. RAID provides a viable option for enhancing ...
Reconstruct versus read-modify writes in RAID
RAID5 (Redundant Arrays of Independent Disk level 5) is a popular paradigm, which uses parity to protect against single disk failures. A major shortcoming of RAID5 is the small write penalty, i.e., the cost of updating parity when a data block is ...
Grouping-Based Elastic Striping with Hotness Awareness for Improving SSD RAID Performance
DSN '15: Proceedings of the 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and NetworksRAID provides a good option to provide device-level fault tolerance. Conventional RAID usually updates parities with read-modify-write or read-reconstruct-write, which may introduce a lot of extra I/Os and thus significantly degrade SSD RAID ...
Comments