Abstract
Understanding workload characteristics is essential to storage systems design and performance optimization. With the emergence of flash memory as a new viable storage medium, the new design concern of flash endurance arises, necessitating a revisit of workload characteristics, in particular, of the write behavior. Inspired by Web caching studies where a Zipf-like access pattern is commonly found, we hypothesize that write count distribution at the block level may also follow Zipf’s Law. To validate this hypothesis, we study 48 block I/O traces collected from a wide variety of real and benchmark applications. Through extensive analysis, we demonstrate that the Zipf-like pattern indeed widely exists in write traffic provided its disguises are removed by statistical processing. This finding implies that write skew in a large class of applications could be analytically expressed and, thus, facilitates design tradeoff explorations adaptive to workload characteristics.
- Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark Manasse, and Rina Panigrahy. 2008. Design tradeoffs for SSD performance. In Proceedings of the USENIX 2008 Annual Technical Conference on Annual Technical Conference (ATC’08). USENIX Association, 57--70. Google ScholarDigital Library
- Martin F. Arlitt and Carey L. Williamson. 1997. Internet web servers: Workload characterization and performance implications. In IEEE/ACM Transactions on Networking. IEEE Press, 631--645. DOI:http://dx.doi.org/10.1109/90.649565 Google ScholarDigital Library
- J. Axboe. 2014. FIO (Flexible IO Tester). Retrieved from http://git.kernel.dk/?p=fio.git;a=summary.Google Scholar
- Bernd Blasius and Ralf Tönjes. 2009. Zipf’s law in the popularity distribution of chess openings. Physics Review Letters 103, 21 (Nov. 2009), 218701. DOI:http://dx.doi.org/10.1103/PhysRevLett.103.218701Google ScholarCross Ref
- L. Breslau, Pei Cao, Li Fan, G. Phillips, and S. Shenker. 1999. Web caching and Zipf-like distributions: Evidence and implications. In Proceedings of the 18th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings (INFOCOM’99), Vol. 1. IEEE, 126--134. DOI:http://dx.doi.org/10.1109/INFCOM.1999.749260Google Scholar
- Werner Bux and Ilias Iliadis. 2010. Performance of greedy garbage collection in flash-based solid-state drives. In Performance Evaluation, Vol. 67. Elsevier Science Publishers B. V., 1172--1186. DOI:http://dx.doi.org/10.1016/j.peva.2010.07.003 Google ScholarDigital Library
- Meeyoung Cha, Haewoon Kwak, Pablo Rodriguez, Yong-Yeol Ahn, and Sue Moon. 2007. I tube, you tube, everybody tubes: Analyzing the world’s largest user generated content video system. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (IMC’07). ACM, 1--14. DOI:http://dx.doi.org/10.1145/1298306.1298309 Google ScholarDigital Library
- Li-Pin Chang and Chun-Da Du. 2009. Design and implementation of an efficient wear-leveling algorithm for solid-state-disk microcontrollers. ACM Transactions on Design Automation of Electronic Systems 15, 1, Article 6 (Dec. 2009), 36 pages. DOI:http://dx.doi.org/10.1145/1640457.1640463 Google ScholarDigital Library
- Feng Chen, Tian Luo, and Xiaodong Zhang. 2011. CAFTL: A content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives. In Proceedings of the 9th USENIX Conference on File and Stroage Technologies (FAST’11). USENIX Association, 1. Google ScholarDigital Library
- Maureen Chesire, Alec Wolman, Geoffrey M. Voelker, and Henry M. Levy. 2001. Measurement and analysis of a streaming-media workload. In Proceedings of the 3rd Conference on USENIX Symposium on Internet Technologies and Systems - Volume 3 (USITS’01). USENIX Association, 1. Google ScholarDigital Library
- Edward Chlebus. 2009. An approximate formula for a partial sum of the divergent p-series. In Applied Mathematics Letters. Elsevier Ltd, 732--737. DOI:http://dx.doi.org/10.1016/j.aml.2008.07.007Google Scholar
- Pasquale Cirillo. 2013. Are your data really Pareto distributed? Physica A: Statistical Mechanics and Its Applications (2013). DOI:http://dx.doi.org/10.1016/j.physa.2013.07.061Google Scholar
- Aaron Clauset, Cosma Rohilla Shalizi, and M. E. J. Newman. 2009. Power-law distributions in empirical data. In SIAM Review. Society for Industrial and Applied Mathematics, 661--703. DOI:http://dx.doi.org/10.1137/070710111 Google ScholarDigital Library
- Carlos R. Cunha, Azer Bestavros, and Mark E. Crovella. 1995. Characteristics of WWW Client-based Traces. Technical Report BU-CS-95-010. Computer Science Department, Boston University, oston, MA 02215. Google Scholar
- Peter Desnoyers. 2012. Analytic modeling of SSD write performance. In Proceedings of the 5th Annual International Systems and Storage Conference (SYSTOR’12). ACM, Article 12, 10 pages. DOI:http://dx.doi.org/10.1145/2367589.2367603 Google ScholarDigital Library
- Peter Desnoyers. 2014. Analytic models of SSD write performance. Transactions on Storage 10, Article 8 (March 2014), 25 pages. DOI:http://dx.doi.org/10.1145/2577384 Google ScholarDigital Library
- ETW. 2012. ETW: Event Tracing for Windows. Retrieved from http://msdn.microsoft.com/en-us/library/bb968803%28VS.85%29.aspx.Google Scholar
- Phillipa Gill, Martin Arlitt, Zongpeng Li, and Anirban Mahanti. 2007. Youtube traffic characterization: A view from the edge. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (IMC’07). ACM, 15--28. DOI:http://dx.doi.org/10.1145/1298306.1298310 Google ScholarDigital Library
- Lei Guo, Enhua Tan, Songqing Chen, Zhen Xiao, and Xiaodong Zhang. 2008. The stretched exponential distribution of internet media access patterns. In Proceedings of the 27th ACM Symposium on Principles of Distributed Computing (PODC’08). ACM, 283--294. DOI:http://dx.doi.org/10.1145/1400751.1400789 Google ScholarDigital Library
- Aayush Gupta, Raghav Pisolkar, Bhuvan Urgaonkar, and Anand Sivasubramaniam. 2011. Leveraging value locality in optimizing NAND flash-based SSDs. In Proceedings of the 9th USENIX Conference on File and Stroage Technologies (FAST’11). USENIX Association, 1. Google ScholarDigital Library
- Jen-Wei Hsieh, Tei-Wei Kuo, and Li-Pin Chang. 2006. Efficient identification of hot data for flash memory storage systems. In Transactions on Storage. ACM, 22--40. DOI:http://dx.doi.org/10.1145/1138041.1138043 Google ScholarDigital Library
- S. Kavalanekar, V. Sharda, B. L. Worthington, and Q. Zhang. 2008. Characterization of storage workload traces from production windows servers. In Proceedings of the 4th International Symposium on Workload Characterization (IISWC’08). IEEE, New York, NY.Google Scholar
- Isao Kotera, Ryusuke Egawa, Hiroyuki Takizawa, and Hiroaki Kobayashi. 2008. Modeling of cache access behavior based on Zipf’s law. In Proceedings of the 9th Workshop on Memory Performance: Dealing with Applications, Systems and Architecture (MEDEA’08). ACM, 9--15. DOI:http://dx.doi.org/10.1145/1509084.1509086 Google ScholarDigital Library
- Jongsung Lee and Jin-Soo Kim. 2013. An empirical study of hot/cold data separation policies in solid state drives (SSDs). In Proceedings of the 6th International Systems and Storage Conference (SYSTOR’13). ACM, Article 12, 6 pages. DOI:http://dx.doi.org/10.1145/2485732.2485745 Google ScholarDigital Library
- Microsoft News Centre. 2013. The Big Bang: How the Big Data Explosion Is Changing the World. Retrieved from http://www.microsoft.com/en-us/news/features/2013/feb13/02-11bigdata.aspx.Google Scholar
- Dushyanth Narayanan, Austin Donnelly, and Antony Rowstron. 2008. Write off-loading: Practical power management for enterprise storage. In Transactions on Storage. ACM, Article 10, 23 pages. DOI:http://dx.doi.org/10.1145/1416944.1416949 Google ScholarDigital Library
- Storage Networking Industry Association. 2011. IOTTA Repository. Retrieved from http://iotta.snia.org/.Google Scholar
- Mark E. J. Newman. 2005. Power laws, pareto distributions and Zipf’s law. Contemporary Physics 46 (2005), 323--351. http://arxiv.org/abs/cond-mat/0412004.Google ScholarCross Ref
- Chanik Park, Wonmoon Cheon, Jeonguk Kang, Kangho Roh, Wonhee Cho, and Jin-Soo Kim. 2008. A reconfigurable FTL (flash translation layer) architecture for NAND flash-based applications. ACM Transactions in Embedded Computing Systems 7, 4, Article 38 (Aug. 2008), 23 pages. DOI:http://dx.doi.org/10.1145/1376804.1376806 Google ScholarDigital Library
- Dongchul Park and David H. C. Du. 2011. Hot data identification for flash-based storage systems using multiple bloom filters. In Proceedings of the 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST’11). IEEE Computer Society, 1--11. DOI:http://dx.doi.org/10.1109/MSST.2011.5937216 Google ScholarDigital Library
- Alma Riska and Erik Riedel. 2006. Disk drive level workload characterization. In Proceedings of the Annual Conference on USENIX’06 Annual Technical Conference (ATEC’06). USENIX Association, 9. Google ScholarDigital Library
- Storage Performance Council. 2002. OLTP Application I/O. Retrieved from http://traces.cs.umass.edu/index.php/Storage/Storage.Google Scholar
- Storage Performance Council. 2009. SPC benchmark 2C™, (SPC-2C) official specification. Retrieved from http://www.storageperformance.org/specs/spc2c_v1.2.pdf.Google Scholar
- Wenting Tang, Yun Fu, Ludmila Cherkasova, and Amin Vahdat. 2003. MediSyn: A synthetic streaming media service workload generator. In Proceedings of the 13th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV’03). ACM, 12--21. DOI:http://dx.doi.org/10.1145/776322.776327 Google ScholarDigital Library
- Vernon Turner, Stephen Minton, Vernon Turner, and David Reinsel. 2014. The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things. Retrieved from http://idcdocserv.com/1678.Google Scholar
- Benny Van Houdt. 2013. A mean field model for a class of garbage collection algorithms in flash-based solid state drives. In Proceedings of the ACM SIGMETRICS/International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’13). ACM, 191--202. DOI:http://dx.doi.org/10.1145/2465529.2465543 Google ScholarDigital Library
- Adepele Williams, Arlitt Martin, Carey Williamson, and Barker Ken. 2005. Web server workload characterization: Ten years later. In Web Information Systems Engineering and Internet Technologies. Springer, 3--21.Google Scholar
- Yue Yang and Jianwen Zhu. 2014. Analytical modeling of garbage collection algorithms in hotness-aware flash-based solid state drives. In Proceedings of the 30th International Conference on Massive Storage Systems and Technology (MSST’14). IEEE.Google ScholarCross Ref
- Hongliang Yu, Dongdong Zheng, Ben Y. Zhao, and Weimin Zheng. 2006. Understanding user behavior in large-scale video-on-demand systems. In Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006 (EuroSys’06). ACM, 333--344. DOI:http://dx.doi.org/10.1145/1217935.1217968 Google ScholarDigital Library
- George Kingsley Zipf. 1950. Human behavior and the principle of least effort. cambridge, (mass.): Addison-Wesley, 1949, pp. 573. Journal of Clinical Psychology 6, 3 (1950), 394--401. DOI:http://dx.doi.org/10.1002/1097-4679(195007)6:3<306::AID-JCLP2270060331>3.0.CO;2-7Google Scholar
Index Terms
- Write Skew and Zipf Distribution: Evidence and Implications
Recommendations
Write Activity Minimization for Nonvolatile Main Memory Via Scheduling and Recomputation
Nonvolatile memories such as Flash memory, phase change memory (PCM), and magnetic random access memory (MRAM) have many desirable characteristics for embedded systems to employ them as main memory. However, there are two common challenges we need to ...
Performance Trade-Offs in Using NVRAM Write Buffer for Flash Memory-Based Storage Devices
While NAND flash memory is used in a variety of end-user devices, it has a few disadvantages, such as asymmetric speed of read and write operations, inability to in-place updates, among others. To overcome these problems, various flash-aware strategies ...
Write amplification analysis in flash-based solid state drives
SYSTOR '09: Proceedings of SYSTOR 2009: The Israeli Experimental Systems ConferenceWrite amplification is a critical factor limiting the random write performance and write endurance in storage devices based on NAND-flash memories such as solid-state drives (SSD). The impact of garbage collection on write amplification is influenced by ...
Comments