skip to main content
article

A five-year study of file-system metadata

Published:01 October 2007Publication History
Skip Abstract Section

Abstract

For five years, we collected annual snapshots of file-system metadata from over 60,000 Windows PC file systems in a large corporation. In this article, we use these snapshots to study temporal changes in file size, file age, file-type frequency, directory size, namespace structure, file-system population, storage capacity and consumption, and degree of file modification. We present a generative model that explains the namespace structure and the distribution of directory sizes. We find significant temporal trends relating to the popularity of certain file types, the origin of file content, the way the namespace is used, and the degree of variation among file systems, as well as more pedestrian changes in size and capacities. We give examples of consequent lessons for designers of file systems and related software.

References

  1. Adya, A., Bolosky, W., Castro, M., Cermak, G., Chaiken, R., Douceur, J., Howell, J., Lorch, J., Theimer, M., and Wattenhofer, R.P. 2002. FARSITE: Federated, available, and reliable storage for an incompletely trusted environment. In Proceedings of the 5th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Boston, MA, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Agrawal, N.A., Bolosky, W.J., Douceur, J.R., and Lorch, J.R. 2007. A five-year study of file system metadata. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST), San Jose, CA, 31--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Arpaci-Dusseau, A.C. and Arpaci-Dusseau, R.H. 2001. Information and control in gray-box systems. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP), Banff, Canada, 43--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Barford, P. and Crovella, M. 1998. Generating representative web workloads for network and server performance evaluation. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), Madison, WI, 151--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bennett, J.M., Bauer, M.A., and Kinchlea, D. 1991. Characteristics of files in NFS environments. In Proceedings of the ACM SIGSMALL/PC Symposium on Small Systems, Toronto, Ontario, Candada, 33--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bolosky, W.J., Corbin, S., Goebel, D., and Douceur, J.R. 2000. Single instance storage in Windows 2000. In Proceedings of the 4th USENIX Windows Systems Symposium, Seattle, WA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bonwick, J. 2006. ZFS: The last word in file systems. http://www.opensolaris.org/os/community/zfs/docs/zfs_last.pdf.Google ScholarGoogle Scholar
  8. Chapman, G. 2002. Why does Explorer think I only want to see my documents? http://pubs.logicalexpressions.com/Pub0009/LPMArticle.asp?ID=189.Google ScholarGoogle Scholar
  9. Cox, L.P., Murray, C.D., and Noble, B.D. 2002. Pastiche: Making backup cheap and easy. In Proceedings of the 5th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Boston, MA, 285--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Douceur, J.R. and Bolosky, W.J. 1999. A large-scale study of file system contents. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), Atlanta, GA, 59--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Downey, A.B. 2001. The structural cause of file size distributions. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), Cambridge, MA, 328--329. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Evans, K.M. and Kuenning, G.H. 2002. A study of irregularities in file-size distributions. In Proceedings of the International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS), San Diego, CA.Google ScholarGoogle Scholar
  13. Freund, J.E. 1992. Mathematical Statistics, 5th ed. Prentice Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gribble, S.D., Manku, G.S., Roselli, D.S., Brewer, E.A., Gibson, T.J., and Miller, E.L. 1998. Self-Similarity in file systems. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), Madison, WI, 141--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gunawi, H.S., Agrawal, N., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H., and Schindler, J. 2005. Deconstructing commodity storage clusters. In Proceedings of the 32nd International Symposium on Computer Architecture (ISCA), Madison, WI, 60--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Irlam, G. 1993. Unix file size survey -- 1993. http://www.base.com/gordoni/ufs93.html.Google ScholarGoogle Scholar
  17. Knuth, D.E. 1981. The Art of Computer Programming, Volume 2: Seminumerical Algorithms, 2nd ed. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mahmoud, H.M. 1992. Distances in random plane-oriented recursive trees. J. Comput. Appl. Math. 41, 237--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mesnier, M., Thereska, E., Ganger, G.R., Ellard, D., and Seltzer, M. 2004. File classification in self-* storage systems. In Proceedings of the 1st International Conference on Autonomic Computing (ICAC), New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Microsoft. 2006. SetFileTime. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wcecoreos5/html/wce50lrfsetfiletime.asp.Google ScholarGoogle Scholar
  21. Mitchell, S. 1997. Inside the Windows 95 file system. O'Reilly, Sebastopol, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mitzenmacher, M. 2004. Dynamic models for file sizes and double Pareto distributions. Internet Math. 1, 3, 305--333.Google ScholarGoogle ScholarCross RefCross Ref
  23. Mullender, S.J. and Tanenbaum, A.S. 1984. Immediate files. Softw. Pract. Exper. 14, 4 (Apr.), 365--368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ousterhout, J.K., Costa, H.D., Harrison, D., Kunze, J.A., Kupfer, M., and Thompson, J.G. 1985. A trace-driven analysis of the UNIX 4.2 BSD file system. In Proceedings of the 10th ACM Symposium on Operating Systems Principles (SOSP), Orcas Island, WA, 15--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Reiser, H. 2006. Three reasons why ReiserFS is great for you. http://www.namesys.com/.Google ScholarGoogle Scholar
  26. Roselli, D., Lorch, J.R., and Anderson, T.E. 2000. A comparison of file system workloads. In Proceedings of the USENIX Annual Technical Conference, San Diego, CA, 41--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Satyanarayanan, M. 1981. A study of file sizes and functional lifetimes. In Proceedings of the 8th ACM Symposium on Operating Systems Principles (SOSP), Pacific Grove, CA, 96--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sienknecht, T.F., Friedrich, R.J., Martinka, J.J., and Friedenbach, P.M. 1994. The implications of distributed data in a commercial environment on the design of hierarchical storage management. In Proceedings of the 16th IFIP Working Group 7.3 International Symposium on Computer Performance Modeling and Evaluation. 3--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Smith, K. and Seltzer, M. 1994. File layout and file system performance. Tech. Rep. TR-35-94, Harvard University.Google ScholarGoogle Scholar
  30. Solomon, D.A. 1998. Inside Windows NT, 2nd ed. Microsoft Press, Redmond, WA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Vogels, W. 1999. File system usage in Windows NT 4.0. In Proceedings of the 17th ACM Symposium on Operating Systems Principles (SOSP), Kiawah Island, SC, 93--109. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A five-year study of file-system metadata

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Storage
          ACM Transactions on Storage  Volume 3, Issue 3
          October 2007
          183 pages
          ISSN:1553-3077
          EISSN:1553-3093
          DOI:10.1145/1288783
          Issue’s Table of Contents

          Copyright © 2007 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 October 2007
          Published in tos Volume 3, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader