skip to main content
10.1145/2159352.2159359acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Easing the burdens of HPC file management

Published:13 November 2011Publication History

ABSTRACT

While the amount of data we can process and store grows, our ability to find data remains dependent upon our own memories more often than not. Manual metadata management is common among scientific users, consuming their time while not making use of the computing resources at hand. Our system design proposes to empower users with more powerful data finding tools, such as unified search spaces, provenance, and ranked file system search. By returning the responsibility of file management to the file system, we enable scientists to focus on their science without the need for a customized file organization scheme for their work.

References

  1. D. J. Abadi, S. R. Madden, and N. Hachem. Column-stores vs row-stores: How different are they really? June 2008.Google ScholarGoogle Scholar
  2. I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, and S. Mock. Kepler: An extensible system for design and execution of scientific workflows. 2004.Google ScholarGoogle Scholar
  3. S. Ames, N. Bobb, S. A. Brandt, A. Hiatt, C. Maltzahn, E. L. Miller, A. Neeman, and D. Tuteja. Richer file system metadata using links and attributes. In Proceedings of MSST 2005, Apr. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Bhagwat and N. Polyzotis. Searching a file system using inferred semantic links. In Proceedings of HYPERTEXT'05, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. J. Braam. The Lustre storage architecture. http://www.lustre.org/documentation.html, 2004.Google ScholarGoogle Scholar
  6. S. P. Callahan, J. Freire, E. Santos, C. E. Scheidegger, C. T. Silva, and H. T. Vo. Vistrails: Visualization meets data management. June 2006.Google ScholarGoogle Scholar
  7. S. B. Davidson and J. Freire. Provenance and scientific workflows: Challenges and opportunites. June 2008.Google ScholarGoogle Scholar
  8. E. Deelman, G. Singh, M. P. Atkinson, A. Chervenak, N. P. C. Hong, C. Kesselman, S. Patil, L. Pearlman, and M.-H. Su. Grid-based metadata services. International Conference on Scientific and Statistical Database Management, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Dumais, E. Cutrell, J. Cadiz, G. Jancke, R. Sarin, and D. C. Robbins. Stuff I've seen: a system for personal information retrieval and re-use. In Proceedings of ACM SIGIR'03, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Harizopoulos, D. J. Abadi, S. Madden, and M. Stonebraker. OLTP through the looking glass, and what we found there. June 2008.Google ScholarGoogle Scholar
  11. S. Harizopoulos, V. Liang, D. J. Abadi, and S. Madden. Performance tradeoffs in read-optimized databases. 2006.Google ScholarGoogle Scholar
  12. A. L. Holloway and D. J. Dewitt. Read-optimized databases, in depth. Proceedings of VLDB'08, 1, August 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. Pocock, P. Li, and T. Oinn. Taverna: a tool for building and running workflows of services. Nucleic Acids Research, 34(Web Server issue):729--732, July 2006.Google ScholarGoogle Scholar
  14. S. N. Jones, C. R. Strong, D. D. E. Long, and E. L. Miller. Tracking emigrant data via transient provenance. In Proceedings of USENIX TaPP'11, June 2011.Google ScholarGoogle Scholar
  15. M. Meseke. Using xml and xquery for data management in hpss. In Proceedings of MSST 2011, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K.-K. Muniswamy-Reddy, D. A. Holland, U. Braun, and M. Seltzer. Provenance-aware storage systems. In Proceedings of USENIX ATC'06, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999--66, Stanford InfoLab, November 1999.Google ScholarGoogle Scholar
  18. S. Ram and J. Liu. Understanding the semantics of data provenance to support active conceptual modeling. In Proceedings of Active Conceptual Modeling of Learning'06, 2006.Google ScholarGoogle Scholar
  19. R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, and B. Lyon. Design and implementation of the Sun network file system. In Proceedings of USENIX ATC'85, 1985.Google ScholarGoogle Scholar
  20. F. Schmuck and R. Haskin. GPFS: A shared-disk file system for large computing clusters. In Proceedings of FAST'02, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Shah, C. A. N. Soules, G. R. Ganger, and B. D. Noble. Using provenance to aid in personal file search. In USENIX ATC'07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. L. Simmhan, B. Plale, and D. Gannon. A framework for collecting provenance in data-centric scientific workflows. IEEE International Conference on Web Services, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. A. N. Soules and G. R. Ganger. Connections: using context to enhance file search. In Proceedings of SOSP'05, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Stonebraker, C. Bear, U. Çetintemel, M. Cherniack, T. Ge, N. Hachem, S. Harizopoulos, J. Lifter, J. Rogers, and S. Zdonik. One size fits all? part 2: Benchmarking results. January 2007.Google ScholarGoogle Scholar
  25. C. Strong, S. Jones, A. Parker-Wood, A. Holloway, and D. D. E. Long. Los Alamos National Laboratory Interviews. Technical Report UCSC-SSRC-11-06, University of California, Santa Cruz, Sept. 2011.Google ScholarGoogle Scholar
  26. R. W. Watson and R. A. Coyne. The parallel I/O architecture of the High Performance Storage System (HPSS). In Proceedings of MSS'95, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long, and C. Maltzahn. Ceph: A scalable, high-performance distributed file system. In Proceedings of OSDI'06, Nov. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. B. Welch, M. Unangst, Z. Abbasi, G. Gibson, B. Mueller, J. Small, J. Zelenka, and B. Zhou. Scalable performance of the Panasas parallel file system. In Proceedings of FAST'08, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Easing the burdens of HPC file management

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PDSW '11: Proceedings of the sixth workshop on Parallel Data Storage
        November 2011
        62 pages
        ISBN:9781450311038
        DOI:10.1145/2159352

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 November 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate17of41submissions,41%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader