skip to main content
10.1145/2148600.2148660acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
poster

ACM SRC poster: SpotMPI: auction-based high performance cloud computing

Published:12 November 2011Publication History

ABSTRACT

Cloud computing benefits extensively from economies of scale to provide cost effective computing. Recently, reliability has been introduced as a potential tradeoff point for delivering compute resources while decreasing further the price of cloud resources. The usage of fair market conditions create an environment where sellers and buyers of compute resources can benefit from trading their resources. The resource use efficiency can potentially be achieved as a result. While there are many advantages to the usage of auction-based infrastructure there are currently no practical computing platforms that can harness such volatile environments effectively. This research work reports a methodology and a toolkit designed to address the challenges of using volatile cloud-based auctioned resources for MPI applications.

Specifically we emphasize the use of dynamically adjusted optimal checkpoint-restart (CPR) intervals. We discuss an initial analytical model for dealing with price histories and selecting optimal checkpoint intervals. Also we describe the SpotMPI toolkit that can be used to achieve practical execution of MPI application on volatile auction-based cloud platforms. The result of this exploration is the synthesis of intrinsic dependencies that exist in MPI-based parallel applications with the publicly available price histories of HPC cloud resources on the Amazon cloud. We study algorithms with different computing v.s. communication complexities. Our results show counter-intuitive insights into the optimal bidding and application scaling strategies.

Skip Supplemental Material Section

Supplemental Material

References

  1. J. Daly. A higher order estimate of the optimum checkpoint interval for restart dumps. Future Generation Computer Systems, 22(3):303--312, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. E. Lusk. Fault tolerance in mpi programs. In Special issue of the Journal High Performance Computing Applications, IJHPCA, 2002.Google ScholarGoogle Scholar
  3. J. Shi. Program scalability analysis. In International Conference on Distributed and Parallel Processing, Geogetown University, Washington D.C., October 1997.Google ScholarGoogle Scholar
  4. J. Shi, M. Taifi, and A. Khreishah. Resource planning for parallel processing in the cloud. In High Performance Computing and Communications (HPCC), 2011 IEEE 13th International Conference on, pages 828--833. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Shi, M. Taifi, A. Khreishah, and J. Wu. Sustainable gpu computing at scale. In 14th IEEE International Conference in Computational Science and Engneering 2011, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Young. A first order approximation to the optimum checkpoint interval. Communications of the ACM, 17(9):530--531, 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Q. Zhang, E. Grses, R. Boutaba, and J. Xiao. Dynamic resource allocation for spot markets in clouds. In Proceedings of the 11th USENIX conference on Hot topics in management of internet, cloud, and enterprise networks and services, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ACM SRC poster: SpotMPI: auction-based high performance cloud computing

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
          November 2011
          166 pages
          ISBN:9781450310307
          DOI:10.1145/2148600

          Copyright © 2011 Author

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 November 2011

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • poster

          Acceptance Rates

          Overall Acceptance Rate1,516of6,373submissions,24%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader