skip to main content
10.1145/2371536.2371547acmconferencesArticle/Chapter ViewAbstractPublication PagesicacConference Proceedingsconference-collections
research-article

AROMA: automated resource allocation and configuration of mapreduce environment in the cloud

Authors Info & Claims
Published:18 September 2012Publication History

ABSTRACT

Distributed data processing framework MapReduce is increasingly deployed in Clouds to leverage the pay-per-usage cloud computing model. Popular Hadoop MapReduce environment expects that end users determine the type and amount of Cloud resources for reservation as well as the configuration of Hadoop parameters. However, such resource reservation and job provisioning decisions require in-depth knowledge of system internals and laborious but often ineffective parameter tuning. We propose and develop AROMA, a system that automates the allocation of heterogeneous Cloud resources and configuration of Hadoop parameters for achieving quality of service goals while minimizing the incurred cost. It addresses the significant challenge of provisioning ad-hoc jobs that have performance deadlines in Clouds through a novel two-phase machine learning and optimization framework. Its technical core is a support vector machine based performance model that enables the integration of various aspects of resource provisioning and auto-configuration of Hadoop jobs. It adapts to ad-hoc jobs by robustly matching their resource utilization signature with previously executed jobs and making provisioning decisions accordingly. We implement AROMA as an automated job provisioning system for Hadoop MapReduce hosted in virtualized HP ProLiant blade servers. Experimental results show AROMA's effectiveness in providing performance guarantee of diverse Hadoop benchmark jobs while minimizing the cost of Cloud resource usage.

References

  1. A. Abouzid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin. Hadoopdb: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. In Proc. of the VLDB, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. X. Bu, J. Rao, and C.-Z. Xu. A reinforcement learning approach to online web system auto-configuration. In Proc. IEEE Int'l Conference on Distributed Computing Systems (ICDCS), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Chang and C. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears. MapReduce online. In Proc. USENIX NSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Dittrich, J.-A. Quiané-Ruiz, A. Jindal, Y. Kargin, V. Setty, and J. Schad. Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proc. of the VLDB, 3:515--529, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Geng, S. Chen, Y. Wu, R. Wu, G. Yang, and W. Zheng. Location-aware MapReduce in virtual cloud. In Proc. IEEE Int'l Conference on Parallel Processing (ICPP), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F. Goiri, K. Le, J. Guitart, J. Torres, and R. Bianchini. Intelligent placement of datacenters for internet services. In Proc. IEEE Int'l Conference on Distributed Computing Systems (ICDCS), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. Guanying, A. Butt, P. Pandey, and K. Gupta. A simulation approach to evaluating design decisions in MapReduce setups. In Proc. IEEE Int'l Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), 2009.Google ScholarGoogle ScholarCross RefCross Ref
  10. H. Herodotou and S. Babu. Profiling, what-if analysis, and cost-based optimization of MapReduce programs. In Proc. of the VLDB, 2011.Google ScholarGoogle Scholar
  11. b. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A platform for fine-grained resource sharing in the data center. In Proc. USENIX NSDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Kambatla, A. Pathak, and H. Pucha. Towards optimizing hadoop provisioning in the cloud. In HotCloud Workshop in conjunction with USENIX Annual Technical Conference, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Lama and X. Zhou. Autonomic provisioning with self-adaptive neural fuzzy control for end-to-end delay guarantee. In Proc. IEEE/ACM Int'l Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Lama and X. Zhou. PERFUME: Power and performance guarantee with fuzzy mimo control in virtualized servers. In Proc. IEEE Int'l Workshop on Quality of Service (IWQoS), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Lee, B. Chun, and H. K. Randy. Heterogeneity-aware resource allocation and scheduling in the cloud. In HotCloud Workshop in conjunction with USENIX Annual Technical Conference, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Lee, T. Luo, F. Wang, Y. Huai, Y. He, and X. Zhang. Ysmart: Yet another SQL-to-MapReduce translator. In Proc. IEEE Int'l Conference on Distributed Computing Systems (ICDCS), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. X. Meng, C. Isci, J. Kephart, L. Zhang, and E. Bouillet. Efficient resource provisioning in compute clouds via vm multiplexing. In Proc. Int'l Conference on Autonomic Computing (ICAC), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Polo, D. Carrera, Y. Becerra, J. Torres, E. Ayguade, M. Steinder, and I. Whalley. Performance-driven task co-scheduling for MapReduce environments. In Proc. of the IEEE/IFIP Network Operations and Management Symposium (NOMS), 2010.Google ScholarGoogle ScholarCross RefCross Ref
  19. J. Rao, X. Bu, C. Xu, L. Wang, and G. Yin. Vconf: A reinforcement learning approach to virtual machines auto-conguration. In Proc. IEEE Int'l Conference on Autonomic Computing Systems (ICAC), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Rao and C. Xu. CoSL: a coordinated statistical learning approach to measuring the capacity of multi-tier Websites. In Proc. IEEE Int'l Parallel and Distributed Processing Symposium (IPDPS), 2008.Google ScholarGoogle Scholar
  21. L. Shi, X. Li, and K. L. Tan. S3: An efficient shared scan scheduler on MapReduce framework. In Proc. IEEE Int'l Conference on Parallel Processing (ICPP), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Singh, U. Sharma, E. Cecchet, and P. Shenoy. Autonomic mix-aware provisioning for non-stationary data center workloads. In Proc. IEEE Int'l Conference on Autonomic Computing (ICAC), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Verma, L. Cherkasova, and R. Campbell. ARIA: automatic resource inference and allocation for MapReduce environments. In Proc. IEEE/ACM Int'l Conference on Autonomic Computing (ICAC), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Warneke and O. Kao. Exploiting dynamic resource allocation for efficient parallel data processing in the cloud. IEEE Trans. on Parallel and Distributed Systems, 22(6), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Xiong, Z. Wang, S. Malkowski, D. Jayasinghe, Q. Wang, and C. Pu. Economical and robust provisioning of n-tier cloud workloads: A multi-level control approach. In Proc. IEEE Int'l Conference on Distributed Computing Systems (ICDCS), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Xu and J. Fortes. A multi-objective approach to virtual machine management in datacenters. In Proc. of IEEE/ACM Int'l Conference on Autonomic computing (ICAC), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Zaharia, A. Konwinshi, A. D. Josepj, R. Katz, and I. Stoica. Improving MapReduce performance in heterogeneous environments. In Proc. the USENIX OSDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. AROMA: automated resource allocation and configuration of mapreduce environment in the cloud

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ICAC '12: Proceedings of the 9th international conference on Autonomic computing
            September 2012
            222 pages
            ISBN:9781450315203
            DOI:10.1145/2371536

            Copyright © 2012 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 18 September 2012

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader