skip to main content
10.1145/2749469.2749475acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article
Open Access

Heracles: improving resource efficiency at scale

Published:13 June 2015Publication History

ABSTRACT

User-facing, latency-sensitive services, such as websearch, underutilize their computing resources during daily periods of low traffic. Reusing those resources for other tasks is rarely done in production services since the contention for shared resources can cause latency spikes that violate the service-level objectives of latency-sensitive tasks. The resulting under-utilization hurts both the affordability and energy-efficiency of large-scale datacenters. With technology scaling slowing down, it becomes important to address this opportunity.

We present Heracles, a feedback-based controller that enables the safe colocation of best-effort tasks alongside a latency-critical service. Heracles dynamically manages multiple hardware and software isolation mechanisms, such as CPU, memory, and network isolation, to ensure that the latency-sensitive job meets latency targets while maximizing the resources given to best-effort tasks. We evaluate Heracles using production latency-critical and batch workloads from Google and demonstrate average server utilizations of 90% without latency violations across all the load and colocation scenarios that we evaluated.

References

  1. "Iperf - The TCP/UDP Bandwidth Measurement Tool," https://iperf.fr/.Google ScholarGoogle Scholar
  2. "memcached," http://memcached.org/.Google ScholarGoogle Scholar
  3. "Intel® 64 and IA-32 Architectures Software Developer's Manual," vol. 3B: System Programming Guide, Part 2, Sep 2014.Google ScholarGoogle Scholar
  4. Mohammad Al-Fares et al., "A Scalable, Commodity Data Center Network Architecture," in Proc. of the ACM SIGCOMM 2008 Conference on Data Communication, ser. SIGCOMM '08. New York, NY: ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Mohammad Alizadeh et al., "Data Center TCP (DCTCP)," in Proc. of the ACM SIGCOMM 2010 Conference, ser. SIGCOMM '10. New York, NY: ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Luiz Barroso et al., "The Case for Energy-Proportional Computing," Computer, vol. 40, no. 12, Dec. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Luiz André Barroso et al., The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 2nd ed. Morgan & Claypool Publishers, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Adam Belay et al., "IX: A Protected Dataplane Operating System for High Throughput and Low Latency," in 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). Broomfield, CO: USENIX Association, Oct. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Sergey Blagodurov et al., "A Case for NUMA-aware Contention Management on Multicore Systems," in Proc. of the 2011 USENIX Conference on USENIX Annual Technical Conference, ser. USENIXATC'11. Berkeley, CA: USENIX Association, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Eric Boutin et al., "Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing," in 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). Broomfield, CO: USENIX Association, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bob Briscoe, "Flow Rate Fairness: Dismantling a Religion," SIGCOMM Comput. Commun. Rev., vol. 37, no. 2, Mar. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Martin A. Brown, "Traffic Control HOWTO," http://linux-ip.net/articles/Traffic-Control-HOWTO/.Google ScholarGoogle Scholar
  13. Marcus Carvalho et al., "Long-term SLOs for Reclaimed Cloud Computing Resources," in Proc. of SOCC, Seattle, WA, Dec. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. McKinsey & Company, "Revolutionizing data center efficiency," Uptime Institute Symp., 2008.Google ScholarGoogle Scholar
  15. Henry Cook et al., "A Hardware Evaluation of Cache Partitioning to Improve Utilization and Energy-efficiency While Preserving Responsiveness," in Proc. of the 40th Annual International Symposium on Computer Architecture, ser. ISCA '13. New York, NY: ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Carlo Curino et al., "Reservation-based Scheduling: If You're Late Don't Blame Us!" in Proc. of the 5th annual Symposium on Cloud Computing, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jeffrey Dean et al. "The tail at scale," Commun. ACM, vol. 56, no. 2, Feb. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Christina Delimitrou et al. "Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters," in Proc. of the 18th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Houston, TX, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Christina Delimitrou et al. "Quasar: Resource-Efficient and QoS-Aware Cluster Management," in Proc. of the Nineteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Salt Lake City, UT, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Eiman Ebrahimi et al. "Fairness via Source Throttling: A Configurable and High-performance Fairness Substrate for Multi-core Memory Systems," in Proc. of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS XV. New York, NY: ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Esmaeilzadeh et al. "Dark silicon and the end of multicore scaling," in Computer Architecture (ISCA), 2011 38th Annual International Symposium on, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Sriram Govindan et al. "Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines," in Proc. of the 2nd ACM Symposium on Cloud Computing, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Fei Guo et al. "From Chaos to QoS: Case Studies in CMP Resource Management," SIGARCH Comput. Archit. News, vol. 35, no. 1, Mar. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Fei Guo et al. "A Framework for Providing Quality of Service in Chip Multi-Processors," in Proc. of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 40. Washington, DC: IEEE Computer Society, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Nikos Hardavellas et al. "Toward Dark Silicon in Servers," IEEE Micro, vol. 31, no. 4, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Lisa R. Hsu et al. "Communist, Utilitarian, and Capitalist Cache Policies on CMPs: Caches As a Shared Resource," in Proc. of the 15th International Conference on Parallel Architectures and Compilation Techniques, ser. PACT '06. New York, NY: ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Intel, "Serial ATA II Native Command Queuing Overview," http://download.intel.com/support/chipsets/imsm/sb/sata2_ncq_overview.pdf, 2003.Google ScholarGoogle Scholar
  28. Teerawat Issariyakul et al. Introduction to Network Simulator NS2, 1st ed. Springer Publishing Company, Incorporated, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ravi Iyer, "CQoS: A Framework for Enabling QoS in Shared Caches of CMP Platforms," in Proc. of the 18th Annual International Conference on Supercomputing, ser. ICS '04. New York, NY: ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ravi Iyer et al. "QoS Policies and Architecture for Cache/Memory in CMP Platforms," in Proc. of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, ser. SIGMETRICS '07. New York, NY: ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Vijay Janapa Reddi et al. "Web Search Using Mobile Cores: Quantifying and Mitigating the Price of Efficiency," SIGARCH Comput. Archit. News, vol. 38, no. 3, Jun. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Min Kyu Jeong et al. "A QoS-aware Memory Controller for Dynamically Balancing GPU and CPU Bandwidth Use in an MPSoC," in Proc. of the 49th Annual Design Automation Conference, ser. DAC '12. New York, NY: ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Vimalkumar Jeyakumar et al. "EyeQ: Practical Network Performance Isolation at the Edge," in Proc. of the 10th USENIX Conference on Networked Systems Design and Implementation, ser. nsdi'13. Berkeley, CA: USENIX Association, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Svilen Kanev et al. "Tradeoffs between Power Management and Tail Latency in Warehouse-Scale Applications," in IISWC, 2014.Google ScholarGoogle Scholar
  35. Rishi Kapoor et al. "Chronos: Predictable Low Latency for Data Center Applications," in Proc. of the Third ACM Symposium on Cloud Computing, ser. SoCC '12. New York, NY: ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Harshad Kasture et al. "Ubik: Efficient Cache Sharing with Strict QoS for Latency-Critical Workloads," in Proc. of the 19th international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XIX), March 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Wonyoung Kim et al. "System level analysis of fast, per-core DVFS using on-chip switching regulators," in High Performance Computer Architecture, 2008. HPCA 2008. IEEE 14th International Symposium on, Feb 2008.Google ScholarGoogle Scholar
  38. Quoc Le et al. "Building high-level features using large scale unsupervised learning," in International Conference in Machine Learning, 2012.Google ScholarGoogle Scholar
  39. Jacob Leverich et al. "Reconciling High Server Utilization and Sub-millisecond Quality-of-Service," in SIGOPS European Conf. on Computer Systems (EuroSys), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Bin Li et al. "CoQoS: Coordinating QoS-aware Shared Resources in NoC-based SoCs," J. Parallel Distrib. Comput., vol. 71, no. 5, May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Kevin Lim et al. "Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached," in Proc. of the 40th Annual International Symposium on Computer Architecture, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Kevin Lim et al. "System-level Implications of Disaggregated Memory," in Proc. of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture, ser. HPCA '12. Washington, DC: IEEE Computer Society, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Jiang Lin et al. "Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems," in High Performance Computer Architecture, 2008. HPCA 2008. IEEE 14th International Symposium on, Feb 2008.Google ScholarGoogle Scholar
  44. Huan Liu, "A Measurement Study of Server Utilization in Public Clouds," in Dependable, Autonomic and Secure Computing (DASC), 2011 IEEE Ninth Intl. Conf. on, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Rose Liu et al. "Tessellation: Space-time Partitioning in a Manycore Client OS," in Proc. of the First USENIX Conference on Hot Topics in Parallelism, ser. HotPar'09. Berkeley, CA: USENIX Association, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Yanpei Liu et al. "SleepScale: Runtime Joint Speed Scaling and Sleep States Management for Power Efficient Data Centers," in Proceeding of the 41st Annual International Symposium on Computer Architecuture, ser. ISCA '14. Piscataway, NJ: IEEE Press, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. David Lo et al. "Towards Energy Proportionality for Large-scale Latency-critical Workloads," in Proceeding of the 41st Annual International Symposium on Computer Architecuture, ser. ISCA '14. Piscataway, NJ: IEEE Press, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Krishna T. Malladi et al. "Towards Energy-proportional Datacenter Memory with Mobile DRAM," SIGARCH Comput. Archit. News, vol. 40, no. 3, Jun. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. R Manikantan et al. "Probabilistic Shared Cache Management (PriSM)," in Proc. of the 39th Annual International Symposium on Computer Architecture, ser. ISCA '12. Washington, DC: IEEE Computer Society, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. J. Mars et al. "Increasing Utilization in Modern Warehouse-Scale Computers Using Bubble-Up," Micro, IEEE, vol. 32, no. 3, May 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Jason Mars et al. "Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible Co-locations," in Proc. of the 44th Annual IEEE/ACM Intl. Symp. on Microarchitecture, ser. MICRO-44 '11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Paul Marshall et al. "Improving Utilization of Infrastructure Clouds," in Proc. of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. David Meisner et al. "PowerNap: Eliminating Server Idle Power," in Proc. of the 14th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS XIV, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. David Meisner et al. "Power Management of Online Data-Intensive Services," in Proc. of the 38th ACM Intl. Symp. on Computer Architecture, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Paul Menage, "CGROUPS," https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt.Google ScholarGoogle Scholar
  56. Sai Prashanth Muralidhara et al. "Reducing Memory Interference in Multicore Systems via Application-aware Memory Channel Partitioning," in Proc. of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-44. New York, NY: ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Vijay Nagarajan et al. "ECMon: Exposing Cache Events for Monitoring," in Proc. of the 36th Annual International Symposium on Computer Architecture, ser. ISCA '09. New York, NY: ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. R. Nathuji et al. "Q-Clouds: Managing Performance Interference Effects for QoS-Aware Clouds," in Proc. of EuroSys, France, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. K. J. Nesbit et al. "Fair Queuing Memory Systems," in Microarchitecture, 2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on, Dec 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Dejan Novakovic et al. "DeepDive: Transparently Identifying and Managing Performance Interference in Virtualized Environments," in Proc. of the USENIX Annual Technical Conference (ATC'13), San Jose, CA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. W. Pattara-Aukom et al. "Starvation prevention and quality of service in wireless LANs," in Wireless Personal Multimedia Communications, 2002. The 5th International Symposium on, vol. 3, Oct 2002.Google ScholarGoogle Scholar
  62. M. Podlesny et al. "Solving the TCP-Incast Problem with Application-Level Scheduling," in Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS), 2012 IEEE 20th International Symposium on, Aug 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Andrew Putnam et al. "A Reconfigurable Fabric for Accelerating Large-scale Datacenter Services," in Proceeding of the 41st Annual International Symposium on Computer Architecuture, ser. ISCA '14. Piscataway, NJ: IEEE Press, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. M. K. Qureshi et al. "Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches," in Microarchitecture, 2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on, Dec 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Parthasarathy Ranganathan et al. "Reconfigurable Caches and Their Application to Media Processing," in Proc. of the 27th Annual International Symposium on Computer Architecture, ser. ISCA '00. New York, NY: ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Charles Reiss et al. "Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis," in ACM Symp. on Cloud Computing (SoCC), Oct. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Chuck Rosenberg, "Improving Photo Search: A Step Across the Semantic Gap," http://googleresearch.blogspot.com/2013/06/improving-photo-search-step-across.html.Google ScholarGoogle Scholar
  68. Daniel Sanchez et al. "Vantage: Scalable and Efficient Fine-grain Cache Partitioning," SIGARCH Comput. Archit. News, vol. 39, no. 3, Jun. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Yoon Jae Seong et al. "Hydra: A Block-Mapped Parallel Flash Memory Solid-State Disk Architecture," Computers, IEEE Transactions on, vol. 59, no. 7, July 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Akbar Sharifi et al. "METE: Meeting End-to-end QoS in Multicores Through System-wide Resource Management," in Proc. of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, ser. SIGMETRICS '11. New York, NY: ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Shekhar Srikantaiah et al. "SHARP Control: Controlled Shared Cache Management in Chip Multiprocessors," in Proc. of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 42. New York, NY: ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Shingo Tanaka et al. "High Performance Hardware-Accelerated Flash Key-Value Store," in The 2014 Non-volatile Memories Workshop (NVMW), 2014.Google ScholarGoogle Scholar
  73. Lingjia Tang et al. "The impact of memory subsystem resource sharing on datacenter applications," in Computer Architecture (ISCA), 2011 38th Annual International Symposium on, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Arunchandar Vasan et al. "Worth their watts? - an empirical study of datacenter servers," in Intl. Symp. on High-Performance Computer Architecture, 2010.Google ScholarGoogle Scholar
  75. Nedeljko Vasić et al. "DejaVu: accelerating resource allocation in virtualized environments," in Proc. of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), London, UK, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Christo Wilson et al. "Better Never Than Late: Meeting Deadlines in Datacenter Networks," in Proc. of the ACM SIGCOMM 2011 Conference, ser. SIGCOMM '11. New York, NY: ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Carole-Jean Wu et al. "A Comparison of Capacity Management Schemes for Shared CMP Caches," in Proc. of the 7th Workshop on Duplicating, Deconstructing, and Debunking, vol. 15. Citeseer, 2008.Google ScholarGoogle Scholar
  78. Yuejian Xie et al. "PIPP: Promotion/Insertion Pseudo-partitioning of Multi-core Shared Caches," in Proc. of the 36th Annual International Symposium on Computer Architecture, ser. ISCA '09. New York, NY: ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Hailong Yang et al. "Bubble-flux: Precise Online QoS Management for Increased Utilization in Warehouse Scale Computers," in Proc. of the 40th Annual Intl. Symp. on Computer Architecture, ser. ISCA '13, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Xiao Zhang et al. "CPI2: CPU performance isolation for shared compute clusters," in Proc. of the 8th ACM European Conference on Computer Systems (EuroSys), Prague, Czech Republic, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Yunqi Zhang et al. "SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers," in International Symposium on Microarchitecture (MICRO), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Heracles: improving resource efficiency at scale

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
              June 2015
              768 pages
              ISBN:9781450334020
              DOI:10.1145/2749469

              Copyright © 2015 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 13 June 2015

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate543of3,203submissions,17%

              Upcoming Conference

              ISCA '24

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader