Heracles: improving resource efficiency at scale

Authors:
David Lo

Stanford University

Stanford University
View Profile

,
Liqun Cheng

Google, Inc.

Google, Inc.
View Profile

,
Rama Govindaraju

Google, Inc.

Google, Inc.
View Profile

,
Parthasarathy Ranganathan

Google, Inc.

Google, Inc.
View Profile

,
Christos Kozyrakis

Stanford University

Stanford University
View Profile

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer ArchitectureJune 2015Pages 450–462https://doi.org/10.1145/2749469.2749475

Published:13 June 2015Publication History

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture

Pages 450–462

ABSTRACT

User-facing, latency-sensitive services, such as websearch, underutilize their computing resources during daily periods of low traffic. Reusing those resources for other tasks is rarely done in production services since the contention for shared resources can cause latency spikes that violate the service-level objectives of latency-sensitive tasks. The resulting under-utilization hurts both the affordability and energy-efficiency of large-scale datacenters. With technology scaling slowing down, it becomes important to address this opportunity.

We present Heracles, a feedback-based controller that enables the safe colocation of best-effort tasks alongside a latency-critical service. Heracles dynamically manages multiple hardware and software isolation mechanisms, such as CPU, memory, and network isolation, to ensure that the latency-sensitive job meets latency targets while maximizing the resources given to best-effort tasks. We evaluate Heracles using production latency-critical and batch workloads from Google and demonstrate average server utilizations of 90% without latency violations across all the load and colocation scenarios that we evaluated.

References

"Iperf - The TCP/UDP Bandwidth Measurement Tool," https://iperf.fr/.Google Scholar
"memcached," http://memcached.org/.Google Scholar
"Intel® 64 and IA-32 Architectures Software Developer's Manual," vol. 3B: System Programming Guide, Part 2, Sep 2014.Google Scholar
Mohammad Al-Fares et al., "A Scalable, Commodity Data Center Network Architecture," in Proc. of the ACM SIGCOMM 2008 Conference on Data Communication, ser. SIGCOMM '08. New York, NY: ACM, 2008. Google ScholarDigital Library
Mohammad Alizadeh et al., "Data Center TCP (DCTCP)," in Proc. of the ACM SIGCOMM 2010 Conference, ser. SIGCOMM '10. New York, NY: ACM, 2010. Google ScholarDigital Library
Luiz Barroso et al., "The Case for Energy-Proportional Computing," Computer, vol. 40, no. 12, Dec. 2007. Google ScholarDigital Library
Luiz André Barroso et al., The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 2nd ed. Morgan & Claypool Publishers, 2013. Google ScholarDigital Library
Adam Belay et al., "IX: A Protected Dataplane Operating System for High Throughput and Low Latency," in 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). Broomfield, CO: USENIX Association, Oct. 2014. Google ScholarDigital Library
Sergey Blagodurov et al., "A Case for NUMA-aware Contention Management on Multicore Systems," in Proc. of the 2011 USENIX Conference on USENIX Annual Technical Conference, ser. USENIXATC'11. Berkeley, CA: USENIX Association, 2011. Google ScholarDigital Library
Eric Boutin et al., "Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing," in 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). Broomfield, CO: USENIX Association, 2014. Google ScholarDigital Library
Bob Briscoe, "Flow Rate Fairness: Dismantling a Religion," SIGCOMM Comput. Commun. Rev., vol. 37, no. 2, Mar. 2007. Google ScholarDigital Library
Martin A. Brown, "Traffic Control HOWTO," http://linux-ip.net/articles/Traffic-Control-HOWTO/.Google Scholar
Marcus Carvalho et al., "Long-term SLOs for Reclaimed Cloud Computing Resources," in Proc. of SOCC, Seattle, WA, Dec. 2014. Google ScholarDigital Library
McKinsey & Company, "Revolutionizing data center efficiency," Uptime Institute Symp., 2008.Google Scholar
Henry Cook et al., "A Hardware Evaluation of Cache Partitioning to Improve Utilization and Energy-efficiency While Preserving Responsiveness," in Proc. of the 40th Annual International Symposium on Computer Architecture, ser. ISCA '13. New York, NY: ACM, 2013. Google ScholarDigital Library
Carlo Curino et al., "Reservation-based Scheduling: If You're Late Don't Blame Us!" in Proc. of the 5th annual Symposium on Cloud Computing, 2014. Google ScholarDigital Library
Jeffrey Dean et al. "The tail at scale," Commun. ACM, vol. 56, no. 2, Feb. 2013. Google ScholarDigital Library
Christina Delimitrou et al. "Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters," in Proc. of the 18th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Houston, TX, 2013. Google ScholarDigital Library
Christina Delimitrou et al. "Quasar: Resource-Efficient and QoS-Aware Cluster Management," in Proc. of the Nineteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Salt Lake City, UT, 2014. Google ScholarDigital Library
Eiman Ebrahimi et al. "Fairness via Source Throttling: A Configurable and High-performance Fairness Substrate for Multi-core Memory Systems," in Proc. of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS XV. New York, NY: ACM, 2010. Google ScholarDigital Library
H. Esmaeilzadeh et al. "Dark silicon and the end of multicore scaling," in Computer Architecture (ISCA), 2011 38th Annual International Symposium on, June 2011. Google ScholarDigital Library
Sriram Govindan et al. "Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines," in Proc. of the 2nd ACM Symposium on Cloud Computing, 2011. Google ScholarDigital Library
Fei Guo et al. "From Chaos to QoS: Case Studies in CMP Resource Management," SIGARCH Comput. Archit. News, vol. 35, no. 1, Mar. 2007. Google ScholarDigital Library
Fei Guo et al. "A Framework for Providing Quality of Service in Chip Multi-Processors," in Proc. of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 40. Washington, DC: IEEE Computer Society, 2007. Google ScholarDigital Library
Nikos Hardavellas et al. "Toward Dark Silicon in Servers," IEEE Micro, vol. 31, no. 4, 2011. Google ScholarDigital Library
Lisa R. Hsu et al. "Communist, Utilitarian, and Capitalist Cache Policies on CMPs: Caches As a Shared Resource," in Proc. of the 15th International Conference on Parallel Architectures and Compilation Techniques, ser. PACT '06. New York, NY: ACM, 2006. Google ScholarDigital Library
Intel, "Serial ATA II Native Command Queuing Overview," http://download.intel.com/support/chipsets/imsm/sb/sata2_ncq_overview.pdf, 2003.Google Scholar
Teerawat Issariyakul et al. Introduction to Network Simulator NS2, 1st ed. Springer Publishing Company, Incorporated, 2010. Google ScholarDigital Library
Ravi Iyer, "CQoS: A Framework for Enabling QoS in Shared Caches of CMP Platforms," in Proc. of the 18th Annual International Conference on Supercomputing, ser. ICS '04. New York, NY: ACM, 2004. Google ScholarDigital Library
Ravi Iyer et al. "QoS Policies and Architecture for Cache/Memory in CMP Platforms," in Proc. of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, ser. SIGMETRICS '07. New York, NY: ACM, 2007. Google ScholarDigital Library
Vijay Janapa Reddi et al. "Web Search Using Mobile Cores: Quantifying and Mitigating the Price of Efficiency," SIGARCH Comput. Archit. News, vol. 38, no. 3, Jun. 2010. Google ScholarDigital Library
Min Kyu Jeong et al. "A QoS-aware Memory Controller for Dynamically Balancing GPU and CPU Bandwidth Use in an MPSoC," in Proc. of the 49th Annual Design Automation Conference, ser. DAC '12. New York, NY: ACM, 2012. Google ScholarDigital Library
Vimalkumar Jeyakumar et al. "EyeQ: Practical Network Performance Isolation at the Edge," in Proc. of the 10th USENIX Conference on Networked Systems Design and Implementation, ser. nsdi'13. Berkeley, CA: USENIX Association, 2013. Google ScholarDigital Library
Svilen Kanev et al. "Tradeoffs between Power Management and Tail Latency in Warehouse-Scale Applications," in IISWC, 2014.Google Scholar
Rishi Kapoor et al. "Chronos: Predictable Low Latency for Data Center Applications," in Proc. of the Third ACM Symposium on Cloud Computing, ser. SoCC '12. New York, NY: ACM, 2012. Google ScholarDigital Library
Harshad Kasture et al. "Ubik: Efficient Cache Sharing with Strict QoS for Latency-Critical Workloads," in Proc. of the 19th international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XIX), March 2014. Google ScholarDigital Library
Wonyoung Kim et al. "System level analysis of fast, per-core DVFS using on-chip switching regulators," in High Performance Computer Architecture, 2008. HPCA 2008. IEEE 14th International Symposium on, Feb 2008.Google Scholar
Quoc Le et al. "Building high-level features using large scale unsupervised learning," in International Conference in Machine Learning, 2012.Google Scholar
Jacob Leverich et al. "Reconciling High Server Utilization and Sub-millisecond Quality-of-Service," in SIGOPS European Conf. on Computer Systems (EuroSys), 2014. Google ScholarDigital Library
Bin Li et al. "CoQoS: Coordinating QoS-aware Shared Resources in NoC-based SoCs," J. Parallel Distrib. Comput., vol. 71, no. 5, May 2011. Google ScholarDigital Library
Kevin Lim et al. "Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached," in Proc. of the 40th Annual International Symposium on Computer Architecture, 2013. Google ScholarDigital Library
Kevin Lim et al. "System-level Implications of Disaggregated Memory," in Proc. of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture, ser. HPCA '12. Washington, DC: IEEE Computer Society, 2012. Google ScholarDigital Library
Jiang Lin et al. "Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems," in High Performance Computer Architecture, 2008. HPCA 2008. IEEE 14th International Symposium on, Feb 2008.Google Scholar
Huan Liu, "A Measurement Study of Server Utilization in Public Clouds," in Dependable, Autonomic and Secure Computing (DASC), 2011 IEEE Ninth Intl. Conf. on, 2011. Google ScholarDigital Library
Rose Liu et al. "Tessellation: Space-time Partitioning in a Manycore Client OS," in Proc. of the First USENIX Conference on Hot Topics in Parallelism, ser. HotPar'09. Berkeley, CA: USENIX Association, 2009. Google ScholarDigital Library
Yanpei Liu et al. "SleepScale: Runtime Joint Speed Scaling and Sleep States Management for Power Efficient Data Centers," in Proceeding of the 41st Annual International Symposium on Computer Architecuture, ser. ISCA '14. Piscataway, NJ: IEEE Press, 2014. Google ScholarDigital Library
David Lo et al. "Towards Energy Proportionality for Large-scale Latency-critical Workloads," in Proceeding of the 41st Annual International Symposium on Computer Architecuture, ser. ISCA '14. Piscataway, NJ: IEEE Press, 2014. Google ScholarDigital Library
Krishna T. Malladi et al. "Towards Energy-proportional Datacenter Memory with Mobile DRAM," SIGARCH Comput. Archit. News, vol. 40, no. 3, Jun. 2012. Google ScholarDigital Library
R Manikantan et al. "Probabilistic Shared Cache Management (PriSM)," in Proc. of the 39th Annual International Symposium on Computer Architecture, ser. ISCA '12. Washington, DC: IEEE Computer Society, 2012. Google ScholarDigital Library
J. Mars et al. "Increasing Utilization in Modern Warehouse-Scale Computers Using Bubble-Up," Micro, IEEE, vol. 32, no. 3, May 2012. Google ScholarDigital Library
Jason Mars et al. "Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible Co-locations," in Proc. of the 44th Annual IEEE/ACM Intl. Symp. on Microarchitecture, ser. MICRO-44 '11, 2011. Google ScholarDigital Library
Paul Marshall et al. "Improving Utilization of Infrastructure Clouds," in Proc. of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2011. Google ScholarDigital Library
David Meisner et al. "PowerNap: Eliminating Server Idle Power," in Proc. of the 14th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS XIV, 2009. Google ScholarDigital Library
David Meisner et al. "Power Management of Online Data-Intensive Services," in Proc. of the 38th ACM Intl. Symp. on Computer Architecture, 2011. Google ScholarDigital Library
Paul Menage, "CGROUPS," https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt.Google Scholar
Sai Prashanth Muralidhara et al. "Reducing Memory Interference in Multicore Systems via Application-aware Memory Channel Partitioning," in Proc. of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-44. New York, NY: ACM, 2011. Google ScholarDigital Library
Vijay Nagarajan et al. "ECMon: Exposing Cache Events for Monitoring," in Proc. of the 36th Annual International Symposium on Computer Architecture, ser. ISCA '09. New York, NY: ACM, 2009. Google ScholarDigital Library
R. Nathuji et al. "Q-Clouds: Managing Performance Interference Effects for QoS-Aware Clouds," in Proc. of EuroSys, France, 2010. Google ScholarDigital Library
K. J. Nesbit et al. "Fair Queuing Memory Systems," in Microarchitecture, 2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on, Dec 2006. Google ScholarDigital Library
Dejan Novakovic et al. "DeepDive: Transparently Identifying and Managing Performance Interference in Virtualized Environments," in Proc. of the USENIX Annual Technical Conference (ATC'13), San Jose, CA, 2013. Google ScholarDigital Library
W. Pattara-Aukom et al. "Starvation prevention and quality of service in wireless LANs," in Wireless Personal Multimedia Communications, 2002. The 5th International Symposium on, vol. 3, Oct 2002.Google Scholar
M. Podlesny et al. "Solving the TCP-Incast Problem with Application-Level Scheduling," in Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS), 2012 IEEE 20th International Symposium on, Aug 2012. Google ScholarDigital Library
Andrew Putnam et al. "A Reconfigurable Fabric for Accelerating Large-scale Datacenter Services," in Proceeding of the 41st Annual International Symposium on Computer Architecuture, ser. ISCA '14. Piscataway, NJ: IEEE Press, 2014. Google ScholarDigital Library
M. K. Qureshi et al. "Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches," in Microarchitecture, 2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on, Dec 2006. Google ScholarDigital Library
Parthasarathy Ranganathan et al. "Reconfigurable Caches and Their Application to Media Processing," in Proc. of the 27th Annual International Symposium on Computer Architecture, ser. ISCA '00. New York, NY: ACM, 2000. Google ScholarDigital Library
Charles Reiss et al. "Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis," in ACM Symp. on Cloud Computing (SoCC), Oct. 2012. Google ScholarDigital Library
Chuck Rosenberg, "Improving Photo Search: A Step Across the Semantic Gap," http://googleresearch.blogspot.com/2013/06/improving-photo-search-step-across.html.Google Scholar
Daniel Sanchez et al. "Vantage: Scalable and Efficient Fine-grain Cache Partitioning," SIGARCH Comput. Archit. News, vol. 39, no. 3, Jun. 2011. Google ScholarDigital Library
Yoon Jae Seong et al. "Hydra: A Block-Mapped Parallel Flash Memory Solid-State Disk Architecture," Computers, IEEE Transactions on, vol. 59, no. 7, July 2010. Google ScholarDigital Library
Akbar Sharifi et al. "METE: Meeting End-to-end QoS in Multicores Through System-wide Resource Management," in Proc. of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, ser. SIGMETRICS '11. New York, NY: ACM, 2011. Google ScholarDigital Library
Shekhar Srikantaiah et al. "SHARP Control: Controlled Shared Cache Management in Chip Multiprocessors," in Proc. of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 42. New York, NY: ACM, 2009. Google ScholarDigital Library
Shingo Tanaka et al. "High Performance Hardware-Accelerated Flash Key-Value Store," in The 2014 Non-volatile Memories Workshop (NVMW), 2014.Google Scholar
Lingjia Tang et al. "The impact of memory subsystem resource sharing on datacenter applications," in Computer Architecture (ISCA), 2011 38th Annual International Symposium on, June 2011. Google ScholarDigital Library
Arunchandar Vasan et al. "Worth their watts? - an empirical study of datacenter servers," in Intl. Symp. on High-Performance Computer Architecture, 2010.Google Scholar
Nedeljko Vasić et al. "DejaVu: accelerating resource allocation in virtualized environments," in Proc. of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), London, UK, 2012. Google ScholarDigital Library
Christo Wilson et al. "Better Never Than Late: Meeting Deadlines in Datacenter Networks," in Proc. of the ACM SIGCOMM 2011 Conference, ser. SIGCOMM '11. New York, NY: ACM, 2011. Google ScholarDigital Library
Carole-Jean Wu et al. "A Comparison of Capacity Management Schemes for Shared CMP Caches," in Proc. of the 7th Workshop on Duplicating, Deconstructing, and Debunking, vol. 15. Citeseer, 2008.Google Scholar
Yuejian Xie et al. "PIPP: Promotion/Insertion Pseudo-partitioning of Multi-core Shared Caches," in Proc. of the 36th Annual International Symposium on Computer Architecture, ser. ISCA '09. New York, NY: ACM, 2009. Google ScholarDigital Library
Hailong Yang et al. "Bubble-flux: Precise Online QoS Management for Increased Utilization in Warehouse Scale Computers," in Proc. of the 40th Annual Intl. Symp. on Computer Architecture, ser. ISCA '13, 2013. Google ScholarDigital Library
Xiao Zhang et al. "CPI2: CPU performance isolation for shared compute clusters," in Proc. of the 8th ACM European Conference on Computer Systems (EuroSys), Prague, Czech Republic, 2013. Google ScholarDigital Library
Yunqi Zhang et al. "SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers," in International Symposium on Microarchitecture (MICRO), 2014. Google ScholarDigital Library

Index Terms

Heracles: improving resource efficiency at scale
1. Computer systems organization
  1. Embedded and cyber-physical systems
  2. Real-time systems
2. Hardware

Recommendations

Heracles: improving resource efficiency at scale
ISCA'15

User-facing, latency-sensitive services, such as websearch, underutilize their computing resources during daily periods of low traffic. Reusing those resources for other tasks is rarely done in production services since the contention for shared ...
Read More
Improving Resource Efficiency at Scale with Heracles

User-facing, latency-sensitive services, such as websearch, underutilize their computing resources during daily periods of low traffic. Reusing those resources for other tasks is rarely done in production services since the contention for shared ...
Read More
Heracles: a tool for fast RTL-based design space exploration of multicore processors
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
June 2015
768 pages
ISBN:9781450334020
DOI:10.1145/2749469
General Chair:
Debbie Marr
Intel
,
Program Chair:
David Albonesi
Cornell
ACM SIGARCH Computer Architecture News Volume 43, Issue 3S
ISCA'15
June 2015
745 pages
ISSN:0163-5964
DOI:10.1145/2872887
Editor:
Doug DeGroot
acm dot org
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 June 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate543of3,203submissions,17%
Upcoming Conference
ISCA '24

Sponsor:

sigarch

ISCA '24: The 51st Annual International Symposium on Computer Architecture

June 29 - July 3, 2024

Buenos Aires , Argentina
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 376
  Total Citations
  View Citations
- 5,232
  Total Downloads
- Downloads (Last 12 months)692
- Downloads (Last 6 weeks)139
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Heracles: improving resource efficiency at scale

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Heracles: improving resource efficiency at scale

Improving Resource Efficiency at Scale with Heracles

Heracles: a tool for fast RTL-based design space exploration of multicore processors