ABSTRACT
User-facing, latency-sensitive services, such as websearch, underutilize their computing resources during daily periods of low traffic. Reusing those resources for other tasks is rarely done in production services since the contention for shared resources can cause latency spikes that violate the service-level objectives of latency-sensitive tasks. The resulting under-utilization hurts both the affordability and energy-efficiency of large-scale datacenters. With technology scaling slowing down, it becomes important to address this opportunity.
We present Heracles, a feedback-based controller that enables the safe colocation of best-effort tasks alongside a latency-critical service. Heracles dynamically manages multiple hardware and software isolation mechanisms, such as CPU, memory, and network isolation, to ensure that the latency-sensitive job meets latency targets while maximizing the resources given to best-effort tasks. We evaluate Heracles using production latency-critical and batch workloads from Google and demonstrate average server utilizations of 90% without latency violations across all the load and colocation scenarios that we evaluated.
- "Iperf - The TCP/UDP Bandwidth Measurement Tool," https://iperf.fr/.Google Scholar
- "memcached," http://memcached.org/.Google Scholar
- "Intel® 64 and IA-32 Architectures Software Developer's Manual," vol. 3B: System Programming Guide, Part 2, Sep 2014.Google Scholar
- Mohammad Al-Fares et al., "A Scalable, Commodity Data Center Network Architecture," in Proc. of the ACM SIGCOMM 2008 Conference on Data Communication, ser. SIGCOMM '08. New York, NY: ACM, 2008. Google ScholarDigital Library
- Mohammad Alizadeh et al., "Data Center TCP (DCTCP)," in Proc. of the ACM SIGCOMM 2010 Conference, ser. SIGCOMM '10. New York, NY: ACM, 2010. Google ScholarDigital Library
- Luiz Barroso et al., "The Case for Energy-Proportional Computing," Computer, vol. 40, no. 12, Dec. 2007. Google ScholarDigital Library
- Luiz André Barroso et al., The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 2nd ed. Morgan & Claypool Publishers, 2013. Google ScholarDigital Library
- Adam Belay et al., "IX: A Protected Dataplane Operating System for High Throughput and Low Latency," in 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). Broomfield, CO: USENIX Association, Oct. 2014. Google ScholarDigital Library
- Sergey Blagodurov et al., "A Case for NUMA-aware Contention Management on Multicore Systems," in Proc. of the 2011 USENIX Conference on USENIX Annual Technical Conference, ser. USENIXATC'11. Berkeley, CA: USENIX Association, 2011. Google ScholarDigital Library
- Eric Boutin et al., "Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing," in 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). Broomfield, CO: USENIX Association, 2014. Google ScholarDigital Library
- Bob Briscoe, "Flow Rate Fairness: Dismantling a Religion," SIGCOMM Comput. Commun. Rev., vol. 37, no. 2, Mar. 2007. Google ScholarDigital Library
- Martin A. Brown, "Traffic Control HOWTO," http://linux-ip.net/articles/Traffic-Control-HOWTO/.Google Scholar
- Marcus Carvalho et al., "Long-term SLOs for Reclaimed Cloud Computing Resources," in Proc. of SOCC, Seattle, WA, Dec. 2014. Google ScholarDigital Library
- McKinsey & Company, "Revolutionizing data center efficiency," Uptime Institute Symp., 2008.Google Scholar
- Henry Cook et al., "A Hardware Evaluation of Cache Partitioning to Improve Utilization and Energy-efficiency While Preserving Responsiveness," in Proc. of the 40th Annual International Symposium on Computer Architecture, ser. ISCA '13. New York, NY: ACM, 2013. Google ScholarDigital Library
- Carlo Curino et al., "Reservation-based Scheduling: If You're Late Don't Blame Us!" in Proc. of the 5th annual Symposium on Cloud Computing, 2014. Google ScholarDigital Library
- Jeffrey Dean et al. "The tail at scale," Commun. ACM, vol. 56, no. 2, Feb. 2013. Google ScholarDigital Library
- Christina Delimitrou et al. "Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters," in Proc. of the 18th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Houston, TX, 2013. Google ScholarDigital Library
- Christina Delimitrou et al. "Quasar: Resource-Efficient and QoS-Aware Cluster Management," in Proc. of the Nineteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Salt Lake City, UT, 2014. Google ScholarDigital Library
- Eiman Ebrahimi et al. "Fairness via Source Throttling: A Configurable and High-performance Fairness Substrate for Multi-core Memory Systems," in Proc. of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS XV. New York, NY: ACM, 2010. Google ScholarDigital Library
- H. Esmaeilzadeh et al. "Dark silicon and the end of multicore scaling," in Computer Architecture (ISCA), 2011 38th Annual International Symposium on, June 2011. Google ScholarDigital Library
- Sriram Govindan et al. "Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines," in Proc. of the 2nd ACM Symposium on Cloud Computing, 2011. Google ScholarDigital Library
- Fei Guo et al. "From Chaos to QoS: Case Studies in CMP Resource Management," SIGARCH Comput. Archit. News, vol. 35, no. 1, Mar. 2007. Google ScholarDigital Library
- Fei Guo et al. "A Framework for Providing Quality of Service in Chip Multi-Processors," in Proc. of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 40. Washington, DC: IEEE Computer Society, 2007. Google ScholarDigital Library
- Nikos Hardavellas et al. "Toward Dark Silicon in Servers," IEEE Micro, vol. 31, no. 4, 2011. Google ScholarDigital Library
- Lisa R. Hsu et al. "Communist, Utilitarian, and Capitalist Cache Policies on CMPs: Caches As a Shared Resource," in Proc. of the 15th International Conference on Parallel Architectures and Compilation Techniques, ser. PACT '06. New York, NY: ACM, 2006. Google ScholarDigital Library
- Intel, "Serial ATA II Native Command Queuing Overview," http://download.intel.com/support/chipsets/imsm/sb/sata2_ncq_overview.pdf, 2003.Google Scholar
- Teerawat Issariyakul et al. Introduction to Network Simulator NS2, 1st ed. Springer Publishing Company, Incorporated, 2010. Google ScholarDigital Library
- Ravi Iyer, "CQoS: A Framework for Enabling QoS in Shared Caches of CMP Platforms," in Proc. of the 18th Annual International Conference on Supercomputing, ser. ICS '04. New York, NY: ACM, 2004. Google ScholarDigital Library
- Ravi Iyer et al. "QoS Policies and Architecture for Cache/Memory in CMP Platforms," in Proc. of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, ser. SIGMETRICS '07. New York, NY: ACM, 2007. Google ScholarDigital Library
- Vijay Janapa Reddi et al. "Web Search Using Mobile Cores: Quantifying and Mitigating the Price of Efficiency," SIGARCH Comput. Archit. News, vol. 38, no. 3, Jun. 2010. Google ScholarDigital Library
- Min Kyu Jeong et al. "A QoS-aware Memory Controller for Dynamically Balancing GPU and CPU Bandwidth Use in an MPSoC," in Proc. of the 49th Annual Design Automation Conference, ser. DAC '12. New York, NY: ACM, 2012. Google ScholarDigital Library
- Vimalkumar Jeyakumar et al. "EyeQ: Practical Network Performance Isolation at the Edge," in Proc. of the 10th USENIX Conference on Networked Systems Design and Implementation, ser. nsdi'13. Berkeley, CA: USENIX Association, 2013. Google ScholarDigital Library
- Svilen Kanev et al. "Tradeoffs between Power Management and Tail Latency in Warehouse-Scale Applications," in IISWC, 2014.Google Scholar
- Rishi Kapoor et al. "Chronos: Predictable Low Latency for Data Center Applications," in Proc. of the Third ACM Symposium on Cloud Computing, ser. SoCC '12. New York, NY: ACM, 2012. Google ScholarDigital Library
- Harshad Kasture et al. "Ubik: Efficient Cache Sharing with Strict QoS for Latency-Critical Workloads," in Proc. of the 19th international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XIX), March 2014. Google ScholarDigital Library
- Wonyoung Kim et al. "System level analysis of fast, per-core DVFS using on-chip switching regulators," in High Performance Computer Architecture, 2008. HPCA 2008. IEEE 14th International Symposium on, Feb 2008.Google Scholar
- Quoc Le et al. "Building high-level features using large scale unsupervised learning," in International Conference in Machine Learning, 2012.Google Scholar
- Jacob Leverich et al. "Reconciling High Server Utilization and Sub-millisecond Quality-of-Service," in SIGOPS European Conf. on Computer Systems (EuroSys), 2014. Google ScholarDigital Library
- Bin Li et al. "CoQoS: Coordinating QoS-aware Shared Resources in NoC-based SoCs," J. Parallel Distrib. Comput., vol. 71, no. 5, May 2011. Google ScholarDigital Library
- Kevin Lim et al. "Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached," in Proc. of the 40th Annual International Symposium on Computer Architecture, 2013. Google ScholarDigital Library
- Kevin Lim et al. "System-level Implications of Disaggregated Memory," in Proc. of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture, ser. HPCA '12. Washington, DC: IEEE Computer Society, 2012. Google ScholarDigital Library
- Jiang Lin et al. "Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems," in High Performance Computer Architecture, 2008. HPCA 2008. IEEE 14th International Symposium on, Feb 2008.Google Scholar
- Huan Liu, "A Measurement Study of Server Utilization in Public Clouds," in Dependable, Autonomic and Secure Computing (DASC), 2011 IEEE Ninth Intl. Conf. on, 2011. Google ScholarDigital Library
- Rose Liu et al. "Tessellation: Space-time Partitioning in a Manycore Client OS," in Proc. of the First USENIX Conference on Hot Topics in Parallelism, ser. HotPar'09. Berkeley, CA: USENIX Association, 2009. Google ScholarDigital Library
- Yanpei Liu et al. "SleepScale: Runtime Joint Speed Scaling and Sleep States Management for Power Efficient Data Centers," in Proceeding of the 41st Annual International Symposium on Computer Architecuture, ser. ISCA '14. Piscataway, NJ: IEEE Press, 2014. Google ScholarDigital Library
- David Lo et al. "Towards Energy Proportionality for Large-scale Latency-critical Workloads," in Proceeding of the 41st Annual International Symposium on Computer Architecuture, ser. ISCA '14. Piscataway, NJ: IEEE Press, 2014. Google ScholarDigital Library
- Krishna T. Malladi et al. "Towards Energy-proportional Datacenter Memory with Mobile DRAM," SIGARCH Comput. Archit. News, vol. 40, no. 3, Jun. 2012. Google ScholarDigital Library
- R Manikantan et al. "Probabilistic Shared Cache Management (PriSM)," in Proc. of the 39th Annual International Symposium on Computer Architecture, ser. ISCA '12. Washington, DC: IEEE Computer Society, 2012. Google ScholarDigital Library
- J. Mars et al. "Increasing Utilization in Modern Warehouse-Scale Computers Using Bubble-Up," Micro, IEEE, vol. 32, no. 3, May 2012. Google ScholarDigital Library
- Jason Mars et al. "Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible Co-locations," in Proc. of the 44th Annual IEEE/ACM Intl. Symp. on Microarchitecture, ser. MICRO-44 '11, 2011. Google ScholarDigital Library
- Paul Marshall et al. "Improving Utilization of Infrastructure Clouds," in Proc. of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2011. Google ScholarDigital Library
- David Meisner et al. "PowerNap: Eliminating Server Idle Power," in Proc. of the 14th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS XIV, 2009. Google ScholarDigital Library
- David Meisner et al. "Power Management of Online Data-Intensive Services," in Proc. of the 38th ACM Intl. Symp. on Computer Architecture, 2011. Google ScholarDigital Library
- Paul Menage, "CGROUPS," https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt.Google Scholar
- Sai Prashanth Muralidhara et al. "Reducing Memory Interference in Multicore Systems via Application-aware Memory Channel Partitioning," in Proc. of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-44. New York, NY: ACM, 2011. Google ScholarDigital Library
- Vijay Nagarajan et al. "ECMon: Exposing Cache Events for Monitoring," in Proc. of the 36th Annual International Symposium on Computer Architecture, ser. ISCA '09. New York, NY: ACM, 2009. Google ScholarDigital Library
- R. Nathuji et al. "Q-Clouds: Managing Performance Interference Effects for QoS-Aware Clouds," in Proc. of EuroSys, France, 2010. Google ScholarDigital Library
- K. J. Nesbit et al. "Fair Queuing Memory Systems," in Microarchitecture, 2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on, Dec 2006. Google ScholarDigital Library
- Dejan Novakovic et al. "DeepDive: Transparently Identifying and Managing Performance Interference in Virtualized Environments," in Proc. of the USENIX Annual Technical Conference (ATC'13), San Jose, CA, 2013. Google ScholarDigital Library
- W. Pattara-Aukom et al. "Starvation prevention and quality of service in wireless LANs," in Wireless Personal Multimedia Communications, 2002. The 5th International Symposium on, vol. 3, Oct 2002.Google Scholar
- M. Podlesny et al. "Solving the TCP-Incast Problem with Application-Level Scheduling," in Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS), 2012 IEEE 20th International Symposium on, Aug 2012. Google ScholarDigital Library
- Andrew Putnam et al. "A Reconfigurable Fabric for Accelerating Large-scale Datacenter Services," in Proceeding of the 41st Annual International Symposium on Computer Architecuture, ser. ISCA '14. Piscataway, NJ: IEEE Press, 2014. Google ScholarDigital Library
- M. K. Qureshi et al. "Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches," in Microarchitecture, 2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on, Dec 2006. Google ScholarDigital Library
- Parthasarathy Ranganathan et al. "Reconfigurable Caches and Their Application to Media Processing," in Proc. of the 27th Annual International Symposium on Computer Architecture, ser. ISCA '00. New York, NY: ACM, 2000. Google ScholarDigital Library
- Charles Reiss et al. "Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis," in ACM Symp. on Cloud Computing (SoCC), Oct. 2012. Google ScholarDigital Library
- Chuck Rosenberg, "Improving Photo Search: A Step Across the Semantic Gap," http://googleresearch.blogspot.com/2013/06/improving-photo-search-step-across.html.Google Scholar
- Daniel Sanchez et al. "Vantage: Scalable and Efficient Fine-grain Cache Partitioning," SIGARCH Comput. Archit. News, vol. 39, no. 3, Jun. 2011. Google ScholarDigital Library
- Yoon Jae Seong et al. "Hydra: A Block-Mapped Parallel Flash Memory Solid-State Disk Architecture," Computers, IEEE Transactions on, vol. 59, no. 7, July 2010. Google ScholarDigital Library
- Akbar Sharifi et al. "METE: Meeting End-to-end QoS in Multicores Through System-wide Resource Management," in Proc. of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, ser. SIGMETRICS '11. New York, NY: ACM, 2011. Google ScholarDigital Library
- Shekhar Srikantaiah et al. "SHARP Control: Controlled Shared Cache Management in Chip Multiprocessors," in Proc. of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 42. New York, NY: ACM, 2009. Google ScholarDigital Library
- Shingo Tanaka et al. "High Performance Hardware-Accelerated Flash Key-Value Store," in The 2014 Non-volatile Memories Workshop (NVMW), 2014.Google Scholar
- Lingjia Tang et al. "The impact of memory subsystem resource sharing on datacenter applications," in Computer Architecture (ISCA), 2011 38th Annual International Symposium on, June 2011. Google ScholarDigital Library
- Arunchandar Vasan et al. "Worth their watts? - an empirical study of datacenter servers," in Intl. Symp. on High-Performance Computer Architecture, 2010.Google Scholar
- Nedeljko Vasić et al. "DejaVu: accelerating resource allocation in virtualized environments," in Proc. of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), London, UK, 2012. Google ScholarDigital Library
- Christo Wilson et al. "Better Never Than Late: Meeting Deadlines in Datacenter Networks," in Proc. of the ACM SIGCOMM 2011 Conference, ser. SIGCOMM '11. New York, NY: ACM, 2011. Google ScholarDigital Library
- Carole-Jean Wu et al. "A Comparison of Capacity Management Schemes for Shared CMP Caches," in Proc. of the 7th Workshop on Duplicating, Deconstructing, and Debunking, vol. 15. Citeseer, 2008.Google Scholar
- Yuejian Xie et al. "PIPP: Promotion/Insertion Pseudo-partitioning of Multi-core Shared Caches," in Proc. of the 36th Annual International Symposium on Computer Architecture, ser. ISCA '09. New York, NY: ACM, 2009. Google ScholarDigital Library
- Hailong Yang et al. "Bubble-flux: Precise Online QoS Management for Increased Utilization in Warehouse Scale Computers," in Proc. of the 40th Annual Intl. Symp. on Computer Architecture, ser. ISCA '13, 2013. Google ScholarDigital Library
- Xiao Zhang et al. "CPI2: CPU performance isolation for shared compute clusters," in Proc. of the 8th ACM European Conference on Computer Systems (EuroSys), Prague, Czech Republic, 2013. Google ScholarDigital Library
- Yunqi Zhang et al. "SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers," in International Symposium on Microarchitecture (MICRO), 2014. Google ScholarDigital Library
Index Terms
- Heracles: improving resource efficiency at scale
Recommendations
Heracles: improving resource efficiency at scale
ISCA'15User-facing, latency-sensitive services, such as websearch, underutilize their computing resources during daily periods of low traffic. Reusing those resources for other tasks is rarely done in production services since the contention for shared ...
Improving Resource Efficiency at Scale with Heracles
User-facing, latency-sensitive services, such as websearch, underutilize their computing resources during daily periods of low traffic. Reusing those resources for other tasks is rarely done in production services since the contention for shared ...
Heracles: a tool for fast RTL-based design space exploration of multicore processors
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Comments