Top

Cluster Computing

Published in:

02-08-2018

OMBM: optimized memory bandwidth management for ensuring QoS and high server utilization

Authors: Hanul Sung, Jeesoo Min, Sujin Ha, Hyeonsang Eom

Published in: Cluster Computing | Issue 1/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Latency-critical workloads such as web search engines, social networks and finance market applications are sensitive to tail latencies for meeting service level objectives (SLOs). Since unexpected tail latencies are caused by sharing hardware resources with other co-executing workloads, a service provider executes the latency-critical workload alone. Thus, the data center for the latency-critical workloads has exceedingly low hardware resource utilization. For improving hardware resource utilization, the service provider has to co-locate the latency-critical workloads and other batch processing ones. However, because the memory bandwidth cannot be provided in isolation unlike the cores and cache memory, the latency-critical workloads experience poor performance isolation even though the core and cache memory are allocated in isolation to the workloads. To solve this problem, we propose an optimized memory bandwidth management approach for ensuring quality of service (QoS) and high server utilization. By providing isolated shared resources including the memory bandwidth to the latency-critical workload and co-executing batch processing ones, firstly, our proposed approach performs few pre-profilings under the assumption that memory bandwidth contention is the worst with a divide and conquer method. Second, we predict the memory bandwidth to meet the SLO for all queries per seconds (QPSs) based on results of the pre-profilings. Then, our approach allocates the amount of the isolated memory bandwidth that guarantees the SLO to the latency-critical workload and the rest of the memory bandwidth to co-executing batch processing ones. It is experimentally found that our proposed approach can achieve up to 99% SLO assurance and improve the server utilization up to 6.5×.

previous article A simulation provenance data management system for efficient job execution on an online computational science engineering platform

next article A resource recommendation method based on dynamic cluster analysis of application characteristics

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Jalaparti, V., Bodik, P., Kandula, S., Menache, I., Rybalkin, M., Yan, C., Jalaparti, V., Bodik, P., Kandula, S., Menache, I., Rybalkin, M., Yan, C.: Speeding up distributed request-response workflows. In: Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM - SIGCOMM ’13, vol. 43, p. 219. ACM Press, New York (2013)

Xu, Y., Musgrave, Z., Noble, B., Bailey, M.: Bobtail: avoiding long tails in the cloud (2013)

Dabrowski, J.R., Munson, E.V.: Is 100 milliseconds too fast? In: CHI ’01 Extended Abstracts on Human Factors in Computing Systems—CHI ’01, p. 317. ACM Press, New York (2001)

Kapoor, R., Porter, G., Tewari, M., Voelker, G.M., Vahdat, A.: Chronos: predictable low latency for data center applications. In: Proceedings of the Third ACM Symposium on Cloud Computing—SoCC ’12, pp. 1–14. ACM Press, New York (2012)

Lalith, S., Canini, M., Schmid, S., Feldmann, A.: C3: cutting tail latency in cloud data stores via adaptive replica selection. In: Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation, USENIX Association, p. 296 (2015)

Wang, Q., Lai, C.-A., Kanemasa, Y., Zhang, S., Pu, C.: A study of long-tail latency in n-Tier systems: RPC vs. asynchronous invocations. In: Proceedings of the 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 207–217. IEEE (2017)

Kohavi, R., Longbotham, R.: Online experiments: lessons learned. Computer 40(9), 103–105 (2007)CrossRef

Zhu, T., Tumanov, A., Kozuch, M.A., Harchol-Balter, M., Ganger, G.R.: Prioritymeister: tail latency qos for shared networked storage. In: Proceedings of the ACM Symposium on Cloud Computing, SOCC ’14, pp. 29:1–29:14. ACM, New York (2014)

Govindan, S., Liu, J., Kansal, A., Sivasubramaniam, A.: Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines. In: Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC ’11, pp. 22:1–22:14. ACM, New York (2011)

10.

Mars, J., Tang, L., Hundt, R., Skadron, K., Soffa, M.L.: Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44, pp. 248–259. ACM, New York (2011)

11.

Nathuji, R., Kansal, A., Ghaffarkhah, A.: Q-clouds: managing performance interference effects for qos-aware clouds. In: Proceedings of the 5th European Conference on Computer Systems, EuroSys ’10, pp. 237–250. ACM , New York (2010)

12.

Kasture, H., Sanchez, D.: Ubik: efficient cache sharing with strict qos for latency-critical workloads. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’14, pp. 729–742. ACM, New York (2014)

13.

Barroso, L.A., Hoelzle, U.: The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 1st edn. Morgan and Claypool Publishers, San Rafael (2009)

14.

Yang, X., Blackburn, S.M., McKinley, K.S.: Elfen scheduling: fine-grain principled borrowing from latency-critical workloads using simultaneous multithreading. In: Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC 16), pp. 309–322. USENIX Association, Denver (2016)

15.

Lo, D., Cheng, L., Govindaraju, R., Ranganathan, P., Kozyrakis, C.: Heracles: improving resource efficiency at scale. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, ISCA ’15, pp. 450–462. ACM, New York (2015)

16.

Zhu, H., Erez, M.: Dirigent: enforcing qos for latency-critical tasks on shared multicore systems. In: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’16, pp. 33–47. ACM, New York (2016)

17.

Yun, H., Yao, G., Pellizzoni, R., Caccamo, M., Sha, L.: Memguard: memory bandwidth reservation system for efficient performance isolation in multi-core platforms. In: Proceedings of the 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pp. 55–64 (2013)

18.

Cook, H., Moreto, M., Bird, S., Dao, K., Patterson, D.A., Asanovic, K.: A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness. In: Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA ’13, pp. 308–319. ACM, New York (2013)

19.

Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A.D., Ailamaki, A., Falsafi, B.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pp. 37–48. ACM, New York (2012)

20.

Kasture, H., Sanchez, D.: Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In: 2016 IEEE International Symposium on Workload Characterization (IISWC), pp. 1–10 (2016)

21.

STREAM Benchmark: http://www.cs.virginia.edu/stream/ref.html (2017)

22.

Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The hibench benchmark suite: characterization of the mapreduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010), pp. 41–51 (2010)

23.

Hurt, K., John, E.: Analysis of memory sensitive spec cpu2006 integer benchmarks for big data benchmarking. In: Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems, PABS ’15, pp. 11–16. ACM, New York (2015)

24.

Mian, R., Martin, P., Vazquez-Poletti, J.L.: Provisioning data analytic workloads in a cloud. Future Gener. Comput. Syst. 29(6), 1452–1458 (2013)CrossRef

25.

Guo, F., Solihin, Y., Zhao, L., Iyer, R.: A framework for providing quality of service in chip multi-processors. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 40, pp. 343–355. IEEE Computer Society, Washington, DC (2007)

26.

Iyer, R.: Cqos: a framework for enabling qos in shared caches of cmp platforms. In: Proceedings of the 18th Annual International Conference on Supercomputing, ICS ’04, pp. 257–266. ACM, New York (2004)

27.

Iyer, R., Zhao, L., Guo, F., Illikkal, R., Makineni, S., Newell, D., Solihin, Y., Hsu, L., Reinhardt, S.: Qos policies and architecture for cache/memory in cmp platforms. In: Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’07, pp. 25–36. ACM, New York (2007)

28.

Sanchez, D., Kozyrakis, C.: Vantage: scalable and efficient fine-grain cache partitioning. In: Proceedings of the 38th Annual International Symposium on Computer Architecture, ISCA ’11, pp. 57–68. ACM, New York (2011)

29.

Srikantaiah, S., Kandemir, M., Wang, Q.: Sharp control: controlled shared cache management in chip multiprocessors. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pp. 517–528. ACM, New York (2009)

30.

Delimitrou, C., Kozyrakis, C.: Paragon: Qos-aware scheduling for heterogeneous datacenters. In: Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’13, pp. 77–88. ACM, New York (2013)

31.

Delimitrou, C., Kozyrakis, C.: Quasar: resource-efficient and qos-aware cluster management. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’14, pp. 127–144. ACM, New York (2014)

32.

Novaković, D., Vasić, N., Novaković, S., Kostić, D., Bianchini, R.: Deepdive: transparently identifying and managing performance interference in virtualized environments. In: Presented as Part of the 2013 USENIX Annual Technical Conference (USENIX ATC 13), pp. 219–230. USENIX, San Jose (2013)

33.

Vasić, N., Novaković, D., Miučin, S., Kostić, D., Bianchini, R.: Dejavu: accelerating resource allocation in virtualized environments. SIGARCH Comput. Arch. News 40(1), 423–436 (2012)CrossRef

34.

Yang, H., Breslow, A., Mars, J., Tang, L.: Bubble-flux: precise online qos management for increased utilization in warehouse scale computers. In: Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA ’13, pp. 607–618. ACM, New York (2013)

Title: OMBM: optimized memory bandwidth management for ensuring QoS and high server utilization
Authors: Hanul Sung
Jeesoo Min
Sujin Ha
Hyeonsang Eom
Publication date: 02-08-2018
Publisher: Springer US
Published in: Cluster Computing / Issue 1/2019
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI: https://doi.org/10.1007/s10586-018-2828-1

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 1/2019

A resource recommendation method based on dynamic cluster analysis of application characteristics

Towards Sustainable High-Performance Transaction Processing in Cloud-based DBMS

RFTL: improving performance of selective caching-based page-level FTL through replication

On the effect of using rCUDA to provide CUDA acceleration to Xen virtual machines

Towards optimal scheduling policy for heterogeneous memory architecture in many-core system

A fog based load forecasting strategy for smart grids using big electrical data

Premium Partner