Skip to main content
Top
Published in: Cluster Computing 1/2019

02-08-2018

OMBM: optimized memory bandwidth management for ensuring QoS and high server utilization

Authors: Hanul Sung, Jeesoo Min, Sujin Ha, Hyeonsang Eom

Published in: Cluster Computing | Issue 1/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Latency-critical workloads such as web search engines, social networks and finance market applications are sensitive to tail latencies for meeting service level objectives (SLOs). Since unexpected tail latencies are caused by sharing hardware resources with other co-executing workloads, a service provider executes the latency-critical workload alone. Thus, the data center for the latency-critical workloads has exceedingly low hardware resource utilization. For improving hardware resource utilization, the service provider has to co-locate the latency-critical workloads and other batch processing ones. However, because the memory bandwidth cannot be provided in isolation unlike the cores and cache memory, the latency-critical workloads experience poor performance isolation even though the core and cache memory are allocated in isolation to the workloads. To solve this problem, we propose an optimized memory bandwidth management approach for ensuring quality of service (QoS) and high server utilization. By providing isolated shared resources including the memory bandwidth to the latency-critical workload and co-executing batch processing ones, firstly, our proposed approach performs few pre-profilings under the assumption that memory bandwidth contention is the worst with a divide and conquer method. Second, we predict the memory bandwidth to meet the SLO for all queries per seconds (QPSs) based on results of the pre-profilings. Then, our approach allocates the amount of the isolated memory bandwidth that guarantees the SLO to the latency-critical workload and the rest of the memory bandwidth to co-executing batch processing ones. It is experimentally found that our proposed approach can achieve up to 99% SLO assurance and improve the server utilization up to 6.5×.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Jalaparti, V., Bodik, P., Kandula, S., Menache, I., Rybalkin, M., Yan, C., Jalaparti, V., Bodik, P., Kandula, S., Menache, I., Rybalkin, M., Yan, C.: Speeding up distributed request-response workflows. In: Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM - SIGCOMM ’13, vol. 43, p. 219. ACM Press, New York (2013) Jalaparti, V., Bodik, P., Kandula, S., Menache, I., Rybalkin, M., Yan, C., Jalaparti, V., Bodik, P., Kandula, S., Menache, I., Rybalkin, M., Yan, C.: Speeding up distributed request-response workflows. In: Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM - SIGCOMM ’13, vol. 43, p. 219. ACM Press, New York (2013)
2.
go back to reference Xu, Y., Musgrave, Z., Noble, B., Bailey, M.: Bobtail: avoiding long tails in the cloud (2013) Xu, Y., Musgrave, Z., Noble, B., Bailey, M.: Bobtail: avoiding long tails in the cloud (2013)
3.
go back to reference Dabrowski, J.R., Munson, E.V.: Is 100 milliseconds too fast? In: CHI ’01 Extended Abstracts on Human Factors in Computing Systems—CHI ’01, p. 317. ACM Press, New York (2001) Dabrowski, J.R., Munson, E.V.: Is 100 milliseconds too fast? In: CHI ’01 Extended Abstracts on Human Factors in Computing Systems—CHI ’01, p. 317. ACM Press, New York (2001)
4.
go back to reference Kapoor, R., Porter, G., Tewari, M., Voelker, G.M., Vahdat, A.: Chronos: predictable low latency for data center applications. In: Proceedings of the Third ACM Symposium on Cloud Computing—SoCC ’12, pp. 1–14. ACM Press, New York (2012) Kapoor, R., Porter, G., Tewari, M., Voelker, G.M., Vahdat, A.: Chronos: predictable low latency for data center applications. In: Proceedings of the Third ACM Symposium on Cloud Computing—SoCC ’12, pp. 1–14. ACM Press, New York (2012)
5.
go back to reference Lalith, S., Canini, M., Schmid, S., Feldmann, A.: C3: cutting tail latency in cloud data stores via adaptive replica selection. In: Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation, USENIX Association, p. 296 (2015) Lalith, S., Canini, M., Schmid, S., Feldmann, A.: C3: cutting tail latency in cloud data stores via adaptive replica selection. In: Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation, USENIX Association, p. 296 (2015)
6.
go back to reference Wang, Q., Lai, C.-A., Kanemasa, Y., Zhang, S., Pu, C.: A study of long-tail latency in n-Tier systems: RPC vs. asynchronous invocations. In: Proceedings of the 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 207–217. IEEE (2017) Wang, Q., Lai, C.-A., Kanemasa, Y., Zhang, S., Pu, C.: A study of long-tail latency in n-Tier systems: RPC vs. asynchronous invocations. In: Proceedings of the 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 207–217. IEEE (2017)
7.
go back to reference Kohavi, R., Longbotham, R.: Online experiments: lessons learned. Computer 40(9), 103–105 (2007)CrossRef Kohavi, R., Longbotham, R.: Online experiments: lessons learned. Computer 40(9), 103–105 (2007)CrossRef
8.
go back to reference Zhu, T., Tumanov, A., Kozuch, M.A., Harchol-Balter, M., Ganger, G.R.: Prioritymeister: tail latency qos for shared networked storage. In: Proceedings of the ACM Symposium on Cloud Computing, SOCC ’14, pp. 29:1–29:14. ACM, New York (2014) Zhu, T., Tumanov, A., Kozuch, M.A., Harchol-Balter, M., Ganger, G.R.: Prioritymeister: tail latency qos for shared networked storage. In: Proceedings of the ACM Symposium on Cloud Computing, SOCC ’14, pp. 29:1–29:14. ACM, New York (2014)
9.
go back to reference Govindan, S., Liu, J., Kansal, A., Sivasubramaniam, A.: Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines. In: Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC ’11, pp. 22:1–22:14. ACM, New York (2011) Govindan, S., Liu, J., Kansal, A., Sivasubramaniam, A.: Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines. In: Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC ’11, pp. 22:1–22:14. ACM, New York (2011)
10.
go back to reference Mars, J., Tang, L., Hundt, R., Skadron, K., Soffa, M.L.: Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44, pp. 248–259. ACM, New York (2011) Mars, J., Tang, L., Hundt, R., Skadron, K., Soffa, M.L.: Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44, pp. 248–259. ACM, New York (2011)
11.
go back to reference Nathuji, R., Kansal, A., Ghaffarkhah, A.: Q-clouds: managing performance interference effects for qos-aware clouds. In: Proceedings of the 5th European Conference on Computer Systems, EuroSys ’10, pp. 237–250. ACM , New York (2010) Nathuji, R., Kansal, A., Ghaffarkhah, A.: Q-clouds: managing performance interference effects for qos-aware clouds. In: Proceedings of the 5th European Conference on Computer Systems, EuroSys ’10, pp. 237–250. ACM , New York (2010)
12.
go back to reference Kasture, H., Sanchez, D.: Ubik: efficient cache sharing with strict qos for latency-critical workloads. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’14, pp. 729–742. ACM, New York (2014) Kasture, H., Sanchez, D.: Ubik: efficient cache sharing with strict qos for latency-critical workloads. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’14, pp. 729–742. ACM, New York (2014)
13.
go back to reference Barroso, L.A., Hoelzle, U.: The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 1st edn. Morgan and Claypool Publishers, San Rafael (2009) Barroso, L.A., Hoelzle, U.: The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 1st edn. Morgan and Claypool Publishers, San Rafael (2009)
14.
go back to reference Yang, X., Blackburn, S.M., McKinley, K.S.: Elfen scheduling: fine-grain principled borrowing from latency-critical workloads using simultaneous multithreading. In: Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC 16), pp. 309–322. USENIX Association, Denver (2016) Yang, X., Blackburn, S.M., McKinley, K.S.: Elfen scheduling: fine-grain principled borrowing from latency-critical workloads using simultaneous multithreading. In: Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC 16), pp. 309–322. USENIX Association, Denver (2016)
15.
go back to reference Lo, D., Cheng, L., Govindaraju, R., Ranganathan, P., Kozyrakis, C.: Heracles: improving resource efficiency at scale. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, ISCA ’15, pp. 450–462. ACM, New York (2015) Lo, D., Cheng, L., Govindaraju, R., Ranganathan, P., Kozyrakis, C.: Heracles: improving resource efficiency at scale. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, ISCA ’15, pp. 450–462. ACM, New York (2015)
16.
go back to reference Zhu, H., Erez, M.: Dirigent: enforcing qos for latency-critical tasks on shared multicore systems. In: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’16, pp. 33–47. ACM, New York (2016) Zhu, H., Erez, M.: Dirigent: enforcing qos for latency-critical tasks on shared multicore systems. In: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’16, pp. 33–47. ACM, New York (2016)
17.
go back to reference Yun, H., Yao, G., Pellizzoni, R., Caccamo, M., Sha, L.: Memguard: memory bandwidth reservation system for efficient performance isolation in multi-core platforms. In: Proceedings of the 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pp. 55–64 (2013) Yun, H., Yao, G., Pellizzoni, R., Caccamo, M., Sha, L.: Memguard: memory bandwidth reservation system for efficient performance isolation in multi-core platforms. In: Proceedings of the 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pp. 55–64 (2013)
18.
go back to reference Cook, H., Moreto, M., Bird, S., Dao, K., Patterson, D.A., Asanovic, K.: A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness. In: Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA ’13, pp. 308–319. ACM, New York (2013) Cook, H., Moreto, M., Bird, S., Dao, K., Patterson, D.A., Asanovic, K.: A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness. In: Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA ’13, pp. 308–319. ACM, New York (2013)
19.
go back to reference Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A.D., Ailamaki, A., Falsafi, B.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pp. 37–48. ACM, New York (2012) Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A.D., Ailamaki, A., Falsafi, B.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pp. 37–48. ACM, New York (2012)
20.
go back to reference Kasture, H., Sanchez, D.: Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In: 2016 IEEE International Symposium on Workload Characterization (IISWC), pp. 1–10 (2016) Kasture, H., Sanchez, D.: Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In: 2016 IEEE International Symposium on Workload Characterization (IISWC), pp. 1–10 (2016)
22.
go back to reference Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The hibench benchmark suite: characterization of the mapreduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010), pp. 41–51 (2010) Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The hibench benchmark suite: characterization of the mapreduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010), pp. 41–51 (2010)
23.
go back to reference Hurt, K., John, E.: Analysis of memory sensitive spec cpu2006 integer benchmarks for big data benchmarking. In: Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems, PABS ’15, pp. 11–16. ACM, New York (2015) Hurt, K., John, E.: Analysis of memory sensitive spec cpu2006 integer benchmarks for big data benchmarking. In: Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems, PABS ’15, pp. 11–16. ACM, New York (2015)
24.
go back to reference Mian, R., Martin, P., Vazquez-Poletti, J.L.: Provisioning data analytic workloads in a cloud. Future Gener. Comput. Syst. 29(6), 1452–1458 (2013)CrossRef Mian, R., Martin, P., Vazquez-Poletti, J.L.: Provisioning data analytic workloads in a cloud. Future Gener. Comput. Syst. 29(6), 1452–1458 (2013)CrossRef
25.
go back to reference Guo, F., Solihin, Y., Zhao, L., Iyer, R.: A framework for providing quality of service in chip multi-processors. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 40, pp. 343–355. IEEE Computer Society, Washington, DC (2007) Guo, F., Solihin, Y., Zhao, L., Iyer, R.: A framework for providing quality of service in chip multi-processors. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 40, pp. 343–355. IEEE Computer Society, Washington, DC (2007)
26.
go back to reference Iyer, R.: Cqos: a framework for enabling qos in shared caches of cmp platforms. In: Proceedings of the 18th Annual International Conference on Supercomputing, ICS ’04, pp. 257–266. ACM, New York (2004) Iyer, R.: Cqos: a framework for enabling qos in shared caches of cmp platforms. In: Proceedings of the 18th Annual International Conference on Supercomputing, ICS ’04, pp. 257–266. ACM, New York (2004)
27.
go back to reference Iyer, R., Zhao, L., Guo, F., Illikkal, R., Makineni, S., Newell, D., Solihin, Y., Hsu, L., Reinhardt, S.: Qos policies and architecture for cache/memory in cmp platforms. In: Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’07, pp. 25–36. ACM, New York (2007) Iyer, R., Zhao, L., Guo, F., Illikkal, R., Makineni, S., Newell, D., Solihin, Y., Hsu, L., Reinhardt, S.: Qos policies and architecture for cache/memory in cmp platforms. In: Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’07, pp. 25–36. ACM, New York (2007)
28.
go back to reference Sanchez, D., Kozyrakis, C.: Vantage: scalable and efficient fine-grain cache partitioning. In: Proceedings of the 38th Annual International Symposium on Computer Architecture, ISCA ’11, pp. 57–68. ACM, New York (2011) Sanchez, D., Kozyrakis, C.: Vantage: scalable and efficient fine-grain cache partitioning. In: Proceedings of the 38th Annual International Symposium on Computer Architecture, ISCA ’11, pp. 57–68. ACM, New York (2011)
29.
go back to reference Srikantaiah, S., Kandemir, M., Wang, Q.: Sharp control: controlled shared cache management in chip multiprocessors. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pp. 517–528. ACM, New York (2009) Srikantaiah, S., Kandemir, M., Wang, Q.: Sharp control: controlled shared cache management in chip multiprocessors. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pp. 517–528. ACM, New York (2009)
30.
go back to reference Delimitrou, C., Kozyrakis, C.: Paragon: Qos-aware scheduling for heterogeneous datacenters. In: Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’13, pp. 77–88. ACM, New York (2013) Delimitrou, C., Kozyrakis, C.: Paragon: Qos-aware scheduling for heterogeneous datacenters. In: Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’13, pp. 77–88. ACM, New York (2013)
31.
go back to reference Delimitrou, C., Kozyrakis, C.: Quasar: resource-efficient and qos-aware cluster management. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’14, pp. 127–144. ACM, New York (2014) Delimitrou, C., Kozyrakis, C.: Quasar: resource-efficient and qos-aware cluster management. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’14, pp. 127–144. ACM, New York (2014)
32.
go back to reference Novaković, D., Vasić, N., Novaković, S., Kostić, D., Bianchini, R.: Deepdive: transparently identifying and managing performance interference in virtualized environments. In: Presented as Part of the 2013 USENIX Annual Technical Conference (USENIX ATC 13), pp. 219–230. USENIX, San Jose (2013) Novaković, D., Vasić, N., Novaković, S., Kostić, D., Bianchini, R.: Deepdive: transparently identifying and managing performance interference in virtualized environments. In: Presented as Part of the 2013 USENIX Annual Technical Conference (USENIX ATC 13), pp. 219–230. USENIX, San Jose (2013)
33.
go back to reference Vasić, N., Novaković, D., Miučin, S., Kostić, D., Bianchini, R.: Dejavu: accelerating resource allocation in virtualized environments. SIGARCH Comput. Arch. News 40(1), 423–436 (2012)CrossRef Vasić, N., Novaković, D., Miučin, S., Kostić, D., Bianchini, R.: Dejavu: accelerating resource allocation in virtualized environments. SIGARCH Comput. Arch. News 40(1), 423–436 (2012)CrossRef
34.
go back to reference Yang, H., Breslow, A., Mars, J., Tang, L.: Bubble-flux: precise online qos management for increased utilization in warehouse scale computers. In: Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA ’13, pp. 607–618. ACM, New York (2013) Yang, H., Breslow, A., Mars, J., Tang, L.: Bubble-flux: precise online qos management for increased utilization in warehouse scale computers. In: Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA ’13, pp. 607–618. ACM, New York (2013)
Metadata
Title
OMBM: optimized memory bandwidth management for ensuring QoS and high server utilization
Authors
Hanul Sung
Jeesoo Min
Sujin Ha
Hyeonsang Eom
Publication date
02-08-2018
Publisher
Springer US
Published in
Cluster Computing / Issue 1/2019
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-018-2828-1

Other articles of this Issue 1/2019

Cluster Computing 1/2019 Go to the issue

Premium Partner