nach oben

The Journal of Supercomputing

Erschienen in:

01.03.2016

Lowering the volatility: a practical cache allocation prediction and stability-oriented co-runner scheduling algorithms

verfasst von: Fei Wang, Xiaofeng Gao, Guihai Chen

Erschienen in: The Journal of Supercomputing | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The accurate and quantitative analysis of the cache behavior in a Chip Multi-Core (CMP) machine has long been a challenging work. So far there has been no practical way to predict the cache allocation, i.e., allocated cache size, of a running program. Lots of applications, especially those that have many interactions with the users, cache allocation should be estimated with high accuracy since its variation is closely related to the stability of system performance which is important to the efficient operation of servers and has a great influence on user experience. For these interests, this paper proposes an accurate prediction model for the allocation of the last level cache (LLC) of the co-runners. With a precise cache allocation predicted, we further implemented a performance-stability-oriented co-runner scheduling algorithm which aims to maximize the number of co-runners running in performance-stable state and minimize the performance variation of the unstable ones. We demonstrate that the proposed prediction algorithm exhibits a high accuracy with an average error of 5.7 %; and the co-runner scheduling algorithm can find the optimal solution under the specified target with a time complexity of O(n).

Vorheriger Artikel Erratum to: Design procedures and NML cost analysis of reversible barrel shifters optimizing garbage and ancilla lines

Nächster Artikel A performance study of incentive schemes in peer-to-peer file-sharing systems

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Eyerman S, Eeckhout L (2008) System-level performance metrics for multiprogram workloads. IEEE Micro 28:42–53. doi:10.1109/MM.2008.44 CrossRef

Cazorla FJ, Knijnenburg Peter MW, Sakellariou R, Fernandez E, Ramirez A, Valero M (2004) Predictable performance in SMT processors. In: Proceedings of the 1st conference on computing frontiers. ACM, New York, pp 433–443. doi:10.1145/977091.977152

Jiang Y, Shen X (2008) Exploration of the influence of program inputs on cmp co-scheduling. Euro Conf Parall Comp. 263–273. doi:10.1007/978-3-540-85451-7_29

Sandberg A, Sembrant A, Hagersten E, Black-Schaffer D (2013) Modeling Performance Variation Due to Cache Sharing. In: Proceedings International Symposium High Performance Computer Architecture (HPCA), pp 155–166. doi:10.1109/HPCA.2013.6522315

Chen Xi E, Aamodt Tor M (2012) Modeling cache contention and throughput of multiprogrammed manycore processors. IEEE Trans Comp 61:913–927. doi:10.1109/TC.2011.141 MathSciNetCrossRef

Chandra D, Guo F, Kim S, Solihin Y (2005) Predicting inter-thread cache contention on a chip multi-processor Architecture. In: Proceedings of International Symposium High-Performance Computer Architecture (HPCA), pp 76–86. doi:10.1109/HPCA.2005.27

Xu C, Chen X, Dick Rober P, Mao Zhuoqing M (2010) Cache contention and application performance prediction for multi-core systems. In: International Symposium Performance Analysis of Systems and Software (ISPASS). pp 76–86. doi:10.1109/ISPASS.2010.5452065

Xiang X, Ding C, Luo H, Bao B (2013) HOTL: a higher order theory of locality. In: Proceedings of the 18th Intl’ Conf on Architectural support for programming languages and operating systems (ASPLOS ’13). pp 343–356. doi:10.1145/2451116.2451153

Kim S, Chandra D, Solihin Y (2004) Fair cache sharing and partitioning on a chip multi-processor architecture. In: Proceedings of the 13th International Conference on Parallel Architecture and Compilation Techniques. pp 111–122. doi:10.1109/PACT.2004.15

10.

Qureshi MK, Patt YN (2006) Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proceedings of Annual International Symposium on Microarchitecture (MICRO), pp 111–122. doi:10.1109/MICRO.2006.49

11.

Suh GE, Devadas S, Rudolph L (2002) A new memory monitoring scheme for memory-aware scheduling and partitioning. In: Proceedings of International Symposium on High Performance Computer Architecture. doi:10.1109/HPCA.2002.995703

12.

DeVuyst M, Kumar R, Tullsen Dean M (2006) Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors. In: Proceedings International Parallel and Distributed Processing Symposium(IPDPS), pp 117–126. doi:10.1109/IPDPS.2006.1639374

13.

Jiang Yunlian, Tian Kai, Shen Xipeng, Zhang Jinghe, Jie Chen, Tripath Rahul (2010) The complexity of optimal job co-scheduling on chip multiprocessors and heuristics-based solutions. IEEE Trans Paral Distrib Syst 22:1192–1205. doi:10.1109/TPDS.2010.193 CrossRef

14.

Yunlian J, Xipeng S, Chen J, Rahul T (2008) Analysis and approximation of optimal co-scheduling on chip multiprocessors. In: Proceedings of 17th Interantional Conference on Parallel Architectures and Compilation Techniques(PACT), pp 220–229. doi:10.1145/1454115.1454146

15.

Zhuravlev S, Blagodurov S, Fedorova A (2010) Addressing shared resource contention in multicore processors via scheduling. In: Proceedings of Architectural support for programming languages and operating systems(ASPLOS), pp 129–142. doi:10.1145/1736020.1736036

16.

Snavely A, Tullsen D (2000) Symbiotic job scheduling for a simultaneous multi threading processor. ASPLOS IX. doi:10.1145/356989.357011

17.

Aamer J, Najaf-abadi Hashem H, Samantika S, Steely Simon C, Joel E (2012) CRUISE: cache replacement and utility-aware scheduling. ASPLOS XII 249–260 doi:10.1145/2150976.2151003

18.

Gupta Saurabh, Xiang Ping, Zhang Yi, Zhou Huiyang (2013) Locality principle revisited: a probability-based quantitative approach. J Parallel Distrib Comput 73:1011–1027. doi:10.1016/j.jpdc.2013.01.010 CrossRef

19.

Xiaoya X, Bao B, Ding C, Kai S (2012) Cache conscious task regrouping on multicore processors. In: Proceedings of 12th IEEE/ACM Interantional Symposium on Cluster, Cloud and Grid Computing. doi:10.1109/CCGrid.139

20.

Knauerhase R, Brett P, Hohlt B, Li T, Hahn S (2008) Using OS observations to improve performance in multicore systems. IEEE Micro 28:54–66. doi:10.1109/MM.2008.48 CrossRef

21.

Fedorova A, Seltzer M, Smith Michael D (2007) Improving performance isolation on chip multiprocessors via an operating system scheduler. In: Proceedings of 16th International Conference Parallel Architecture and Compilaton Techniques (PACT), pp 25-38. doi:10.1109/PACT.2007.40

22.

Eiman E, Joo Lee C, Onur M, Patt Yale N (2010) Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. ASPLOS XV. doi:10.1145/1736020.1736058

23.

Eklov D, Nikoleris N, Black-Schaffer D, Hagersten E (2011) Cache pirating: measuring the curse of the shared cache. In: International Conference on Parallel Processing (ICPP), pp 165–175. doi:10.1109/ICPP.2011.15

24.

Perelman E, Polito M, Bouguet JY, Sampson J, Calder B, Dulong C (2006) Detecting phases in parallel applications on shared memory architectures. In: 20th IEEE Interantional Parallel and Distributed Processing Symposium (IPDPS), pp 88–98 doi:10.1109/IPDPS.2006.1639325

25.

Han W, Xiaopeng G, Zhiqiang W, Yi L (2009) Using GPU to accelerate cache simulation. In: IEEE Interantional Symposium on Parallel and Distributed Processing with Applications, pp 565–570. doi:10.1109/ISP.2009.51

26.

Curtin Ryan R, Cline James R, Slagle Neil P, March William B, Ram P, Mehta Nishant A, Gray Alexander G (2013) MLPACK: a scalable C++ machine learning library. J Mach Learn Res 801–805

Titel: Lowering the volatility: a practical cache allocation prediction and stability-oriented co-runner scheduling algorithms
verfasst von: Fei Wang
Xiaofeng Gao
Guihai Chen
Publikationsdatum: 01.03.2016
Verlag: Springer US
Erschienen in: The Journal of Supercomputing / Ausgabe 3/2016
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-016-1645-7

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 3/2016

Security analysis of a user registration approach

Cost-aware DAG scheduling algorithms for minimizing execution cost on cloud resources

Sibling virtual machine co-location confirmation and avoidance tactics for Public Infrastructure Clouds

Study on the performance evaluation of online teaching using the quantile regression analysis and artificial neural network

Resource provisioning and scheduling in clouds: QoS perspective

Parallel Partition and Merge QuickSort (PPMQSort) on Multicore CPUs

Premium Partner