nach oben

Erschienen in:

2019 | OriginalPaper | Buchkapitel

4. Tuning Parallel Applications

verfasst von : Arthur Francisco Lorenzon, Antonio Carlos Schneider Beck Filho

Erschienen in: Parallel Computing Hits the Power Wall

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This chapter presents a comprehensive study of the techniques used to improve the performance, energy, or EDP of parallel applications. They are discussed considering the following:

Adaptability: when the adaptation of the number of threads and processor operating frequency happens and whether it is continuous or not.
Transparency: when the application tuning involves the need for special tools or compilers, programmer influence, and/or changes in the source or binary codes.

Therefore, in Sect. 4.1, we first discuss the design space exploration related to the way how the approaches that optimize parallel applications can achieve adaptability and transparency. In Sect. 4.2, we describe the works that aim to improve the execution of parallel applications by tuning the number of threads. Then, Sect. 4.3 presents the approaches that change the levels of voltage and frequency of the processor in order to deliver a better behavior of parallel applications. Finally, Sect. 4.4 discusses the approaches that exploit both DCT and DVFS for improving parallel applications execution.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel The Impact of Parallel Programming Interfaces on Energy

Nächstes Kapitel Case Study: DCT with Aurora

Adya, A., Howell, J., Theimer, M., Bolosky, W.J., Douceur, J.R.: Cooperative task management without manual stack management. In: Annual Conference on USENIX, pp. 289–302. USENIX Association, Berkeley (2002)

Akram, S., Sartor, J.B., Eeckhout, L.: DVFS performance prediction for managed multithreaded applications. In: ISPASS, pp. 12–23. IEEE, Piscataway (2016). https://doi.org/10.1109/ISPASS.2016.7482070

Alessi, F., Thoman, P., Georgakoudis, G., Fahringer, T., Nikolopoulos, D.S.: Application-level energy awareness for openmp. In: International Workshop on OpenMP, pp. 219–232. Springer, Berlin (2015)CrossRef

Barnes, B.J., Rountree, B., Lowenthal, D.K., Reeves, J., de Supinski, B., Schulz, M.: A regression-based approach to scalability prediction. In: Proceedings of the 22Nd Annual International Conference on Supercomputing, ICS ’08, pp. 368–377. ACM, New York (2008). https://doi.org/10.1145/1375527.1375580

Basmadjian, R., de Meer, H.: Evaluating and modeling power consumption of multi-core processors. In: 2012 Third International Conference on Future Systems: Where Energy, Computing and Communication Meet (e-Energy), pp. 1–10. IEEE, Piscataway (2012). https://doi.org/10.1145/2208828.2208840

Benedict, S., Rejitha, R.S., Gschwandtner, P., Prodan, R., Fahringer, T.: Energy prediction of openmp applications using random forest modeling approach. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, pp. 1251–1260. IEEE, Piscataway (2015). https://doi.org/10.1109/IPDPSW.2015.12

11.

Bhattacharjee, A., Martonosi, M.: Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors. SIGARCH Comput. Archit. News 37(3), 290–301 (2009). https://doi.org/10.1145/1555815.1555792 CrossRef

18.

Cabrera, A., Almeida, F., Blanco, V., Giménez, D.: Analytical modeling of the energy consumption for the high performance linpack. In: 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 343–350. IEEE, Piscataway (2013). https://doi.org/10.1109/PDP.2013.56

20.

Chadha, G., Mahlke, S., Narayanasamy, S.: When less is more (limo): controlled parallelism forimproved efficiency. In: Proceedings of the 2012 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, pp. 141–150. ACM, New York (2012)

24.

Chen, Y.L., Chang, M.F., Liang, W.Y., Lee, C.H.: Performance and energy efficient dynamic voltage and frequency scaling scheme for multicore embedded system. In: IEEE ICCE, pp. 58–59. IEEE, Piscataway (2016). https://doi.org/10.1109/ICCE.2016.7430521

25.

Chou, C.Y., Chang, H.Y., Wang, S.T., Huang, K.C., Shen, C.Y.: An improved model for predicting hpl performance. In: Cérin, C., Li, K.C. (eds.) Advances in Grid and Pervasive Computing, pp. 158–168. Springer, Berlin (2007)CrossRef

27.

Cochran, R., Hankendi, C., Coskun, A.K., Reda, S.: Pack & cap: adaptive DVFS and thread packing under power caps. In: IEEE/ACM MICRO, pp. 175–185 (2011). https://doi.org/10.1145/2155620.2155641

28.

Curtis-Maury, M., Dzierwa, J., Antonopoulos, C.D., Nikolopoulos, D.S.: Online power-performance adaptation of multithreaded programs using hardware event-based prediction. In: Proceedings of the 20th Annual International Conference on Supercomputing, pp. 157–166. ACM, New York (2006)

29.

Curtis-Maury, M., Shah, A., Blagojevic, F., Nikolopoulos, D.S., De Supinski, B.R., Schulz, M.: Prediction models for multi-dimensional power-performance optimization on many cores. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 250–259. ACM, New York (2008)

30.

Dimakopoulos, V.V., Leontiadis, E., Tzoumas, G.: A portable c compiler for openmp v. 2.0. In: Proceedings of the of the 5th European Workshop on OpenMP (EWOMP03) (2003)

31.

Ding, Y., Kandemir, M., Raghavan, P., Irwin, M.J.: A helper thread based edp reduction scheme for adapting application execution in CMPS. In: 2008 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–14. IEEE, Piscataway (2008). https://doi.org/10.1109/IPDPS.2008.4536297

33.

dos Santos Marques, W., de Souza, P.S.S., Lorenzon, A.F., Beck, A.C.S., Beck Rutzig, M., Diniz Rossi, F.: Improving EDP in multi-core embedded systems through multidimensional frequency scaling. In: 2017 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–4. IEEE, Piscataway (2017). https://doi.org/10.1109/ISCAS.2017.8050515

37.

Ge, R., Feng, X., Feng, W., Cameron, K.W.: CPU MISER: a performance-directed, run-time system for power-aware clusters. In: ICPP, pp. 18–18 (2007). https://doi.org/10.1109/ICPP.2007.29

42.

Hankendi, C., Coskun, A.K.: Adaptive power and resource management techniques for multi-threaded workloads. In: 2013 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum, pp. 2302–2305. IEEE, Picataway (2013). https://doi.org/10.1109/IPDPSW.2013.258

45.

Hotta, Y., Sato, M., Kimura, H., Matsuoka, S., Boku, T., Takahashi, D.: Profile-based optimization of power performance by using dynamic voltage scaling on a pc cluster. In: IEEE IPDPS (2006). https://doi.org/10.1109/IPDPS.2006.1639597

46.

Hsu, C.H., Feng, W.C.: A power-aware run-time system for high-performance computing. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC ’05, pp. 1–1 (2005). https://doi.org/10.1109/SC.2005.3

48.

Hwang, Y., Chung, K.: Dynamic power management technique for multicore based embedded mobile devices. IEEE Trans. Ind. Inf. 9(3), 1601–1612 (2013). https://doi.org/10.1109/TII.2012.2232299 CrossRef

49.

Ipek, E., de Supinski, B.R., Schulz, M., McKee, S.A.: An approach to performance prediction for parallel applications. In: Proceedings of the 11th International Euro-Par Conference on Parallel Processing, Euro-Par’05, pp. 196–205. Springer, Berlin (2005)CrossRef

50.

Jayakumar, A., Murali, P., Vadhiyar, S.: Matching application signatures for performance predictions using a single execution. In: 2015 IEEE International Parallel and Distributed Processing Symposium, pp. 1161–1170. IEEE, Picataway (2015). https://doi.org/10.1109/IPDPS.2015.20

53.

Jordan, H., Thoman, P., Durillo, J.J., Pellegrini, S., Gschwandtner, P., Fahringer, T., Moritsch, H.: A multi-objective auto-tuning framework for parallel codes. In: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12. IEEE, Picataway (2012)

54.

Ju, T., Wu, W., Chen, H., Zhu, Z., Dong, X.: Thread count prediction model: Dynamically adjusting threads for heterogeneous many-core systems. In: 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS), pp. 456–464. IEEE, Picataway (2015). https://doi.org/10.1109/ICPADS.2015.64

55.

Jung, C., Lim, D., Lee, J., Han, S.: Adaptive execution techniques for SMT multiprocessor architectures. In: Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 236–246. ACM, New York (2005)

63.

Lee, J., Wu, H., Ravichandran, M., Clark, N.: Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications. ACM SIGARCH Comput. Archit. News 38(3), 270–279 (2010)CrossRef

65.

Li, D., de Supinski, B.R., Schulz, M., Cameron, K., Nikolopoulos, D.S.: Hybrid MPI/openMP power-aware computing. In: IEEE IPDPS, pp. 1–12 (2010). https://doi.org/10.1109/IPDPS.2010.5470463

66.

Li, D., de Supinski, B.R., Schulz, M., Nikolopoulos, D.S., Cameron, K.W.: Strategies for energy-efficient resource management of hybrid programming models. IEEE Trans. Parallel Distrib. Syst. 24(1), 44–157 (2013). https://doi.org/10.1109/TPDS.2012.95 CrossRef

67.

Li, J., Martinez, J.F.: Dynamic power-performance adaptation of parallel computation on chip multiprocessors. In: The Twelfth International Symposium on High-Performance Computer Architecture, 2006, pp. 77–87 (2006). https://doi.org/10.1109/HPCA.2006.1598114

72.

Lorenzon, A.F., Souza, J.D., Beck, A.C.S.: Laant: A library to automatically optimize edp for openMP applications. In: DATE, pp. 1229–1232 (2017). https://doi.org/10.23919/DATE.2017.7927176

77.

Marathe, A., Bailey, P.E., Lowenthal, D.K., Rountree, B., Schulz, M., de Supinski, B.R.: A run-time system for power-constrained hpc applications. In: Kunkel, J.M., Ludwig, T. (eds.) High Performance Computing, pp. 394–408. Springer, Cham (2015)CrossRef

81.

Miftakhutdinov, R., Ebrahimi, E., Patt, Y.N.: Predicting performance impact of dvfs for realistic memory systems. In: 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 155–165 (2012). https://doi.org/10.1109/MICRO.2012.23

82.

Miftakhutdinov, R.R.: Performance prediction for dynamic voltage and frequency scaling. Ph.D. thesis, The University of Texas (2014)

87.

Palermo, G., Silvano, C., Zaccaria, V.: An efficient design space exploration methodology for on-chip multiprocessors subject to application-specific constraints. In: 2008 Symposium on Application Specific Processors, pp. 75–82 (2008). https://doi.org/10.1109/SASP.2008.4570789

90.

Porterfield, A., Fowler, R., Neyer, M.: Maestro: Dynamic runtime power and concurrency adaptation. In: Proceedings Workshop Managed Many-Core System, pp. 1–8

91.

Porterfield, A.K., Olivier, S.L., Bhalachandra, S., Prins, J.F.: Power measurement and concurrency throttling for energy reduction in openMP programs. In: Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International, pp. 884–891. IEEE, Piscataway (2013)

92.

Pusukuri, K.K., Gupta, R., Bhuyan, L.N.: Thread reinforcer: Dynamically determining number of threads via os level monitoring. In: 2011 IEEE International Symposium on Workload Characterization (IISWC), pp. 116–125. IEEE, Piscataway (2011)

93.

Quinlan, D., Liao, C.: The rose source-to-source compiler infrastructure. In: Cetus Users and Compiler Infrastructure Workshop, in conjunction with PACT 2011 (2011)

96.

Raman, A., Zaks, A., Lee, J.W., August, D.I.: Parcae: A system for flexible parallel execution. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, pp. 133–144. ACM, New York (2012)

98.

Rizvandi, N.B., Taheri, J., Zomaya, A.Y., Lee, Y.C.: Linear combinations of DVFS-enabled processor frequencies to modify the energy-aware scheduling algorithms. In: CCGRID, pp. 388–397 (2010). https://doi.org/10.1109/CCGRID.2010.38

99.

Rossi, F.D., Storch, M., de Oliveira, I., Rose, C.A.F.D.: Modeling power consumption for dvfs policies. In: 2015 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1879–1882. IEEE, Piscataway (2015). https://doi.org/10.1109/ISCAS.2015.7169024

100.

Rountree, B., Lowenthal, D.K., Schulz, M., de Supinski, B.R.: Practical performance prediction under dynamic voltage frequency scaling. In: 2011 International Green Computing Conference and Workshops, pp. 1–8 (2011). https://doi.org/10.1109/IGCC.2011.6008553

101.

Sensi, D.D.: Predicting performance and power consumption of parallel applications. In: 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), pp. 200–207 (2016). https://doi.org/10.1109/PDP.2016.41

102.

Sensi, D.D., Torquati, M., Danelutto, M.: A reconfiguration algorithm for power-aware parallel applications. TACO 13(4), 43:1–43:25 (2016). https://doi.org/10.1145/3004054 CrossRef

104.

Shafik, R.A., Das, A., Yang, S., Merrett, G., Al-Hashimi, B.M.: Adaptive energy minimization of openMP parallel applications on many-core systems. In: Proceedings of the 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures, pp. 19–24. ACM, New York (2015)

105.

Shafik, R.A., Das, A.K., Yang, S., Merrett, G.V., Al-Hashimi, B.: Thermal-aware adaptive energy minimization of open MP parallel applications (2015)

106.

Sharkawi, S., DeSota, D., Panda, R., Indukuru, R., Stevens, S., Taylor, V., Wu, X.: Performance projection of HPC applications using spec cfp2006 benchmarks. In: 2009 IEEE International Symposium on Parallel Distributed Processing, pp. 1–12. IEEE, Piscataway (2009). https://doi.org/10.1109/IPDPS.2009.5161057

107.

Singh, K., İpek, E., McKee, S.A., de Supinski, B.R., Schulz, M., Caruana, R.: Predicting parallel application performance via machine learning approaches: Research articles. Concurr. Comput. Pract. Exper. 19(17), 2219–2235 (2007). https://doi.org/10.1002/cpe.v19:17 CrossRef

108.

Snowdon, D.C., Petters, S.M., Heiser, G.: Accurate on-line prediction of processor and memoryenergy usage under voltage scaling. In: Proceedings of the 7th ACM &Amp; IEEE International Conference on Embedded Software, EMSOFT ’07, pp. 84–93. ACM, New York (2007). https://doi.org/10.1145/1289927.1289945

109.

Snowdon, D.C., Van Der Linden, G., Petters, S.M., Heiser, G.: Accurate run-time prediction of performance degradation under frequency scaling. In: Workshop on Operating Systems Platforms for Embedded Real-Time applications, p. 58 (2007)

110.

Sodhi, S., Subhlok, J., Xu, Q.: Performance prediction with skeletons. Clust. Comput. 11(2), 151–165 (2008). https://doi.org/10.1007/s10586-007-0039-2 CrossRef

111.

Song, S.L., Barker, K., Kerbyson, D.: Unified performance and power modeling of scientific workloads. In: Proceedings of the 1st International Workshop on Energy Efficient Supercomputing, E2SC ’13, pp. 4:1–4:8. ACM, New York (2013). https://doi.org/10.1145/2536430.2536435

112.

Sridharan, S., Gupta, G., Sohi, G.S.: Holistic run-time parallelism management for time and energy efficiency. In: Proceedings of the 27th international ACM conference on International conference on supercomputing, pp. 337–348. ACM, New York (2013)

113.

Sridharan, S., Gupta, G., Sohi, G.S.: Adaptive, efficient, parallel execution of parallel programs. ACM SIGPLAN Notices 49(6), 169–180 (2014)CrossRef

115.

Suleman, M.A., Qureshi, M.K., Patt, Y.N.: Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPS. SIGARCH Comput. Archit. News 36(1), 277–286 (2008). https://doi.org/10.1145/1353534.1346317 CrossRef

118.

Taylor, V., Xu, X., Geisler, J., Li, X., Lan, Z., Hereld, M., Judson, I.R., Stevens, R.: Prophesy: automating the modeling process. In: Proceedings Third Annual International Workshop on Active Middleware Services, pp. 3–11 (2001). https://doi.org/10.1109/AMS.2001.993715

119.

Taylor, V., Wu, X., Geisler, J., Stevens, R.: Using kernel couplings to predict parallel application performance. In: Proceedings 11th IEEE International Symposium on High Performance Distributed Computing, pp. 125–134 (2002). https://doi.org/10.1109/HPDC.2002.1029910

120.

Tiwari, A., Laurenzano, M.A., Carrington, L., Snavely, A.: Modeling power and energy usage of hpc kernels. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops PhD Forum, pp. 990–998 (2012). https://doi.org/10.1109/IPDPSW.2012.121

123.

Wheeler, K.B., Murphy, R.C., Thain, D.: Qthreads: An api for programming with millions of lightweight threads. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8 (2008). https://doi.org/10.1109/IPDPS.2008.4536359

125.

Witkowski, M., Oleksiak, A., Piontek, T., Weglarz, J.: Practical power consumption estimation for real life HPC applications. Futur. Gener. Comput. Syst. 29(1), 208–217 (2013). https://doi.org/10.1016/j.future.2012.06.003 CrossRef

126.

Wu, Q., Martonosi, M., Clark, D.W., Reddi, V.J., Connors, D., Wu, Y., Lee, J., Brooks, D.: Dynamic-compiler-driven control for microprocessor energy and performance. IEEE Micro 26(1), 119–129 (2006). https://doi.org/10.1109/MM.2006.9 CrossRef

127.

Yang, L.T., Ma, X., Mueller, F.: Cross-platform performance prediction of parallel applications using partial execution. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC ’05, p. 40. IEEE Computer Society, Washington (2005). https://doi.org/10.1109/SC.2005.20

128.

Zhang, W., Cheng, A.M.K., Subhlok, J.: Dwarfcode: A performance prediction tool for parallel applications. IEEE Trans. Comput. 65(2), 495–507 (2016). https://doi.org/10.1109/TC.2015.2417526 MathSciNetCrossRef

Titel: Tuning Parallel Applications
verfasst von: Arthur Francisco Lorenzon
Antonio Carlos Schneider Beck Filho
Verlag: Springer International Publishing
Buch: Parallel Computing Hits the Power Wall
Print ISBN: 978-3-030-28718-4

Electronic ISBN: 978-3-030-28719-1

Copyright-Jahr: 2019
DOI: https://doi.org/10.1007/978-3-030-28719-1_4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner