Skip to main content
Erschienen in: The Journal of Supercomputing 11/2017

06.04.2017

Runtime prediction of parallel applications with workload-aware clustering

verfasst von: Ju-Won Park, Eunhye Kim

Erschienen in: The Journal of Supercomputing | Ausgabe 11/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Traditionally, many science fields require great support for a massive workflow, which utilizes multiple cores simultaneously. In order to support such large-scale scientific workflows, high-capacity parallel systems such as supercomputers are widely used. To increase the utilization of these systems, most schedulers use backfilling policy based on user’s estimated runtime. However, it is found to be extremely inaccurate because users overestimate their jobs. Therefore, in this paper, an efficient machine learning approach is present to predict the runtime of parallel application. The proposed method is divided into three phases. First is to analyze important feature of the history log data by factor analysis. Second is to carry out clustering for the parallel program based on the important features. Third is to build a prediction models by pattern similarity of parallel program log data and estimate runtime. In the experiments, we use workload logs on parallel systems (i.e., NASA-iPSC, LANL-CM5, SDSC-Par95, SDSC-Par96, and CTC-SP2) to evaluate the effectiveness of our approach. Comparing root-mean-square error with other techniques, experimental results show that the proposed method improves the accuracy up to 69.56%.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
In [12], the authors present that if a factor has four or more loadings \({>}0.6\), then it is reliable regardless of sample size. Therefore, in this paper, the threshold of factor analysis is determined by 0.6.
 
Literatur
1.
Zurück zum Zitat Agarwal B, Mittal N (2014) Text classification using machine learning methods—a survey. In: Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), 28–30 Dec 2012. Springer, India, pp 701–709 Agarwal B, Mittal N (2014) Text classification using machine learning methods—a survey. In: Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), 28–30 Dec 2012. Springer, India, pp 701–709
2.
Zurück zum Zitat Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15CrossRef Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15CrossRef
3.
Zurück zum Zitat Chang F, Guo CY, Lin XR, Lu CJ (2010) Tree decomposition for large-scale svm problems. J Mach Learn Res 11:2935–2972MATHMathSciNet Chang F, Guo CY, Lin XR, Lu CJ (2010) Tree decomposition for large-scale svm problems. J Mach Learn Res 11:2935–2972MATHMathSciNet
4.
Zurück zum Zitat Deelman E, Gannon D, Shields M, Taylor I (2009) Workflows and e-science: an overview of workflow system features and capabilities. Future Gener Comput Syst 25(5):528–540CrossRef Deelman E, Gannon D, Shields M, Taylor I (2009) Workflows and e-science: an overview of workflow system features and capabilities. Future Gener Comput Syst 25(5):528–540CrossRef
5.
Zurück zum Zitat Downey AB (1997) Using queue time predictions for processor allocation. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 35–57 Downey AB (1997) Using queue time predictions for processor allocation. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 35–57
6.
Zurück zum Zitat Drucker H, Burges CJ, Kaufman L, Smola A, Vapnik V (1997) Support vector regression machines. Adv Neural Inf Process Syst 9:155–161 Drucker H, Burges CJ, Kaufman L, Smola A, Vapnik V (1997) Support vector regression machines. Adv Neural Inf Process Syst 9:155–161
7.
Zurück zum Zitat Feitelson DG, Tsafrir D, Krakov D (2012) Experience with the parallel workloads archive. Technical report Feitelson DG, Tsafrir D, Krakov D (2012) Experience with the parallel workloads archive. Technical report
8.
Zurück zum Zitat Gaussier E, Glesser D, Reis V, Trystram D (2015) Improving backfilling by using machine learning to predict running times. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, p 64 Gaussier E, Glesser D, Reis V, Trystram D (2015) Improving backfilling by using machine learning to predict running times. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, p 64
9.
Zurück zum Zitat Gibbons R (1997) A historical application profiler for use by parallel schedulers. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 58–77 Gibbons R (1997) A historical application profiler for use by parallel schedulers. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 58–77
10.
Zurück zum Zitat Gil Y, Deelman E, Ellisman M, Fahringer T, Fox G, Gannon D, Goble C, Livny M, Moreau L, Myers J (2007) Examining the challenges of scientific workflows. IEEE Comput 40(12):24–32. doi:10.1109/MC.2007.421 CrossRef Gil Y, Deelman E, Ellisman M, Fahringer T, Fox G, Gannon D, Goble C, Livny M, Moreau L, Myers J (2007) Examining the challenges of scientific workflows. IEEE Comput 40(12):24–32. doi:10.​1109/​MC.​2007.​421 CrossRef
11.
Zurück zum Zitat Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, pp 22–30 Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, pp 22–30
12.
Zurück zum Zitat Guadagnoli E, Velicer WF (1988) Relation to sample size to the stability of component patterns. Psycholl Bull 103(2):265CrossRef Guadagnoli E, Velicer WF (1988) Relation to sample size to the stability of component patterns. Psycholl Bull 103(2):265CrossRef
13.
Zurück zum Zitat Härdle WK, Simar L (2012) Applied multivariate statistical analysis. Springer, BerlinCrossRefMATH Härdle WK, Simar L (2012) Applied multivariate statistical analysis. Springer, BerlinCrossRefMATH
14.
Zurück zum Zitat Jones JP, Nitzberg B (1999) Scheduling for parallel supercomputing: a historical perspective of achievable utilization. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 1–16 Jones JP, Nitzberg B (1999) Scheduling for parallel supercomputing: a historical perspective of achievable utilization. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 1–16
16.
Zurück zum Zitat Kohonen T (2013) Essentials of the self-organizing map. Neural Netw 37:52–65CrossRef Kohonen T (2013) Essentials of the self-organizing map. Neural Netw 37:52–65CrossRef
17.
Zurück zum Zitat Liang F, Liu Y, Liu H, Ma S, Schnor B (2015) A parallel job execution time estimation approach based on user submission patterns within computational grids. Int J Parallel Program 43(3):440–454CrossRef Liang F, Liu Y, Liu H, Ma S, Schnor B (2015) A parallel job execution time estimation approach based on user submission patterns within computational grids. Int J Parallel Program 43(3):440–454CrossRef
18.
Zurück zum Zitat Lifka DA (1995) The anl/ibm sp scheduling system. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 295–303 Lifka DA (1995) The anl/ibm sp scheduling system. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 295–303
19.
Zurück zum Zitat Minh TN, Wolters L (2010) Using historical data to predict application runtimes on backfilling parallel systems. In: Proceedings of 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 246–252 Minh TN, Wolters L (2010) Using historical data to predict application runtimes on backfilling parallel systems. In: Proceedings of 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 246–252
20.
Zurück zum Zitat Mu’alem A, Feitelson D (2001) Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans Parallel Distrib Syst 12(6):529–543. doi:10.1109/71.932708 CrossRef Mu’alem A, Feitelson D (2001) Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans Parallel Distrib Syst 12(6):529–543. doi:10.​1109/​71.​932708 CrossRef
22.
Zurück zum Zitat Senger LJ, Santana MJ, Santana RC (2004) An instance-based learning approach for predicting execution times of parallel applications. In: Proceedings of international information and telecommunication technologies symposium, pp 9–15 Senger LJ, Santana MJ, Santana RC (2004) An instance-based learning approach for predicting execution times of parallel applications. In: Proceedings of international information and telecommunication technologies symposium, pp 9–15
23.
Zurück zum Zitat Smith W, Foster I, Taylor V (2004) Predicting application run times with historical information. J Parallel Distrib Comput 64(9):1007–1016CrossRefMATH Smith W, Foster I, Taylor V (2004) Predicting application run times with historical information. J Parallel Distrib Comput 64(9):1007–1016CrossRefMATH
24.
Zurück zum Zitat Song H, Brandt-Pearce M (2013) Range of influence and impact of physical impairments in long-haul DWDM systems. J Lightwave Technol 31(6):846–854CrossRef Song H, Brandt-Pearce M (2013) Range of influence and impact of physical impairments in long-haul DWDM systems. J Lightwave Technol 31(6):846–854CrossRef
25.
26.
Zurück zum Zitat Vapnik V (2013) The nature of statistical learning theory. Springer, BerlinMATH Vapnik V (2013) The nature of statistical learning theory. Springer, BerlinMATH
27.
Zurück zum Zitat Wei W, Yang XL, Shen PY, Zhou B (2012) Holes detection in anisotropic sensornets: topological methods. Int J Distrib Sens Netw 8(10):135054. doi:10.1155/2012/135054 Wei W, Yang XL, Shen PY, Zhou B (2012) Holes detection in anisotropic sensornets: topological methods. Int J Distrib Sens Netw 8(10):135054. doi:10.​1155/​2012/​135054
28.
Zurück zum Zitat Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10(Feb):207–244MATH Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10(Feb):207–244MATH
29.
Zurück zum Zitat Weinland D, Ronfard R, Boyer E (2011) A survey of vision-based methods for action representation, segmentation and recognition. Comput Vis Image Underst 115(2):224–241CrossRef Weinland D, Ronfard R, Boyer E (2011) A survey of vision-based methods for action representation, segmentation and recognition. Comput Vis Image Underst 115(2):224–241CrossRef
30.
Zurück zum Zitat Zhang Y, Franke H, Moreira J, Sivasubramaniam A (2003) An integrated approach to parallel scheduling using gang-scheduling, backfilling, and migration. IEEE Trans Parallel Distrib Syst 14(3):236–247CrossRefMATH Zhang Y, Franke H, Moreira J, Sivasubramaniam A (2003) An integrated approach to parallel scheduling using gang-scheduling, backfilling, and migration. IEEE Trans Parallel Distrib Syst 14(3):236–247CrossRefMATH
Metadaten
Titel
Runtime prediction of parallel applications with workload-aware clustering
verfasst von
Ju-Won Park
Eunhye Kim
Publikationsdatum
06.04.2017
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 11/2017
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-017-2038-2

Weitere Artikel der Ausgabe 11/2017

The Journal of Supercomputing 11/2017 Zur Ausgabe