Skip to main content
Top
Published in: The Journal of Supercomputing 11/2017

06-04-2017

Runtime prediction of parallel applications with workload-aware clustering

Authors: Ju-Won Park, Eunhye Kim

Published in: The Journal of Supercomputing | Issue 11/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Traditionally, many science fields require great support for a massive workflow, which utilizes multiple cores simultaneously. In order to support such large-scale scientific workflows, high-capacity parallel systems such as supercomputers are widely used. To increase the utilization of these systems, most schedulers use backfilling policy based on user’s estimated runtime. However, it is found to be extremely inaccurate because users overestimate their jobs. Therefore, in this paper, an efficient machine learning approach is present to predict the runtime of parallel application. The proposed method is divided into three phases. First is to analyze important feature of the history log data by factor analysis. Second is to carry out clustering for the parallel program based on the important features. Third is to build a prediction models by pattern similarity of parallel program log data and estimate runtime. In the experiments, we use workload logs on parallel systems (i.e., NASA-iPSC, LANL-CM5, SDSC-Par95, SDSC-Par96, and CTC-SP2) to evaluate the effectiveness of our approach. Comparing root-mean-square error with other techniques, experimental results show that the proposed method improves the accuracy up to 69.56%.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Footnotes
1
In [12], the authors present that if a factor has four or more loadings \({>}0.6\), then it is reliable regardless of sample size. Therefore, in this paper, the threshold of factor analysis is determined by 0.6.
 
Literature
1.
go back to reference Agarwal B, Mittal N (2014) Text classification using machine learning methods—a survey. In: Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), 28–30 Dec 2012. Springer, India, pp 701–709 Agarwal B, Mittal N (2014) Text classification using machine learning methods—a survey. In: Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), 28–30 Dec 2012. Springer, India, pp 701–709
2.
go back to reference Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15CrossRef Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15CrossRef
3.
go back to reference Chang F, Guo CY, Lin XR, Lu CJ (2010) Tree decomposition for large-scale svm problems. J Mach Learn Res 11:2935–2972MATHMathSciNet Chang F, Guo CY, Lin XR, Lu CJ (2010) Tree decomposition for large-scale svm problems. J Mach Learn Res 11:2935–2972MATHMathSciNet
4.
go back to reference Deelman E, Gannon D, Shields M, Taylor I (2009) Workflows and e-science: an overview of workflow system features and capabilities. Future Gener Comput Syst 25(5):528–540CrossRef Deelman E, Gannon D, Shields M, Taylor I (2009) Workflows and e-science: an overview of workflow system features and capabilities. Future Gener Comput Syst 25(5):528–540CrossRef
5.
go back to reference Downey AB (1997) Using queue time predictions for processor allocation. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 35–57 Downey AB (1997) Using queue time predictions for processor allocation. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 35–57
6.
go back to reference Drucker H, Burges CJ, Kaufman L, Smola A, Vapnik V (1997) Support vector regression machines. Adv Neural Inf Process Syst 9:155–161 Drucker H, Burges CJ, Kaufman L, Smola A, Vapnik V (1997) Support vector regression machines. Adv Neural Inf Process Syst 9:155–161
7.
go back to reference Feitelson DG, Tsafrir D, Krakov D (2012) Experience with the parallel workloads archive. Technical report Feitelson DG, Tsafrir D, Krakov D (2012) Experience with the parallel workloads archive. Technical report
8.
go back to reference Gaussier E, Glesser D, Reis V, Trystram D (2015) Improving backfilling by using machine learning to predict running times. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, p 64 Gaussier E, Glesser D, Reis V, Trystram D (2015) Improving backfilling by using machine learning to predict running times. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, p 64
9.
go back to reference Gibbons R (1997) A historical application profiler for use by parallel schedulers. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 58–77 Gibbons R (1997) A historical application profiler for use by parallel schedulers. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 58–77
10.
go back to reference Gil Y, Deelman E, Ellisman M, Fahringer T, Fox G, Gannon D, Goble C, Livny M, Moreau L, Myers J (2007) Examining the challenges of scientific workflows. IEEE Comput 40(12):24–32. doi:10.1109/MC.2007.421 CrossRef Gil Y, Deelman E, Ellisman M, Fahringer T, Fox G, Gannon D, Goble C, Livny M, Moreau L, Myers J (2007) Examining the challenges of scientific workflows. IEEE Comput 40(12):24–32. doi:10.​1109/​MC.​2007.​421 CrossRef
11.
go back to reference Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, pp 22–30 Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, pp 22–30
12.
go back to reference Guadagnoli E, Velicer WF (1988) Relation to sample size to the stability of component patterns. Psycholl Bull 103(2):265CrossRef Guadagnoli E, Velicer WF (1988) Relation to sample size to the stability of component patterns. Psycholl Bull 103(2):265CrossRef
13.
14.
go back to reference Jones JP, Nitzberg B (1999) Scheduling for parallel supercomputing: a historical perspective of achievable utilization. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 1–16 Jones JP, Nitzberg B (1999) Scheduling for parallel supercomputing: a historical perspective of achievable utilization. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 1–16
16.
go back to reference Kohonen T (2013) Essentials of the self-organizing map. Neural Netw 37:52–65CrossRef Kohonen T (2013) Essentials of the self-organizing map. Neural Netw 37:52–65CrossRef
17.
go back to reference Liang F, Liu Y, Liu H, Ma S, Schnor B (2015) A parallel job execution time estimation approach based on user submission patterns within computational grids. Int J Parallel Program 43(3):440–454CrossRef Liang F, Liu Y, Liu H, Ma S, Schnor B (2015) A parallel job execution time estimation approach based on user submission patterns within computational grids. Int J Parallel Program 43(3):440–454CrossRef
18.
go back to reference Lifka DA (1995) The anl/ibm sp scheduling system. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 295–303 Lifka DA (1995) The anl/ibm sp scheduling system. In: Workshop on job scheduling strategies for parallel processing. Springer, pp 295–303
19.
go back to reference Minh TN, Wolters L (2010) Using historical data to predict application runtimes on backfilling parallel systems. In: Proceedings of 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 246–252 Minh TN, Wolters L (2010) Using historical data to predict application runtimes on backfilling parallel systems. In: Proceedings of 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 246–252
20.
go back to reference Mu’alem A, Feitelson D (2001) Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans Parallel Distrib Syst 12(6):529–543. doi:10.1109/71.932708 CrossRef Mu’alem A, Feitelson D (2001) Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans Parallel Distrib Syst 12(6):529–543. doi:10.​1109/​71.​932708 CrossRef
22.
go back to reference Senger LJ, Santana MJ, Santana RC (2004) An instance-based learning approach for predicting execution times of parallel applications. In: Proceedings of international information and telecommunication technologies symposium, pp 9–15 Senger LJ, Santana MJ, Santana RC (2004) An instance-based learning approach for predicting execution times of parallel applications. In: Proceedings of international information and telecommunication technologies symposium, pp 9–15
23.
go back to reference Smith W, Foster I, Taylor V (2004) Predicting application run times with historical information. J Parallel Distrib Comput 64(9):1007–1016CrossRefMATH Smith W, Foster I, Taylor V (2004) Predicting application run times with historical information. J Parallel Distrib Comput 64(9):1007–1016CrossRefMATH
24.
go back to reference Song H, Brandt-Pearce M (2013) Range of influence and impact of physical impairments in long-haul DWDM systems. J Lightwave Technol 31(6):846–854CrossRef Song H, Brandt-Pearce M (2013) Range of influence and impact of physical impairments in long-haul DWDM systems. J Lightwave Technol 31(6):846–854CrossRef
26.
go back to reference Vapnik V (2013) The nature of statistical learning theory. Springer, BerlinMATH Vapnik V (2013) The nature of statistical learning theory. Springer, BerlinMATH
27.
go back to reference Wei W, Yang XL, Shen PY, Zhou B (2012) Holes detection in anisotropic sensornets: topological methods. Int J Distrib Sens Netw 8(10):135054. doi:10.1155/2012/135054 Wei W, Yang XL, Shen PY, Zhou B (2012) Holes detection in anisotropic sensornets: topological methods. Int J Distrib Sens Netw 8(10):135054. doi:10.​1155/​2012/​135054
28.
go back to reference Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10(Feb):207–244MATH Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10(Feb):207–244MATH
29.
go back to reference Weinland D, Ronfard R, Boyer E (2011) A survey of vision-based methods for action representation, segmentation and recognition. Comput Vis Image Underst 115(2):224–241CrossRef Weinland D, Ronfard R, Boyer E (2011) A survey of vision-based methods for action representation, segmentation and recognition. Comput Vis Image Underst 115(2):224–241CrossRef
30.
go back to reference Zhang Y, Franke H, Moreira J, Sivasubramaniam A (2003) An integrated approach to parallel scheduling using gang-scheduling, backfilling, and migration. IEEE Trans Parallel Distrib Syst 14(3):236–247CrossRefMATH Zhang Y, Franke H, Moreira J, Sivasubramaniam A (2003) An integrated approach to parallel scheduling using gang-scheduling, backfilling, and migration. IEEE Trans Parallel Distrib Syst 14(3):236–247CrossRefMATH
Metadata
Title
Runtime prediction of parallel applications with workload-aware clustering
Authors
Ju-Won Park
Eunhye Kim
Publication date
06-04-2017
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 11/2017
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-017-2038-2

Other articles of this Issue 11/2017

The Journal of Supercomputing 11/2017 Go to the issue

Premium Partner