Skip to main content
Erschienen in: Cluster Computing 4/2015

01.12.2015

Ensemble learning of runtime prediction models for gene-expression analysis workflows

verfasst von: David A. Monge, Matěj Holec, Filip Železný, Carlos García Garino

Erschienen in: Cluster Computing | Ausgabe 4/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The adequate management of scientific workflow applications strongly depends on the availability of accurate performance models of sub-tasks. Numerous approaches use machine learning to generate such models autonomously, thus alleviating the human effort associated to this process. However, these standalone models may lack robustness, leading to a decay on the quality of information provided to workflow systems on top. This paper presents a novel approach for learning ensemble prediction models of tasks runtime. The ensemble-learning method entitled bootstrap aggregating (bagging) is used to produce robust ensembles of M5P regression trees of better predictive performance than could be achieved by standalone models. Our approach has been tested on gene expression analysis workflows. The results show that the ensemble method leads to significant prediction-error reductions when compared with learned standalone models. This is the first initiative using ensemble learning for generating performance prediction models. These promising results encourage further research in this direction.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
This task of classification learning should not be confused with the learning task of runtime prediction.
 
5
This sub-sampling process should not be confused with the sub-sampling carried out in bagging.
 
Literatur
1.
Zurück zum Zitat Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991) Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
3.
Zurück zum Zitat Chen, W., Deelman, E.: Partitioning and scheduling workflows across multiple sites with storage constraints. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) Parallel Processing and Applied Mathematics. Lecture Notes in Computer Science, vol. 7204, pp. 11–20. Springer, Berlin (2012)CrossRef Chen, W., Deelman, E.: Partitioning and scheduling workflows across multiple sites with storage constraints. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) Parallel Processing and Applied Mathematics. Lecture Notes in Computer Science, vol. 7204, pp. 11–20. Springer, Berlin (2012)CrossRef
4.
Zurück zum Zitat da Cruz, S., Campos, M., Mattoso, M.: Towards a taxonomy of provenance in scientific workflow management systems. In: 2009 World Conference on Services—I, pp. 259–266 (2009) da Cruz, S., Campos, M., Mattoso, M.: Towards a taxonomy of provenance in scientific workflow management systems. In: 2009 World Conference on Services—I, pp. 259–266 (2009)
5.
Zurück zum Zitat Genez, T., Bittencourt, L., Madeira, E.R.M.: Workflow scheduling for SaaS / PaaS cloud providers considering two SLA levels. In: Network Operations and Management Symposium (NOMS), 2012 IEEE, pp. 906–912 (2012) Genez, T., Bittencourt, L., Madeira, E.R.M.: Workflow scheduling for SaaS / PaaS cloud providers considering two SLA levels. In: Network Operations and Management Symposium (NOMS), 2012 IEEE, pp. 906–912 (2012)
6.
Zurück zum Zitat Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)CrossRef Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)CrossRef
7.
Zurück zum Zitat Hey, T., Tansley, S., Tolle, K. (eds.): The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond, Washington (2009) Hey, T., Tansley, S., Tolle, K. (eds.): The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond, Washington (2009)
8.
Zurück zum Zitat Holec, M., Klema, J., Z̆elezný, F., Tolar, J.: Comparative evaluation of set-level techniques in predictive classification of gene expression samples. BMC Bioinform. 13, Suppl. 10(S15), 1–15 (2012) Holec, M., Klema, J., Z̆elezný, F., Tolar, J.: Comparative evaluation of set-level techniques in predictive classification of gene expression samples. BMC Bioinform. 13, Suppl. 10(S15), 1–15 (2012)
9.
Zurück zum Zitat Iverson, M., Ozguner, F., Potter, L.: Statistical prediction of task execution times through analytic benchmarking for scheduling in a heterogeneous environment. In: Heterogeneous Computing Workshop. (HCW ’99) Proceedings of the Eighth, vol. 8, pp. 99–111. IEEE Computer Society, San Juan, PR (1999) Iverson, M., Ozguner, F., Potter, L.: Statistical prediction of task execution times through analytic benchmarking for scheduling in a heterogeneous environment. In: Heterogeneous Computing Workshop. (HCW ’99) Proceedings of the Eighth, vol. 8, pp. 99–111. IEEE Computer Society, San Juan, PR (1999)
10.
Zurück zum Zitat Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18(1), 50–60 (1947)MathSciNetCrossRef Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18(1), 50–60 (1947)MathSciNetCrossRef
11.
Zurück zum Zitat Mao, M., Humphrey, M.: Scaling and scheduling to maximize application performance within budget constraints in cloud workflows. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), pp. 67–78. IEEE (2013) Mao, M., Humphrey, M.: Scaling and scheduling to maximize application performance within budget constraints in cloud workflows. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), pp. 67–78. IEEE (2013)
12.
Zurück zum Zitat Marx, V.: Biology: the big challenges of big data. Nature 498(7453), 255–260 (2013)CrossRef Marx, V.: Biology: the big challenges of big data. Nature 498(7453), 255–260 (2013)CrossRef
13.
Zurück zum Zitat Mendes-Moreira, J.A., Soares, C., Jorge, A.M., Sousa, J.F.D.: Ensemble approaches for regression: a survey. ACM Comput. Surv. 45(1), 10:1–10:40 (2012)CrossRef Mendes-Moreira, J.A., Soares, C., Jorge, A.M., Sousa, J.F.D.: Ensemble approaches for regression: a survey. ACM Comput. Surv. 45(1), 10:1–10:40 (2012)CrossRef
14.
Zurück zum Zitat Monge, D.A., Bĕlohradský, J., García Garino, C., Z̆elezný, F.: A Performance Prediction Module for Workflow Scheduling. In: A.R. de Mendarozqueta et al. (ed.) 4th Symposium on High-Performance Computing in Latin America (HPCLatAm 2011), 40 JAIIO, vol. 4, pp. 130–144. Argentine Society of Informatics (SADIO), Córdoba (2011) Monge, D.A., Bĕlohradský, J., García Garino, C., Z̆elezný, F.: A Performance Prediction Module for Workflow Scheduling. In: A.R. de Mendarozqueta et al. (ed.) 4th Symposium on High-Performance Computing in Latin America (HPCLatAm 2011), 40 JAIIO, vol. 4, pp. 130–144. Argentine Society of Informatics (SADIO), Córdoba (2011)
15.
Zurück zum Zitat Monge, D.A., Holec, M., Z̆elezný, F., García Garino, C.: Ensemble learning of run-time prediction models for data-intensive scientific workflows. In: G.H. et al. (ed.) High Performance Computing, Communications in Computer and Information Science, vol. 485, pp. 83–97. Springer, Berlin (2014) Monge, D.A., Holec, M., Z̆elezný, F., García Garino, C.: Ensemble learning of run-time prediction models for data-intensive scientific workflows. In: G.H. et al. (ed.) High Performance Computing, Communications in Computer and Information Science, vol. 485, pp. 83–97. Springer, Berlin (2014)
16.
Zurück zum Zitat Ould-Ahmed-Vall, E., Woodlee, J., Yount, C., Doshi, K., Abraham, S.: Using model trees for computer architecture performance analysis of software applications. In: IEEE International Symposium on Performance Analysis of Systems Software, 2007. ISPASS 2007, pp. 116–125. IEEE Computer Society (2007) Ould-Ahmed-Vall, E., Woodlee, J., Yount, C., Doshi, K., Abraham, S.: Using model trees for computer architecture performance analysis of software applications. In: IEEE International Symposium on Performance Analysis of Systems Software, 2007. ISPASS 2007, pp. 116–125. IEEE Computer Society (2007)
17.
Zurück zum Zitat Pllana, S., Brandic, I., Benkner, S.: A survey of the state of the art in performance modeling and prediction of parallel and distributed computing systems. Int. J. Comput. Intell. Res. 4(1), 279–284 (2008) Pllana, S., Brandic, I., Benkner, S.: A survey of the state of the art in performance modeling and prediction of parallel and distributed computing systems. Int. J. Comput. Intell. Res. 4(1), 279–284 (2008)
18.
Zurück zum Zitat Quinlan, J.: Learning with continuous classes. In: Proceedings of the 5th Australian joint Conference on Artificial Intelligence, pp. 343–348. World Scientific, Singapore (1992) Quinlan, J.: Learning with continuous classes. In: Proceedings of the 5th Australian joint Conference on Artificial Intelligence, pp. 343–348. World Scientific, Singapore (1992)
20.
Zurück zum Zitat Taylor, I., Deelman, E., Gannon, D., Shields, M.: Workflows for e-Science: Scientific Workflows for Grids, 1st edn. Springer, London (2007)CrossRef Taylor, I., Deelman, E., Gannon, D., Shields, M.: Workflows for e-Science: Scientific Workflows for Grids, 1st edn. Springer, London (2007)CrossRef
21.
Zurück zum Zitat Taylor, V., Wu, X., Stevens, R.: Prophesy: an infrastructure for performance analysis and modeling of parallel and grid applications. SIGMETRICS Perform. Eval. Rev. 30, 13–18 (2003)CrossRef Taylor, V., Wu, X., Stevens, R.: Prophesy: an infrastructure for performance analysis and modeling of parallel and grid applications. SIGMETRICS Perform. Eval. Rev. 30, 13–18 (2003)CrossRef
22.
Zurück zum Zitat Wang, Y., Witten, I.: Induction of model trees for predicting continuous classes. In: Proceedings of the poster papers of the European Conference on Machine Learning. University of Economics, Faculty of Informatics and Statistics, Prague (1996) Wang, Y., Witten, I.: Induction of model trees for predicting continuous classes. In: Proceedings of the poster papers of the European Conference on Machine Learning. University of Economics, Faculty of Informatics and Statistics, Prague (1996)
23.
Zurück zum Zitat Weicker, R.P.: Dhrystone: a synthetic systems programming benchmark. Commun. ACM 27(10), 1013–1030 (1984)CrossRef Weicker, R.P.: Dhrystone: a synthetic systems programming benchmark. Commun. ACM 27(10), 1013–1030 (1984)CrossRef
24.
Zurück zum Zitat Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan, Kaufman (2011) Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan, Kaufman (2011)
Metadaten
Titel
Ensemble learning of runtime prediction models for gene-expression analysis workflows
verfasst von
David A. Monge
Matěj Holec
Filip Železný
Carlos García Garino
Publikationsdatum
01.12.2015
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 4/2015
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-015-0481-5

Weitere Artikel der Ausgabe 4/2015

Cluster Computing 4/2015 Zur Ausgabe

Premium Partner