Skip to main content
Erschienen in: Automated Software Engineering 2/2018

30.08.2017

Faster discovery of faster system configurations with spectral learning

verfasst von: Vivek Nair, Tim Menzies, Norbert Siegmund, Sven Apel

Erschienen in: Automated Software Engineering | Ausgabe 2/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Despite the huge spread and economical importance of configurable software systems, there is unsatisfactory support in utilizing the full potential of these systems with respect to finding performance-optimal configurations. Prior work on predicting the performance of software configurations suffered from either (a) requiring far too many sample configurations or (b) large variances in their predictions. Both these problems can be avoided using the WHAT spectral learner. WHAT’s innovation is the use of the spectrum (eigenvalues) of the distance matrix between the configurations of a configurable software system, to perform dimensionality reduction. Within that reduced configuration space, many closely associated configurations can be studied by executing only a few sample configurations. For the subject systems studied here, a few dozen samples yield accurate and stable predictors—less than 10% prediction error, with a standard deviation of less than 2%. When compared to the state of the art, WHAT (a) requires 2–10 times fewer samples to achieve similar prediction accuracies, and (b) its predictions are more stable (i.e., have lower standard deviation). Furthermore, we demonstrate that predictive models generated by WHAT can be used by optimizers to discover system configurations that closely approach the optimal performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
In this paper, we concentrate on Boolean options, as they make up the majority of all options; see Siegmund et al. (2015), for how to incorporate numeric options.
 
2
Though, in practice, this can be very difficult. For example, in models like the Linux Kernel such an enumeration is practically impossible (Sayyad et al. 2013).
 
3
The projection of \(N_i\) can be calculated in the following way:\(a = dist ( East , N_i);\quad b = dist ( West , N_i);\quad x_i = \sqrt{\frac{a^2 - b^2 + c ^2}{2 c }}\).
 
4
In our study, \( dist \) accepts pair of configuration (\(\mathbf {C}\)) and returns the distance between them. If \(x_i\) and \(y_i\) \(\in \mathbb {R}^n\), then the distance function would be same as the standard Euclidean distance.
 
5
Also known as response surface methods, meta models, or emulators.
 
7
Just to clarify one frequently asked question about this work, we note that our rig “studies” 40% of the data. We do not mean that our predictive models require accessing the performance scores from the 40% of the data. Rather, by “study” we mean reflect on a sample of configurations to determine what minimal subset of that sample deserves to be compiled and executed.
 
8
WHERE is an approximation of the first principal component
 
9
Another challenge of having high dimensional search space is the amount of noise induced by irrelevant dimensions.
 
Literatur
Zurück zum Zitat Bettenburg, N., Nagappan, M., Hassan, A.E.: Think locally, act globally: improving defect and effort prediction models. In: Proceedings of IEEE Working Conference on Mining Software Repositories, pp. 60–69. IEEE (2012) Bettenburg, N., Nagappan, M., Hassan, A.E.: Think locally, act globally: improving defect and effort prediction models. In: Proceedings of IEEE Working Conference on Mining Software Repositories, pp. 60–69. IEEE (2012)
Zurück zum Zitat Bettenburg, N., Nagappan, M., Hassan, A.E.: Towards improving statistical modeling of software engineering data: think locally, act globally!. Empir. Softw. Eng. 20(2), 294–335 (2015)CrossRef Bettenburg, N., Nagappan, M., Hassan, A.E.: Towards improving statistical modeling of software engineering data: think locally, act globally!. Empir. Softw. Eng. 20(2), 294–335 (2015)CrossRef
Zurück zum Zitat Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245–250. ACM (2001) Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245–250. ACM (2001)
Zurück zum Zitat Boley, Daniel: Principal direction divisive partitioning. Data Min. Knowl. Discov. 2(4), 325–344 (1998)CrossRef Boley, Daniel: Principal direction divisive partitioning. Data Min. Knowl. Discov. 2(4), 325–344 (1998)CrossRef
Zurück zum Zitat Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC, Boca Raton (1984) Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC, Boca Raton (1984)
Zurück zum Zitat Burges, C., Christopher, J.: Dimension reduction: a guided tour. Foundations and trends. Mach. Learn. 2(4), 275–365 (2010)CrossRefMATH Burges, C., Christopher, J.: Dimension reduction: a guided tour. Foundations and trends. Mach. Learn. 2(4), 275–365 (2010)CrossRefMATH
Zurück zum Zitat Chen, J., Nair, V., Krishna, R., Menzies, T.: Is “sampling” better than “evolution” for search-based software engineering? arXiv:1608.07617 (2016) Chen, J., Nair, V., Krishna, R., Menzies, T.: Is “sampling” better than “evolution” for search-based software engineering? arXiv:​1608.​07617 (2016)
Zurück zum Zitat Dasgupta, S.: Experiments with random projection. In: Proceedings of conference on Uncertainty in Artificial Intelligence, pp. 143–151. Morgan Kaufmann Publishers Inc. (2000) Dasgupta, S.: Experiments with random projection. In: Proceedings of conference on Uncertainty in Artificial Intelligence, pp. 143–151. Morgan Kaufmann Publishers Inc. (2000)
Zurück zum Zitat Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.A.M.T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. Trans. Evol. Comput. 6(2), 182–197 (2002)CrossRef Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.A.M.T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. Trans. Evol. Comput. 6(2), 182–197 (2002)CrossRef
Zurück zum Zitat Deiters, C., Rausch, A., Schindler, M.: Using spectral clustering to automate identification and optimization of component structures. In: Proceedings of International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering, pp. 14–20. IEEE (2013) Deiters, C., Rausch, A., Schindler, M.: Using spectral clustering to automate identification and optimization of component structures. In: Proceedings of International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering, pp. 14–20. IEEE (2013)
Zurück zum Zitat Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)CrossRef Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)CrossRef
Zurück zum Zitat Du, Qian, James, E.Fowler: Low-complexity principal component analysis for hyperspectral image compression. Int. J. High Perform. Comput. Appl. 22(4), 438–448 (2008)CrossRef Du, Qian, James, E.Fowler: Low-complexity principal component analysis for hyperspectral image compression. Int. J. High Perform. Comput. Appl. 22(4), 438–448 (2008)CrossRef
Zurück zum Zitat Efron, Bradley, Tibshirani, Robert J.: An Introduction to the Bootstrap. CRC, Boca Raton (1994)MATH Efron, Bradley, Tibshirani, Robert J.: An Introduction to the Bootstrap. CRC, Boca Raton (1994)MATH
Zurück zum Zitat Faloutsos, C., Lin, K.I.: Fastmap.: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proceedings of International Conference on Management of Data, pp. 163–174. ACM (1995) Faloutsos, C., Lin, K.I.: Fastmap.: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proceedings of International Conference on Management of Data, pp. 163–174. ACM (1995)
Zurück zum Zitat Fletcher, R.: Practical Methods of Optimization. Wiley, New York (2013)MATH Fletcher, R.: Practical Methods of Optimization. Wiley, New York (2013)MATH
Zurück zum Zitat Ghotra, B., McIntosh, S., Hassan, A.E.: Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of International Conference on Software Engineering, pp. 789–800. IEEE (2015) Ghotra, B., McIntosh, S., Hassan, A.E.: Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of International Conference on Software Engineering, pp. 789–800. IEEE (2015)
Zurück zum Zitat Grassberger, P., Procaccia, I.: Measuring the strangeness of strange attractors. Physica D Nonlinear Phenom. 9(1–2), 189–208 (1983)MathSciNetCrossRefMATH Grassberger, P., Procaccia, I.: Measuring the strangeness of strange attractors. Physica D Nonlinear Phenom. 9(1–2), 189–208 (1983)MathSciNetCrossRefMATH
Zurück zum Zitat Guo, J., Czarnecki, K., Apel, S., Siegmund, N., Wasowski, A.: Variability-aware performance prediction: a statistical learning approach. In: Proceedings of International Conference on Automated Software Engineering, pp. 301–311. IEEE (2013) Guo, J., Czarnecki, K., Apel, S., Siegmund, N., Wasowski, A.: Variability-aware performance prediction: a statistical learning approach. In: Proceedings of International Conference on Automated Software Engineering, pp. 301–311. IEEE (2013)
Zurück zum Zitat Hamerly, G.: Making k-means even faster. In: Proceedings of International Conference on Data Mining, pp. 130–140. SIAM (2010) Hamerly, G.: Making k-means even faster. In: Proceedings of International Conference on Data Mining, pp. 130–140. SIAM (2010)
Zurück zum Zitat Harman, M., Mansouri, S.A., Zhang, Y.: Search-based software engineering: trends, techniques and applications. ACM Comput. Surv. 45(1), 11 (2012)CrossRef Harman, M., Mansouri, S.A., Zhang, Y.: Search-based software engineering: trends, techniques and applications. ACM Comput. Surv. 45(1), 11 (2012)CrossRef
Zurück zum Zitat Houle, M.E., Kashima, H., Nett, M.: Generalized expansion dimension. In: Proceedings of International Conference on Data Mining Workshops (ICDMW), pp. 587–594. IEEE (2012) Houle, M.E., Kashima, H., Nett, M.: Generalized expansion dimension. In: Proceedings of International Conference on Data Mining Workshops (ICDMW), pp. 587–594. IEEE (2012)
Zurück zum Zitat Ilin, A., Raiko, T.: Practical approaches to principal component analysis in the presence of missing values. J. Mach. Learn. Res. 11(Jul), 1957–2000 (2010)MathSciNetMATH Ilin, A., Raiko, T.: Practical approaches to principal component analysis in the presence of missing values. J. Mach. Learn. Res. 11(Jul), 1957–2000 (2010)MathSciNetMATH
Zurück zum Zitat Jolliffe, Ian: Principal Component Analysis. Wiley, New York (2002)MATH Jolliffe, Ian: Principal Component Analysis. Wiley, New York (2002)MATH
Zurück zum Zitat Kamvar, K., Sepandar, S., Klein, K., Dan, D., Manning, M., Christopher, C.: Spectral learning. In: Proceedings of International Joint Conference of Artificial Intelligence. Stanford InfoLab (2003) Kamvar, K., Sepandar, S., Klein, K., Dan, D., Manning, M., Christopher, C.: Spectral learning. In: Proceedings of International Joint Conference of Artificial Intelligence. Stanford InfoLab (2003)
Zurück zum Zitat Krall, J., Menzies, T., Davies, M.: Gale: geometric active learning for search-based software engineering. Trans. Softw. Eng. 41(10), 1001–1018 (2015) Krall, J., Menzies, T., Davies, M.: Gale: geometric active learning for search-based software engineering. Trans. Softw. Eng. 41(10), 1001–1018 (2015)
Zurück zum Zitat Kuhn, D.R., Kacker, R.N., Lei, Y.: Introduction to Combinatorial Testing. CRC, Boca Raton (2013)MATH Kuhn, D.R., Kacker, R.N., Lei, Y.: Introduction to Combinatorial Testing. CRC, Boca Raton (2013)MATH
Zurück zum Zitat Loshchilov, I.G.: Surrogate-assisted evolutionary algorithms. PhD thesis, Citeseer (2013) Loshchilov, I.G.: Surrogate-assisted evolutionary algorithms. PhD thesis, Citeseer (2013)
Zurück zum Zitat Menzies, T., Butcher, A., Cok, D., Marcus, A., Layman, L., Shull, F., Turhan, B., Zimmermann, T.: Local versus global lessons for defect prediction and effort estimation. Trans. Softw. Eng. 39(6), 822–834 (2013)CrossRef Menzies, T., Butcher, A., Cok, D., Marcus, A., Layman, L., Shull, F., Turhan, B., Zimmermann, T.: Local versus global lessons for defect prediction and effort estimation. Trans. Softw. Eng. 39(6), 822–834 (2013)CrossRef
Zurück zum Zitat Mittas, N., Angelis, L.: Ranking and clustering software cost estimation models through a multiple comparisons algorithm. Trans. Softw. Eng. 39(4), 537–551 (2013)CrossRef Mittas, N., Angelis, L.: Ranking and clustering software cost estimation models through a multiple comparisons algorithm. Trans. Softw. Eng. 39(4), 537–551 (2013)CrossRef
Zurück zum Zitat Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011) Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)
Zurück zum Zitat Platt, J.: Fastmap, metricmap, and landmark mds are all nystrom algorithms. In: Proceedings of International Conference on Artificial Intelligence and Statistics (2005) Platt, J.: Fastmap, metricmap, and landmark mds are all nystrom algorithms. In: Proceedings of International Conference on Artificial Intelligence and Statistics (2005)
Zurück zum Zitat Sarkar, A., Guo, J., Siegmund, N., Apel, S., Czarnecki, K.: Cost-efficient sampling for performance prediction of configurable systems. In: Proceedings of International Conference on Automated Software Engineering, pp. 342–352. IEEE (2015) Sarkar, A., Guo, J., Siegmund, N., Apel, S., Czarnecki, K.: Cost-efficient sampling for performance prediction of configurable systems. In: Proceedings of International Conference on Automated Software Engineering, pp. 342–352. IEEE (2015)
Zurück zum Zitat Sayyad, A.S., Ingram, J., Menzies, T., Ammar, H.: Scalable product line configuration: a straw to break the camel’s back. In: Proceedings of International Conference on Automated Software Engineering, pp. 465–474. IEEE (2013) Sayyad, A.S., Ingram, J., Menzies, T., Ammar, H.: Scalable product line configuration: a straw to break the camel’s back. In: Proceedings of International Conference on Automated Software Engineering, pp. 465–474. IEEE (2013)
Zurück zum Zitat Shi, Jianbo, Malik, Jitendra: Trans. Pattern Anal. Mach. Intell. Normalized cuts and image segmentation 22(8), 888–905 (2000) Shi, Jianbo, Malik, Jitendra: Trans. Pattern Anal. Mach. Intell. Normalized cuts and image segmentation 22(8), 888–905 (2000)
Zurück zum Zitat Siegmund, N., Kolesnikov, S.S., Kästner, C., Apel, S., Batory, D., Rosenmüller, M., Gunter, S.: Predicting performance via automated feature-interaction detection. In: Proceedings of International Conference on Software Engineering, pp. 167–177. IEEE (2012) Siegmund, N., Kolesnikov, S.S., Kästner, C., Apel, S., Batory, D., Rosenmüller, M., Gunter, S.: Predicting performance via automated feature-interaction detection. In: Proceedings of International Conference on Software Engineering, pp. 167–177. IEEE (2012)
Zurück zum Zitat Siegmund, J., Siegmund, N., Apel, S.: Views on internal and external validity in empirical software engineering. In: Proceedings of International Conference on Software Engineering, vol. 1, pp. 9–19. IEEE (2015) Siegmund, J., Siegmund, N., Apel, S.: Views on internal and external validity in empirical software engineering. In: Proceedings of International Conference on Software Engineering, vol. 1, pp. 9–19. IEEE (2015)
Zurück zum Zitat Siegmund, N., Grebhahn, A., Apel, S., Kästner, C.: Performance-influence models for highly configurable systems. In: Proceedings of International Conference on Foundations of Software Engineering, pp. 284–294. ACM (2015) Siegmund, N., Grebhahn, A., Apel, S., Kästner, C.: Performance-influence models for highly configurable systems. In: Proceedings of International Conference on Foundations of Software Engineering, pp. 284–294. ACM (2015)
Zurück zum Zitat Storn, R., Price, K.: Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11(4), 341–359 (1997)MathSciNetCrossRefMATH Storn, R., Price, K.: Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11(4), 341–359 (1997)MathSciNetCrossRefMATH
Zurück zum Zitat Theisen, C., Herzig, K., Morrison, P., Murphy, B., Williams, L.: Approximating attack surfaces with stack traces. In: Proceedings of International Conference on Software Engineering, pp. 199–208. IEEE (2015) Theisen, C., Herzig, K., Morrison, P., Murphy, B., Williams, L.: Approximating attack surfaces with stack traces. In: Proceedings of International Conference on Software Engineering, pp. 199–208. IEEE (2015)
Zurück zum Zitat Vargha, A., Delaney, H.D.: A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J. Educ. Behav. Stat. 25(2), 101–132 (2000) Vargha, A., Delaney, H.D.: A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J. Educ. Behav. Stat. 25(2), 101–132 (2000)
Zurück zum Zitat Weiss, G.M., Tian, Y.: Maximizing classifier utility when there are data acquisition and modeling costs. Data Min. Knowl. Discov. 17(2), 253–282 (2008)MathSciNetCrossRef Weiss, G.M., Tian, Y.: Maximizing classifier utility when there are data acquisition and modeling costs. Data Min. Knowl. Discov. 17(2), 253–282 (2008)MathSciNetCrossRef
Zurück zum Zitat Xu, T., Jin, L., Fan, X., Zhou, Y., Pasupathy, S., Talwadker, R.: Hey, you have given me too many knobs!: Understanding and dealing with over-designed configuration in system software. In: Proceedings of International Conference on Foundations of Software Engineering, pp. 307–319. ACM (2015) Xu, T., Jin, L., Fan, X., Zhou, Y., Pasupathy, S., Talwadker, R.: Hey, you have given me too many knobs!: Understanding and dealing with over-designed configuration in system software. In: Proceedings of International Conference on Foundations of Software Engineering, pp. 307–319. ACM (2015)
Zurück zum Zitat Zhang, F., Zheng, Q., Zou, Y., Hassan, A.E.: Cross-project defect prediction using a connectivity-based unsupervised classifier. In Proceedings of International Conference on Software Engineering, pp. 309–320. ACM (2016) Zhang, F., Zheng, Q., Zou, Y., Hassan, A.E.: Cross-project defect prediction using a connectivity-based unsupervised classifier. In Proceedings of International Conference on Software Engineering, pp. 309–320. ACM (2016)
Zurück zum Zitat Zhang, Y., Guo, J., Blais, E., Czarnecki, K.: Performance prediction of configurable software systems by fourier learning. In: Proceedings of International Conference on Automated Software Engineering, pp. 365–373. IEEE (2015) Zhang, Y., Guo, J., Blais, E., Czarnecki, K.: Performance prediction of configurable software systems by fourier learning. In: Proceedings of International Conference on Automated Software Engineering, pp. 365–373. IEEE (2015)
Zurück zum Zitat Zuluaga, M., Sergent, G., Krause, A., Püschel, M.: Active learning for multi-objective optimization. In: Proceedings of International Conference in Machine Learning, vol. 28, pp. 462–470. ICML (2013) Zuluaga, M., Sergent, G., Krause, A., Püschel, M.: Active learning for multi-objective optimization. In: Proceedings of International Conference in Machine Learning, vol. 28, pp. 462–470. ICML (2013)
Metadaten
Titel
Faster discovery of faster system configurations with spectral learning
verfasst von
Vivek Nair
Tim Menzies
Norbert Siegmund
Sven Apel
Publikationsdatum
30.08.2017
Verlag
Springer US
Erschienen in
Automated Software Engineering / Ausgabe 2/2018
Print ISSN: 0928-8910
Elektronische ISSN: 1573-7535
DOI
https://doi.org/10.1007/s10515-017-0225-2

Weitere Artikel der Ausgabe 2/2018

Automated Software Engineering 2/2018 Zur Ausgabe