nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

On Scalability of Predictive Ensembles and Tradeoff Between Their Training Time and Accuracy

verfasst von : Pavel Kordík, Tomáš Frýda

Erschienen in: Advances in Intelligent Systems and Computing II

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Scalability of predictive models is often realized by data subsampling. The generalization performance of models is not the only criterion one should take into account in the algorithm selection stage. For many real world applications, predictive models have to be scalable and their training time should be in balance with their performance. For many tasks it is reasonable to save computational resources and select an algorithm with slightly lower performance and significantly lower training time. In this contribution we made extensive benchmarks of predictive algorithms scalability and examined how they are capable to trade accuracy for lower training time. We demonstrate how one particular template (simple ensemble of fast sigmoidal regression models) outperforms state-of-the-art approaches on the Airline data set.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Performance Analysis of Open Source Machine Learning Frameworks for Various Parameters in Single-Threaded and Multi-threaded Modes

Nächstes Kapitel Agent DEVS Simulation of the Evacuation Process from a Commercial Building During a Fire

Segata, N., Blanzieri, E.: Fast and scalable local kernel machines. J. Mach. Learn. Res. 11(June), 1883–1926 (2010)MathSciNet

Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef

Kordík, P., Černý, J.: Self-organization of supervised models. In: Jankowski, N., Duch, W., Graczewski, K. (eds.) Meta-learning in Computational Intelligence. Studies in Computational Intelligence, vol. 358, pp. 179–223. Springer, Heidelberg (2011)

Sutherland, A., Henery, R., Molina, R., Taylor, C.C., King, R.: StatLog: Comparison of Classification Algorithms on Large Real-World Problems. Springer, Heidelberg (1993)

Bensusan, H., Kalousis, A.: Estimating the predictive accuracy of a classifier. In: Proceedings of the 12th European Conference on Machine Learning. Springer (2001)

Botia, J.A., Gomez-Skarmeta, A.F., Valdes, M., Padilla, A.: METALA: a meta-learning architecture. In: Proceedings of the International Conference, Seventh Fuzzy Days on Computational Intelligence, Theory and Applications (2001)

Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855 (2013)

Salvador, M.M., Budka, M., Gabrys, B.: Automatic composition and optimisation of multicomponent predictive systems. arXiv preprint arXiv:1612.08789 (2016)

Salvador, M.M., Budka, M., Gabrys, B.: Towards automatic composition of multicomponent predictive systems. In: International Conference on Hybrid Artificial Intelligence Systems, pp. 27–39. Springer (2016)

10.

Salvador, M.M., Budka, M., Gabrys, B.: Adapting multicomponent predictive systems using hybrid adaptation strategies with auto-WEKA in process industry. In: International Conference on Machine Learning. AutoML Workshop (2016)

11.

Koza, J.R.: Genetic programming. IEEE Intell. Syst. 14(4), 135–84 (2000)

12.

Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., Talwalkar, A.: Efficient Hyperparameter Optimization and Infinitely Many Armed Bandits. arXiv preprint (2016)

13.

Duffy, N., Helmbold, D.: A geometric approach to leveraging weak learners. In: European Conference on Computational Learning Theory, pp. 18–33. Springer (1999)

14.

Marquardt, D.W.: An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Ind. Appl. Math. 11(2), 431–441 (1963)MathSciNetCrossRef

15.

Shanno, D.F.: Conditioning of Quasi-Newton methods for function minimization. Math. Comput. 24(111), 647–656 (1970)MathSciNetCrossRef

16.

Bičík, V.: Continuous optimization algorithms. Master’s thesis, CTU in Prague (2010)

17.

Kordík, P., Koutník, J., Drchal, J., Kovářík, O., Čepek, M., Šnorek, M.: Meta-learning approach to neural network optimization. Neural Netw. 23(4), 568–582 (2010). 2010 special issueCrossRef

18.

The fake game environment for the automatic knowledge extraction, February 2011. http://www.sourceforge.net/projects/fakegame

19.

Software: Rapid miner, data mining. http://rapid-i.com/

20.

Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, R.: Metalearning: Applications to Data Mining. Cognitive Technologies. Springer, Heidelberg (2009)CrossRef

21.

Kuncheva, L.: Combining Pattern Classifiers: Methods and Algorithms. John Wiley and Sons, New York (2004)CrossRef

22.

Wolpert, D.H.: Stacked generalization. Neural Netw. 5, 241–259 (1992)CrossRef

23.

Schapire, R.E.: The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990)CrossRef

24.

Woods, K., Kegelmeyer, W., Bowyer, K.: Combination of multiple classifiers using local accuracy estimates. IEEE Trans. Pattern Anal. Mach. Intell. 19, 405–410 (1997)CrossRef

25.

Holeňa, M., Linke, D., Steinfeldt, N.: Boosted neural networks in evolutionary computation. In: Neural Information Processing. LNCS, vol. 5864, pp. 131–140. Springer, Heidelberg (2009)

26.

Brown, G., Wyatt, J., Tino, P.: Managing diversity in regression ensembles. J. Mach. Learn. Res. 6, 1621–1650 (2006)MathSciNet

27.

Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, R.: Metalearning, Applications to Data Mining. Cognitive Technologies. Springer, Heidelberg (2009)CrossRef

28.

Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)CrossRef

29.

Gama, J., Brazdil, P.: Cascade generalization. Mach. Learn. 41(3), 315–343 (2000)CrossRef

30.

Ferri, C., Flach, P., Hernández-Orallo, J.: Delegating classifiers. In: Proceedings of the Twenty-First International Conference on Machine Learning, ICML 2004, p. 37. ACM, New York (2004)

31.

Alpaydin, E., Kaynak, C.: Cascading classifiers. Kybernetika 34, 369–374 (1998)

32.

Kaynak, C., Alpaydin, E.: Multistage cascading of multiple classifiers: one man’s noise is another man’s data. In: Proceedings of the Seventeenth International Conference on Machine Learning, ICML 2000, pp. 455–462. Morgan Kaufmann Publishers Inc., San Francisco (2000)

33.

Ortega, J., Koppel, M., Argamon, S.: Arbitrating among competing classifiers using learned referees. Knowl. Inf. Syst. 3(4), 470–490 (2001)CrossRef

34.

Holland, J.H.: Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. U Michigan Press, Ann Arbor (1975)

35.

Rosca, J.P.: Analysis of complexity drift in genetic programming. In: Genetic Programming, pp. 286–294 (1997)

36.

Borovicka, T., Jirina Jr., M., Kordik, P., Jirina, M.: Selecting representative data sets. In: Advances in Data Mining Knowledge Discovery and Applications. Intech (2012)

37.

Basilico, J.D., Munson, M.A., Kolda, T.G., Dixon, K.R., Kegelmeyer, W.P.: Comet: a recipe for learning and using large ensembles on massive data. In: 2011 IEEE 11th International Conference on Data Mining, pp. 41–50. IEEE (2011)

38.

Arora, A., Candel, A., Lanford, J., LeDell, E., Parmar, V.: Deep Learning with H2O. H2O.ai, Mountain View (2015)

39.

Meng, X., Bradley, J., Yuvaz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., et al.: MLlib: machine learning in apache spark. JMLR 17(34), 1–7 (2016)MathSciNet

40.

Chu, C., Kim, S.K., Lin, Y.A., Yu, Y., Bradski, G., Ng, A.Y., Olukotun, K.: Map-Reduce for machine learning on multicore. Adv. Neural Inf. Process. Syst. 19, 281 (2007)

41.

van Rijn, J.N., Abdulrahman, S.M., Brazdil, P., Vanschoren, J.: Fast algorithm selection using learning curves. In: International Symposium on Intelligent Data Analysis, pp. 298–309. Springer (2015)

42.

H2O.ai: H2O: Scalable Machine Learning (2015)

43.

Baldi, P., Sadowski, P., Whiteson, D.: Searching for exotic particles in high-energy physics with deep learning. Nat. Commun. 5 (2014). Article no. 4308

44.

Hussami, N., Kraljevic, T., Lanford, J., Nykodym, T., Rao, A., Wang, A.: Generalized linear modeling with H2O (2015)

45.

Click, C., Malohlava, M., Candel, A., Roark, H., Parmar, V.: Gradient boosting machine with H2O (2016)

46.

LeDell, E.: Scalable super learning. In: Handbook of Big Data, p. 339 (2016)

47.

Software: Algorithmic templates for H2O.ai. https://github.com/kordikp

Titel: On Scalability of Predictive Ensembles and Tradeoff Between Their Training Time and Accuracy
verfasst von: Pavel Kordík
Tomáš Frýda
Verlag: Springer International Publishing
Buch: Advances in Intelligent Systems and Computing II
Print ISBN: 978-3-319-70580-4

Electronic ISBN: 978-3-319-70581-1

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-319-70581-1_18

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner