nach oben

International Journal of Machine Learning and Cybernetics

Erschienen in:

03.01.2020 | Original Article

Stochastic trust region inexact Newton method for large-scale machine learning

verfasst von: Vinod Kumar Chauhan, Anuj Sharma, Kalpana Dahiya

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 7/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Nowadays stochastic approximation methods are one of the major research direction to deal with the large-scale machine learning problems. From stochastic first order methods, now the focus is shifting to stochastic second order methods due to their faster convergence and availability of computing resources. In this paper, we have proposed a novel stochastic trust region inexact Newton method, called as STRON, to solve large-scale learning problems which uses conjugate gradient to inexactly solve trust region subproblem. The method uses progressive subsampling in the calculation of gradient and Hessian values to take the advantage of both, stochastic and full-batch regimes. We have extended STRON using existing variance reduction techniques to deal with the noisy gradients and using preconditioned conjugate gradient as subproblem solver, and empirically proved that they do not work as expected, for the large-scale learning problems. Finally, our empirical results prove efficacy of the proposed method against existing methods with bench marked datasets.

Vorheriger Artikel Homo-ELM: fully homomorphic extreme learning machine

Nächster Artikel Unsupervised feature learning with sparse Bayesian auto-encoding based extreme learning machine

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.

Order your 30-days-trial for free and without any commitment.

Jetzt informieren

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

Jetzt informieren

Nur mit Berechtigung zugänglich

https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.

Experimental results can be reproduced using the library LIBS2ML [14].

Agarwal N, Bullins B, Hazan E (2017) Second-order stochastic optimization for machine learning in linear time. J Mach Learn Res 18(116):1–40MathSciNetMATH

Allen-Zhu Z (2017) Katyusha: the first direct acceleration of stochastic gradient methods. J Mach Learn Res (to appear) Full Version. hyperimagehttp://arxiv.org/abs/1603.05953arXiv:1603.05953

Bellavia S, Krejic N, Jerinkic NK (2018) Subsampled inexact Newton methods for minimizing large sums of convex functions. Opt Online. arXiv preprint arXiv:1811.05730

Berahas AS, Nocedal J, Takac M (2016) A multi-batch L-BFGS method for machine learning. Adv Neural Inf Process Syst 29:1055–1063

Bollapragada R, Byrd R, Nocedal J (2016) Exact and inexact subsampled newton methods for optimization. arXiv:1609.08502

Bollapragada R, Nocedal J, Mudigere D, Shi HJ, Tang PTP (2018) A progressive batching L-BFGS method for machine learning. Proc 35th Int Conf Mach Learn PMLR Proc Mach Learn Res 80:620–629

Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New YorkCrossRef

Byrd R, Chin G, Neveitt W, Nocedal J (2011) On the use of stochastic hessian information in optimization methods for machine learning. SIAM J Opt 21(3):977–995. https://doi.org/10.1137/10079923X MathSciNetCrossRefMATH

Byrd RH, Hansen SL, Nocedal J, Singer Y (2016) A stochastic quasi-Newton method for large-scale optimization. SIAM J Opt 26(2):1008–1031MathSciNetCrossRef

10.

Cauchy AL (1847) Méthode générale pour la résolution des systèmes d’équations simultanées. Compte Rendu des S’eances de L’Acad’emie des Sciences XXV S’erie A(25):536–538

11.

Chauhan VK, Dahiya K, Sharma A (2017) Mini-batch block-coordinate based stochastic average adjusted gradient methods to solve big data problems. In: Proceedings of the ninth asian conference on machine learning, PMLR, vol 77, pp 49–64. http://proceedings.mlr.press/v77/chauhan17a.html

12.

Chauhan VK, Dahiya K, Sharma A (2018a) Problem formulations and solvers in linear SVM: a review. Artif Intell Rev. https://doi.org/10.1007/s10462-018-9614-6 CrossRef

13.

Chauhan VK, Sharma A, Dahiya K (2018b) Faster learning by reduction of data access time. Appl Intell 48(12):4715–4729. https://doi.org/10.1007/s10489-018-1235-x CrossRef

14.

Chauhan VK, Sharma A, Dahiya K (2019) LIBS2ML: a library for scalable second order machine learning algorithms. arXiv arXiv:1904.09448

15.

Chauhan VK, Sharma A, Dahiya K (2019) Saags: biased stochastic variance reduction methods for large-scale learning. Appl Intell 1:1. https://doi.org/10.1007/s10489-019-01450-3 CrossRef

16.

Csiba D, Richt P (2016) Importance sampling for minibatches, pp 1–19, arXiv:1602.02283v1

17.

Defazio A, Bach F, Lacoste-Julien S (2014) Saga: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Proceedings of the 27th international conference on neural information processing systems, MIT Press, Cambridge, MA, USA, NIPS’14, pp 1646–1654

18.

Fan R, Chang K, Hsieh C, Wang X, Lin C (2008) Liblinear: a library for large linear classification. JMLR 9:1871–1874MATH

19.

Fanhua S, Zhou K, Cheng J, Tsang IW, Zhang L, Tao D (2018) Vr-sgd: a simple stochastic variance reduction method for machine learning. arXiv arXiv:1802.09932

20.

Fletcher R (1980) Practical methods of optimization, vol 1, unconstrained optimization. John Wiley & Sons

21.

Hsia CY, Zhu Y, Lin CJ (2017) A study on trust region update rules in newton methods for large-scale linear classification. Proc Ninth Asian Conf Mach Learn PMLR Proc Mach Learn Res 77:33–48

22.

Hsia CY, Chiang WL, Lin CJ (2018) Preconditioned conjugate gradient methods in truncated newton frameworks for large-scale linear classification. In: Proceedings of the tenth Asian conference on machine learning, PMLR, proceedings of machine learning research

23.

Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates Inc., New York, pp 315–323

24.

Kolte R, Erdogdu M, Ozgur A (2015) Accelerating SVRG via second-order information. In: NIPS workshop on optimization for machine learning

25.

Le Roux N, Schmidt M, Bach F (2012) A stochastic gradient method with an exponential convergence rate for strongly-convex optimization with finite training sets. Tech. rep, INRIA

26.

Lin C, Moré J (1999) Newton’s method for large bound-constrained optimization problems. SIAM J Opt 9(4):1100–1127. https://doi.org/10.1137/S1052623498345075 MathSciNetCrossRefMATH

27.

Lin CJ, Weng RC, Keerthi SS (2008) Trust region newton method for logistic regression. JMLR 9:627–650MathSciNetMATH

28.

Liu DC, Nocedal J (1989) On the limited memory BFGS method for large scale optimization. Math Program 45(1):503–528MathSciNetCrossRef

29.

Lucchi A, McWilliams B, Hofmann T (2015) A variance reduced stochastic newton method. arXiv arXiv:1503.08316

30.

Mokhtari A, Ribeiro A (2014) Res: regularized stochastic BFGS algorithm. IEEE Trans Signal Process 62(23):6089–6104MathSciNetCrossRef

31.

Moritz P, Nishihara R, Jordan MI (2016) A linearly-convergent stochastic L-BFGS algorithm. In: AISTATS

32.

Nocedal Wright S (1999) Numerical optimization. Springer, New YorkCrossRef

33.

Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat 22:400–407MathSciNetCrossRef

34.

Schmidt M, Le Roux N, Bach F (2017) Minimizing finite sums with the stochastic average gradient. Math Program 162:83–112MathSciNetCrossRef

35.

Schraudolph NN, Yu J, Günter S (2007) A stochastic quasi-newton method for online convex optimization. In: Meila M, Shen X (eds) Proceedings of the eleventh international conference on artificial intelligence and statistics, PMLR, proceedings of machine learning research, vol 2, pp 436–443

36.

Shalev-Shwartz S, Zhang T (2013) Stochastic dual coordinate ascent methods for regularized loss. J Mach Learn Res 14(1):567–599MathSciNetMATH

37.

Shalev-Shwartz S, Singer Y, Srebro N (2007) Pegasos: primal estimated sub-gradient solver for SVM. In: Proceedings of the 24th international conference on machine learning, ACM, New York, NY, USA, ICML’07, pp 807–814

38.

Steihaug T (1983) The conjugate gradient method and trust regions in large scale optimization. SIAM J Numer Anal 20(3):626–637MathSciNetCrossRef

39.

Zhang Y, Xiao L (2015) Stochastic primal-dual coordinate method for regularized empirical risk minimization. Proc 32nd Int Conf Int Conf Mach Learn ICML’15 37:353–361

40.

Zhou L, Pan S, Wang J, Vasilakos AV (2017) Machine learning on big data: opportunities and challenges. Neurocomputing 237:350–361. https://doi.org/10.1016/j.neucom.2017.01.026 CrossRef

Titel: Stochastic trust region inexact Newton method for large-scale machine learning
verfasst von: Vinod Kumar Chauhan
Anuj Sharma
Kalpana Dahiya
Publikationsdatum: 03.01.2020
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal of Machine Learning and Cybernetics / Ausgabe 7/2020
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI: https://doi.org/10.1007/s13042-019-01055-9

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

ATZelectronics worldwide

ATZelektronik

Weitere Artikel der Ausgabe 7/2020

An ORESTE approach for multi-criteria decision-making with probabilistic hesitant fuzzy information

Supervised feature selection by constituting a basis for the original space of features and matrix factorization

A survey of robust optimization based machine learning with special reference to support vector machines

Parameter self-tuning schemes for the two phase test sample sparse representation classifier

Homo-ELM: fully homomorphic extreme learning machine

Unsupervised feature learning with sparse Bayesian auto-encoding based extreme learning machine

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.