nach oben

Erschienen in:

2021 | OriginalPaper | Buchkapitel

Orthant Based Proximal Stochastic Gradient Method for \(\ell _1\)-Regularized Optimization

verfasst von : Tianyi Chen, Tianyu Ding, Bo Ji, Guanyi Wang, Yixin Shi, Jing Tian, Sheng Yi, Xiao Tu, Zhihui Zhu

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Sparsity-inducing regularization problems are ubiquitous in machine learning applications, ranging from feature selection to model compression. In this paper, we present a novel stochastic method – Orthant Based Proximal Stochastic Gradient Method (OBProx-SG) – to solve perhaps the most popular instance, i.e., the \(\ell _1\)-regularized problem. The OBProx-SG method contains two steps: (i) a proximal stochastic gradient step to predict a support cover of the solution; and (ii) an orthant step to aggressively enhance the sparsity level via orthant face projection. Compared to the state-of-the-art methods, e.g., Prox-SG, RDA and Prox-SVRG, the OBProx-SG not only converges comparably in both convex and non-convex scenarios, but also promotes the sparsity of the solutions substantially. Particularly, on a large number of convex problems, OBProx-SG outperforms the existing methods comprehensively in the aspect of sparsity exploration and objective values. Moreover, the experiments on non-convex deep neural networks, e.g., MobileNetV1 and ResNet18, further demonstrate its superiority by generating the solutions of much higher sparsity without sacrificing generalization accuracy, which further implies that OBProx-SG may achieve significant memory and energy savings. The source code is available at https://github.com/tianyic/obproxsg.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Towards Description of Block Model on Graph

Nächstes Kapitel Efficiency of Coordinate Descent Methods for Structured Nonconvex Optimization

https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.

Andrew, G., Gao, J.: Scalable training of \(l_1\)-regularized log-linear models. In: Proceedings of the 24th International Conference on Machine Learning, pp. 33–40. ACM (2007)

Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)MathSciNetCrossRef

Bradley, S., Hax, A., Magnanti, T.: Applied Mathematical Programming. Addison-Wesley, Boston (1977)

Chen, T.: A Fast Reduced-Space Algorithmic Framework for Sparse Optimization. Ph.D. thesis, Johns Hopkins University (2018)

Chen, T., Curtis, F.E., Robinson, D.P.: A reduced-space algorithm for minimizing \(\ell _1\)-regularized convex functions. SIAM J. Optim. 27(3), 1583–1610 (2017)MathSciNetCrossRef

Chen, T., Curtis, F.E., Robinson, D.P.: Farsa for \(\ell _1\)-regularized convex optimization: local convergence and numerical experience. Optim. Methods Softw. 33, 396–415 (2018)MathSciNetCrossRef

Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282 (2017)

Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Advances in Neural Information Processing Systems (2019)

Dixit, A.K.: Optimization in Economic Theory. Oxford University Press on Demand, Oxford (1990)

10.

Duchi, J., Singer, Y.: Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res. 10, 2899–2934 (2009)MathSciNetMATH

11.

Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points–online stochastic gradient for tensor decomposition. In: Conference on Learning Theory, pp. 797–842 (2015)

12.

Han, S., Mao, H., Dally, W.J.: Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149 (2015)

13.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

14.

Howard, A.G., et al.: MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

15.

Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural Inf. Process. Syst. 26, 315–323 (2013)

16.

Keskar, N.S., Nocedal, J., Oztoprak, F., Waechter, A.: A second-order method for convex \(\ell _1\)-regularized optimization with active set prediction. arXiv preprint arXiv:1505.04315 (2015)

17.

Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto (2009)

18.

Lee, J., Sun, Y., Saunders, M.: Proximal newton-type methods for convex optimization. In: Advances in Neural Information Processing Systems, pp. 836–844 (2012)

19.

Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120, 221–259 (2009). https://doi.org/10.1007/s10107-007-0149-xMathSciNetCrossRefMATH

20.

Riezler, S., Vasserman, A.: Incremental feature selection and l1 regularization for relaxed maximum-entropy modeling. In: Empirical Methods in Natural Language Processing (2004)

21.

Sra, S.: Fast projections onto \(\ell _{1, q}\)-norm balls for grouped feature selection. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases (2011)

22.

Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58(1), 267–288 (1996)MathSciNetMATH

23.

Tikhonov, N., Arsenin, Y.: Solution of Ill-Posed Problems. Winston and Sons, Washington, D.C. (1977)MATH

24.

Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms (2017)

25.

Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 11, 2543–2596 (2010)MathSciNetMATH

26.

Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014)MathSciNetCrossRef

27.

Yang, M., Milzarek, A., Wen, Z., Zhang, T.: A stochastic extra-step quasi-newton method for nonsmooth nonconvex optimization. arXiv preprint arXiv:1910.09373 (2019)

28.

Yuan, G.X., Ho, C.H., Lin, C.J.: An improved GLMNET for l1-regularized logistic regression. J. Mach. Learn. Res. 13(1), 1999–2030 (2012)MathSciNetMATH

29.

Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014)

30.

Zeiler, M.D., Fergus, R.: Stochastic pooling for regularization of deep convolutional neural networks. arXiv preprint arXiv:1301.3557 (2013)

31.

Zhong, K., Song, Z., Jain, P., Bartlett, P.L., Dhillon, I.S.: Recovery guarantees for one-hidden-layer neural networks. In: International Conference on Machine Learning (2017)

32.

Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soci. B (Stat. Methodol.) 67, 301–320 (2005)MathSciNetCrossRef

Titel: Orthant Based Proximal Stochastic Gradient Method for -Regularized Optimization
verfasst von: Tianyi Chen
Tianyu Ding
Bo Ji
Guanyi Wang
Yixin Shi
Jing Tian
Sheng Yi
Xiao Tu
Zhihui Zhu
Verlag: Springer International Publishing
Buch: Machine Learning and Knowledge Discovery in Databases
Print ISBN: 978-3-030-67663-6

Electronic ISBN: 978-3-030-67664-3

Copyright-Jahr: 2021
DOI: https://doi.org/10.1007/978-3-030-67664-3_4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"