Skip to main content

2021 | OriginalPaper | Buchkapitel

Orthant Based Proximal Stochastic Gradient Method for \(\ell _1\)-Regularized Optimization

verfasst von : Tianyi Chen, Tianyu Ding, Bo Ji, Guanyi Wang, Yixin Shi, Jing Tian, Sheng Yi, Xiao Tu, Zhihui Zhu

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Sparsity-inducing regularization problems are ubiquitous in machine learning applications, ranging from feature selection to model compression. In this paper, we present a novel stochastic method – Orthant Based Proximal Stochastic Gradient Method (OBProx-SG) – to solve perhaps the most popular instance, i.e., the \(\ell _1\)-regularized problem. The OBProx-SG method contains two steps: (i) a proximal stochastic gradient step to predict a support cover of the solution; and (ii) an orthant step to aggressively enhance the sparsity level via orthant face projection. Compared to the state-of-the-art methods, e.g., Prox-SG, RDA and Prox-SVRG, the OBProx-SG not only converges comparably in both convex and non-convex scenarios, but also promotes the sparsity of the solutions substantially. Particularly, on a large number of convex problems, OBProx-SG outperforms the existing methods comprehensively in the aspect of sparsity exploration and objective values. Moreover, the experiments on non-convex deep neural networks, e.g., MobileNetV1 and ResNet18, further demonstrate its superiority by generating the solutions of much higher sparsity without sacrificing generalization accuracy, which further implies that OBProx-SG may achieve significant memory and energy savings. The source code is available at https://​github.​com/​tianyic/​obproxsg.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Andrew, G., Gao, J.: Scalable training of \(l_1\)-regularized log-linear models. In: Proceedings of the 24th International Conference on Machine Learning, pp. 33–40. ACM (2007) Andrew, G., Gao, J.: Scalable training of \(l_1\)-regularized log-linear models. In: Proceedings of the 24th International Conference on Machine Learning, pp. 33–40. ACM (2007)
2.
Zurück zum Zitat Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)MathSciNetCrossRef Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)MathSciNetCrossRef
3.
Zurück zum Zitat Bradley, S., Hax, A., Magnanti, T.: Applied Mathematical Programming. Addison-Wesley, Boston (1977) Bradley, S., Hax, A., Magnanti, T.: Applied Mathematical Programming. Addison-Wesley, Boston (1977)
4.
Zurück zum Zitat Chen, T.: A Fast Reduced-Space Algorithmic Framework for Sparse Optimization. Ph.D. thesis, Johns Hopkins University (2018) Chen, T.: A Fast Reduced-Space Algorithmic Framework for Sparse Optimization. Ph.D. thesis, Johns Hopkins University (2018)
5.
Zurück zum Zitat Chen, T., Curtis, F.E., Robinson, D.P.: A reduced-space algorithm for minimizing \(\ell _1\)-regularized convex functions. SIAM J. Optim. 27(3), 1583–1610 (2017)MathSciNetCrossRef Chen, T., Curtis, F.E., Robinson, D.P.: A reduced-space algorithm for minimizing \(\ell _1\)-regularized convex functions. SIAM J. Optim. 27(3), 1583–1610 (2017)MathSciNetCrossRef
6.
Zurück zum Zitat Chen, T., Curtis, F.E., Robinson, D.P.: Farsa for \(\ell _1\)-regularized convex optimization: local convergence and numerical experience. Optim. Methods Softw. 33, 396–415 (2018)MathSciNetCrossRef Chen, T., Curtis, F.E., Robinson, D.P.: Farsa for \(\ell _1\)-regularized convex optimization: local convergence and numerical experience. Optim. Methods Softw. 33, 396–415 (2018)MathSciNetCrossRef
7.
Zurück zum Zitat Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282 (2017) Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:​1710.​09282 (2017)
8.
Zurück zum Zitat Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Advances in Neural Information Processing Systems (2019) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Advances in Neural Information Processing Systems (2019)
9.
Zurück zum Zitat Dixit, A.K.: Optimization in Economic Theory. Oxford University Press on Demand, Oxford (1990) Dixit, A.K.: Optimization in Economic Theory. Oxford University Press on Demand, Oxford (1990)
10.
Zurück zum Zitat Duchi, J., Singer, Y.: Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res. 10, 2899–2934 (2009)MathSciNetMATH Duchi, J., Singer, Y.: Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res. 10, 2899–2934 (2009)MathSciNetMATH
11.
Zurück zum Zitat Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points–online stochastic gradient for tensor decomposition. In: Conference on Learning Theory, pp. 797–842 (2015) Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points–online stochastic gradient for tensor decomposition. In: Conference on Learning Theory, pp. 797–842 (2015)
12.
Zurück zum Zitat Han, S., Mao, H., Dally, W.J.: Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149 (2015) Han, S., Mao, H., Dally, W.J.: Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:​1510.​00149 (2015)
13.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
14.
Zurück zum Zitat Howard, A.G., et al.: MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Howard, A.G., et al.: MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:​1704.​04861 (2017)
15.
Zurück zum Zitat Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural Inf. Process. Syst. 26, 315–323 (2013) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural Inf. Process. Syst. 26, 315–323 (2013)
16.
Zurück zum Zitat Keskar, N.S., Nocedal, J., Oztoprak, F., Waechter, A.: A second-order method for convex \(\ell _1\)-regularized optimization with active set prediction. arXiv preprint arXiv:1505.04315 (2015) Keskar, N.S., Nocedal, J., Oztoprak, F., Waechter, A.: A second-order method for convex \(\ell _1\)-regularized optimization with active set prediction. arXiv preprint arXiv:​1505.​04315 (2015)
17.
Zurück zum Zitat Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto (2009) Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto (2009)
18.
Zurück zum Zitat Lee, J., Sun, Y., Saunders, M.: Proximal newton-type methods for convex optimization. In: Advances in Neural Information Processing Systems, pp. 836–844 (2012) Lee, J., Sun, Y., Saunders, M.: Proximal newton-type methods for convex optimization. In: Advances in Neural Information Processing Systems, pp. 836–844 (2012)
20.
Zurück zum Zitat Riezler, S., Vasserman, A.: Incremental feature selection and l1 regularization for relaxed maximum-entropy modeling. In: Empirical Methods in Natural Language Processing (2004) Riezler, S., Vasserman, A.: Incremental feature selection and l1 regularization for relaxed maximum-entropy modeling. In: Empirical Methods in Natural Language Processing (2004)
21.
Zurück zum Zitat Sra, S.: Fast projections onto \(\ell _{1, q}\)-norm balls for grouped feature selection. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases (2011) Sra, S.: Fast projections onto \(\ell _{1, q}\)-norm balls for grouped feature selection. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases (2011)
22.
Zurück zum Zitat Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58(1), 267–288 (1996)MathSciNetMATH Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58(1), 267–288 (1996)MathSciNetMATH
23.
Zurück zum Zitat Tikhonov, N., Arsenin, Y.: Solution of Ill-Posed Problems. Winston and Sons, Washington, D.C. (1977)MATH Tikhonov, N., Arsenin, Y.: Solution of Ill-Posed Problems. Winston and Sons, Washington, D.C. (1977)MATH
24.
Zurück zum Zitat Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms (2017) Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms (2017)
25.
Zurück zum Zitat Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 11, 2543–2596 (2010)MathSciNetMATH Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 11, 2543–2596 (2010)MathSciNetMATH
26.
Zurück zum Zitat Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014)MathSciNetCrossRef Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014)MathSciNetCrossRef
27.
Zurück zum Zitat Yang, M., Milzarek, A., Wen, Z., Zhang, T.: A stochastic extra-step quasi-newton method for nonsmooth nonconvex optimization. arXiv preprint arXiv:1910.09373 (2019) Yang, M., Milzarek, A., Wen, Z., Zhang, T.: A stochastic extra-step quasi-newton method for nonsmooth nonconvex optimization. arXiv preprint arXiv:​1910.​09373 (2019)
28.
Zurück zum Zitat Yuan, G.X., Ho, C.H., Lin, C.J.: An improved GLMNET for l1-regularized logistic regression. J. Mach. Learn. Res. 13(1), 1999–2030 (2012)MathSciNetMATH Yuan, G.X., Ho, C.H., Lin, C.J.: An improved GLMNET for l1-regularized logistic regression. J. Mach. Learn. Res. 13(1), 1999–2030 (2012)MathSciNetMATH
29.
30.
Zurück zum Zitat Zeiler, M.D., Fergus, R.: Stochastic pooling for regularization of deep convolutional neural networks. arXiv preprint arXiv:1301.3557 (2013) Zeiler, M.D., Fergus, R.: Stochastic pooling for regularization of deep convolutional neural networks. arXiv preprint arXiv:​1301.​3557 (2013)
31.
Zurück zum Zitat Zhong, K., Song, Z., Jain, P., Bartlett, P.L., Dhillon, I.S.: Recovery guarantees for one-hidden-layer neural networks. In: International Conference on Machine Learning (2017) Zhong, K., Song, Z., Jain, P., Bartlett, P.L., Dhillon, I.S.: Recovery guarantees for one-hidden-layer neural networks. In: International Conference on Machine Learning (2017)
32.
Zurück zum Zitat Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soci. B (Stat. Methodol.) 67, 301–320 (2005)MathSciNetCrossRef Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soci. B (Stat. Methodol.) 67, 301–320 (2005)MathSciNetCrossRef
Metadaten
Titel
Orthant Based Proximal Stochastic Gradient Method for -Regularized Optimization
verfasst von
Tianyi Chen
Tianyu Ding
Bo Ji
Guanyi Wang
Yixin Shi
Jing Tian
Sheng Yi
Xiao Tu
Zhihui Zhu
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-67664-3_4