Skip to main content
main-content
Top

Hint

Swipe to navigate through the chapters of this book

2021 | OriginalPaper | Chapter

Orthant Based Proximal Stochastic Gradient Method for \(\ell _1\)-Regularized Optimization

Authors: Tianyi Chen, Tianyu Ding, Bo Ji, Guanyi Wang, Yixin Shi, Jing Tian, Sheng Yi, Xiao Tu, Zhihui Zhu

Published in: Machine Learning and Knowledge Discovery in Databases

Publisher: Springer International Publishing

share
SHARE

Abstract

Sparsity-inducing regularization problems are ubiquitous in machine learning applications, ranging from feature selection to model compression. In this paper, we present a novel stochastic method – Orthant Based Proximal Stochastic Gradient Method (OBProx-SG) – to solve perhaps the most popular instance, i.e., the \(\ell _1\)-regularized problem. The OBProx-SG method contains two steps: (i) a proximal stochastic gradient step to predict a support cover of the solution; and (ii) an orthant step to aggressively enhance the sparsity level via orthant face projection. Compared to the state-of-the-art methods, e.g., Prox-SG, RDA and Prox-SVRG, the OBProx-SG not only converges comparably in both convex and non-convex scenarios, but also promotes the sparsity of the solutions substantially. Particularly, on a large number of convex problems, OBProx-SG outperforms the existing methods comprehensively in the aspect of sparsity exploration and objective values. Moreover, the experiments on non-convex deep neural networks, e.g., MobileNetV1 and ResNet18, further demonstrate its superiority by generating the solutions of much higher sparsity without sacrificing generalization accuracy, which further implies that OBProx-SG may achieve significant memory and energy savings. The source code is available at https://​github.​com/​tianyic/​obproxsg.

To get access to this content you need the following product:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 69.000 Bücher
  • über 500 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Testen Sie jetzt 15 Tage kostenlos.

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 50.000 Bücher
  • über 380 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




Testen Sie jetzt 15 Tage kostenlos.

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 58.000 Bücher
  • über 300 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Testen Sie jetzt 15 Tage kostenlos.

Literature
1.
go back to reference Andrew, G., Gao, J.: Scalable training of \(l_1\)-regularized log-linear models. In: Proceedings of the 24th International Conference on Machine Learning, pp. 33–40. ACM (2007) Andrew, G., Gao, J.: Scalable training of \(l_1\)-regularized log-linear models. In: Proceedings of the 24th International Conference on Machine Learning, pp. 33–40. ACM (2007)
2.
go back to reference Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009) MathSciNetCrossRef Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009) MathSciNetCrossRef
3.
go back to reference Bradley, S., Hax, A., Magnanti, T.: Applied Mathematical Programming. Addison-Wesley, Boston (1977) Bradley, S., Hax, A., Magnanti, T.: Applied Mathematical Programming. Addison-Wesley, Boston (1977)
4.
go back to reference Chen, T.: A Fast Reduced-Space Algorithmic Framework for Sparse Optimization. Ph.D. thesis, Johns Hopkins University (2018) Chen, T.: A Fast Reduced-Space Algorithmic Framework for Sparse Optimization. Ph.D. thesis, Johns Hopkins University (2018)
5.
go back to reference Chen, T., Curtis, F.E., Robinson, D.P.: A reduced-space algorithm for minimizing \(\ell _1\)-regularized convex functions. SIAM J. Optim. 27(3), 1583–1610 (2017) MathSciNetCrossRef Chen, T., Curtis, F.E., Robinson, D.P.: A reduced-space algorithm for minimizing \(\ell _1\)-regularized convex functions. SIAM J. Optim. 27(3), 1583–1610 (2017) MathSciNetCrossRef
6.
go back to reference Chen, T., Curtis, F.E., Robinson, D.P.: Farsa for \(\ell _1\)-regularized convex optimization: local convergence and numerical experience. Optim. Methods Softw. 33, 396–415 (2018) MathSciNetCrossRef Chen, T., Curtis, F.E., Robinson, D.P.: Farsa for \(\ell _1\)-regularized convex optimization: local convergence and numerical experience. Optim. Methods Softw. 33, 396–415 (2018) MathSciNetCrossRef
7.
go back to reference Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:​1710.​09282 (2017) Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:​1710.​09282 (2017)
8.
go back to reference Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Advances in Neural Information Processing Systems (2019) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Advances in Neural Information Processing Systems (2019)
9.
go back to reference Dixit, A.K.: Optimization in Economic Theory. Oxford University Press on Demand, Oxford (1990) Dixit, A.K.: Optimization in Economic Theory. Oxford University Press on Demand, Oxford (1990)
10.
go back to reference Duchi, J., Singer, Y.: Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res. 10, 2899–2934 (2009) MathSciNetMATH Duchi, J., Singer, Y.: Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res. 10, 2899–2934 (2009) MathSciNetMATH
11.
go back to reference Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points–online stochastic gradient for tensor decomposition. In: Conference on Learning Theory, pp. 797–842 (2015) Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points–online stochastic gradient for tensor decomposition. In: Conference on Learning Theory, pp. 797–842 (2015)
12.
go back to reference Han, S., Mao, H., Dally, W.J.: Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:​1510.​00149 (2015) Han, S., Mao, H., Dally, W.J.: Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:​1510.​00149 (2015)
13.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
14.
15.
go back to reference Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural Inf. Process. Syst. 26, 315–323 (2013) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural Inf. Process. Syst. 26, 315–323 (2013)
16.
go back to reference Keskar, N.S., Nocedal, J., Oztoprak, F., Waechter, A.: A second-order method for convex \(\ell _1\)-regularized optimization with active set prediction. arXiv preprint arXiv:​1505.​04315 (2015) Keskar, N.S., Nocedal, J., Oztoprak, F., Waechter, A.: A second-order method for convex \(\ell _1\)-regularized optimization with active set prediction. arXiv preprint arXiv:​1505.​04315 (2015)
17.
go back to reference Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto (2009) Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto (2009)
18.
go back to reference Lee, J., Sun, Y., Saunders, M.: Proximal newton-type methods for convex optimization. In: Advances in Neural Information Processing Systems, pp. 836–844 (2012) Lee, J., Sun, Y., Saunders, M.: Proximal newton-type methods for convex optimization. In: Advances in Neural Information Processing Systems, pp. 836–844 (2012)
20.
go back to reference Riezler, S., Vasserman, A.: Incremental feature selection and l1 regularization for relaxed maximum-entropy modeling. In: Empirical Methods in Natural Language Processing (2004) Riezler, S., Vasserman, A.: Incremental feature selection and l1 regularization for relaxed maximum-entropy modeling. In: Empirical Methods in Natural Language Processing (2004)
21.
go back to reference Sra, S.: Fast projections onto \(\ell _{1, q}\)-norm balls for grouped feature selection. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases (2011) Sra, S.: Fast projections onto \(\ell _{1, q}\)-norm balls for grouped feature selection. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases (2011)
22.
go back to reference Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58(1), 267–288 (1996) MathSciNetMATH Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58(1), 267–288 (1996) MathSciNetMATH
23.
go back to reference Tikhonov, N., Arsenin, Y.: Solution of Ill-Posed Problems. Winston and Sons, Washington, D.C. (1977) MATH Tikhonov, N., Arsenin, Y.: Solution of Ill-Posed Problems. Winston and Sons, Washington, D.C. (1977) MATH
24.
go back to reference Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms (2017) Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms (2017)
25.
go back to reference Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 11, 2543–2596 (2010) MathSciNetMATH Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 11, 2543–2596 (2010) MathSciNetMATH
26.
go back to reference Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014) MathSciNetCrossRef Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014) MathSciNetCrossRef
27.
go back to reference Yang, M., Milzarek, A., Wen, Z., Zhang, T.: A stochastic extra-step quasi-newton method for nonsmooth nonconvex optimization. arXiv preprint arXiv:​1910.​09373 (2019) Yang, M., Milzarek, A., Wen, Z., Zhang, T.: A stochastic extra-step quasi-newton method for nonsmooth nonconvex optimization. arXiv preprint arXiv:​1910.​09373 (2019)
28.
go back to reference Yuan, G.X., Ho, C.H., Lin, C.J.: An improved GLMNET for l1-regularized logistic regression. J. Mach. Learn. Res. 13(1), 1999–2030 (2012) MathSciNetMATH Yuan, G.X., Ho, C.H., Lin, C.J.: An improved GLMNET for l1-regularized logistic regression. J. Mach. Learn. Res. 13(1), 1999–2030 (2012) MathSciNetMATH
30.
31.
go back to reference Zhong, K., Song, Z., Jain, P., Bartlett, P.L., Dhillon, I.S.: Recovery guarantees for one-hidden-layer neural networks. In: International Conference on Machine Learning (2017) Zhong, K., Song, Z., Jain, P., Bartlett, P.L., Dhillon, I.S.: Recovery guarantees for one-hidden-layer neural networks. In: International Conference on Machine Learning (2017)
32.
go back to reference Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soci. B (Stat. Methodol.) 67, 301–320 (2005) MathSciNetCrossRef Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soci. B (Stat. Methodol.) 67, 301–320 (2005) MathSciNetCrossRef
Metadata
Title
Orthant Based Proximal Stochastic Gradient Method for -Regularized Optimization
Authors
Tianyi Chen
Tianyu Ding
Bo Ji
Guanyi Wang
Yixin Shi
Jing Tian
Sheng Yi
Xiao Tu
Zhihui Zhu
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-67664-3_4

Premium Partner