Skip to main content
Top
Published in: Advances in Data Analysis and Classification 4/2023

21-12-2022 | Regular Article

Proximal methods for sparse optimal scoring and discriminant analysis

Authors: Summer Atkins, Gudmundur Einarsson, Line Clemmensen, Brendan Ames

Published in: Advances in Data Analysis and Classification | Issue 4/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Linear discriminant analysis (LDA) is a classical method for dimensionality reduction, where discriminant vectors are sought to project data to a lower dimensional space for optimal separability of classes. Several recent papers have outlined strategies, based on exploiting sparsity of the discriminant vectors, for performing LDA in the high-dimensional setting where the number of features exceeds the number of observations in the data. However, many of these proposed methods lack scalable methods for solution of the underlying optimization problems. We consider an optimization scheme for solving the sparse optimal scoring formulation of LDA based on block coordinate descent. Each iteration of this algorithm requires an update of a scoring vector, which admits an analytic formula, and an update of the corresponding discriminant vector, which requires solution of a convex subproblem; we will propose several variants of this algorithm where the proximal gradient method or the alternating direction method of multipliers is used to solve this subproblem. We show that the per-iteration cost of these methods scales linearly in the dimension of the data provided restricted regularization terms are employed, and cubically in the dimension of the data in the worst case. Furthermore, we establish that when this block coordinate descent framework generates convergent subsequences of iterates, then these subsequences converge to the stationary points of the sparse optimal scoring problem. We demonstrate the effectiveness of our new methods with empirical results for classification of Gaussian data and data sets drawn from benchmarking repositories, including time-series and multispectral X-ray data, and provide Matlab and R implementations of our optimization schemes.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
2
Citation count available from scholar.google.com accessed November 11, 2021.
 
3
Download count available from  cranlogs.r-pkg.org/badges/grand-total/sparseLDA accessed November 11, 2021.
 
4
This choice of method of tuning \(\lambda \) differs from that used in earlier versions of this manuscript, where we chose \(\lambda \) via cross-validation or based on out-of-sample classification rate. The results of this set of analyses largely agree with those of the earlier manuscripts, except with significantly decreased run-times for LARS when compared to the approach applying cross-validation, and modestly increased misclassification error when compared to those trained using out-of-sample accuracy.
 
Literature
go back to reference Allen-Zhu Z, Orecchia L (2017) Linear coupling: an ultimate unification of gradient and mirror descent. In: Papadimitriou CH (ed. 8th Innovations in theoretical computer science conference, ITCS 2017, January 9-11, Berkeley, CA, USA, LIPIcs, vol 67, pp 3:1–3:22. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017). https://doi.org/10.4230/LIPIcs.ITCS.2017.3 Allen-Zhu Z, Orecchia L (2017) Linear coupling: an ultimate unification of gradient and mirror descent. In: Papadimitriou CH (ed. 8th Innovations in theoretical computer science conference, ITCS 2017, January 9-11, Berkeley, CA, USA, LIPIcs, vol 67, pp 3:1–3:22. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017). https://​doi.​org/​10.​4230/​LIPIcs.​ITCS.​2017.​3
go back to reference Beck A (2017) First-order methods in optimization, vol 25. SIAM Beck A (2017) First-order methods in optimization, vol 25. SIAM
go back to reference Bubeck S, Lee YT, Singh M (2015) A geometric alternative to Nesterov’s accelerated gradient descent Bubeck S, Lee YT, Singh M (2015) A geometric alternative to Nesterov’s accelerated gradient descent
go back to reference Einarsson G, Jensen JN, Paulsen RR, Einarsdottir H, Ersbøll BK, Dahl AB, Christensen LB (2017) Foreign object detection in multispectral x-ray images of food items using sparse discriminant analysis. In: Scandinavian conference on image analysis, pp 350–361. Springer Einarsson G, Jensen JN, Paulsen RR, Einarsdottir H, Ersbøll BK, Dahl AB, Christensen LB (2017) Foreign object detection in multispectral x-ray images of food items using sparse discriminant analysis. In: Scandinavian conference on image analysis, pp 350–361. Springer
go back to reference Flammarion N, Bach F (2015) From averaging to acceleration, there is only a step-size. In: Conference on learning theory, pp 658–695. PMLR Flammarion N, Bach F (2015) From averaging to acceleration, there is only a step-size. In: Conference on learning theory, pp 658–695. PMLR
go back to reference Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1CrossRef Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1CrossRef
go back to reference Goldfarb D, Ma S, Scheinberg K (2013) Fast alternating linearization methods for minimizing the sum of two convex functions. Math Program 141(1):349–382MathSciNetCrossRefMATH Goldfarb D, Ma S, Scheinberg K (2013) Fast alternating linearization methods for minimizing the sum of two convex functions. Math Program 141(1):349–382MathSciNetCrossRefMATH
go back to reference Golub GH, Van Loan CF (2013) Matrix computations, 4th edn. The Johns Hopkins University Press, BaltimoreCrossRefMATH Golub GH, Van Loan CF (2013) Matrix computations, 4th edn. The Johns Hopkins University Press, BaltimoreCrossRefMATH
go back to reference Grosenick L, Greer S, Knutson B (2008) Interpretable classifiers for fmri improve prediction of purchases. IEEE Trans Neural Syst Rehabil Eng 16(6):539–548CrossRef Grosenick L, Greer S, Knutson B (2008) Interpretable classifiers for fmri improve prediction of purchases. IEEE Trans Neural Syst Rehabil Eng 16(6):539–548CrossRef
go back to reference Hastie T, Buja A, Tibshirani R (1995) Penalized discriminant analysis. Ann Stat pp 73–102 Hastie T, Buja A, Tibshirani R (1995) Penalized discriminant analysis. Ann Stat pp 73–102
go back to reference Hastie T, Tibshirani R, Friedman JH (2013) The elements of statistical learning, 2nd edn. Springer, New YorkMATH Hastie T, Tibshirani R, Friedman JH (2013) The elements of statistical learning, 2nd edn. Springer, New YorkMATH
go back to reference Hastie T, Tibshirani R, Wainwright M (2012) Statistical learning with sparsity: the lasso and generalizations, 1st edn. CRC Press, Boca RatonMATH Hastie T, Tibshirani R, Wainwright M (2012) Statistical learning with sparsity: the lasso and generalizations, 1st edn. CRC Press, Boca RatonMATH
go back to reference He B, Yuan X (2012) On the \({O}(1/n)\) convergence rate of the Douglas-Rachford alternating direction method. SIAM J Numer Anal 50(2):700–709MathSciNetCrossRefMATH He B, Yuan X (2012) On the \({O}(1/n)\) convergence rate of the Douglas-Rachford alternating direction method. SIAM J Numer Anal 50(2):700–709MathSciNetCrossRefMATH
go back to reference Matérn B (2013) Spatial variation, vol 36. Springer Science & Business Media, New YorkMATH Matérn B (2013) Spatial variation, vol 36. Springer Science & Business Media, New YorkMATH
go back to reference Merchante LFS, Grandvalet Y, Govaert G (2012) An efficient approach to sparse linear discriminant analysis. In: Proceedings of the 29th international conference on machine learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012. icml.cc / Omnipress. http://icml.cc/2012/papers/591.pdf Merchante LFS, Grandvalet Y, Govaert G (2012) An efficient approach to sparse linear discriminant analysis. In: Proceedings of the 29th international conference on machine learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012. icml.cc / Omnipress. http://​icml.​cc/​2012/​papers/​591.​pdf
go back to reference Nishihara R, Lessard L, Recht B, Packard A, Jordan M (2015) A general analysis of the convergence of ADMM. In: International conference on machine learning, pp 343–352. PMLR Nishihara R, Lessard L, Recht B, Packard A, Jordan M (2015) A general analysis of the convergence of ADMM. In: International conference on machine learning, pp 343–352. PMLR
go back to reference Nocedal J, Wright S (2006) Numerical optimization, 2nd edn. Springer Science & Business Media, New YorkMATH Nocedal J, Wright S (2006) Numerical optimization, 2nd edn. Springer Science & Business Media, New YorkMATH
go back to reference Roth V, Fischer B (2008) The group-lasso for generalized linear models: uniqueness of solutions and efficient algorithms. In: Proceedings of the 25th international conference on machine learning, pp 848–855 Roth V, Fischer B (2008) The group-lasso for generalized linear models: uniqueness of solutions and efficient algorithms. In: Proceedings of the 25th international conference on machine learning, pp 848–855
go back to reference Su W, Boyd S, Candes E (2014) A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights. In: Advances in neural information processing systems, pp 2510–2518 Su W, Boyd S, Candes E (2014) A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights. In: Advances in neural information processing systems, pp 2510–2518
Metadata
Title
Proximal methods for sparse optimal scoring and discriminant analysis
Authors
Summer Atkins
Gudmundur Einarsson
Line Clemmensen
Brendan Ames
Publication date
21-12-2022
Publisher
Springer Berlin Heidelberg
Published in
Advances in Data Analysis and Classification / Issue 4/2023
Print ISSN: 1862-5347
Electronic ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-022-00530-6

Other articles of this Issue 4/2023

Advances in Data Analysis and Classification 4/2023 Go to the issue

Premium Partner