Top

Advances in Data Analysis and Classification

Published in:

21-12-2022 | Regular Article

Proximal methods for sparse optimal scoring and discriminant analysis

Authors: Summer Atkins, Gudmundur Einarsson, Line Clemmensen, Brendan Ames

Published in: Advances in Data Analysis and Classification | Issue 4/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Linear discriminant analysis (LDA) is a classical method for dimensionality reduction, where discriminant vectors are sought to project data to a lower dimensional space for optimal separability of classes. Several recent papers have outlined strategies, based on exploiting sparsity of the discriminant vectors, for performing LDA in the high-dimensional setting where the number of features exceeds the number of observations in the data. However, many of these proposed methods lack scalable methods for solution of the underlying optimization problems. We consider an optimization scheme for solving the sparse optimal scoring formulation of LDA based on block coordinate descent. Each iteration of this algorithm requires an update of a scoring vector, which admits an analytic formula, and an update of the corresponding discriminant vector, which requires solution of a convex subproblem; we will propose several variants of this algorithm where the proximal gradient method or the alternating direction method of multipliers is used to solve this subproblem. We show that the per-iteration cost of these methods scales linearly in the dimension of the data provided restricted regularization terms are employed, and cubically in the dimension of the data in the worst case. Furthermore, we establish that when this block coordinate descent framework generates convergent subsequences of iterates, then these subsequences converge to the stationary points of the sparse optimal scoring problem. We demonstrate the effectiveness of our new methods with empirical results for classification of Gaussian data and data sets drawn from benchmarking repositories, including time-series and multispectral X-ray data, and provide Matlab and R implementations of our optimization schemes.

previous article LASSO regularization within the LocalGLMnet architecture

next article Sparse correspondence analysis for large contingency tables

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Available only for authorised users

Available at http://bpames.people.ua.edu/software.

Citation count available from scholar.google.com accessed November 11, 2021.

Download count available from cranlogs.r-pkg.org/badges/grand-total/sparseLDA accessed November 11, 2021.

This choice of method of tuning \(\lambda \) differs from that used in earlier versions of this manuscript, where we chose \(\lambda \) via cross-validation or based on out-of-sample classification rate. The results of this set of analyses largely agree with those of the earlier manuscripts, except with significantly decreased run-times for LARS when compared to the approach applying cross-validation, and modestly increased misclassification error when compared to those trained using out-of-sample accuracy.

Allen-Zhu Z, Orecchia L (2017) Linear coupling: an ultimate unification of gradient and mirror descent. In: Papadimitriou CH (ed. 8th Innovations in theoretical computer science conference, ITCS 2017, January 9-11, Berkeley, CA, USA, LIPIcs, vol 67, pp 3:1–3:22. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017). https://doi.org/10.4230/LIPIcs.ITCS.2017.3

Ames B, Hong M (2016) Alternating direction method of multipliers for penalized zero-variance discriminant analysis. Comput Optim Appl 64(3):725–754. https://doi.org/10.1007/s10589-016-9828-yMathSciNetCrossRefMATH

Beck A (2017) First-order methods in optimization, vol 25. SIAM

Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag Sci 2(1):183–202. https://doi.org/10.1137/080716542MathSciNetCrossRefMATH

Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122. https://doi.org/10.1561/2200000016CrossRefMATH

Bubeck S, Lee YT, Singh M (2015) A geometric alternative to Nesterov’s accelerated gradient descent

Cai T, Liu W (2011) A direct estimation approach to sparse linear discriminant analysis. J Am Stat Assoc 106(496):1566–1577. https://doi.org/10.1198/jasa.2011.tm11199MathSciNetCrossRefMATH

Clemmensen L (2008) Sparse discriminant analysis software (sparseLDA): Matlab and R packages . https://orbit.dtu.dk/en/publications/sparse-discriminant-analysis-software-sparselda-matlab-and-r-pack/. Matlab and R versions of the sparseLDA package

Clemmensen L, Hastie T, Witten D, Ersbøll B (2011) Sparse discriminant analysis. Technometrics 53(4):406–413. https://doi.org/10.1198/TECH.2011.08118MathSciNetCrossRef

Dau HA, Bagnall AJ, Kamgar K, Yeh CM, Zhu Y, Gharghabi S, Ratanamahatana CA, Keogh EJ (2019) The UCR time series archive. IEEE CAA J Autom Sinica 6(6):1293–1305. https://doi.org/10.1109/jas.2019.1911747CrossRef

Dau HA, Keogh E, Kamgar K, Yeh CCM, Zhu Y, Gharghabi S, Ratanamahatana CA, Yanping Hu B, Begum N, Bagnall A, Mueen A, Batista G (2018) The UCR time series classification archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/

Deng W, Yin W (2012) On the global and linear convergence of the generalized alternating direction method of multipliers. J Sci Comput 3(66):889–916. https://doi.org/10.1007/s10915-015-0048-xMathSciNetCrossRefMATH

Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499MathSciNetCrossRefMATH

Einarsson G, Clemmensen L, Ames B, Atkins S (2017) Accsda: accelerated sparse discriminant analysis. https://cran.r-project.org/web/packages/accSDA/index.html. Also available at https://github.com/gumeo/accSDA

Einarsson G, Jensen JN, Paulsen RR, Einarsdottir H, Ersbøll BK, Dahl AB, Christensen LB (2017) Foreign object detection in multispectral x-ray images of food items using sparse discriminant analysis. In: Scandinavian conference on image analysis, pp 350–361. Springer

Fan J, Fan Y (2008) High dimensional classification using features annealed independence rules. Ann Stat 36(6):2605–2637. https://doi.org/10.1214/07-AOS504MathSciNetCrossRefMATH

Flammarion N, Bach F (2015) From averaging to acceleration, there is only a step-size. In: Conference on learning theory, pp 658–695. PMLR

Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1CrossRef

Goldfarb D, Ma S, Scheinberg K (2013) Fast alternating linearization methods for minimizing the sum of two convex functions. Math Program 141(1):349–382MathSciNetCrossRefMATH

Golub GH, Van Loan CF (2013) Matrix computations, 4th edn. The Johns Hopkins University Press, BaltimoreCrossRefMATH

Grosenick L, Greer S, Knutson B (2008) Interpretable classifiers for fmri improve prediction of purchases. IEEE Trans Neural Syst Rehabil Eng 16(6):539–548CrossRef

Hastie T, Buja A, Tibshirani R (1995) Penalized discriminant analysis. Ann Stat pp 73–102

Hastie T, Tibshirani R, Buja A (1994) Flexible discriminant analysis by optimal scoring. J Am Stat Assoc 89(428):1255–1270. https://doi.org/10.2307/2290989MathSciNetCrossRefMATH

Hastie T, Tibshirani R, Friedman JH (2013) The elements of statistical learning, 2nd edn. Springer, New YorkMATH

Hastie T, Tibshirani R, Wainwright M (2012) Statistical learning with sparsity: the lasso and generalizations, 1st edn. CRC Press, Boca RatonMATH

He B, Yuan X (2012) On the \({O}(1/n)\) convergence rate of the Douglas-Rachford alternating direction method. SIAM J Numer Anal 50(2):700–709MathSciNetCrossRefMATH

Lessard L, Recht B, Packard A (2016) Analysis and design of optimization algorithms via integral quadratic constraints. SIAM J Optim 26(1):57–95. https://doi.org/10.1137/15M1009597MathSciNetCrossRefMATH

Ma Q, Yuan M, Zou H (2012) A direct approach to sparse discriminant analysis in ultra-high dimensions. Biometrika 99:29–42. https://doi.org/10.1093/biomet/asr066MathSciNetCrossRefMATH

Mai Q, Yang Y, Zou H (2019) Multiclass sparse discriminant analysis. Stat Sin 29(1):97–111MathSciNetMATH

Mai Q, Zou H (2013) A note on the connection and equivalence of three sparse linear discriminant analysis methods. Technometrics 55(2):243–246. https://doi.org/10.1080/00401706.2012.746208MathSciNetCrossRef

Matérn B (2013) Spatial variation, vol 36. Springer Science & Business Media, New YorkMATH

Merchante LFS, Grandvalet Y, Govaert G (2012) An efficient approach to sparse linear discriminant analysis. In: Proceedings of the 29th international conference on machine learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012. icml.cc / Omnipress. http://icml.cc/2012/papers/591.pdf

Nesterov Y (1983) A method of solving a convex programming problem with convergence rate \({O}(1/k^2)\). In: Soviet mathematics doklady, vol 27, pp 372–376. http://mpawankumar.info/teaching/cdt-big-data/nesterov83.pdf

Nesterov Y (2005) Smooth minimization of non-smooth functions. Math Program 103(1):127–152. https://doi.org/10.1007/s10107-004-0552-5MathSciNetCrossRefMATH

Nesterov Y (2013) Gradient methods for minimizing composite functions. Math Program 140(1):125–161. https://doi.org/10.1007/s10107-012-0629-5MathSciNetCrossRefMATH

Nishihara R, Lessard L, Recht B, Packard A, Jordan M (2015) A general analysis of the convergence of ADMM. In: International conference on machine learning, pp 343–352. PMLR

Nocedal J, Wright S (2006) Numerical optimization, 2nd edn. Springer Science & Business Media, New YorkMATH

O’Donoghue B, Candes E (2015) Adaptive restart for accelerated gradient schemes. Found Comput Math 15(3):715–732. https://doi.org/10.1007/s10208-013-9150-3MathSciNetCrossRefMATH

Parikh N, Boyd SP (2014) Proximal algorithms. Found Trends Optim 1(3):127–239. https://doi.org/10.1561/2400000003CrossRef

Roth V, Fischer B (2008) The group-lasso for generalized linear models: uniqueness of solutions and efficient algorithms. In: Proceedings of the 25th international conference on machine learning, pp 848–855

Shao J, Wang Y, Deng X, Wang S (2011) Sparse linear discriminant analysis by thresholding for high dimensional data. Ann Stat 39(2):1241–1265. https://doi.org/10.1214/10-AOS870MathSciNetCrossRefMATH

Su W, Boyd S, Candes E (2014) A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights. In: Advances in neural information processing systems, pp 2510–2518

Tibshirani R, Hastie T, Narasimhan B, Chu G (2003) Class prediction by nearest shrunken centroids, with applications to DNA microarrays. Stat Sci. https://doi.org/10.1214/ss/1056397488MathSciNetCrossRefMATH

Tseng P (2008) On accelerated proximal gradient methods for convex-concave optimization. submitted to SIAM Journal on Optimization 2(3). http://www.mit.edu/~dimitrib/PTseng/papers/apgm.pdf

Witten DM, Tibshirani R (2011) Penalized classification using Fisher’s linear discriminant. J R Stat Soc Ser B 73(5):753–772. https://doi.org/10.1111/j.1467-9868.2011.00783.xMathSciNetCrossRefMATH

Wu M, Zhang L, Wang Z, Christiani D, Lin X (2008) Sparse linear discriminant analysis for simultaneous testing for the significance of a gene set/pathway and gene selection. Bioinformatics 25(9):1145–1151. https://doi.org/10.1093/bioinformatics/btp019CrossRef

Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67(2):301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.xMathSciNetCrossRefMATH

Title: Proximal methods for sparse optimal scoring and discriminant analysis
Authors: Summer Atkins
Gudmundur Einarsson
Line Clemmensen
Brendan Ames
Publication date: 21-12-2022
Publisher: Springer Berlin Heidelberg
Published in: Advances in Data Analysis and Classification / Issue 4/2023
Print ISSN: 1862-5347
Electronic ISSN: 1862-5355
DOI: https://doi.org/10.1007/s11634-022-00530-6

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 4/2023

Determinantal consensus clustering

Clustering data with non-ignorable missingness using semi-parametric mixture models assuming independence within components

A power-controlled reliability assessment for multi-class probabilistic classifiers

Sparse correspondence analysis for large contingency tables

Editorial for ADAC issue 4 of volume 17 (2023)

Robust instance-dependent cost-sensitive classification

Premium Partner