Skip to main content
Log in

Alternating direction method of multipliers for penalized zero-variance discriminant analysis

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

We consider the task of classification in the high dimensional setting where the number of features of the given data is significantly greater than the number of observations. To accomplish this task, we propose a heuristic, called sparse zero-variance discriminant analysis, for simultaneously performing linear discriminant analysis and feature selection on high dimensional data. This method combines classical zero-variance discriminant analysis, where discriminant vectors are identified in the null space of the sample within-class covariance matrix, with penalization applied to induce sparse structures in the resulting vectors. To approximately solve the resulting nonconvex problem, we develop a simple algorithm based on the alternating direction method of multipliers. Further, we show that this algorithm is applicable to a larger class of penalized generalized eigenvalue problems, including a particular relaxation of the sparse principal component analysis problem. Finally, we establish theoretical guarantees for convergence of our algorithm to stationary points of the original nonconvex problem, and empirically demonstrate the effectiveness of our heuristic for classifying simulated data and data drawn from applications in time-series classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Allen, G.I.: Sparse and functional principal components analysis. arXiv:1309.2895 (2013)

  2. Ames, B., Hong, M.: SZVD: ADMM for sparse zero-variance discriminant analysis. http://bpames.people.ua.edu/software.html (2014)

  3. Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Structured sparsity through convex optimization. Stat. Sci. 27(4), 450–468 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  4. Baik, J., Silverstein, J.W.: Eigenvalues of large sample covariance matrices of spiked population models. J. Multivar. Anal. 97(6), 1382–1408 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bickel, P., Levina, E.: Some theory for Fisher’s linear discriminant function, naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli 10(6), 989–1010 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  6. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends\(\textregistered \) Mach. Learn. 3(1), 1–122 (2011)

  7. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  8. Boyd, S., Vandenberghe, L.: Subgradients. Lecture Notes for EE364b, Stanford University, Winter 2006-07 (2008). http://see.stanford.edu/materials/lsocoee364b/01-subgradients_notes.pdf

  9. Briandet, R., Kemsley, E.K., Wilson, R.H.: Discrimination of Arabica and Robusta in instant coffee by Fourier transform infrared spectroscopy and chemometrics. J. Agric. Food Chem. 44(1), 170–174 (1996)

    Article  Google Scholar 

  10. Cai, T., Liu, W.: A direct estimation approach to sparse linear discriminant analysis. J. Am. Stat. Assoc. 106(496), 1566–1577 (2011)

  11. Candès, E.J.: Compressive sampling. In: Proceedings oh the International Congress of Mathematicians: Madrid, August 22–30, 2006: invited lectures, pages pp. 1433–1452 (2006)

  12. Clemmensen, L.: On discriminant analysis techniques and correlation structures in high dimensions. Technical Report-2013. Technical University of Denmark (2013)

  13. Clemmensen, L.: contributions by Max Kuhn. sparseLDA: Sparse Discriminant Analysis. R package version 0.1-6 (2012)

  14. Clemmensen, L., Hastie, T., Witten, D., Ersbøll, B.: Sparse discriminant analysis. Technometrics, 53(4), 406–413 (2011)

  15. d’Aspremont, A., Bach, F., El Ghaoui, L.: Optimal solutions for sparse principal component analysis. J. Mach. Learn. Res. 9, 1269–1294 (2008)

    MathSciNet  MATH  Google Scholar 

  16. d’Aspremont, A., Bach, F., El Ghaoui, L.: Approximation bounds for sparse principal component analysis. Math. Program. 148(1–2), 89–110 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  17. d’Aspremont, A., El Ghaoui, L., Jordan, M.I., Lanckriet, G.R.: A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49(3), 434–448 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  18. Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97(457), 77–87 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  19. Tebbens, J.Duintjer, Schlesinger, P.: Improving implementation of linear discriminant analysis for the high dimension/small sample size problem. Comput. Stat. Data Anal. 52(1), 423–437 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  20. Eckstein, J., Bertsekas, D.P.: On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1–3), 293–318 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  21. Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7(2), 179–188 (1936)

    Article  Google Scholar 

  22. Friedman, J.H.: Regularized discriminant analysis. J. Am. Stat. Assoc. 84(405), 165–175 (1989)

    Article  MathSciNet  Google Scholar 

  23. Golub, G.H., Van Loan, C.F.: Matrix Computations. Johns Hopkins University Press, Baltimore (1996)

    MATH  Google Scholar 

  24. Guo, Y., Hastie, T., Tibshirani, R.: Regularized linear discriminant analysis and its application in microarrays. Biostatistics 8(1), 86–100 (2007)

    Article  MATH  Google Scholar 

  25. Hastie, T., Buja, A., Tibshirani, R.: Penalized discriminant analysis. Ann. Stat. 73–102 (1995)

  26. Hastie, T., Tibshirani, R., Friedman, J.J.H.: The Elements of Statistical Learning. Springer, New York (2013)

    MATH  Google Scholar 

  27. Hong, M., Luo, Z.-Q.: On the linear convergence of the alternating direction method of multipliers. arXiv:1208.3922, preprint (2012)

  28. Hong, M., Luo, Z.-Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. arXiv:1410.1390, preprint (2012)

  29. Johnstone, I.M., Lu, A.Y.: Sparse principal components analysis. Unpublished manuscript (2004)

  30. Jolliffe, I.T., Trendafilov, N.T., Uddin, M.: A modified principal component technique based on the lasso. J. Comput. Graph. Stat. 12(3), 531–547 (2003)

    Article  MathSciNet  Google Scholar 

  31. Journée, M., Nesterov, Y., Richtárik, P., Sepulchre, R.: Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 11, 517–553 (2010)

    MathSciNet  MATH  Google Scholar 

  32. Keogh, E., Xi, X., Wei, L., Ratanamahatana, C.A.: The UCR time series classification/clustering homepage (2006)

  33. Krzanowski, W., Jonathan, P., McCarthy, W., Thomas, M.: Discriminant analysis with singular covariance matrices: methods and applications to spectroscopic data. Appl. Stat. 101–115 (1995)

  34. Kutyniok, G.: Compressed sensing: Theory and applications. CoRR, arXiv:1203.3815 (2012)

  35. Luss, R., Teboulle, M.: Convex approximations to sparse PCA via lagrangian duality. Oper. Res. Lett. 39(1), 57–61 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  36. Ma, S.: Alternating direction method of multipliers for sparse principal component analysis. J. Oper. Res. Soc. China 1(2), 253–274 (2013)

    Article  MATH  Google Scholar 

  37. Papailiopoulos, D.S., Dimakis, A.G., Korokythakis, S.: Sparse PCA through low-rank approximations. arXiv:1303.0551, preprint (2013)

  38. Paul, D.: Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica 17(4), 1617 (2007)

    MathSciNet  MATH  Google Scholar 

  39. Richtárik, P., Takáč, M., Ahipaşaoğlu, S.D.: Alternating maximization: Unifying framework for 8 sparse PCA formulations and efficient parallel codes. arXiv:1212.4137, preprint (2012)

  40. Shao, J., Wang, Y., Deng, X., Wang, S.: Sparse linear discriminant analysis by thresholding for high dimensional data. Ann. Stat. 39(2), 1241–1265 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  41. Tapp, H.S., Defernez, M., Kemsley, E.K.: Ftir spectroscopy and multivariate analysis can distinguish the geographic origin of extra virgin olive oils. J. Agric. Food Chem. 51(21), 6110–6115 (2003)

    Article  Google Scholar 

  42. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  43. Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Class prediction by nearest shrunken centroids, with applications to dna microarrays. Stat. Sci. 104–117 (2003)

  44. Welling, M.: Fisher linear discriminant analysis. Department of Computer Science, University of Toronto (2005)

  45. Witten, D.: penalizedLDA: Penalized classification using Fisher’s linear discriminant, (2011). R package version 1.0

  46. Witten, D.M., Tibshirani, R.: Penalized classification using fisher’s linear discriminant. J. R. Stat. Soc. Ser. B 73(5), 753–772 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  47. Witten, D.M., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3), 515–534 (2009)

    Article  Google Scholar 

  48. Xu, P., Brock, G.N., Parrish, R.S.: Modified linear discriminant analysis approaches for classification of high-dimensional microarray data. Comput. Stat. Data Anal. 53(5), 1674–1687 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  49. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 68(1), 49–67 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  50. Yuan, X.-T., Zhang, T.: Truncated power method for sparse eigenvalue problems. J. Mach. Learn. Res. 14(1), 899–925 (2013)

    MathSciNet  MATH  Google Scholar 

  51. Zhang, Y., d’Aspremont, A., Ghaoui, L.E.I.: Sparse PCA: Convex relaxations, algorithms and applications. Handbook on Semidefinite, Conic and Polynomial Optimization, pp. 915–940. Springer, Berlin (2012)

    Chapter  Google Scholar 

  52. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67(2), 301–320 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  53. Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The work presented in this paper was partially carried out while B.P.W Ames was a postdoctoral fellow at the Institute for Mathematics and its Applications during the IMA’s annual program on Mathematics of Information (supported by National Science Foundation (NSF) award DMS-0931945), and while B.P.W. Ames was a Von Karman instructor at the California Institute of Technology supported by Joel Tropp under Office of Naval Research (ONR) award N00014-11-1002. This research was also supported by University of Alabama Research Grant RG14678. We are grateful to Fadil Santosa, Krystal Taylor, Zhi-Quan Luo, Meisam Razaviyayn, and Line Clemmensen for their insights and helpful suggestions. Finally, we are grateful for the contributions of two anonymous reviewers whose suggestions have greatly improved this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brendan P. W. Ames.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ames, B.P.W., Hong, M. Alternating direction method of multipliers for penalized zero-variance discriminant analysis. Comput Optim Appl 64, 725–754 (2016). https://doi.org/10.1007/s10589-016-9828-y

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-016-9828-y

Keywords

Navigation