Skip to main content

Advertisement

Log in

On the Robust PCA and Weiszfeld’s Algorithm

  • Published:
Applied Mathematics & Optimization Submit manuscript

Abstract

The principal component analysis (PCA) is a powerful standard tool for reducing the dimensionality of data. Unfortunately, it is sensitive to outliers so that various robust PCA variants were proposed in the literature. This paper addresses the robust PCA by successively determining the directions of lines having minimal Euclidean distances from the data points. The corresponding energy functional is non-differentiable at a finite number of directions which we call anchor directions. We derive a Weiszfeld-like algorithm for minimizing the energy functional which has several advantages over existing algorithms. Special attention is paid to carefully handling the anchor directions, where the relation between local minima and one-sided derivatives of Lipschitz continuous functions on submanifolds of \(\mathbb {R}^d\) is taken into account. Using ideas for stabilizing the classical Weiszfeld algorithm at anchor points and the Kurdyka–Łojasiewicz property of the energy functional, we prove global convergence of the whole sequence of iterates generated by the algorithm to a critical point of the energy functional. Numerical examples demonstrate the very good performance of our algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)

    MathSciNet  MATH  Google Scholar 

  2. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137(1–2, Ser. A), 91–129 (2013)

    MathSciNet  MATH  Google Scholar 

  3. Beck, A., Sabach, S.: Weiszfeld’s method: old and new results. J. Optim. Theory Appl. 164(1), 1–40 (2015)

    MathSciNet  MATH  Google Scholar 

  4. Ben-Tal, A., Zowe, J.: Directional derivatives in nonsmooth optimization. J. Optim. Theory Appl. 47(4), 483–490 (1985)

    MathSciNet  MATH  Google Scholar 

  5. Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), Art. 11 (2011)

    MathSciNet  MATH  Google Scholar 

  6. Chouzenoux, E., Idier, J., Moussaoui, S.: A majorize-minimize strategy for subspace optimization applied to image restoration. IEEE Trans. Image Process. 20(6), 1517–1528 (2011)

    MathSciNet  MATH  Google Scholar 

  7. Ding, C., Zhou, D., He, X., Zha, H.: ${R}_1$-PCA: Rotational invariant ${L}_1$-norm principal component analysis for robust subspace factorization. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 281–288. ACM (2006)

  8. Epstein, R., Hallinan, P., Yuille, A.: 5$\pm $2 eigenimages suffice: an empirical investigation of low-dimensional lighting models. In: IEEE Workshop on Physics-Based Vision, pp. 108–116 (1995)

  9. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. In: Readings in Computer Vision, pp. 726–740. Elsevier (1987)

  10. Fletcher, P.T., Joshi, S.: Principal geodesic analysis on symmetric spaces: statistics of diffusion tensors. In: Computer Vision and Mathematical Methods in Medical and Biomedical Image Analysis, pp. 87–98. Springer (2004)

  11. Fletcher, P.T., Lu, C., Pizer, S.M., Joshi, S.: Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE Trans. Med. Imaging 23(8), 995–1005 (2004)

    Google Scholar 

  12. Hager, W.W., Phan, D.T., Zhu, J.: Projection algorithms for nonconvex minimization with application to sparse principal component analysis. J. Glob. Optim. 65(4), 657–676 (2016)

    MathSciNet  MATH  Google Scholar 

  13. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, New York (2001)

    MATH  Google Scholar 

  14. Hauberg, S., Feragen, A., Black, M.J.: Grassmann averages for scalable robust PCA. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3810–3817 (2014)

  15. Huber, P.J.: Robust Statistics. Wiley Series in Probability and Mathematical Statistics. Wiley, New York (1981)

    MATH  Google Scholar 

  16. Huber, P.J., Ronchetti, E.M.: Robust Statistics, 2nd edn. Wiley Series in Probability and Statistics. Wiley, New York (2009)

    MATH  Google Scholar 

  17. Huckemann, S., Ziezold, H.: Principal component analysis for Riemannian manifolds with an application to triangular shape spaces. Adv. Appl. Probab. 38(2), 299–319 (2006)

    MathSciNet  MATH  Google Scholar 

  18. Katz, I.N.: Local convergence in Fermat’s problem. Math. Program. 6(1), 89–104 (1974)

    MathSciNet  MATH  Google Scholar 

  19. Ke, Q., Kanade, T.: Robust $\ell _1$ norm factorization in the presence of outliers and missing data by alternative convex programming. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005 (CVPR 2005), vol. 1, pp. 739–746. IEEE (2005)

  20. Keeling, S.L., Kunisch, K.: Robust $\ell _1$ approaches to computing the geometric median and principal and independent components. J. Math. Imaging Vis. 56(1), 99–124 (2016)

    MATH  Google Scholar 

  21. Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: A general framework for increasing the robustness of PCA-based correlation clustering algorithms. In: Scientific and Statistical Database Management. Lecture Notes in Computer Science. 5069, pp. 418–435 (2008)

  22. Kuhn, H.W.: A note on Fermat’s problem. Math. Program. 4, 98–107 (1973)

    MathSciNet  MATH  Google Scholar 

  23. Kuhn, H.W., Kuenne, R.E.: An efficient algorithm for the numerical solution of the generalized Weber problem in spatial economics. J. Reg. Sci. 4(2), 21–33 (1962)

    Google Scholar 

  24. Kurdyka, K.: On gradients of functions definable in o-minimal structures. Annales de l’Institut Fourier, vol. 48, pp. 769–783 (1998)

  25. Kwak, N.: Principal component analysis based on $\ell _1$-norm maximization. IEEE Trans. Pattern Anal. Mach. Intell. 30(9), 1672–1680 (2008)

    Google Scholar 

  26. Lee, K.-C., Ho, J., Kriegman, D.J.: Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans. Pattern Anal. Mach. Intell. 5, 684–698 (2005)

    Google Scholar 

  27. Lerman, G., Maunu, T.: Fast, robust and non-convex subspace recovery. Inf. Inference 7(2), 277–336 (2017)

    MathSciNet  MATH  Google Scholar 

  28. Lerman, G., Maunu, T.: An overview of robust subspace recovery. Proc. IEEE 106(8), 1380–1410 (2018)

    MATH  Google Scholar 

  29. Lerman, G., Zhang, T.: Robust recovery of multiple subspaces by geometric $l_p$ minimization. Ann. Stat. 39(5), 2686–2715 (2011)

    MathSciNet  MATH  Google Scholar 

  30. Lerman, G., Zhang, T.: $l_p$-recovery of the most significant subspace among multiple subspaces with outliers. Constr. Approx. 40(3), 329–385 (2014)

    MathSciNet  MATH  Google Scholar 

  31. Lerman, G., McCoy, M., Tropp, J.A., Zhang, T.: Robust computation of linear models by convex relaxation. Found. Comput. Math. 15(1), 363–410 (2015)

    MathSciNet  MATH  Google Scholar 

  32. Leroy, A.M., Rousseeuw, P.J.: Robust Regression and Outlier Detection. Wiley Series in Probability and Mathematical Statistics. Wiley, Chichester (1987)

    MATH  Google Scholar 

  33. Li, G., Chen, Z.: Projected-pursuit approach to robust dispersion matrices and principal components: Primary theory and Monte-Carlo. J. Am. Stat. Soc. 80, 759–766 (1985)

    MATH  Google Scholar 

  34. Li, L., Huang, W., Gu, I.Y.-H., Tian, Q.: Statistical modeling of complex backgrounds for foreground object detection. IEEE Trans. Image Process. 13(11), 1459–1472 (2004)

    Google Scholar 

  35. Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. In: Les Équations aux Dérivées Partielles (Paris, 1962), pp. 87–89. Éditions du Centre National de la Recherche Scientifique, Paris (1963)

  36. Luss, R., Teboulle, M.: Conditional gradient algorithms for rank-one matrix approximations with a sparsity constraint. SIAM Rev. 55(1), 65–98 (2013)

    MathSciNet  MATH  Google Scholar 

  37. Markopoulos, P.P., Karystinos, G.N., Pados, D.A.: Optimal algorithms for $l_1$-subspace signal processing. IEEE Trans. Signal Process. 62(19), 5046–5058 (2014)

    MathSciNet  MATH  Google Scholar 

  38. Markopoulos, P.P., Kundu, S., Chamadia, S., Pados, D.A.: Efficient l1-norm principal-component analysis via bit flipping. IEEE Trans. Signal Process. 65(16), 4252–4264 (2017)

    MathSciNet  MATH  Google Scholar 

  39. Maronna, R.A., Martin, R.D., Yohai, V.J.: Robust Statistics: Theory and Methods. Wiley Series in Probability and Statistics. Wiley, Chichester (2006)

    MATH  Google Scholar 

  40. Massart, D.L., Kaufman, L., Rousseeuw, P.J., Leroy, A.: Least median of squares: a robust method for outlier and model error detection in regression and calibration. Anal. Chim. Acta 187, 171–179 (1986)

    Google Scholar 

  41. Maunu, T., Zhang, T., Lerman, G.: A well-tempered landscape for non-convex robust subspace recovery (2017). arXiv:1706.03896

  42. McCoy, M., Tropp, J.A.: Two proposals for robust PCA using semidefinite programming. Electron. J. Stat. 5, 1123–1160 (2011)

    MathSciNet  MATH  Google Scholar 

  43. Mordukhovich, B., Nam, N.M., Yen, N.D.: Fréchet subdifferential calculus and optimality conditions in nondifferentiable programming. Optimization 55(5–6), 685–708 (2006)

    MathSciNet  MATH  Google Scholar 

  44. Neumayer, S., Nimmer, M., Steidl, G., Stephani, H.: On a projected Weiszfeld algorithm. In: Lauze, F., Dong, Y., Dahl, A.B. (eds.) Scale Space and Variational Methods in Computer Vision. Lecture Notes in Computer Science, vol. 10302, pp. 486–497. Springer, New York (2017)

  45. Neumayer, S., Nimmer, M., Setzer, S., Steidl, G.: On the rotational invariant ${L}_1$-norm PCA (2019). http://arxiv.org/pdf/1902.03840v1

  46. Nie, F., Huang, H., Ding, C., Luo, D., Wang, H.: Robust principal component analysis with non-greedy $\ell _1$-norm maximization. In: IJCAI Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, pp. 1433 (2011)

  47. Ostresh Jr., L.M.: On the convergence of a class of iterative methods for solving the Weber location problem. Oper. Res. 26(4), 597–609 (1978)

    MathSciNet  MATH  Google Scholar 

  48. Pearson, K.: On lines and planes of closest fit to systems of points in space. Philos. Mag. 2(11), 559–572 (1901)

    MATH  Google Scholar 

  49. Pennec, X.: Barycentric subspace analysis on manifolds (2016). arXiv:1607.02833

  50. Pennec, X.: Sample-limited $l_p$ barycentric subspace analysis on constant curvature spaces. In: International Conference on Geometric Science of Information, pp. 20–28. Springer (2017)

  51. Podosinnikova, A., Setzer, S., Hein, M.: Robust PCA: Optimization of the robust reconstruction error over the Stiefel manifold. In: German Conference on Pattern Recognition, pp. 121–131. Springer (2014)

  52. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

    MATH  Google Scholar 

  53. Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. Wiley, New York (1987)

    MATH  Google Scholar 

  54. Schneck, G.: Robust principal component analysis. Bachelor Thesis, TU Kaiserslautern (2018)

  55. Setzer, S., Steidl, G., Teuber, T.: On vector and matrix median computation. J. Comput. Appl. Math. 236(8), 2200–2222 (2012)

    MathSciNet  MATH  Google Scholar 

  56. Sommer, S., Lauze, F., Nielsen, M.: Optimization over geodesics for exact principal component analysis. Adv. Comput. Math. 40, 283–313 (2014)

    MathSciNet  MATH  Google Scholar 

  57. Vardi, Y., Zhang, C.H.: A modified Weiszfeld algorithm for the Fermat-Weber location. Math. Program. 90, 559–566 (2001)

    MathSciNet  MATH  Google Scholar 

  58. Vazsonyi, A.: Pure mathematics and Weiszfeld algorithm. Decis. Line 33(3), 12–13 (2002)

    Google Scholar 

  59. Weiszfeld, E.: Sur le point pour lequel les sommes des distances de $n$ points donnés et minimum. Tôhoku Math. J. 43, 355–386 (1937)

    MATH  Google Scholar 

  60. Xu, H., Caramanis, C., Sanghavi, S.: Robust PCA via outlier pursuit. IEEE Trans. Inf. Theory 58(3), 3047–3064 (2012)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Funding by the German Research Foundation (DFG) within the Research Training Group 1932, project area P3, is gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Max Nimmer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: One-Sided Derivatives and Minimizers on Embedded Manifolds

Appendix A: One-Sided Derivatives and Minimizers on Embedded Manifolds

The one-sided directional derivative of a function \(f:\mathbb {R}^d \rightarrow \mathbb {R}\), \(d \in {\mathbb {N}}\), at a point \(x \in \mathbb {R}^d\) in direction \(h \in \mathbb {R}^d\) is defined by

$$\begin{aligned} Df(x;h) :=\lim _{\alpha \downarrow 0}\frac{f(x+\alpha h) - f(x)}{\alpha }. \end{aligned}$$

Restricting f to a submanifold \({\mathcal {M}} \subseteq \mathbb {R}^d\), we can restrict our considerations to \(h \in T_x {\mathcal {M}}\). Recall that \({\mathcal {M}} \subseteq \mathbb {R}^d\) is an m-dimensional submanifold of \(\mathbb {R}^d\) if for each point \(x \in {\mathcal {M}}\) there exists an open neighborhood \(U \subseteq \mathbb {R}^d\) as well as an open set \(\Omega \subseteq \mathbb {R}^m\) and a so-called parametrization \(\varphi \in C^1(\Omega , \mathbb {R}^d)\) of \({\mathcal {M}}\) with the properties

  1. (i)

    \(\varphi (\Omega ) = {\mathcal {M}} \cap U\),

  2. (ii)

    \(\varphi ^{-1}:{\mathcal {M}} \cap U \rightarrow \Omega \) is surjective and continuous, and

  3. (iii)

    \(D \varphi (x)\) has full rank m for all \(x \in \Omega \).

To establish the relation between one-sided directional derivatives and local minima of functions on manifolds we need the following lemma. A proof can be found in [45, Lemma B.1].

Lemma A.1

Let \({\mathcal {M}}\subset \mathbb {R}^{d}\) be an m-dimensional submanifold of \(\mathbb {R}^{d}\). Then the tangent space \(T_{x}{\mathcal {M}}\) and the tangent cone

$$\begin{aligned}&{\mathcal {T}}_x{\mathcal {M}} :=\left\{ u\in \mathbb {R}^{d}:\ \exists \text { sequence } (x_k)_{k\in \mathbb {N}}\subset {\mathcal {M}}\setminus \{x\} \text { with } x_k\rightarrow x \text { s.t. } \frac{x_k-x}{\Vert x_k-x\Vert }\right. \\&\qquad \qquad \qquad \quad \left. \rightarrow \frac{u}{\Vert u\Vert }\right\} \cup \{0\} \end{aligned}$$

coincide.

The following theorem gives a general necessary and sufficient condition for local minimizers of Lipschitz continuous functions on embedded manifolds using the notation of one-sided derivatives. For the Euclidean setting \({\mathcal {M}} = \mathbb {R}^d\), the first relation of the proposition is trivially fulfilled for any function \(f:\mathbb {R}^d \rightarrow \mathbb {R}\), while a proof of the sufficient minimality condition in the second part was given in [4]. Moreover, the authors of [4] gave an example that Lipschitz continuity in the second part cannot be weakened to just continuity.

Theorem A.2

Let \({\mathcal {M}} \subset \mathbb {R}^d\) be an m-dimensional submanifold of \(\mathbb {R}^d\) and \(f:\mathbb {R}^d \rightarrow \mathbb {R}\) a locally Lipschitz continuous function. Then the following holds true:

  1. 1.

    If \({\hat{x}} \in {\mathcal {M}}\) is a local minimizer of f on \({\mathcal {M}}\), then \(Df({\hat{x}};h) \ge 0\) for all \(h \in T_{{\hat{x}}}{\mathcal {M}}\).

  2. 2.

    If \(Df({\hat{x}};h) >0\) for all \(h\in T_{{\hat{x}}} {\mathcal {M}} \setminus \{0\}\), then \({\hat{x}}\) is a strict local minimizer of f on \({\mathcal {M}}\).

A proof can be found in [45, Thm. 6.1] along with an example which demonstrates the necessity of the Lipschitz continuity of f in the manifold setting in the first part of the theorem. Furthermore, note that \(Df({\hat{x}};h) \ge 0\) for all \(h\in T_{{\hat{x}}} {\mathcal {M}} \setminus \{0\}\) does not imply that \({\hat{x}}\) is a local minimizer of f on \({\mathcal {M}}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Neumayer, S., Nimmer, M., Setzer, S. et al. On the Robust PCA and Weiszfeld’s Algorithm. Appl Math Optim 82, 1017–1048 (2020). https://doi.org/10.1007/s00245-019-09566-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00245-019-09566-1

Keywords

Navigation