Skip to main content

2016 | OriginalPaper | Buchkapitel

6. Statistical Methods

verfasst von : René Vidal, Yi Ma, S. Shankar Sastry

Erschienen in: Generalized Principal Component Analysis

Verlag: Springer New York

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The algebraic-geometric approach to subspace clustering described in the previous chapter provides a fairly complete characterization of the algebra and geometry of multiple subspaces, which leads to simple and elegant subspace clustering algorithms. However, while these methods can handle some noise in the data, they do not make explicit assumptions about the distribution of the noise or the data inside the subspaces. Therefore, the estimated subspaces need not be optimal from a statistical perspective, e.g., in a maximum likelihood (ML) sense.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
The reader is encouraged to implement it and test it with simulated data.
 
2
If the dimension of the ambient space D is high, the degrees of freedom in the model are very large and the extrema of the likelihood function may not be so salient when the number of samples is not significantly larger than the dimension. Therefore, the optimization algorithm may converge very slowly to an extremum, since so many parameters are not properly regularized. There has been a rich literature in statistics about how to properly regularize the estimates of high-dimensional Gaussians, especially when the number of samples is limited.
 
3
While the quantity is not an actual distance, notice that it is nonnegative because it is minus the logarithm of a probability.
 
4
Notice that this means that while the subspaces are allowed to have different orientations (different U i ), the distribution of the data points inside the subspaces is the same.
 
5
For simplicity, in the main text, we will derive and present the main results with the zero-mean assumption. However, all the formulas, results, and algorithms can be readily extended to the nonzero-mean case, as shown in Appendix 6.A.
 
6
Strictly speaking, the rate-distortion function for the Gaussian distribution \(\mathcal{N}(0,\Sigma )\) is \(R = \frac{1} {2}\log _{2}\det \big(\frac{D} {\varepsilon ^{2}} \Sigma \big)\) when \(\frac{\varepsilon ^{2}} {D}\) is smaller than the smallest eigenvalue of \(\Sigma \). Thus the above formula is a good approximation when the distortion \(\varepsilon\) is relatively small. However, when \(\frac{\varepsilon ^{2}} {D}\) is larger than some eigenvalues of \(\Sigma \), the rate-distortion function becomes more complicated (Cover and Thomas 1991). Nevertheless, the approximate formula \(R = \frac{1} {2}\log _{2}\det (I + \frac{D} {\varepsilon ^{2}} \Sigma )\) can be viewed as the rate-distortion function of the “regularized” distribution that works for all ranges of \(\varepsilon\).
 
7
This can be viewed as the cost of coding the D principal axes of the data covariance \(\frac{1} {N}XX^{\top }\).
 
8
Here we assume that the ordering of the samples is random and entropy coding is the best we can do to code the membership. However, if the samples are ordered such that nearby samples are more likely to belong to the same cluster (e.g., in segmenting pixels of an image), the second term can and should be replaced by a tighter estimate.
 
9
This particular penalty term is justified by noticing that \(ND\cdot \log \varepsilon\) is (within an additive constant) the number of bits required to code the residual \(\boldsymbol{x} -\hat{\boldsymbol{ x}}\) up to (very small) distortion \(\delta \ll \varepsilon\).
 
10
However, it may be possible to improve the convergence using more complicated split-and-merge strategies (Ueda et al. 2000). In addition, due to Theorem 6.5 of Section 6.3.4, the globally (asymptotically) optimal clustering can also be computed via concave optimization (Benson 1994), at the cost of potentially exponential computation time.
 
11
Compared to the MDL criterion (2.​85), if the term \(N \cdot R(X)\) corresponds to the coding length for the data, the term \(D \cdot R(X)\) then corresponds to the coding length for the model parameter \(\theta\).
 
12
For a more thorough discussion on why some transformations (such as wavelets) are useful for data compression, the reader may refer to (Donoho et al. 1998).
 
13
Especially when the data \(\mathcal{X}\) indeed consist of a mixture of subsets and each cluster is a typical set of samples from an (almost degenerate) Gaussian distribution.
 
14
Notice that the data are not drawn from a mixture of Gaussians. As we have mentioned before, although we have derived the coding length function largely based on the mixture of Gaussians model, the coding length function actually gives a good estimate of the coding length for any subspace-like data; see Appendix 6.A for more details. Hence this experiment tests how well the coding length works when we deviate from the basic mixture of Gaussians model.
 
15
Factor analysis is a probabilistic model for a subspace that generalizes the PPCA model discussed in Section 2.​2: in FA, the noise covariance is diagonal, while in PPCA, it is a scaled identity matrix.
 
16
That is, each sample is assigned to its cluster according to the maximum likelihood rule among all subspaces estimated by the EM algorithm.
 
17
The dimension of each cluster \(\mathcal{X}_{i}\) is identified using principal component analysis (PCA) by thresholding the singular values of the data matrix X i with respect to \(\varepsilon\).
 
21
Notice that \(\varepsilon ''_{j}\) normally does not increase with the number of vectors N, because \(\lambda _{j}\) increases proportionally to N.
 
Literatur
Zurück zum Zitat Agarwal, P., & Mustafa, N. (2004). k-means projective clustering. In ACM Symposium on Principles of Database Systems. Agarwal, P., & Mustafa, N. (2004). k-means projective clustering. In ACM Symposium on Principles of Database Systems.
Zurück zum Zitat Aldroubi, A., Cabrelli, C., & Molter, U. (2008). Optimal non-linear models for sparsity and sampling. Journal of Fourier Analysis and Applications, 14(5–6), 793–812.MathSciNetCrossRefMATH Aldroubi, A., Cabrelli, C., & Molter, U. (2008). Optimal non-linear models for sparsity and sampling. Journal of Fourier Analysis and Applications, 14(5–6), 793–812.MathSciNetCrossRefMATH
Zurück zum Zitat Aldroubi, A., & Zaringhalam, K. (2009). Nonlinear least squares in ℝ N . Acta Applicandae Mathematicae, 107(1–3), 325–337.MathSciNetCrossRefMATH Aldroubi, A., & Zaringhalam, K. (2009). Nonlinear least squares in N . Acta Applicandae Mathematicae, 107(1–3), 325–337.MathSciNetCrossRefMATH
Zurück zum Zitat Benson, H. (1994). Concave minimization: Theory, applications and algorithms. In R. Horst & P. M. Pardalos (Eds.), Handbook of global optimization (vol. 2, pp. 43-148), Springer Verlag. Benson, H. (1994). Concave minimization: Theory, applications and algorithms. In R. Horst & P. M. Pardalos (Eds.), Handbook of global optimization (vol. 2, pp. 43-148), Springer Verlag.
Zurück zum Zitat Boyd, S., & Vandenberghe, L. (2004). Convex Optimization. Cambridge: Cambridge University Press.CrossRefMATH Boyd, S., & Vandenberghe, L. (2004). Convex Optimization. Cambridge: Cambridge University Press.CrossRefMATH
Zurück zum Zitat Bradley, P. S., & Mangasarian, O. L. (2000). k-plane clustering. Journal of Global Optimization, 16(1), 23–32. Bradley, P. S., & Mangasarian, O. L. (2000). k-plane clustering. Journal of Global Optimization, 16(1), 23–32.
Zurück zum Zitat Cilibrasi, R., & Vitányi, P. M. (2005). Clustering by compression. IEEE Transactions on Information Theory, 51(4), 1523–1545. Cilibrasi, R., & Vitányi, P. M. (2005). Clustering by compression. IEEE Transactions on Information Theory, 51(4), 1523–1545.
Zurück zum Zitat Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39(1), 1–38.MathSciNetMATH Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39(1), 1–38.MathSciNetMATH
Zurück zum Zitat Donoho, D. L., Vetterli, M., DeVore, R., & Daubechies, I. (1998). Data compression and harmonic analysis. IEEE Transactions on Information Theory, 44(6), 2435–2476. Donoho, D. L., Vetterli, M., DeVore, R., & Daubechies, I. (1998). Data compression and harmonic analysis. IEEE Transactions on Information Theory, 44(6), 2435–2476.
Zurück zum Zitat Fazel, M., Hindi, H., & Boyd, S. (2003). Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. In Proceedings of the American Control Conference (pp. 2156–2162). Fazel, M., Hindi, H., & Boyd, S. (2003). Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. In Proceedings of the American Control Conference (pp. 2156–2162).
Zurück zum Zitat Figueiredo, M. A. T., & Jain, A. K. (2002). Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3), 381–396. Figueiredo, M. A. T., & Jain, A. K. (2002). Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3), 381–396.
Zurück zum Zitat Fischler, M. A., & Bolles, R. C. (1981). RANSAC random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 26, 381–395. Fischler, M. A., & Bolles, R. C. (1981). RANSAC random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 26, 381–395.
Zurück zum Zitat Forgy, E. (1965). Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications (abstract). Biometrics, 21, 768–769. Forgy, E. (1965). Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications (abstract). Biometrics, 21, 768–769.
Zurück zum Zitat Frey, B., Colmenarez, A., & Huang, T. (1998). Mixtures of local linear subspaces for face recognition. In IEEE Conference on Computer Vision and Pattern Recognition. Frey, B., Colmenarez, A., & Huang, T. (1998). Mixtures of local linear subspaces for face recognition. In IEEE Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Ghahramani, Z., & Beal, M. (2000). Variational inference for Bayesian mixtures of factor analysers. Advances in Neural Information Processing Systems, 12, 449–455. Ghahramani, Z., & Beal, M. (2000). Variational inference for Bayesian mixtures of factor analysers. Advances in Neural Information Processing Systems, 12, 449–455.
Zurück zum Zitat Ghahramani, Z., & Hinton, G. (1996). The EM algorithm for mixtures of factor analyzers. Technical Report CRG-TR-96-1, University of Toronto, Canada. Ghahramani, Z., & Hinton, G. (1996). The EM algorithm for mixtures of factor analyzers. Technical Report CRG-TR-96-1, University of Toronto, Canada.
Zurück zum Zitat Hamkins, J., & Zeger, K. (2002). Gaussian source coding with spherical codes. IEEE Transactions on Information Theory, 48(11), 2980–2989.MathSciNetCrossRefMATH Hamkins, J., & Zeger, K. (2002). Gaussian source coding with spherical codes. IEEE Transactions on Information Theory, 48(11), 2980–2989.MathSciNetCrossRefMATH
Zurück zum Zitat Hansen, M., & Yu, B. (2001). Model selection and the principle of minimum description length. Journal of American Statistical Association, 96, 746–774.MathSciNetCrossRefMATH Hansen, M., & Yu, B. (2001). Model selection and the principle of minimum description length. Journal of American Statistical Association, 96, 746–774.MathSciNetCrossRefMATH
Zurück zum Zitat Hastie, T., Tibshirani, R., & Friedman, J. (2001). The Elements of Statistical Learning. New York: Springer.CrossRefMATH Hastie, T., Tibshirani, R., & Friedman, J. (2001). The Elements of Statistical Learning. New York: Springer.CrossRefMATH
Zurück zum Zitat Ho, J., Yang, M., Lim, J., Lee, K., & Kriegman, D. (2003). Clustering appearances of objects under varying illumination conditions. In Proceedings of International Conference on Computer Vision and Pattern Recognition. Ho, J., Yang, M., Lim, J., Lee, K., & Kriegman, D. (2003). Clustering appearances of objects under varying illumination conditions. In Proceedings of International Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Horn, R. A., & Johnson, C. R. (1985). Matrix Analysis. Cambridge: Cambridge University Press. Horn, R. A., & Johnson, C. R. (1985). Matrix Analysis. Cambridge: Cambridge University Press.
Zurück zum Zitat Huang, K., Ma, Y., & Vidal, R. (2004). Minimum effective dimension for mixtures of subspaces: A robust GPCA algorithm and its applications. In IEEE Conference on Computer Vision and Pattern Recognition (Vol. II, pp. 631–638). Huang, K., Ma, Y., & Vidal, R. (2004). Minimum effective dimension for mixtures of subspaces: A robust GPCA algorithm and its applications. In IEEE Conference on Computer Vision and Pattern Recognition (Vol. II, pp. 631–638).
Zurück zum Zitat Jancey, R. (1966). Multidimensional group analysis. Australian Journal of Botany, 14, 127–130.CrossRef Jancey, R. (1966). Multidimensional group analysis. Australian Journal of Botany, 14, 127–130.CrossRef
Zurück zum Zitat Kamvar, S., Klein, D., & Manning, C. (2002). Interpreting and extending classical agglomerative clustering methods using a model-based approach. Technical Report 2002-11, Stanford University Department of Computer Science. Kamvar, S., Klein, D., & Manning, C. (2002). Interpreting and extending classical agglomerative clustering methods using a model-based approach. Technical Report 2002-11, Stanford University Department of Computer Science.
Zurück zum Zitat Lloyd, S. (1957). Least squares quantization in PCM. Technical Report. Bell Laboratories. Published in 1982 in IEEE Transactions on Information Theory, 28, 128–137. Lloyd, S. (1957). Least squares quantization in PCM. Technical Report. Bell Laboratories. Published in 1982 in IEEE Transactions on Information Theory, 28, 128–137.
Zurück zum Zitat Ma, Y., Derksen, H., Hong, W., & Wright, J. (2007). Segmentation of multivariate mixed data via lossy coding and compression. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(9), 1546–1562.CrossRef Ma, Y., Derksen, H., Hong, W., & Wright, J. (2007). Segmentation of multivariate mixed data via lossy coding and compression. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(9), 1546–1562.CrossRef
Zurück zum Zitat MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (pp. 281–297). MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (pp. 281–297).
Zurück zum Zitat Madiman, M., Harrison, M., & Kontoyiannis, I. (2004). Minimum description length vs. maximum likelihood in lossy data compression. In Proceedings of the 2004 IEEE International Symposium on Information Theory. Madiman, M., Harrison, M., & Kontoyiannis, I. (2004). Minimum description length vs. maximum likelihood in lossy data compression. In Proceedings of the 2004 IEEE International Symposium on Information Theory.
Zurück zum Zitat McLanchlan, G. J., & Krishnan, T. (1997). The EM Algorithms and Extentions. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc. McLanchlan, G. J., & Krishnan, T. (1997). The EM Algorithms and Extentions. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc.
Zurück zum Zitat Neal, R., & Hinton, G. (1998). A view of the EM algorithm that justifies incremental, sparse, and other variants. In M. Jordan (Ed.), Learning in graphical models (pp. 355–368). Boston: Kluwer Academic.CrossRef Neal, R., & Hinton, G. (1998). A view of the EM algorithm that justifies incremental, sparse, and other variants. In M. Jordan (Ed.), Learning in graphical models (pp. 355–368). Boston: Kluwer Academic.CrossRef
Zurück zum Zitat Recht, B., Fazel, M., & Parrilo, P. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3), 471–501.MathSciNetCrossRefMATH Recht, B., Fazel, M., & Parrilo, P. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3), 471–501.MathSciNetCrossRefMATH
Zurück zum Zitat Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14, 465–471.CrossRefMATH Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14, 465–471.CrossRefMATH
Zurück zum Zitat Rose, K. (1998). Deterministic annealing for clustering, compression, classification, regression, and related optimization problems. Proceedings of the IEEE, 86(11), 2210–2239.CrossRef Rose, K. (1998). Deterministic annealing for clustering, compression, classification, regression, and related optimization problems. Proceedings of the IEEE, 86(11), 2210–2239.CrossRef
Zurück zum Zitat Selim, S., & Ismail, M. A. (1984). K-means-type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Transaction on Pattern Analysis and Machine Intelligence, 6(1), 81–87. Selim, S., & Ismail, M. A. (1984). K-means-type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Transaction on Pattern Analysis and Machine Intelligence, 6(1), 81–87.
Zurück zum Zitat Shi, J., & Malik, J. (1998). Motion segmentation and tracking using normalized cuts. In IEEE International Conference on Computer Vision (pp. 1154–1160). Shi, J., & Malik, J. (1998). Motion segmentation and tracking using normalized cuts. In IEEE International Conference on Computer Vision (pp. 1154–1160).
Zurück zum Zitat Tipping, M., & Bishop, C. (1999a). Mixtures of probabilistic principal component analyzers. Neural Computation, 11(2), 443–482.CrossRef Tipping, M., & Bishop, C. (1999a). Mixtures of probabilistic principal component analyzers. Neural Computation, 11(2), 443–482.CrossRef
Zurück zum Zitat Torr, P., Szeliski, R., & Anandan, P. (2001). An integrated Bayesian approach to layer extraction from image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3), 297–303.CrossRef Torr, P., Szeliski, R., & Anandan, P. (2001). An integrated Bayesian approach to layer extraction from image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3), 297–303.CrossRef
Zurück zum Zitat Tse, D., & Viswanath, P. (2005). Fundamentals of Wireless Communications. Cambridge: Cambridge University Press.CrossRefMATH Tse, D., & Viswanath, P. (2005). Fundamentals of Wireless Communications. Cambridge: Cambridge University Press.CrossRefMATH
Zurück zum Zitat Ueda, N., Nakan, R., & Ghahramani, Z. (2000). SMEM algorithm for mixture models. Neural Computation, 12, 2109–2128.CrossRef Ueda, N., Nakan, R., & Ghahramani, Z. (2000). SMEM algorithm for mixture models. Neural Computation, 12, 2109–2128.CrossRef
Zurück zum Zitat Ward, J. (1963). Hierarchical grouping to optimize and objective function. Journal of the American Statistical Association, 58, 236–244.MathSciNetCrossRef Ward, J. (1963). Hierarchical grouping to optimize and objective function. Journal of the American Statistical Association, 58, 236–244.MathSciNetCrossRef
Zurück zum Zitat Wright, J., Ma, Y., Tao, Y., Lin, Z., & Shum, H.-Y. (2009b). Classification via minimum incremental coding length (MICL). SIAM Journal on Imahing Sciences, 2(2), 367–395. Wright, J., Ma, Y., Tao, Y., Lin, Z., & Shum, H.-Y. (2009b). Classification via minimum incremental coding length (MICL). SIAM Journal on Imahing Sciences, 2(2), 367–395.
Zurück zum Zitat Yang, M. H., Ahuja, N., & Kriegman, D. (2000). Face detection using mixtures of linear subspaces. In IEEE International Conference on Automatic Face and Gesture Recognition. Yang, M. H., Ahuja, N., & Kriegman, D. (2000). Face detection using mixtures of linear subspaces. In IEEE International Conference on Automatic Face and Gesture Recognition.
Zurück zum Zitat Zhang, T., Szlam, A., & Lerman, G. (2009). Median k-flats for hybrid linear modeling with many outliers. In Workshop on Subspace Methods. Zhang, T., Szlam, A., & Lerman, G. (2009). Median k-flats for hybrid linear modeling with many outliers. In Workshop on Subspace Methods.
Zurück zum Zitat Zhang, T., Szlam, A., Wang, Y., & Lerman, G. (2010). Randomized hybrid linear modeling via local best-fit flats. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 1927–1934). Zhang, T., Szlam, A., Wang, Y., & Lerman, G. (2010). Randomized hybrid linear modeling via local best-fit flats. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 1927–1934).
Metadaten
Titel
Statistical Methods
verfasst von
René Vidal
Yi Ma
S. Shankar Sastry
Copyright-Jahr
2016
Verlag
Springer New York
DOI
https://doi.org/10.1007/978-0-387-87811-9_6

Neuer Inhalt