nach oben

Erschienen in:

2016 | OriginalPaper | Buchkapitel

6. Statistical Methods

verfasst von : René Vidal, Yi Ma, S. Shankar Sastry

Erschienen in: Generalized Principal Component Analysis

Verlag: Springer New York

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The algebraic-geometric approach to subspace clustering described in the previous chapter provides a fairly complete characterization of the algebra and geometry of multiple subspaces, which leads to simple and elegant subspace clustering algorithms. However, while these methods can handle some noise in the data, they do not make explicit assumptions about the distribution of the noise or the data inside the subspaces. Therefore, the estimated subspaces need not be optimal from a statistical perspective, e.g., in a maximum likelihood (ML) sense.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Algebraic-Geometric Methods

Nächstes Kapitel Spectral Methods

Nur mit Berechtigung zugänglich

The reader is encouraged to implement it and test it with simulated data.

If the dimension of the ambient space D is high, the degrees of freedom in the model are very large and the extrema of the likelihood function may not be so salient when the number of samples is not significantly larger than the dimension. Therefore, the optimization algorithm may converge very slowly to an extremum, since so many parameters are not properly regularized. There has been a rich literature in statistics about how to properly regularize the estimates of high-dimensional Gaussians, especially when the number of samples is limited.

While the quantity is not an actual distance, notice that it is nonnegative because it is minus the logarithm of a probability.

Notice that this means that while the subspaces are allowed to have different orientations (different U _i), the distribution of the data points inside the subspaces is the same.

For simplicity, in the main text, we will derive and present the main results with the zero-mean assumption. However, all the formulas, results, and algorithms can be readily extended to the nonzero-mean case, as shown in Appendix 6.A.

Strictly speaking, the rate-distortion function for the Gaussian distribution \(\mathcal{N}(0,\Sigma )\) is \(R = \frac{1} {2}\log _{2}\det \big(\frac{D} {\varepsilon ^{2}} \Sigma \big)\) when \(\frac{\varepsilon ^{2}} {D}\) is smaller than the smallest eigenvalue of \(\Sigma \). Thus the above formula is a good approximation when the distortion \(\varepsilon\) is relatively small. However, when \(\frac{\varepsilon ^{2}} {D}\) is larger than some eigenvalues of \(\Sigma \), the rate-distortion function becomes more complicated (Cover and Thomas 1991). Nevertheless, the approximate formula \(R = \frac{1} {2}\log _{2}\det (I + \frac{D} {\varepsilon ^{2}} \Sigma )\) can be viewed as the rate-distortion function of the “regularized” distribution that works for all ranges of \(\varepsilon\).

This can be viewed as the cost of coding the D principal axes of the data covariance \(\frac{1} {N}XX^{\top }\).

Here we assume that the ordering of the samples is random and entropy coding is the best we can do to code the membership. However, if the samples are ordered such that nearby samples are more likely to belong to the same cluster (e.g., in segmenting pixels of an image), the second term can and should be replaced by a tighter estimate.

This particular penalty term is justified by noticing that \(ND\cdot \log \varepsilon\) is (within an additive constant) the number of bits required to code the residual \(\boldsymbol{x} -\hat{\boldsymbol{ x}}\) up to (very small) distortion \(\delta \ll \varepsilon\).

However, it may be possible to improve the convergence using more complicated split-and-merge strategies (Ueda et al. 2000). In addition, due to Theorem 6.5 of Section 6.3.4, the globally (asymptotically) optimal clustering can also be computed via concave optimization (Benson 1994), at the cost of potentially exponential computation time.

Compared to the MDL criterion (2.85), if the term \(N \cdot R(X)\) corresponds to the coding length for the data, the term \(D \cdot R(X)\) then corresponds to the coding length for the model parameter \(\theta\).

For a more thorough discussion on why some transformations (such as wavelets) are useful for data compression, the reader may refer to (Donoho et al. 1998).

Especially when the data \(\mathcal{X}\) indeed consist of a mixture of subsets and each cluster is a typical set of samples from an (almost degenerate) Gaussian distribution.

Notice that the data are not drawn from a mixture of Gaussians. As we have mentioned before, although we have derived the coding length function largely based on the mixture of Gaussians model, the coding length function actually gives a good estimate of the coding length for any subspace-like data; see Appendix 6.A for more details. Hence this experiment tests how well the coding length works when we deviate from the basic mixture of Gaussians model.

Factor analysis is a probabilistic model for a subspace that generalizes the PPCA model discussed in Section 2.2: in FA, the noise covariance is diagonal, while in PPCA, it is a scaled identity matrix.

That is, each sample is assigned to its cluster according to the maximum likelihood rule among all subspaces estimated by the EM algorithm.

The dimension of each cluster \(\mathcal{X}_{i}\) is identified using principal component analysis (PCA) by thresholding the singular values of the data matrix X _i with respect to \(\varepsilon\).

GDS930, available at http://www.ncbi.nlm.nih.gov/projects/geo.

GDS34, available at http://www.ncbi.nlm.nih.gov/projects/geo.

GDS1316, available at http://www.ncbi.nlm.nih.gov/projects/geo.

Notice that \(\varepsilon ''_{j}\) normally does not increase with the number of vectors N, because \(\lambda _{j}\) increases proportionally to N.

Agarwal, P., & Mustafa, N. (2004). k-means projective clustering. In ACM Symposium on Principles of Database Systems.

Aldroubi, A., Cabrelli, C., & Molter, U. (2008). Optimal non-linear models for sparsity and sampling. Journal of Fourier Analysis and Applications, 14(5–6), 793–812.MathSciNetCrossRefMATH

Aldroubi, A., & Zaringhalam, K. (2009). Nonlinear least squares in ℝ ^N. Acta Applicandae Mathematicae, 107(1–3), 325–337.MathSciNetCrossRefMATH

Benson, H. (1994). Concave minimization: Theory, applications and algorithms. In R. Horst & P. M. Pardalos (Eds.), Handbook of global optimization (vol. 2, pp. 43-148), Springer Verlag.

Boyd, S., & Vandenberghe, L. (2004). Convex Optimization. Cambridge: Cambridge University Press.CrossRefMATH

Bradley, P. S., & Mangasarian, O. L. (2000). k-plane clustering. Journal of Global Optimization, 16(1), 23–32.

Cilibrasi, R., & Vitányi, P. M. (2005). Clustering by compression. IEEE Transactions on Information Theory, 51(4), 1523–1545.

Cover, T., & Thomas, J. (1991). Elements of information theory. Wiley.CrossRefMATH

Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39(1), 1–38.MathSciNetMATH

Donoho, D. L., Vetterli, M., DeVore, R., & Daubechies, I. (1998). Data compression and harmonic analysis. IEEE Transactions on Information Theory, 44(6), 2435–2476.

Fazel, M., Hindi, H., & Boyd, S. (2003). Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. In Proceedings of the American Control Conference (pp. 2156–2162).

Figueiredo, M. A. T., & Jain, A. K. (2002). Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3), 381–396.

Fischler, M. A., & Bolles, R. C. (1981). RANSAC random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 26, 381–395.

Forgy, E. (1965). Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications (abstract). Biometrics, 21, 768–769.

Frey, B., Colmenarez, A., & Huang, T. (1998). Mixtures of local linear subspaces for face recognition. In IEEE Conference on Computer Vision and Pattern Recognition.

Ghahramani, Z., & Beal, M. (2000). Variational inference for Bayesian mixtures of factor analysers. Advances in Neural Information Processing Systems, 12, 449–455.

Ghahramani, Z., & Hinton, G. (1996). The EM algorithm for mixtures of factor analyzers. Technical Report CRG-TR-96-1, University of Toronto, Canada.

Hamkins, J., & Zeger, K. (2002). Gaussian source coding with spherical codes. IEEE Transactions on Information Theory, 48(11), 2980–2989.MathSciNetCrossRefMATH

Hansen, M., & Yu, B. (2001). Model selection and the principle of minimum description length. Journal of American Statistical Association, 96, 746–774.MathSciNetCrossRefMATH

Hastie, T., Tibshirani, R., & Friedman, J. (2001). The Elements of Statistical Learning. New York: Springer.CrossRefMATH

Ho, J., Yang, M., Lim, J., Lee, K., & Kriegman, D. (2003). Clustering appearances of objects under varying illumination conditions. In Proceedings of International Conference on Computer Vision and Pattern Recognition.

Horn, R. A., & Johnson, C. R. (1985). Matrix Analysis. Cambridge: Cambridge University Press.

Huang, K., Ma, Y., & Vidal, R. (2004). Minimum effective dimension for mixtures of subspaces: A robust GPCA algorithm and its applications. In IEEE Conference on Computer Vision and Pattern Recognition (Vol. II, pp. 631–638).

Jancey, R. (1966). Multidimensional group analysis. Australian Journal of Botany, 14, 127–130.CrossRef

Kamvar, S., Klein, D., & Manning, C. (2002). Interpreting and extending classical agglomerative clustering methods using a model-based approach. Technical Report 2002-11, Stanford University Department of Computer Science.

Lloyd, S. (1957). Least squares quantization in PCM. Technical Report. Bell Laboratories. Published in 1982 in IEEE Transactions on Information Theory, 28, 128–137.

Ma, Y., Derksen, H., Hong, W., & Wright, J. (2007). Segmentation of multivariate mixed data via lossy coding and compression. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(9), 1546–1562.CrossRef

MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (pp. 281–297).

Madiman, M., Harrison, M., & Kontoyiannis, I. (2004). Minimum description length vs. maximum likelihood in lossy data compression. In Proceedings of the 2004 IEEE International Symposium on Information Theory.

McLanchlan, G. J., & Krishnan, T. (1997). The EM Algorithms and Extentions. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc.

Neal, R., & Hinton, G. (1998). A view of the EM algorithm that justifies incremental, sparse, and other variants. In M. Jordan (Ed.), Learning in graphical models (pp. 355–368). Boston: Kluwer Academic.CrossRef

Recht, B., Fazel, M., & Parrilo, P. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3), 471–501.MathSciNetCrossRefMATH

Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14, 465–471.CrossRefMATH

Rose, K. (1998). Deterministic annealing for clustering, compression, classification, regression, and related optimization problems. Proceedings of the IEEE, 86(11), 2210–2239.CrossRef

Selim, S., & Ismail, M. A. (1984). K-means-type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Transaction on Pattern Analysis and Machine Intelligence, 6(1), 81–87.

Shi, J., & Malik, J. (1998). Motion segmentation and tracking using normalized cuts. In IEEE International Conference on Computer Vision (pp. 1154–1160).

Tipping, M., & Bishop, C. (1999a). Mixtures of probabilistic principal component analyzers. Neural Computation, 11(2), 443–482.CrossRef

Torr, P., Szeliski, R., & Anandan, P. (2001). An integrated Bayesian approach to layer extraction from image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3), 297–303.CrossRef

Tse, D., & Viswanath, P. (2005). Fundamentals of Wireless Communications. Cambridge: Cambridge University Press.CrossRefMATH

Tseng, P. (2000). Nearest q-flat to m points. Journal of Optimization Theory and Applications, 105(1), 249–252.MathSciNetCrossRefMATH

Ueda, N., Nakan, R., & Ghahramani, Z. (2000). SMEM algorithm for mixture models. Neural Computation, 12, 2109–2128.CrossRef

Ward, J. (1963). Hierarchical grouping to optimize and objective function. Journal of the American Statistical Association, 58, 236–244.MathSciNetCrossRef

Wright, J., Ma, Y., Tao, Y., Lin, Z., & Shum, H.-Y. (2009b). Classification via minimum incremental coding length (MICL). SIAM Journal on Imahing Sciences, 2(2), 367–395.

Wu, J. (1983). On the convergence properties of the EM algorithm. Annals of Statistics, 11(1), 95–103.MathSciNetCrossRefMATH

Yang, M. H., Ahuja, N., & Kriegman, D. (2000). Face detection using mixtures of linear subspaces. In IEEE International Conference on Automatic Face and Gesture Recognition.

Zhang, T., Szlam, A., & Lerman, G. (2009). Median k-flats for hybrid linear modeling with many outliers. In Workshop on Subspace Methods.

Zhang, T., Szlam, A., Wang, Y., & Lerman, G. (2010). Randomized hybrid linear modeling via local best-fit flats. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 1927–1934).

Titel: Statistical Methods
verfasst von: René Vidal
Yi Ma
S. Shankar Sastry
Verlag: Springer New York
Buch: Generalized Principal Component Analysis
Print ISBN: 978-0-387-87810-2

Electronic ISBN: 978-0-387-87811-9

Copyright-Jahr: 2016
DOI: https://doi.org/10.1007/978-0-387-87811-9_6

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence_ieS/© Springer Fachmedien Wiesbaden GmbH, Search Icon, Banner Hanser, Bunte Männchen, die Kunden darstelle, werden von einem riesigen Magneten angezogen. /© Oleksiy Mark, Dr. Daniel Schneider/© Fraunhofer IESE, Interview Level Ten PPA Bild/© LevelTen, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.