Clustering Large Graphs via the Singular Value Decomposition

Drineas, P.; Frieze, A.; Kannan, R.; Vempala, S.; Vinay, V.

doi:10.1023/B:MACH.0000033113.59016.96

Clustering Large Graphs via the Singular Value Decomposition

Published: July 2004

Volume 56, pages 9–33, (2004)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Clustering Large Graphs via the Singular Value Decomposition

Download PDF

P. Drineas¹,
A. Frieze²,
R. Kannan³,
S. Vempala⁴ &
…
V. Vinay⁵

3346 Accesses
282 Citations
3 Altmetric
Explore all metrics

Abstract

We consider the problem of partitioning a set of m points in the n-dimensional Euclidean space into k clusters (usually m and n are variable, while k is fixed), so as to minimize the sum of squared distances between each point and its cluster center. This formulation is usually the objective of the k-means clustering algorithm (Kanungo et al. (2000)). We prove that this problem in NP-hard even for k = 2, and we consider a continuous relaxation of this discrete problem: find the k-dimensional subspace V that minimizes the sum of squared distances to V of the m points. This relaxation can be solved by computing the Singular Value Decomposition (SVD) of the m × n matrix A that represents the m points; this solution can be used to get a 2-approximation algorithm for the original problem. We then argue that in fact the relaxation provides a generalized clustering which is useful in its own right.

Finally, we show that the SVD of a random submatrix—chosen according to a suitable probability distribution—of a given matrix provides an approximation to the SVD of the whole matrix, thus yielding a very fast randomized algorithm. We expect this algorithm to be the main contribution of this paper, since it can be applied to problems of very large size which typically arise in modern applications.

References

Achlioptas, D., & McSherry, F. (2001). Fast computation of low rank approximations. In Proceedings of the 33rd Annual Symposium on Theory of Computing (pp. 337–346).
Achlioptas, D., & McSherry, F. (2003). Fast computation of low rank matrix approximations. Manuscript.
Andrews, H. C., & Patterson, C. L. (1976a). Singular value decomposition image coding. IEEE Trans. on Communications, 4, 425–432.
Google Scholar
Andrews, H. C., & Patterson, C. L. (1976b). Singular value decompositions and digital image processing. IEEE Trans. ASSP 26–53.
Azar, Y., Fiat, A., Karlin, A., McSherry, F., & Saia, J. (2001). Spectral analysis of data. In Proc. of the 33rd ACM Symposium on Theory of Computing (pp. 619–626).
Berry, M. J., & Linoff, G. (1997). Data mining techniques. John-Wiley.
Bar-Yossef, Z. (2002). The complexity of massive dataset computations. Ph.D. thesis, University of California, Berkeley.
Google Scholar
Bar-Yossef, Z. (2003). Sampling lower bounds via information theory. In Proceedings of the 35th Annual Symposium on Theory of Computing (pp. 335–344).
Charikar, M., & Guha, S. (1999). Improved combinatorial algorithms for the facility location and k-median problems. In Proceedings of the 40th IEEE Symposium on Foundations of Computer Science (pp. 378–388).
Charikar, M., Guha, S., Shmoys, D., & Tardos, E. (2002). A constant factor approximation algorithm for the k-median problem. Journal of Computer and System Sciences, 65:1, 129–149.
Google Scholar
Drineas, P., & Kannan, R. (2001). Fast Monte-Carlo algorithms for approximate matrix multiplication. In Proceedings of the 42nd Annual Symposium on Foundations of Computer Science (pp. 452–459).
Frieze, A., Kannan, R., & Vempala, S. (1998). Fast Monte-Carlo algorithms for finding low rank approximations. In Proceedings of the 39th Annual Symposium on Foundations of Computer Science (pp. 370–378).
Furedi, Z., & Komlos, J. (1981). The eigenvalues of random symmetric matrices. Combinatorica, 1, 233–241.
Google Scholar
Gersho, A., & Gray, R. M. (1991). Vector quantization and signal compression. Kluwer Academic.
Gibson, D., Kleinberg, J., & Raghavan, P. (1998). Clustering categorical data: An approach based on dynamical systems. Very Large Data Bases (VLDB) 311–322.
Goldreich, O., Goldwasser, S., & Ron, D. (1998). Property testing and its connection to learning and approximation. Journal of the ACM, 5:4, 653–750.
Google Scholar
Golub, G., & Van Loan, C. (1989). Matrix computations. Johns Hopkins University Press.
Huang, T., & Narendra, P. (1974). Image restoration by singular value decomposition. Applied Optics, 14:9, 2213–2216.
Google Scholar
Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Prentice Hall.
Jain, K., & Vazirani, V. (1999). Primal-dual approximation algorithms for metric facility location and k-median problems. In Proceedings of the 40th IEEE Symposium on Foundations of Computer Science (pp. 2–13).
Jambu, M., & Lebeaux, M.-O. (1983). Cluster analysis and data analysis. North Holland.
Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., & Wu, A. Y. (2000). The analysis of a simple k-means clustering algorithm. In Symposium on Computational Geometry (pp. 100–109).
Kleinberg, J. (1998). Authoritative sources in a hyperlinked environment. In Proceedings of 9th Annual ACM-SIAM Symposium on Discrete Algorithms (pp. 668–677).
McDiarmid, C. J. H. (1989). On the method of bounded differences. In Surveys in Combinatorics: Invited Papers at the 12th British Combinatorial Conference (pp. 148–188).
Ostrovsky, R., & Rabani, Y. (2002). Polynomial time approximation schemes for geometric k-clustering. Journal of the ACM, 49:2, 139–156.
Google Scholar
Papadimitriou, C. H., Raghavan, P., Tamaki, H., & Vempala, S. (2000). Latent semantic indexing: A probabilistic analysis. Journal of Computer and System Sciences, 61:2, 217–235.
Google Scholar
Parlett, B. (1997). The symmetric eigenvalue problem. Classics in Applied Mathematics, SIAM.

Download references

Author information

Authors and Affiliations

Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY, 12180
P. Drineas
Department of Mathematical Sciences, Carnegie Mellon University, Pittsburgh, PA, 15213
A. Frieze
Computer Science Department, Yale University, New Haven, CT, 06520
R. Kannan
Department of Mathematics, M.I.T., Cambridge, MA, 02139
S. Vempala
Indian Institute of Science, Bangalore, India
V. Vinay

Authors

P. Drineas
View author publications
You can also search for this author in PubMed Google Scholar
A. Frieze
View author publications
You can also search for this author in PubMed Google Scholar
R. Kannan
View author publications
You can also search for this author in PubMed Google Scholar
S. Vempala
View author publications
You can also search for this author in PubMed Google Scholar
V. Vinay
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Drineas, P., Frieze, A., Kannan, R. et al. Clustering Large Graphs via the Singular Value Decomposition. Machine Learning 56, 9–33 (2004). https://doi.org/10.1023/B:MACH.0000033113.59016.96

Download citation

Issue Date: July 2004
DOI: https://doi.org/10.1023/B:MACH.0000033113.59016.96

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Clustering Large Graphs via the Singular Value Decomposition

Abstract

Article PDF

Similar content being viewed by others

Fast Spectral Clustering via the Nyström Method

A method for searching for a globally optimal k-partition of higher-dimensional datasets

A competitive optimization approach for data clustering and orthogonal non-negative matrix factorization

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Clustering Large Graphs via the Singular Value Decomposition

Abstract

Article PDF

Similar content being viewed by others

Fast Spectral Clustering via the Nyström Method

A method for searching for a globally optimal k-partition of higher-dimensional datasets

A competitive optimization approach for data clustering and orthogonal non-negative matrix factorization

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation