research-article

Graph clustering based on structural/attribute similarities

Authors:
Yang Zhou

The Chinese University of Hong Kong

The Chinese University of Hong Kong
View Profile

,
Hong Cheng

The Chinese University of Hong Kong

The Chinese University of Hong Kong
View Profile

,
Jeffrey Xu Yu

The Chinese University of Hong Kong

The Chinese University of Hong Kong
View Profile

Proceedings of the VLDB Endowment Volume 2 Issue 1pp 718–729https://doi.org/10.14778/1687627.1687709

Published:01 August 2009Publication History

Proceedings of the VLDB Endowment

Abstract

The goal of graph clustering is to partition vertices in a large graph into different clusters based on various criteria such as vertex connectivity or neighborhood similarity. Graph clustering techniques are very useful for detecting densely connected groups in a large graph. Many existing graph clustering methods mainly focus on the topological structure for clustering, but largely ignore the vertex properties which are often heterogenous. In this paper, we propose a novel graph clustering algorithm, SA-Cluster, based on both structural and attribute similarities through a unified distance measure. Our method partitions a large graph associated with attributes into k clusters so that each cluster contains a densely connected subgraph with homogeneous attribute values. An effective method is proposed to automatically learn the degree of contributions of structural similarity and attribute similarity. Theoretical analysis is provided to show that SA-Cluster is converging. Extensive experimental results demonstrate the effectiveness of SA-Cluster through comparison with the state-of-the-art graph clustering and summarization methods.

References

R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD '98), pages 94--105, Seattle, WA, June 1998. Google ScholarDigital Library
T. M. Apostol. Calculus, Vol. 1: One-Variable Calculus, with an Introduction to Linear Algebra, 2nd edition. Wiley, 1967.Google Scholar
L. Botton and Y. Bengio. Convergence properties of the k-means algorithms. In Advances in Neural Information Processing Systems 7 (NIPS'94), pages 585--592, Denver, CO, Dec. 1994.Google Scholar
D. Cai, Z. Shao, X. He, X. Yan, and J. Han. Mining hidden community in heterogeneous social networks. In Proc. Workshop on Link Discovery: Issues, Approaches and Applications (LinkKDD'05), pages 58--65, Chicago, IL, Aug. 2005. Google ScholarDigital Library
R. Descartes. The Geometry of René Descartes. Dover Publications, 1954.Google Scholar
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD'96), pages 226--231, Portland, OR, Aug. 1996.Google Scholar
C. Faloutsos, K. McCurley, and A. Tomkins. Fast discovery of connection subgraphs. In Proc. 2004 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'04), pages 118--127, Seattle, WA, Aug. 2004. Google ScholarDigital Library
B. Fine and G. Rosenber. The Fundamental Theorem of Algebra (Undergraduate Texts in Mathematics). Springer-Verlag, 1997.Google Scholar
D. Gibson, J. Kleinberg, and P. Raghavan. Inferring web communities from link topology. In Proc. 9th ACM Conf. Hypertext and Hypermedia, pages 225--234, Pittsburgh, PA, June 1998. Google ScholarDigital Library
A. Hinneburg and D. A. Keim. An efficient approach to clustering in large multimedia databases with noise. In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD'98), pages 58--65, New York, NY, Aug. 1998.Google Scholar
T. Hofmann. Probabilistic latent semantic indexing. In Proc. 1999 Int. ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR'99), pages 50--57, Berkeley, CA, Aug. 1998. Google ScholarDigital Library
G. Jeh and J. Widom. SimRank: a measure of structural-context similarity. In Proc. 2002 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'02), pages 538--543, Edmonton, Canada, July 2002. Google ScholarDigital Library
L. Kaufman and P. J. Rousseeuw. Clustering by means of medoids. Statistical Data Analysis based on the L1 Norm, pages 405--416, 1987.Google Scholar
Z. Liu, J. X. Yu, Y. Ke, X. Lin, and L. Chen. Spotting significant changing subgraphs in evolving graphs. In Proc. 2008 Int. Conf. Data Mining (ICDM'08), Pisa, Italy, Dec. 2008. Google ScholarDigital Library
S. Navlakha, R. Rastogi, and N. Shrivastava. Graph summarization with bounded error. In Proc. 2008 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'08), pages 419--432, Vancouver, Canada, June 2008. Google ScholarDigital Library
M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. In Phys. Rev. E 69, 026113, 2004.Google ScholarCross Ref
R. Ng and J. Han. Efficient and effective clustering method for spatial data mining. In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB'94), pages 144--155, Santiago, Chile, Sept. 1994. Google ScholarDigital Library
P. Pons and M. Latapy. Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications, 10(2):191--218, 2006.Google ScholarCross Ref
J. Shi and J. Malik. Normalized cuts and image segmentation. In IEEE Trans. Pattern Analysis and Machine Intelligence, volume 22, no. 8, pages 888--905, 2000. Google ScholarDigital Library
J. Sun, C. Faloutsos, S. Papadimitriou, and P. S. Yu. Graphscope: parameter-free mining of large time-evolving graphs. In Proc. 2007 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'07), pages 687--696, San Jose, CA, 2007. Google ScholarDigital Library
Y. Sun, J. Han, P. Zhao, Z. Yin, H. Cheng, and T. Wu. Rankclus: Integrating clustering with ranking for heterogenous information network analysis. In Proc. 2009 Int. Conf. Extending Database Technology (EDBT'09), pages 565--576, Saint Petersburg, Russia, Mar. 2009. Google ScholarDigital Library
Y. Tian, R. A. Hankins, and J. M. Patel. Efficient aggregation for graph summarization. In Proc. 2008 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'08), pages 567--580, Vancouver, Canada, June 2008. Google ScholarDigital Library
H. Tong and C. Faloutsos. Center-piece subgraphs: problem definition and fast solutions. In Proc. 2006 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'06), pages 404--413, Philadelphia, PA, 2006. Google ScholarDigital Library
H. Tong, C. Faloutsos, and J.-Y. Pan. Fast random walk with restart and its applications. In Proc. 2006 Int. Conf. on Data Mining (ICDM'06), pages 613--622, Hong Kong, Dec. 2006. Google ScholarDigital Library
C.-Y. Tsai and C.-C. Chui. Developing a feature weight self-adjustment mechanism for a k-means clustering algorithm. Computational Statistics and Data Analysis, 52:4658--4672, 2008. Google ScholarDigital Library
X. Xu, N. Yuruk, Z. Feng, and T. A. J. Schweiger. Scan: a structural clustering algorithm for networks. In Proc. 2007 Int. Conf. Knowledge Discovery and Data Mining (KDD'07), pages 824--833, San Jose, CA, Aug. 2007. Google ScholarDigital Library
C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In Proc. 2004 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'04), pages 743--748, Seattle, WA, Aug. 2004. Google ScholarDigital Library

Index Terms

Graph clustering based on structural/attribute similarities

Recommendations

Online structural graph clustering using frequent subgraph mining
ECMLPKDD'10: Proceedings of the 2010th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part III

The goal of graph clustering is to partition objects in a graph database into different clusters based on various criteria such as vertex connectivity, neighborhood similarity or the size of the maximum common subgraph. This can serve to structure the ...
Read More
Graph clustering using k-Neighbourhood Attribute Structural similarity

Display Omitted Novel graph clustering algorithms (kNAS) is proposed, for overlapping community detection in large graph by combining the topological and attribute similarity that partitions the large graph into m clusters having high intracluster and ...
Read More
Efficient structural graph clustering: an index-based approach

Graph clustering is a fundamental problem widely experienced across many industries. The structural graph clustering (SCAN) method obtains not only clusters but also hubs and outliers. However, the clustering results closely depend on two sensitive ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 2, Issue 1
August 2009
1293 pages
ISSN:2150-8097
Editors:
Serge Abiteboul,
Tova Milo,
Jignesh Patel,
Philippe Rigaux
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 August 2009
Published in pvldb Volume 2, Issue 1
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 264
  Total Citations
  View Citations
- 4,133
  Total Downloads
- Downloads (Last 12 months)302
- Downloads (Last 6 weeks)29
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Graph clustering based on structural/attribute similarities

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Online structural graph clustering using frequent subgraph mining

Graph clustering using k-Neighbourhood Attribute Structural similarity

Efficient structural graph clustering: an index-based approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Graph clustering based on structural/attribute similarities

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Online structural graph clustering using frequent subgraph mining

Graph clustering using k-Neighbourhood Attribute Structural similarity

Efficient structural graph clustering: an index-based approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media