Skip to main content
Erschienen in: Journal of Intelligent Information Systems 3/2013

01.06.2013

Clustering based on a near neighbor graph and a grid cell graph

verfasst von: Xinquan Chen

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 3/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents two novel graph-clustering algorithms, Clustering based on a Near Neighbor Graph (CNNG) and Clustering based on a Grid Cell Graph (CGCG). CNNG algorithm inspired by the idea of near neighbors is an improved graph-clustering method based on Minimum Spanning Tree (MST). In order to analyze massive data sets more efficiently, CGCG algorithm, which is a kind of graph-clustering method based on MST on the level of grid cells, is presented. To clearly describe the two algorithms, we give some important concepts, such as near neighbor point set, near neighbor undirected graph, grid cell, and so on. To effectively implement the two algorithms, we use some efficient partitioning and index methods, such as multidimensional grid partition method, multidimensional index tree, and so on. From simulation experiments of some artificial data sets and seven real data sets, we observe that the time cost of CNNG algorithm can be decreased by using some improving techniques and approximate methods while attaining an acceptable clustering quality, and CGCG algorithm can approximately analyze some dense data sets with linear time cost. Moreover, comparing some classical clustering algorithms, CNNG algorithm can often get better clustering quality or quicker clustering speed.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Agrawal, R., Gehrke, J., Gunopolos, D., et al. (1998). Automatic subspace clustering of high dimensional data for data mining application. In Proceeding of the ACM SIGMOD international conference on management of data (pp. 94–105). Agrawal, R., Gehrke, J., Gunopolos, D., et al. (1998). Automatic subspace clustering of high dimensional data for data mining application. In Proceeding of the ACM SIGMOD international conference on management of data (pp. 94–105).
Zurück zum Zitat Anders, K.H. (2003). A hierarchical graph-clustering approach to find groups of objects. In The 5th workshop on progress in automated map generalization (pp. 1–8). Anders, K.H. (2003). A hierarchical graph-clustering approach to find groups of objects. In The 5th workshop on progress in automated map generalization (pp. 1–8).
Zurück zum Zitat Cormen, T.H., Leiserson, C.E., Rivest, R.L., et al. (2009). Introduction to algorithms (3rd ed.). Cambridge: The MIT Press.MATH Cormen, T.H., Leiserson, C.E., Rivest, R.L., et al. (2009). Introduction to algorithms (3rd ed.). Cambridge: The MIT Press.MATH
Zurück zum Zitat Costa, A.F.B.F., Pimentel, B.A., de Souza, R.M.C.R. (2013). Clustering interval data through kernel-induced feature space. Journal of Intelligent Information Systems, 40(1), 109–140.CrossRef Costa, A.F.B.F., Pimentel, B.A., de Souza, R.M.C.R. (2013). Clustering interval data through kernel-induced feature space. Journal of Intelligent Information Systems, 40(1), 109–140.CrossRef
Zurück zum Zitat Ester, M., Kriegel, H.P., Sander, J., Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial data sets with noise. In The 2th international conference on knowledge discovery and data mining (pp. 226–231). Portland. Ester, M., Kriegel, H.P., Sander, J., Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial data sets with noise. In The 2th international conference on knowledge discovery and data mining (pp. 226–231). Portland.
Zurück zum Zitat Gabriel, K., & Sokal, R. (1969). A new statistical approach to geographic variation analysis. Systematic Zoology, 18, 259–278.CrossRef Gabriel, K., & Sokal, R. (1969). A new statistical approach to geographic variation analysis. Systematic Zoology, 18, 259–278.CrossRef
Zurück zum Zitat Gower, J.C., & Ross, G.J.S. (1969). Minimum spanning trees and single linkage cluster analysis. Applied Statistics, 18(1), 54–64.MathSciNetCrossRef Gower, J.C., & Ross, G.J.S. (1969). Minimum spanning trees and single linkage cluster analysis. Applied Statistics, 18(1), 54–64.MathSciNetCrossRef
Zurück zum Zitat Guha, S., Rastogi, R., Shim, K. (1998). Cure: an efficient clustering algorithm for large databases. In Proceeding of the ACM SIGMOD international conference on management of data (pp. 73–84). Seattle: ACM Press. Guha, S., Rastogi, R., Shim, K. (1998). Cure: an efficient clustering algorithm for large databases. In Proceeding of the ACM SIGMOD international conference on management of data (pp. 73–84). Seattle: ACM Press.
Zurück zum Zitat Jain, A.K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666.CrossRef Jain, A.K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666.CrossRef
Zurück zum Zitat Jain, A.K., Murty, M.N., Flynn, P.J. (1999). Data clustering: a review. ACM Computing Surveys, 31(3), 264–323.CrossRef Jain, A.K., Murty, M.N., Flynn, P.J. (1999). Data clustering: a review. ACM Computing Surveys, 31(3), 264–323.CrossRef
Zurück zum Zitat Jaromczyk, J.W., Godfried, T. (1992). Relative neighborhood graphs and their relatives. Proceedings of the IEEE, 80(9), 1502–1517.CrossRef Jaromczyk, J.W., Godfried, T. (1992). Relative neighborhood graphs and their relatives. Proceedings of the IEEE, 80(9), 1502–1517.CrossRef
Zurück zum Zitat Karypis, G., Han, E.H., Kumar, V. (1999). Chameleon: a hierarchical clustering algorithm using dynamic modeling. IEEE Computer, 32(8), 68–75.CrossRef Karypis, G., Han, E.H., Kumar, V. (1999). Chameleon: a hierarchical clustering algorithm using dynamic modeling. IEEE Computer, 32(8), 68–75.CrossRef
Zurück zum Zitat Lee, D.T. (1980). Two dimensional voronoi diagram in the l p metric. Journal of ACM, 27(4), 604–618.CrossRefMATH Lee, D.T. (1980). Two dimensional voronoi diagram in the l p metric. Journal of ACM, 27(4), 604–618.CrossRefMATH
Zurück zum Zitat Li, C.B., Yin, W.M., Li, R.R., et al. (2009). Tutorial to data structures (3rd ed.). Beijing: The Tsinghua University Press. Li, C.B., Yin, W.M., Li, R.R., et al. (2009). Tutorial to data structures (3rd ed.). Beijing: The Tsinghua University Press.
Zurück zum Zitat Schölkopf, B., Smola, A., Müller, K.R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299–1319.CrossRef Schölkopf, B., Smola, A., Müller, K.R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299–1319.CrossRef
Zurück zum Zitat Tan, P.N., Steinbach, M., Kumar, V. (2005). Introduction to data mining. Addison Wesley. Tan, P.N., Steinbach, M., Kumar, V. (2005). Introduction to data mining. Addison Wesley.
Zurück zum Zitat Theodoridis, S., & Koutroumbas, K. (2006). Pattern recognition (3rd ed.). Academic Press. Theodoridis, S., & Koutroumbas, K. (2006). Pattern recognition (3rd ed.). Academic Press.
Zurück zum Zitat Wang, X.C., Wang, X.L., Wilkes, D.M. (2009). A divide-and-conquer approach for minimum spanning tree-based clustering. IEEE Transactions on Knowledge and Data Engineering, 21(7), 945–958.CrossRef Wang, X.C., Wang, X.L., Wilkes, D.M. (2009). A divide-and-conquer approach for minimum spanning tree-based clustering. IEEE Transactions on Knowledge and Data Engineering, 21(7), 945–958.CrossRef
Zurück zum Zitat Wang, W., Yang, J., Muntz, R.R. (1997). STING: a statistical information grid approach to spatial data mining. In Proceedings of the 23rd VLDB conference (pp. 186–195). Athens, Greece. Wang, W., Yang, J., Muntz, R.R. (1997). STING: a statistical information grid approach to spatial data mining. In Proceedings of the 23rd VLDB conference (pp. 186–195). Athens, Greece.
Zurück zum Zitat Yao, A.C. (1975). An O(∣E∣ ·loglog∣V∣) algorithm for finding minimum spanning trees. Information Processing Letters, 4(1), 21–23.CrossRefMATH Yao, A.C. (1975). An O(∣E∣ ·loglog∣V∣) algorithm for finding minimum spanning trees. Information Processing Letters, 4(1), 21–23.CrossRefMATH
Zurück zum Zitat Yao, A.C. (1982). On constructing minimum spanning trees in k-dimensional spaces and related problems. SIAM Journal on Computing, 11(5), 721–736.MathSciNetCrossRefMATH Yao, A.C. (1982). On constructing minimum spanning trees in k-dimensional spaces and related problems. SIAM Journal on Computing, 11(5), 721–736.MathSciNetCrossRefMATH
Zurück zum Zitat Zahn, C.T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers, C-20(1), 68–86. Zahn, C.T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers, C-20(1), 68–86.
Zurück zum Zitat Zhang, N.X. (2006). Algorithms and data structures: Described in C language (2nd ed.). Beijing: The Higher Education Press. Zhang, N.X. (2006). Algorithms and data structures: Described in C language (2nd ed.). Beijing: The Higher Education Press.
Zurück zum Zitat Zhang, T., Ramakrishnan, R., Linvy, M. (1997). BIRCH: an efficient data clustering method for very large data sets. Data Mining and Knowledge Discovery, 1(2), 141–182.CrossRef Zhang, T., Ramakrishnan, R., Linvy, M. (1997). BIRCH: an efficient data clustering method for very large data sets. Data Mining and Knowledge Discovery, 1(2), 141–182.CrossRef
Zurück zum Zitat Zhou, C.M., Miao, D.Q., Wang, R.Z. (2010). A graph-theoretical clustering method based on two rounds of minimum spanning trees. Pattern Recognition, 43(3), 752–766.CrossRef Zhou, C.M., Miao, D.Q., Wang, R.Z. (2010). A graph-theoretical clustering method based on two rounds of minimum spanning trees. Pattern Recognition, 43(3), 752–766.CrossRef
Metadaten
Titel
Clustering based on a near neighbor graph and a grid cell graph
verfasst von
Xinquan Chen
Publikationsdatum
01.06.2013
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 3/2013
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-013-0236-9

Weitere Artikel der Ausgabe 3/2013

Journal of Intelligent Information Systems 3/2013 Zur Ausgabe

Premium Partner