Skip to main content
Top

2017 | OriginalPaper | Chapter

Clustering of High Dimensional Handwritten Data by an Improved Hypergraph Partition Method

Authors : Tian Wang, Yonggang Lu, Yuxuan Han

Published in: Intelligent Computing Methodologies

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

High dimensional data clustering is a difficult task due to the curse of dimensionality. Traditional clustering methods usually fail to produce meaningful results for high dimensional data. Hypergraph partition is believed to be a promising method for dealing with this challenge. In this work, a new high dimensional clustering method called Merging Dense SubGraphs (MDSG) is proposed. A graph G is first constructed from the data by defining an adjacency relationship between the data points using Shared k Nearest Neighbors (SNN). Then a hypergraph is created from the graph G by defining the hyperedges to be all the maximal cliques in the graph. After the hypergraph is produced, an improved hypergraph partitioning method is used to produce the final clustering results. The proposed MDSG method is evaluated on several real high dimensional handwritten datasets, and the experimental results show that the proposed method is superior to the traditional clustering method and other hypergraph partition methods for high dimensional handwritten data clustering.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Hamerly, G., Elkan, C.: Alternatives to the k-means algorithm that find better clusterings. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 600–607. ACM (2002) Hamerly, G., Elkan, C.: Alternatives to the k-means algorithm that find better clusterings. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 600–607. ACM (2002)
2.
go back to reference Matas, J., Kittler, J.: Spatial and feature space clustering: applications in image analysis. In: Hlaváč, V., Šára, R. (eds.) CAIP 1995. LNCS, vol. 970, pp. 162–173. Springer, Heidelberg (1995). doi:10.1007/3-540-60268-2_293 CrossRef Matas, J., Kittler, J.: Spatial and feature space clustering: applications in image analysis. In: Hlaváč, V., Šára, R. (eds.) CAIP 1995. LNCS, vol. 970, pp. 162–173. Springer, Heidelberg (1995). doi:10.​1007/​3-540-60268-2_​293 CrossRef
3.
go back to reference Natali, A., Toschi, E., Baldeweg, S., et al.: Clustering of insulin resistance with vascular dysfunction and low-grade inflammation in type 2 diabetes. Diabetes 55(4), 1133–1140 (2006)CrossRef Natali, A., Toschi, E., Baldeweg, S., et al.: Clustering of insulin resistance with vascular dysfunction and low-grade inflammation in type 2 diabetes. Diabetes 55(4), 1133–1140 (2006)CrossRef
4.
go back to reference Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. J. Comput. Biol. 6(3–4), 281–297 (1999)CrossRef Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. J. Comput. Biol. 6(3–4), 281–297 (1999)CrossRef
5.
go back to reference Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining, vol. 400, no. (1), pp. 525–526 (2000) Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining, vol. 400, no. (1), pp. 525–526 (2000)
6.
go back to reference Hu, T., Liu, C., Tang, Y., et al.: High-dimensional clustering: a clique-based hypergraph partitioning framework. Knowl. Inf. Syst. 39(1), 61–88 (2014)CrossRef Hu, T., Liu, C., Tang, Y., et al.: High-dimensional clustering: a clique-based hypergraph partitioning framework. Knowl. Inf. Syst. 39(1), 61–88 (2014)CrossRef
7.
go back to reference Bouveyron, C., Brunet-Saumard, C.: Model-based clustering of high-dimensional data: a review. Comput. Stat. Data Anal. 71, 52–78 (2014)MathSciNetCrossRefMATH Bouveyron, C., Brunet-Saumard, C.: Model-based clustering of high-dimensional data: a review. Comput. Stat. Data Anal. 71, 52–78 (2014)MathSciNetCrossRefMATH
8.
go back to reference Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), pp. 2323–2326 (2000) Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), pp. 2323–2326 (2000)
9.
go back to reference Zhu, X., Huang, Z., Yang, Y., et al.: Self-taught dimensionality reduction on the high-dimensional small-sized data. Pattern Recogn. 46(1), 215–229 (2013)CrossRefMATH Zhu, X., Huang, Z., Yang, Y., et al.: Self-taught dimensionality reduction on the high-dimensional small-sized data. Pattern Recogn. 46(1), 215–229 (2013)CrossRefMATH
10.
go back to reference Song, Q., Ni, J., Wang, G.: A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng. 25(1), 1–14 (2013)CrossRef Song, Q., Ni, J., Wang, G.: A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng. 25(1), 1–14 (2013)CrossRef
12.
go back to reference Bouveyron, C.: Model-based clustering of high-dimensional data in astrophysics. EAS Publ. Ser. 77, 91–119 (2016)CrossRef Bouveyron, C.: Model-based clustering of high-dimensional data in astrophysics. EAS Publ. Ser. 77, 91–119 (2016)CrossRef
13.
go back to reference Han, E.H., Karypis, G., Kumar, V., et al.: Hypergraph based clustering in high-dimensional data sets: a summary of results. IEEE Data Eng. Bull. 21(1), 15–22 (1998) Han, E.H., Karypis, G., Kumar, V., et al.: Hypergraph based clustering in high-dimensional data sets: a summary of results. IEEE Data Eng. Bull. 21(1), 15–22 (1998)
14.
go back to reference Sun, L., Ji, S., Ye, J.: Hypergraph spectral learning for multi-label classification. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 668–676. ACM (2008) Sun, L., Ji, S., Ye, J.: Hypergraph spectral learning for multi-label classification. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 668–676. ACM (2008)
15.
go back to reference Huang, Y., Liu, Q., Zhang, S., et al.: Image retrieval via probabilistic hypergraph ranking. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3376–3383. IEEE (2010) Huang, Y., Liu, Q., Zhang, S., et al.: Image retrieval via probabilistic hypergraph ranking. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3376–3383. IEEE (2010)
16.
go back to reference Wang, M., Liu, X., Wu, X.: Visual classification by ℓ1-hypergraph modeling. IEEE Trans. Knowl. Data Eng. 27(9), 2564–2574 (2015)CrossRef Wang, M., Liu, X., Wu, X.: Visual classification by ℓ1-hypergraph modeling. IEEE Trans. Knowl. Data Eng. 27(9), 2564–2574 (2015)CrossRef
17.
go back to reference Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions. In: Papers on Twenty-Five Years of Electronic Design Automation, pp. 241–247. ACM (1988) Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions. In: Papers on Twenty-Five Years of Electronic Design Automation, pp. 241–247. ACM (1988)
18.
go back to reference Huang, D.J.H., Kahng, A.B.: When clusters meet partitions: new density-based methods for circuit decomposition. In: Proceedings of the 1995 European Conference on Design and Test. IEEE Computer Society (1995) Huang, D.J.H., Kahng, A.B.: When clusters meet partitions: new density-based methods for circuit decomposition. In: Proceedings of the 1995 European Conference on Design and Test. IEEE Computer Society (1995)
19.
go back to reference Karypis, G., Aggarwal, R., Kumar, V., et al.: Multilevel hypergraph partitioning: applications in VLSI domain. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 7(1), 69–79 (1999)CrossRef Karypis, G., Aggarwal, R., Kumar, V., et al.: Multilevel hypergraph partitioning: applications in VLSI domain. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 7(1), 69–79 (1999)CrossRef
20.
go back to reference Cai, W., Young, E.F.Y.: A fast hypergraph bipartitioning algorithm. In: 2014 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 607–612. IEEE (2014) Cai, W., Young, E.F.Y.: A fast hypergraph bipartitioning algorithm. In: 2014 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 607–612. IEEE (2014)
23.
go back to reference Liu, H., Latecki, L.J., Yan, S.: Dense subgraph partition of positive hypergraphs. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 541–554 (2015)CrossRef Liu, H., Latecki, L.J., Yan, S.: Dense subgraph partition of positive hypergraphs. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 541–554 (2015)CrossRef
24.
go back to reference Jagannathan, J., Sherajdheen, A., Deepak, R.M.V., et al.: License plate character segmentation using horizontal and vertical projection with dynamic thresholding. In: 2013 International Conference on Emerging Trends in Computing, Communication and Nanotechnology (ICE-CCN), pp. 700–705. IEEE (2013) Jagannathan, J., Sherajdheen, A., Deepak, R.M.V., et al.: License plate character segmentation using horizontal and vertical projection with dynamic thresholding. In: 2013 International Conference on Emerging Trends in Computing, Communication and Nanotechnology (ICE-CCN), pp. 700–705. IEEE (2013)
25.
go back to reference Tuba, E., Bacanin, N.: An algorithm for handwritten digit recognition using projection histograms and SVM classifier. In: 2015 23rd Telecommunications Forum Telfor (TELFOR), pp. 464–467. IEEE (2015) Tuba, E., Bacanin, N.: An algorithm for handwritten digit recognition using projection histograms and SVM classifier. In: 2015 23rd Telecommunications Forum Telfor (TELFOR), pp. 464–467. IEEE (2015)
26.
go back to reference Hinton, G., Roweis, S.: Stochastic neighbor embedding. In: NIPS, vol. 15, pp. 833–840 (2002) Hinton, G., Roweis, S.: Stochastic neighbor embedding. In: NIPS, vol. 15, pp. 833–840 (2002)
27.
go back to reference Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008) Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
28.
go back to reference Eppstein, D., Löffle, M., Strash, D.: Listing all maximal cliques in sparse graphs in near-optimal time. In: Cheong, O., Chwa, K.Y., Park, K. (eds.) ISAAC 2010, vol. 6506, pp. 403–414. Springer, Heidelberg (2010). doi:10.1007/978-3-642-17517-6_36 Eppstein, D., Löffle, M., Strash, D.: Listing all maximal cliques in sparse graphs in near-optimal time. In: Cheong, O., Chwa, K.Y., Park, K. (eds.) ISAAC 2010, vol. 6506, pp. 403–414. Springer, Heidelberg (2010). doi:10.​1007/​978-3-642-17517-6_​36
30.
go back to reference Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78(383), 553–569 (1983)CrossRefMATH Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78(383), 553–569 (1983)CrossRefMATH
34.
go back to reference Van der Maaten, L.: A new benchmark dataset for handwritten character recognition. Tilburg University, pp. 2–5 (2009) Van der Maaten, L.: A new benchmark dataset for handwritten character recognition. Tilburg University, pp. 2–5 (2009)
35.
go back to reference Kaufman, L., Rousseeuw, P.: Clustering by Means of Medoids. North-Holland, Amsterdam (1987) Kaufman, L., Rousseeuw, P.: Clustering by Means of Medoids. North-Holland, Amsterdam (1987)
36.
go back to reference Sun, X., Tian, S., Lu, Y.: High dimensional data clustering by partitioning the hypergraphs using dense subgraph partition. In: Ninth International Symposium on Multispectral Image Processing and Pattern Recognition (MIPPR2015). International Society for Optics and Photonics (2015) Sun, X., Tian, S., Lu, Y.: High dimensional data clustering by partitioning the hypergraphs using dense subgraph partition. In: Ninth International Symposium on Multispectral Image Processing and Pattern Recognition (MIPPR2015). International Society for Optics and Photonics (2015)
Metadata
Title
Clustering of High Dimensional Handwritten Data by an Improved Hypergraph Partition Method
Authors
Tian Wang
Yonggang Lu
Yuxuan Han
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-63315-2_28

Premium Partner