Abstract
This article proposes a new concept called Cluster Similar Coefficient (CSC) for discrete elements. CSC is not only used as a criterion to build cluster by hierarchical and non-hierarchical approaches but also to evaluate the quality of established clusters quality. Based on CSC, we also propose four algorithms: to determine the suitable number of clusters, to analyze the non-fuzzy clusters, to analyze the fuzzy clusters and to build clusters with given CSC. The proposed algorithms are performed by Matlab procedures that would allow users to perform efficiently and conveniently in practice. The numerical examples demonstrate suitability and advantages of using CSC as a criterion to build the clusters in comparing with others.
Similar content being viewed by others
References
Ayala-Ramirez, V., Obara-Kepowicz, M., Sanchez-Yanez, R.E. and Jaime-Rivas, R. (2003). Bayesian texture classification method using a random sampling scheme. In IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 2065–2069.
Babuška, R. (2012). Fuzzy modeling for control, vol. 12. Springer Science & Business Media.
Ball, G.H. and Hall, I. (1965). A novel method of data analysis and pattern classification. Isodata, A novel method of data analysis and pattern classification. Tch. Report 5RI, Project 5533.
Bock, H.H. (1974). Automatic classification. Vandenhoeck and Ruprechat.
Bora, D.J. and Gupta, A.K. (2014). Impact of exponent parameter value for the partition matrix on the performance of fuzzy c means algorithm. arXiv:1406.4007.
Brodatz, P. (1966). Textures: a photographic album for artists and designers. Dover Publications, New York.
Cannon, R.L., Dave, J.V. and Bezdek, J.C. (1986). Efficient implementation of the fuzzy c-means clustering algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 8, 248–255.
Celebi, E. and Alpkocak, A. (2000). Clustering of texture features for content-based image retrieval. In Advances in Information Systems, pp. 216–225. Springer, Berlin.
Defays, D. (1977). An efficient algorithm for a complete link method. Comput. J. 20, 364–366.
Dunn, J.C. (1974). Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4, 95–104.
Ester, M., Kriegel, H.P., Sander, J. and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, vol. 96, pp. 226–231.
Fadili, M.J., Ruan, S., Bloyet, D. and Mazoyer, B. (2001). On the number of clusters and the fuzziness index for unsupervised FCA application to BOLD fMRI time series. Med. Image Anal. 5, 55–67.
Ganti, V., Gehrke, J. and Ramakrishnan, R. (1999). CACTUS–clustering categorical data using summaries. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge discovery and Data Mining, pp. 73–83. ACM.
Hall, L.O., Bensaid, A.M., Clarke, L.P., Velthuizen, R. P., Silbiger, M. S. and Bezdek, J. C. (1992). A comparison of neural network and fuzzy clustering techniques in segmenting magnetic resonance images of the brain. IEEE Trans. Neural Netw. 3, 672–682.
Haralick, R.M. (1979). Statistical and structural approaches to texture. Proc. IEEE 67, 786–804.
Hubert, L. and Arabie, P. (1985). Comparing partitions. J. Classif. 2, 193–218.
Hung, W.L. and Yang, J.H. (2015). Automatic clustering algorithm for fuzzy data. J. Appl. Stat. 42, 1503–1518.
Jain, A.K. and Dubes, R.C. (1988). Algorithms for clustering data. Prentice-Hall, Englewood Cliffs.
Johnson, R.A. and Wichern, D.W. (1992). Applied multivariate statistical analysis, 4. Prentice-Hall, Englewood Cliffs.
Kaufman, L. and Rousseeuw, P. (1987). Clustering by means of medoids. North-Holland, Amsterdam.
Keinosuke, F. (1990). Introduction to statistical pattern recognition. Academic Press, New York.
Kohonen, T. (2012). Self-organization and associative memory, vol. 8. Springer Science & Business Media.
Lauritzen, S.L. (1995). The EM algorithm for graphical association models with missing data. Comput. Stat. Data Anal. 19, 191–201.
Li, J. and Wang, J.Z. (2008). Real-time computerized annotation of pictures. IEEE Trans. Pattern Anal. Mach. Intell. 30, 985–1002.
Lissack, T. and Fu, K.S. (1976). Error estimation in pattern recognition via distance between posterior density functions. IEEE Trans. Inf. Theory 22, 34–45.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and probability, vol. 1, pp. 281–297. Oakland.
Martinez, W.L. and Martinez, A.R. (2007). Computational Statistics Handbook with MATLAB, 2nd edn. Chapman & Hall/CRC Computer Science & Data Analysis. CRC Press, Boca Raton.
Pal, N.R. and Bezdek, J.C. (1995). On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Syst. 3, 370–379.
Popat, K. and Picard, R.W. (1997). Cluster-based probability model and its application to image and texture processing. https://doi.org/10.1109/83.551697.
Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc., 66. https://doi.org/10.1080/01621459.1971.10482356.
Sheikholeslami, G., Chatterjee, S. and Zhang, A. (1998). Wavecluster: a multi-resolution clustering approach for very large spatial databases. VLDB 98, 428–439.
Sibson, R. (1973). SLINK: an optimally efficient algorithm for the single-link cluster method. Comput. J. 16, 30–34.
Sneath, P.H.A. and Sokal, R.R. (1973). Numerical taxonomy. The principles and practice of numerical classification.
Vo Van, T. and Pham-Gia, T. (2010). Clustering probability distributions. J. Appl. Stat. 37, 1891–1910.
Webb, A.R. (2003). Statistical pattern recognition. Wiley, New York.
Wong, A.K.C. and Wang, D.C.C. (1979). DECA: A discrete-valued data clustering algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 1, 342–349.
Xie, X.L. and Beni, G. (1991). A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13, 841–847.
Yu, J., Cheng, Q. and Huang, H. (2004). Analysis of the weighting exponent in the FCM. IEEE Trans. Syst. Man Cybern. B Cybern. 34, 634–639.
Zhang, Y., Wang, J.Z. and Li, J. (2015). Parallel massive clustering of discrete distributions. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 11, 49.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
VoVan, T., Nguyen Trang, T. Similar Coefficient of Cluster for Discrete Elements. Sankhya B 80, 19–36 (2018). https://doi.org/10.1007/s13571-018-0159-0
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13571-018-0159-0