Abstract
The outlier elements of a data are ones that differs significantly from others. For many reasons, we have to face with outlier elements in data analysis for the different fields. Because an outlier element can cause the serious problems in statistical analyses, studying about it is interested in many researchers. This article proposes the fuzzy clustering algorithm for outlier - interval data based on the robust exponent distance to overcome the drawback of traditional clustering algorithm which to clean the outliers before performing. The outstanding advantage of this algorithm is to find the suitable number of clusters, to cluster for the interval data with outlier elements, and to determine the probability belonging to clusters for the intervals at the same time. The proposed algorithm is described step by step via numerical examples, and can be performed effectively by the Matlab procedure. In addition, it also applied in reality with the air pollution, mushroom, and image data sets. These real applications demonstrate the robustness of the proposed algorithm in comparison with the existing ones.
Similar content being viewed by others
References
Azzalini A, Torelli N (2007) Clustering via nonparametric density estimation. Stat Comput 17(1):71–80
Bezdek J C (1974) Numerical taxonomy with fuzzy sets. J Math Biol 1(1):57–71
de Carvalho FdA, Simões EC (2017) Fuzzy clustering of interval-valued data with city-block and hausdorff distances. Neurocomputing 266:659–673
Chen J H, Hung W L (2015) An automatic clustering algorithm for probability density functions. J Stat Comput Simul 85(15):3047–3063
Hathaway R J, Bezdek J C (1988) Recent convergence results for the fuzzy c-means clustering algorithms. J Classif 5(2):237–247
Hung W L, Yang J H, Shen K F (2016) Self-updating clustering algorithm for interval-valued data. IEEE Int Conf Fuzzy Syst:1494–1500
Jeng J T, Chen C M, Chang S C, Chuang C C (2019) Ipfcm clustering algorithm under euclidean and hausdorff distance measure for symbolic interval data. Int J Fuzzy Syst 21(7):2102–2119
Kabir S, Wagner C, Havens T C, Anderson D T, Aickelin U (2017) Novel similarity measure for interval-valued data based on overlapping ratio. IEEE Int Conf Fuzzy Syst:1–6
Kamel M S, Selim S Z (1994) New algorithms for solving the fuzzy clustering problem. Pattern Recogn 27(3):421–428
Lethikim N, Lehoang T, Vovan T (2021) Automatic clustering algorithm for interval data based on overlap distance. Communications in Statistics-Simulation and Computation, pp 1–16. Taylor & Francis. https://doi.org/10.1080/03610918.2021.1900248
Malarvizhi N, Selvarani P, Raj P (2019) Adaptive fuzzy genetic algorithm for multi biometric authentication. Multimed Tools Appl:1–14
Nguyentrang T, Tai V (2017) Fuzzy clustering of probability density functions. J Appl Stat 44(4):583–601
Pham-Gia T, Turkkan N, Tai V (2008) Statistical discrimination analysis using the maximum function. Commun Stat—Simul Comput®; 37(2):320–336
Phamtoan D, Vovan T (2020) Automatic fuzzy genetic algorithm in clustering for images based on the extracted intervals. Multimed Tools Appl:1–23, https://doi.org/10.1007/s11042-020-09975-3
Reimers N, Schiller B, Beck T, Daxenberger J, Stab C, Gurevych I (2019) Classification and clustering of arguments with contextualized word embeddings. arXiv:190609821
Rodríguez SIR, de Carvalho FdAT (2019) A new fuzzy clustering algorithm for interval-valued data based on city-block distance. 2019 IEEE International Conference on Fuzzy Systems, pp 1–6
de Souza L C, de Souza R M C R, do Amaral G J A (2020) Dynamic clustering of interval data based on hybrid lq distance. Knowl Inf Syst 62(2):687–718
Tai V, Pham-Gia T (2010) Clustering probability distributions. J Appl Stat 37(11):1891–1910
Tai V, Dinh P, Tranthituy D (2019) Automatic genetic algorithm in clustering for discrete elements. Commun Stat-Simul Comput:1–16
Vovan T, Phamtoan D, LeHoang T, Nguyentrang T (2020) An automatic clustering for interval data using the genetic algorithm. Ann Oper Res:1–22
Wang X, Yu F, Pedrycz W, Yu L (2019) Clustering of interval-valued time series of unequal length based on improved dynamic time warping. Expert Syst Appl 125:293–304
Xu W (2010) Symbolic data analysis: interval-valued data regression. PhD thesis, University of Georgia, Athens
Acknowledgments
For Khanh Nguyenhuu and Tai Vovan, this research is funded by Ministry of Education and Training in Viet Nam under grant number B2021 – TCT – 01.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Phamtoan, D., Nguyenhuu, K. & Vovan, T. Fuzzy clustering algorithm for outlier-interval data based on the robust exponent distance. Appl Intell 52, 6276–6291 (2022). https://doi.org/10.1007/s10489-021-02773-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02773-w