Skip to main content
Log in

Fuzzy clustering algorithm for outlier-interval data based on the robust exponent distance

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The outlier elements of a data are ones that differs significantly from others. For many reasons, we have to face with outlier elements in data analysis for the different fields. Because an outlier element can cause the serious problems in statistical analyses, studying about it is interested in many researchers. This article proposes the fuzzy clustering algorithm for outlier - interval data based on the robust exponent distance to overcome the drawback of traditional clustering algorithm which to clean the outliers before performing. The outstanding advantage of this algorithm is to find the suitable number of clusters, to cluster for the interval data with outlier elements, and to determine the probability belonging to clusters for the intervals at the same time. The proposed algorithm is described step by step via numerical examples, and can be performed effectively by the Matlab procedure. In addition, it also applied in reality with the air pollution, mushroom, and image data sets. These real applications demonstrate the robustness of the proposed algorithm in comparison with the existing ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Azzalini A, Torelli N (2007) Clustering via nonparametric density estimation. Stat Comput 17(1):71–80

    Article  MathSciNet  Google Scholar 

  2. Bezdek J C (1974) Numerical taxonomy with fuzzy sets. J Math Biol 1(1):57–71

    Article  MathSciNet  Google Scholar 

  3. de Carvalho FdA, Simões EC (2017) Fuzzy clustering of interval-valued data with city-block and hausdorff distances. Neurocomputing 266:659–673

    Article  Google Scholar 

  4. Chen J H, Hung W L (2015) An automatic clustering algorithm for probability density functions. J Stat Comput Simul 85(15):3047–3063

    Article  MathSciNet  Google Scholar 

  5. Hathaway R J, Bezdek J C (1988) Recent convergence results for the fuzzy c-means clustering algorithms. J Classif 5(2):237–247

    Article  MathSciNet  Google Scholar 

  6. Hung W L, Yang J H, Shen K F (2016) Self-updating clustering algorithm for interval-valued data. IEEE Int Conf Fuzzy Syst:1494–1500

  7. Jeng J T, Chen C M, Chang S C, Chuang C C (2019) Ipfcm clustering algorithm under euclidean and hausdorff distance measure for symbolic interval data. Int J Fuzzy Syst 21(7):2102–2119

    Article  MathSciNet  Google Scholar 

  8. Kabir S, Wagner C, Havens T C, Anderson D T, Aickelin U (2017) Novel similarity measure for interval-valued data based on overlapping ratio. IEEE Int Conf Fuzzy Syst:1–6

  9. Kamel M S, Selim S Z (1994) New algorithms for solving the fuzzy clustering problem. Pattern Recogn 27(3):421–428

    Article  Google Scholar 

  10. Lethikim N, Lehoang T, Vovan T (2021) Automatic clustering algorithm for interval data based on overlap distance. Communications in Statistics-Simulation and Computation, pp 1–16. Taylor & Francis. https://doi.org/10.1080/03610918.2021.1900248

  11. Malarvizhi N, Selvarani P, Raj P (2019) Adaptive fuzzy genetic algorithm for multi biometric authentication. Multimed Tools Appl:1–14

  12. Nguyentrang T, Tai V (2017) Fuzzy clustering of probability density functions. J Appl Stat 44(4):583–601

    Article  MathSciNet  Google Scholar 

  13. Pham-Gia T, Turkkan N, Tai V (2008) Statistical discrimination analysis using the maximum function. Commun Stat—Simul Comput®; 37(2):320–336

    Article  MathSciNet  Google Scholar 

  14. Phamtoan D, Vovan T (2020) Automatic fuzzy genetic algorithm in clustering for images based on the extracted intervals. Multimed Tools Appl:1–23, https://doi.org/10.1007/s11042-020-09975-3

  15. Reimers N, Schiller B, Beck T, Daxenberger J, Stab C, Gurevych I (2019) Classification and clustering of arguments with contextualized word embeddings. arXiv:190609821

  16. Rodríguez SIR, de Carvalho FdAT (2019) A new fuzzy clustering algorithm for interval-valued data based on city-block distance. 2019 IEEE International Conference on Fuzzy Systems, pp 1–6

  17. de Souza L C, de Souza R M C R, do Amaral G J A (2020) Dynamic clustering of interval data based on hybrid lq distance. Knowl Inf Syst 62(2):687–718

    Article  Google Scholar 

  18. Tai V, Pham-Gia T (2010) Clustering probability distributions. J Appl Stat 37(11):1891–1910

    Article  MathSciNet  Google Scholar 

  19. Tai V, Dinh P, Tranthituy D (2019) Automatic genetic algorithm in clustering for discrete elements. Commun Stat-Simul Comput:1–16

  20. Vovan T, Phamtoan D, LeHoang T, Nguyentrang T (2020) An automatic clustering for interval data using the genetic algorithm. Ann Oper Res:1–22

  21. Wang X, Yu F, Pedrycz W, Yu L (2019) Clustering of interval-valued time series of unequal length based on improved dynamic time warping. Expert Syst Appl 125:293–304

    Article  Google Scholar 

  22. Xu W (2010) Symbolic data analysis: interval-valued data regression. PhD thesis, University of Georgia, Athens

Download references

Acknowledgments

For Khanh Nguyenhuu and Tai Vovan, this research is funded by Ministry of Education and Training in Viet Nam under grant number B2021 – TCT – 01.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tai Vovan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Phamtoan, D., Nguyenhuu, K. & Vovan, T. Fuzzy clustering algorithm for outlier-interval data based on the robust exponent distance. Appl Intell 52, 6276–6291 (2022). https://doi.org/10.1007/s10489-021-02773-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02773-w

Keywords

Navigation