Skip to main content
Top
Published in:

26-05-2021 | Theoretical advances

Fuzzy kernel K-medoids clustering algorithm for uncertain data objects

Authors: Behnam Tavakkol, Youngdoo Son

Published in: Pattern Analysis and Applications | Issue 3/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Most data mining algorithms are designed for traditional type of data objects which are referred to as certain data objects. Certain data objects contain no uncertainty information and are represented by a single point. Capturing uncertainty can result in better performance of algorithms as they might generate more accurate results. There are different ways of modeling uncertainty for data objects, two of the most popular ones are: (1) considering a group of points for each object and (2) considering a probability density function (pdf) for each object. Objects modeled in these ways are referred to as uncertain data objects. Fuzzy clustering is a well-established field of research for certain data. When fuzzy clustering algorithms are used, degrees of membership are generated for assignment of objects to clusters which gives the flexibility to express that objects can belong to more than one cluster. To the best of our knowledge, for uncertain data, there is only one existing fuzzy clustering algorithm in the literature. The existing uncertain fuzzy clustering algorithm, however, cannot properly create non-convex shaped clusters, and therefore, its performance is not that well on uncertain data sets with arbitrary-shaped clusters—clusters that are non-convex, unconventional, and possibly nonlinearly separable. In this paper, we propose a novel fuzzy kernel K-medoids clustering algorithm for uncertain objects which works well on data sets with arbitrary-shaped clusters. We show through several experiments on synthetic and real data that the proposed algorithm outperforms the competitor algorithms: certain fuzzy K-medoids and the uncertain fuzzy K-medoids.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Aggarwal CC, Philip SY (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng 21:609–623CrossRef Aggarwal CC, Philip SY (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng 21:609–623CrossRef
2.
go back to reference Chau M, Cheng R, Kao B, Ng J (2006) Uncertain data mining: An example in clustering location data. In: Ng W-K, Kitsuregawa M, Li J, Chang K (eds) Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, pp 199–204CrossRef Chau M, Cheng R, Kao B, Ng J (2006) Uncertain data mining: An example in clustering location data. In: Ng W-K, Kitsuregawa M, Li J, Chang K (eds) Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, pp 199–204CrossRef
3.
go back to reference Gullo F, Ponti G, Tagarelli A (2013) Minimizing the variance of cluster mixture models for clustering uncertain objects. Stat Anal Data Min ASA Data Sci J 6:116–135MathSciNetCrossRef Gullo F, Ponti G, Tagarelli A (2013) Minimizing the variance of cluster mixture models for clustering uncertain objects. Stat Anal Data Min ASA Data Sci J 6:116–135MathSciNetCrossRef
4.
go back to reference Gullo F, Ponti G, Tagarelli A, Greco S (2017) An information-theoretic approach to hierarchical clustering of uncertain data. Inf Sci 402:199–215CrossRef Gullo F, Ponti G, Tagarelli A, Greco S (2017) An information-theoretic approach to hierarchical clustering of uncertain data. Inf Sci 402:199–215CrossRef
5.
go back to reference Gullo F, Ponti G, Tagarelli A (2010) Minimizing the variance of cluster mixture models for clustering uncertain objects. In: Data Mining (ICDM), 2010 IEEE 10th International Conference on. IEEE, pp 839–844 Gullo F, Ponti G, Tagarelli A (2010) Minimizing the variance of cluster mixture models for clustering uncertain objects. In: Data Mining (ICDM), 2010 IEEE 10th International Conference on. IEEE, pp 839–844
6.
go back to reference Gullo F, Ponti G, Tagarelli A (2008) Clustering uncertain data via k-medoids. In: Greco S, Lukasiewicz T (eds) Scalable Uncertain Management. Springer, Berlin, pp 229–242CrossRef Gullo F, Ponti G, Tagarelli A (2008) Clustering uncertain data via k-medoids. In: Greco S, Lukasiewicz T (eds) Scalable Uncertain Management. Springer, Berlin, pp 229–242CrossRef
7.
go back to reference Gullo F, Ponti G, Tagarelli A, Greco S (2008) A hierarchical algorithm for clustering uncertain data via an information-theoretic approach. In: Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, pp 821–826 Gullo F, Ponti G, Tagarelli A, Greco S (2008) A hierarchical algorithm for clustering uncertain data via an information-theoretic approach. In: Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, pp 821–826
8.
go back to reference Jiang B, Pei J, Tao Y, Lin X (2013) Clustering uncertain data based on probability distribution similarity. IEEE Trans Knowl Data Eng 25:751–763CrossRef Jiang B, Pei J, Tao Y, Lin X (2013) Clustering uncertain data based on probability distribution similarity. IEEE Trans Knowl Data Eng 25:751–763CrossRef
9.
go back to reference Kao B, Lee SD, Lee FK et al (2010) Clustering uncertain data using voronoi diagrams and r-tree index. IEEE Trans Knowl Data Eng 22:1219–1233CrossRef Kao B, Lee SD, Lee FK et al (2010) Clustering uncertain data using voronoi diagrams and r-tree index. IEEE Trans Knowl Data Eng 22:1219–1233CrossRef
10.
go back to reference Kriegel H-P, Pfeifle M (2005) Density-based clustering of uncertain data. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, pp 672–677 Kriegel H-P, Pfeifle M (2005) Density-based clustering of uncertain data. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, pp 672–677
11.
go back to reference Lee SD, Kao B, Cheng R (2007) Reducing UK-means to K-means. In: Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on. IEEE, pp 483–488 Lee SD, Kao B, Cheng R (2007) Reducing UK-means to K-means. In: Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on. IEEE, pp 483–488
12.
go back to reference Yang B, Zhang Y (2010) Kernel based K-medoids for clustering data with uncertainty. In: Cao L, Feng Y, Zhong J (eds) Advance Data Mining and Applications. Springer, Berlin, pp 246–253CrossRef Yang B, Zhang Y (2010) Kernel based K-medoids for clustering data with uncertainty. In: Cao L, Feng Y, Zhong J (eds) Advance Data Mining and Applications. Springer, Berlin, pp 246–253CrossRef
13.
go back to reference Qin B, Xia Y, Li F (2009) DTU: a decision tree for uncertain data. In: Theeramunkong T, Kijsirikul B, Cercone N, Ho T-B (eds) Advances in Knowledge Discovery and Data Mining. Springer, Berlin, pp 4–15CrossRef Qin B, Xia Y, Li F (2009) DTU: a decision tree for uncertain data. In: Theeramunkong T, Kijsirikul B, Cercone N, Ho T-B (eds) Advances in Knowledge Discovery and Data Mining. Springer, Berlin, pp 4–15CrossRef
15.
go back to reference Tavakkol B, Jeong MK, Albin SL (2017) Object-to-group probabilistic distance measure for uncertain data classification. Neurocomputing 230:143–151CrossRef Tavakkol B, Jeong MK, Albin SL (2017) Object-to-group probabilistic distance measure for uncertain data classification. Neurocomputing 230:143–151CrossRef
16.
go back to reference Aggarwal CC, Yu PS (2008) Outlier detection with uncertain data. In: Proceedings of the 2008 SIAM International Conference on Data Mining. SIAM, pp 483–493 Aggarwal CC, Yu PS (2008) Outlier detection with uncertain data. In: Proceedings of the 2008 SIAM International Conference on Data Mining. SIAM, pp 483–493
17.
go back to reference Jiang B, Pei J (2011) Outlier detection on uncertain data: Objects, instances, and inferences. In: 2011 IEEE 27th International Conference on Data Engineering. IEEE, pp 422–433 Jiang B, Pei J (2011) Outlier detection on uncertain data: Objects, instances, and inferences. In: 2011 IEEE 27th International Conference on Data Engineering. IEEE, pp 422–433
18.
go back to reference Liu B, Xiao Y, Cao L et al (2013) SVDD-based outlier detection on uncertain data. Knowl Inf Syst 34:597–618CrossRef Liu B, Xiao Y, Cao L et al (2013) SVDD-based outlier detection on uncertain data. Knowl Inf Syst 34:597–618CrossRef
19.
go back to reference Liu J, Deng H (2013) Outlier detection on uncertain data based on local information. Knowl-Based Syst 51:60–71CrossRef Liu J, Deng H (2013) Outlier detection on uncertain data based on local information. Knowl-Based Syst 51:60–71CrossRef
20.
go back to reference Shaikh SA, Kitagawa H (2014) Top-k outlier detection from uncertain data. Int J Autom Comput 11:128–142CrossRef Shaikh SA, Kitagawa H (2014) Top-k outlier detection from uncertain data. Int J Autom Comput 11:128–142CrossRef
21.
go back to reference Shaikh SA, Kitagawa H (2012) Distance-based outlier detection on uncertain data of Gaussian distribution. In: Asia-Pacific Web Conference. Springer, pp 109–121 Shaikh SA, Kitagawa H (2012) Distance-based outlier detection on uncertain data of Gaussian distribution. In: Asia-Pacific Web Conference. Springer, pp 109–121
22.
go back to reference Wang B, Xiao G, Yu H, Yang X (2009) Distance-based outlier detection on uncertain data. In: 2009 Ninth IEEE International Conference on Computer and Information Technology. IEEE, pp 293–298 Wang B, Xiao G, Yu H, Yang X (2009) Distance-based outlier detection on uncertain data. In: 2009 Ninth IEEE International Conference on Computer and Information Technology. IEEE, pp 293–298
23.
go back to reference Zhang H, Wang S, Xu X et al (2018) Tree2Vector: learning a vectorial representation for tree-structured data. IEEE Trans Neural Netw Learn Syst 29:5304–5318MathSciNetCrossRef Zhang H, Wang S, Xu X et al (2018) Tree2Vector: learning a vectorial representation for tree-structured data. IEEE Trans Neural Netw Learn Syst 29:5304–5318MathSciNetCrossRef
25.
go back to reference Bora DJ, Gupta D, Kumar A (2014) A comparative study between fuzzy clustering algorithm and hard clustering algorithm. ArXiv Prepr ArXiv14046059 Bora DJ, Gupta D, Kumar A (2014) A comparative study between fuzzy clustering algorithm and hard clustering algorithm. ArXiv Prepr ArXiv14046059
26.
go back to reference Hamdan H, Govaert G (2005) Mixture model clustering of uncertain data. In: The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ’05. IEEE, pp 879–884 Hamdan H, Govaert G (2005) Mixture model clustering of uncertain data. In: The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ’05. IEEE, pp 879–884
27.
go back to reference Kriegel H-P, Pfeifle M (2005) Hierarchical density-based clustering of uncertain data. In: Fifth IEEE International Conference on Data Mining (ICDM’05). IEEE, pp 4–pp Kriegel H-P, Pfeifle M (2005) Hierarchical density-based clustering of uncertain data. In: Fifth IEEE International Conference on Data Mining (ICDM’05). IEEE, pp 4–pp
28.
go back to reference Wang Y, Dong J, Zhou J, et al (2017) Fuzzy c-medoids method based on JS-divergence for uncertain data clustering. In: 2017 4th International Conference on Information, Cybernetics and Computational Social Systems (ICCSS). IEEE, pp 312–315 Wang Y, Dong J, Zhou J, et al (2017) Fuzzy c-medoids method based on JS-divergence for uncertain data clustering. In: 2017 4th International Conference on Information, Cybernetics and Computational Social Systems (ICCSS). IEEE, pp 312–315
29.
go back to reference Patra BK, Nandi S, Viswanath P (2011) A distance based clustering method for arbitrary shaped clusters in large datasets. Pattern Recognit 44:2862–2870CrossRef Patra BK, Nandi S, Viswanath P (2011) A distance based clustering method for arbitrary shaped clusters in large datasets. Pattern Recognit 44:2862–2870CrossRef
30.
go back to reference Cha S-H (2007) Comprehensive survey on distance/similarity measures between probability density functions. City 1:1 Cha S-H (2007) Comprehensive survey on distance/similarity measures between probability density functions. City 1:1
31.
go back to reference Cui M, Lin Y (2009) Nonlinear numerical analysis in reproducing kernel space. Nova Science Publishers Inc., NewYorkMATH Cui M, Lin Y (2009) Nonlinear numerical analysis in reproducing kernel space. Nova Science Publishers Inc., NewYorkMATH
32.
go back to reference Fan J, Heckman NE, Wand MP (1995) Local polynomial kernel regression for generalized linear models and quasi-likelihood functions. J Am Stat Assoc 90:141–150MathSciNetCrossRef Fan J, Heckman NE, Wand MP (1995) Local polynomial kernel regression for generalized linear models and quasi-likelihood functions. J Am Stat Assoc 90:141–150MathSciNetCrossRef
33.
go back to reference Zhong W-M, He G-L, Pi D-Y, Sun Y-X (2005) SVM with quadratic polynomial kernel function based nonlinear model one-step-ahead predictive control. Chin J Chem Eng 13:373–379 Zhong W-M, He G-L, Pi D-Y, Sun Y-X (2005) SVM with quadratic polynomial kernel function based nonlinear model one-step-ahead predictive control. Chin J Chem Eng 13:373–379
35.
go back to reference Musavi MT, Ahmed W, Chan KH et al (1992) On the training of radial basis function classifiers. Neural Netw 5:595–603CrossRef Musavi MT, Ahmed W, Chan KH et al (1992) On the training of radial basis function classifiers. Neural Netw 5:595–603CrossRef
36.
go back to reference Krishnapuram R, Joshi A, Yi L (1999) A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering. In: FUZZ-IEEE’99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No. 99CH36315). IEEE, pp 1281–1286 Krishnapuram R, Joshi A, Yi L (1999) A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering. In: FUZZ-IEEE’99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No. 99CH36315). IEEE, pp 1281–1286
37.
go back to reference Cover TM, Thomas JA (2012) Elements of information theory. John Wiley & SonsMATH Cover TM, Thomas JA (2012) Elements of information theory. John Wiley & SonsMATH
38.
go back to reference Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice hall, New JerseyMATH Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice hall, New JerseyMATH
39.
go back to reference Briët J, Harremoës P (2009) Properties of classical and quantum Jensen-Shannon divergence. Phys Rev A 79:052311CrossRef Briët J, Harremoës P (2009) Properties of classical and quantum Jensen-Shannon divergence. Phys Rev A 79:052311CrossRef
40.
go back to reference Fuglede B, Topsoe F (2004) Jensen-Shannon divergence and Hilbert space embedding. In: International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings. IEEE, p 31 Fuglede B, Topsoe F (2004) Jensen-Shannon divergence and Hilbert space embedding. In: International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings. IEEE, p 31
41.
go back to reference Bhattacharyya A (1946) On a measure of divergence between two multinomial populations. Sankhyā Indian J Stat 7(4):401–406MathSciNetMATH Bhattacharyya A (1946) On a measure of divergence between two multinomial populations. Sankhyā Indian J Stat 7(4):401–406MathSciNetMATH
42.
43.
go back to reference Chernoff H (1952) A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann Math Stat 23(4):493–507MathSciNetCrossRef Chernoff H (1952) A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann Math Stat 23(4):493–507MathSciNetCrossRef
44.
go back to reference Zhou S, Chellappa R (2004) Probabilistic distance measures in reproducing kernel Hilbert space. SCR Technical Report, University of Maryland, USA Zhou S, Chellappa R (2004) Probabilistic distance measures in reproducing kernel Hilbert space. SCR Technical Report, University of Maryland, USA
45.
go back to reference Zhou SK, Chellappa R (2006) From sample similarity to ensemble similarity: probabilistic distance measures in reproducing kernel hilbert space. IEEE Trans Pattern Anal Mach Intell 28:917–929CrossRef Zhou SK, Chellappa R (2006) From sample similarity to ensemble similarity: probabilistic distance measures in reproducing kernel hilbert space. IEEE Trans Pattern Anal Mach Intell 28:917–929CrossRef
46.
go back to reference Zhang H, Guo H, Wang X et al (2020) Clothescounter: a framework for star-oriented clothes mining from videos. Neurocomputing 377:38–48CrossRef Zhang H, Guo H, Wang X et al (2020) Clothescounter: a framework for star-oriented clothes mining from videos. Neurocomputing 377:38–48CrossRef
47.
go back to reference Graves D, Pedrycz W (2010) Kernel-based fuzzy clustering and fuzzy clustering: A comparative experimental study. Fuzzy Sets Syst 161:522–543MathSciNetCrossRef Graves D, Pedrycz W (2010) Kernel-based fuzzy clustering and fuzzy clustering: A comparative experimental study. Fuzzy Sets Syst 161:522–543MathSciNetCrossRef
48.
go back to reference Campello RJ (2007) A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognit Lett 28:833–841CrossRef Campello RJ (2007) A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognit Lett 28:833–841CrossRef
49.
go back to reference Huang H-C, Chuang Y-Y, Chen C-S (2011) Multiple kernel fuzzy clustering. IEEE Trans Fuzzy Syst 20:120–134CrossRef Huang H-C, Chuang Y-Y, Chen C-S (2011) Multiple kernel fuzzy clustering. IEEE Trans Fuzzy Syst 20:120–134CrossRef
50.
go back to reference Lei Y, Bezdek JC, Chan J et al (2016) Extending information-theoretic validity indices for fuzzy clustering. IEEE Trans Fuzzy Syst 25:1013–1018CrossRef Lei Y, Bezdek JC, Chan J et al (2016) Extending information-theoretic validity indices for fuzzy clustering. IEEE Trans Fuzzy Syst 25:1013–1018CrossRef
52.
go back to reference Asuncion A, Newman D (2007) UCI machine learning repository Asuncion A, Newman D (2007) UCI machine learning repository
Metadata
Title
Fuzzy kernel K-medoids clustering algorithm for uncertain data objects
Authors
Behnam Tavakkol
Youngdoo Son
Publication date
26-05-2021
Publisher
Springer London
Published in
Pattern Analysis and Applications / Issue 3/2021
Print ISSN: 1433-7541
Electronic ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-021-00983-z

Other articles of this Issue 3/2021

Pattern Analysis and Applications 3/2021 Go to the issue

Premium Partner