Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 4/2023

22.10.2022 | Original Article

Balance-driven automatic clustering for probability density functions using metaheuristic optimization

verfasst von: Thao Nguyen-Trang, Trung Nguyen-Thoi, Kim-Ngan Nguyen-Thi, Tai Vo-Van

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 4/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

For solving the clustering for probability density functions (CDF) problem with a given number of clusters, the metaheuristic optimization (MO) algorithms have been widely studied because of their advantages in searching for the global optimum. However, the existing approaches cannot be directly extended to the automatic CDF problem for determining the number of clusters k. Besides, balance-driven clustering, an essential research direction recently developed in the problem of discrete-element clustering, has not been considered in the field of CDF. This paper pioneers a technique to apply an MO algorithm for resolving the balance-driven automatic CDF. The proposed method not only can automatically determine the number of clusters but also can approximate the global optimal solution in which both the clustering compactness and the clusters’ size similarity are considered. The experiments on one-dimensional and multidimensional probability density functions demonstrate that the new method possesses higher quality clustering solutions than the other conventional techniques. The proposed method is also applied in analyzing the difficulty levels of entrance exam questions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Banerjee A, Ghosh J (2004) Frequency-sensitive competitive learning for scalable balanced clustering on high-dimensional hyperspheres. IEEE Trans Neural Netw 15(3):702–719CrossRef Banerjee A, Ghosh J (2004) Frequency-sensitive competitive learning for scalable balanced clustering on high-dimensional hyperspheres. IEEE Trans Neural Netw 15(3):702–719CrossRef
2.
Zurück zum Zitat Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203CrossRef Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203CrossRef
4.
Zurück zum Zitat Chen JH, Hung WL (2015) An automatic clustering algorithm for probability density functions. J Stat Comput Simul 85(15):3047–3063MathSciNetCrossRefMATH Chen JH, Hung WL (2015) An automatic clustering algorithm for probability density functions. J Stat Comput Simul 85(15):3047–3063MathSciNetCrossRefMATH
5.
Zurück zum Zitat Chen JH, Hung WL (2021) A jackknife entropy-based clustering algorithm for probability density functions. J Stat Comput Simul 91(5):861–875MathSciNetCrossRefMATH Chen JH, Hung WL (2021) A jackknife entropy-based clustering algorithm for probability density functions. J Stat Comput Simul 91(5):861–875MathSciNetCrossRefMATH
6.
Zurück zum Zitat Chen TL, Shiu SY (2007) A new clustering algorithm based on self-updating process. In: JSM proceedings, statistical computing section, Salt Lake City, Utah, pp 2034–2038 Chen TL, Shiu SY (2007) A new clustering algorithm based on self-updating process. In: JSM proceedings, statistical computing section, Salt Lake City, Utah, pp 2034–2038
3.
Zurück zum Zitat Chen J, Chang Y, Hung W (2018) A robust automatic clustering algorithm for probability density functions with application to categorizing color images. Commun Stat Simul Comput 47(7):2152–2168MathSciNetCrossRefMATH Chen J, Chang Y, Hung W (2018) A robust automatic clustering algorithm for probability density functions with application to categorizing color images. Commun Stat Simul Comput 47(7):2152–2168MathSciNetCrossRefMATH
7.
Zurück zum Zitat Costa LR, Aloise D, Mladenovic N (2017) Less is more: basic variable neighborhood search heuristic for balanced minimum sum-of-squares clustering. Inf Sci 415:247–253CrossRef Costa LR, Aloise D, Mladenovic N (2017) Less is more: basic variable neighborhood search heuristic for balanced minimum sum-of-squares clustering. Inf Sci 415:247–253CrossRef
8.
Zurück zum Zitat Deep K, Singh KP, Kansal ML et al (2009) A real coded genetic algorithm for solving integer and mixed integer optimization problems. Appl Math Comput 212(2):505–518MathSciNetMATH Deep K, Singh KP, Kansal ML et al (2009) A real coded genetic algorithm for solving integer and mixed integer optimization problems. Appl Math Comput 212(2):505–518MathSciNetMATH
9.
Zurück zum Zitat Demiriz A, Bennett KP, Bradley PS (2008) Using assignment constraints to avoid empty clusters in k-means clustering. Constrained clustering: advances in algorithms, theory, and applications, p 201 Demiriz A, Bennett KP, Bradley PS (2008) Using assignment constraints to avoid empty clusters in k-means clustering. Constrained clustering: advances in algorithms, theory, and applications, p 201
10.
Zurück zum Zitat Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc: Ser B (Methodol) 39(1):1–22MathSciNetMATH Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc: Ser B (Methodol) 39(1):1–22MathSciNetMATH
11.
Zurück zum Zitat Diem HK, Trung VD, Trung NT et al (2018) A differential evolution-based clustering for probability density functions. IEEE Access 6:41325–41336CrossRef Diem HK, Trung VD, Trung NT et al (2018) A differential evolution-based clustering for probability density functions. IEEE Access 6:41325–41336CrossRef
12.
Zurück zum Zitat Elsisi M (2019) Future search algorithm for optimization. Evol Intel 12(1):21–31CrossRef Elsisi M (2019) Future search algorithm for optimization. Evol Intel 12(1):21–31CrossRef
13.
Zurück zum Zitat Ester M, Kriegel HP, Sander J et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, pp 226–231 Ester M, Kriegel HP, Sander J et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, pp 226–231
14.
Zurück zum Zitat Everitt BS (1985) Mixture distributions-I. Encyclopedia of statistical sciences Everitt BS (1985) Mixture distributions-I. Encyclopedia of statistical sciences
15.
Zurück zum Zitat Fayyad UM, Reina C, Bradley PS (1998) Initialization of iterative refinement clustering algorithms. In: KDD, pp 194–198 Fayyad UM, Reina C, Bradley PS (1998) Initialization of iterative refinement clustering algorithms. In: KDD, pp 194–198
16.
Zurück zum Zitat Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188CrossRef Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188CrossRef
17.
Zurück zum Zitat Fukunaga K (2013) Introduction to statistical pattern recognition. Academic Press Inc, San DiegoMATH Fukunaga K (2013) Introduction to statistical pattern recognition. Academic Press Inc, San DiegoMATH
18.
Zurück zum Zitat Goh A, Vidal R (2008) Unsupervised Riemannian clustering of probability density functions. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp 377–392 Goh A, Vidal R (2008) Unsupervised Riemannian clustering of probability density functions. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp 377–392
19.
Zurück zum Zitat Hellinger E (1909) Neue begründung der theorie quadratischer formen von unendlichvielen veränderlichen. Journal für die Reine und Angewandte Mathematik 1909(136):210–271CrossRefMATH Hellinger E (1909) Neue begründung der theorie quadratischer formen von unendlichvielen veränderlichen. Journal für die Reine und Angewandte Mathematik 1909(136):210–271CrossRefMATH
20.
Zurück zum Zitat Ho-Kieu D, Vo-Van T, Nguyen-Trang T (2018) Clustering for probability density functions by new-medoids method. Scientific Programming Ho-Kieu D, Vo-Van T, Nguyen-Trang T (2018) Clustering for probability density functions by new-medoids method. Scientific Programming
21.
Zurück zum Zitat Holland JH et al (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT Press, LondonCrossRef Holland JH et al (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT Press, LondonCrossRef
22.
Zurück zum Zitat Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall Inc, HobokenMATH Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall Inc, HobokenMATH
23.
Zurück zum Zitat Kaufmann L (1987) Clustering by means of medoids. In: Proc. Statistical Data Analysis Based on the L1 Norm Conference, Neuchatel, pp 405–416 Kaufmann L (1987) Clustering by means of medoids. In: Proc. Statistical Data Analysis Based on the L1 Norm Conference, Neuchatel, pp 405–416
24.
Zurück zum Zitat Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95-international conference on neural networks, IEEE, pp 1942–1948 Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95-international conference on neural networks, IEEE, pp 1942–1948
25.
Zurück zum Zitat Kim J, Billard L (2018) Double monothetic clustering for histogram-valued data. Commun Stat Appl Methods 25(3):263–274 Kim J, Billard L (2018) Double monothetic clustering for histogram-valued data. Commun Stat Appl Methods 25(3):263–274
26.
Zurück zum Zitat Lebesgue H (1902) Intégrale, longueur, aire. Annali di Matematica Pura ed Applicata (1898-1922) 7(1):231–359CrossRefMATH Lebesgue H (1902) Intégrale, longueur, aire. Annali di Matematica Pura ed Applicata (1898-1922) 7(1):231–359CrossRefMATH
27.
Zurück zum Zitat Li L, Zhou X, Li Y et al (2020) An improved genetic algorithm with Lagrange and density method for clustering. Concurr Comput Pract Exp 32(24):e5969CrossRef Li L, Zhou X, Li Y et al (2020) An improved genetic algorithm with Lagrange and density method for clustering. Concurr Comput Pract Exp 32(24):e5969CrossRef
28.
Zurück zum Zitat Liao Y, Qi H, Li W (2012) Load-balanced clustering algorithm with distributed self-organization for wireless sensor networks. IEEE Sens J 13(5):1498–1506CrossRef Liao Y, Qi H, Li W (2012) Load-balanced clustering algorithm with distributed self-organization for wireless sensor networks. IEEE Sens J 13(5):1498–1506CrossRef
29.
Zurück zum Zitat Liu H, Han J, Nie F et al (2017) Balanced clustering with least square regression. In: Proceedings of the AAAI Conference on Artificial Intelligence Liu H, Han J, Nie F et al (2017) Balanced clustering with least square regression. In: Proceedings of the AAAI Conference on Artificial Intelligence
30.
Zurück zum Zitat MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA, pp 281–297 MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA, pp 281–297
31.
Zurück zum Zitat Malinen MI, Fränti P (2014) Balanced k-means for clustering. In: Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), Springer, pp 32–41 Malinen MI, Fränti P (2014) Balanced k-means for clustering. In: Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), Springer, pp 32–41
32.
Zurück zum Zitat Matusita K (1967) On the notion of affinity of several distributions and some of its applications. Ann Inst Stat Math 19(1):181–192MathSciNetCrossRefMATH Matusita K (1967) On the notion of affinity of several distributions and some of its applications. Ann Inst Stat Math 19(1):181–192MathSciNetCrossRefMATH
33.
34.
Zurück zum Zitat Mukhopadhyay A, Maulik U, Bandyopadhyay S (2015) A survey of multiobjective evolutionary clustering. ACM Comput Surv (CSUR) 47(4):1–46CrossRef Mukhopadhyay A, Maulik U, Bandyopadhyay S (2015) A survey of multiobjective evolutionary clustering. ACM Comput Surv (CSUR) 47(4):1–46CrossRef
35.
Zurück zum Zitat Nguyen-Trang T, Nguyen-Thoi T, Truong-Khac T et al (2019) An efficient hybrid optimization approach using adaptive elitist differential evolution and spherical quadratic steepest descent and its application for clustering. Scientific Programming Nguyen-Trang T, Nguyen-Thoi T, Truong-Khac T et al (2019) An efficient hybrid optimization approach using adaptive elitist differential evolution and spherical quadratic steepest descent and its application for clustering. Scientific Programming
36.
Zurück zum Zitat Pham-Toan D, Vo-Van T, Pham-Chau A et al (2019) A new binary adaptive elitist differential evolution based automatic k-medoids clustering for probability density functions. Mathematical Problems in Engineering Pham-Toan D, Vo-Van T, Pham-Chau A et al (2019) A new binary adaptive elitist differential evolution based automatic k-medoids clustering for probability density functions. Mathematical Problems in Engineering
37.
Zurück zum Zitat Storn R, Price K (1997) Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359MathSciNetCrossRefMATH Storn R, Price K (1997) Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359MathSciNetCrossRefMATH
38.
Zurück zum Zitat Tai V, Thao N, Ha C (2016) Clustering for probability density functions based on genetic algorithm. In: Applied Mathematics in Engineering and Reliability, Proceedings of the 1st International Conference on Applied Mathematics in Engineering and Reliability (Ho Chi Minh City, Vietnam, May 2016), pp 51–57 Tai V, Thao N, Ha C (2016) Clustering for probability density functions based on genetic algorithm. In: Applied Mathematics in Engineering and Reliability, Proceedings of the 1st International Conference on Applied Mathematics in Engineering and Reliability (Ho Chi Minh City, Vietnam, May 2016), pp 51–57
39.
Zurück zum Zitat Toussaint GT (1972) Feature evaluation criteria and contextual decoding algorithms in statistical pattern recognition. PhD thesis, University of British Columbia Toussaint GT (1972) Feature evaluation criteria and contextual decoding algorithms in statistical pattern recognition. PhD thesis, University of British Columbia
42.
Zurück zum Zitat Vo-Van T, Nguyen-Hai A, Tat-Hong M et al (2020) A new clustering algorithm and its application in assessing the quality of underground water. Scientific Programming Vo-Van T, Nguyen-Hai A, Tat-Hong M et al (2020) A new clustering algorithm and its application in assessing the quality of underground water. Scientific Programming
44.
Zurück zum Zitat VoVan T, NguyenTrang T (2018) Similar coefficient for cluster of probability density functions. Commun Stat Theory Methods 47(8):1792–1811MathSciNetCrossRefMATH VoVan T, NguyenTrang T (2018) Similar coefficient for cluster of probability density functions. Commun Stat Theory Methods 47(8):1792–1811MathSciNetCrossRefMATH
45.
Zurück zum Zitat Webb AR (2003) Statistical pattern recognition. Wiley, EnglandMATH Webb AR (2003) Statistical pattern recognition. Wiley, EnglandMATH
46.
Zurück zum Zitat Xu L, Hu Q, Hung E et al (2015) Large margin clustering on uncertain data by considering probability distribution similarity. Neurocomputing 158:81–89CrossRef Xu L, Hu Q, Hung E et al (2015) Large margin clustering on uncertain data by considering probability distribution similarity. Neurocomputing 158:81–89CrossRef
47.
Zurück zum Zitat Zhang Y, Wang JZ, Li J (2015) Parallel massive clustering of discrete distributions. ACM Trans Multimed Comput Commun Appl (TOMM) 11(4):1–24CrossRef Zhang Y, Wang JZ, Li J (2015) Parallel massive clustering of discrete distributions. ACM Trans Multimed Comput Commun Appl (TOMM) 11(4):1–24CrossRef
48.
Zurück zum Zitat Zhou Q, Hao JK, Wu Q (2021) Responsive threshold search based memetic algorithm for balanced minimum sum-of-squares clustering. Inf Sci 569:184–204MathSciNetCrossRef Zhou Q, Hao JK, Wu Q (2021) Responsive threshold search based memetic algorithm for balanced minimum sum-of-squares clustering. Inf Sci 569:184–204MathSciNetCrossRef
49.
Zurück zum Zitat Zong Y, Xu G, Zhang Y et al (2010) A robust iterative refinement clustering algorithm with smoothing search space. Knowl-Based Syst 23(5):389–396CrossRef Zong Y, Xu G, Zhang Y et al (2010) A robust iterative refinement clustering algorithm with smoothing search space. Knowl-Based Syst 23(5):389–396CrossRef
Metadaten
Titel
Balance-driven automatic clustering for probability density functions using metaheuristic optimization
verfasst von
Thao Nguyen-Trang
Trung Nguyen-Thoi
Kim-Ngan Nguyen-Thi
Tai Vo-Van
Publikationsdatum
22.10.2022
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 4/2023
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-022-01683-8

Weitere Artikel der Ausgabe 4/2023

International Journal of Machine Learning and Cybernetics 4/2023 Zur Ausgabe

Neuer Inhalt