Skip to main content
Erschienen in: Soft Computing 2/2018

14.02.2017 | Foundations

Enhancing point symmetry-based distance for data clustering

verfasst von: Sriparna Saha

Erschienen in: Soft Computing | Ausgabe 2/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, at first a new point symmetry-based similarity measurement is proposed which satisfies the closure and the symmetry properties of any distance function. The different desirable properties of the new distance are elaborately explained. Thereafter a new clustering algorithm based on the search capability of genetic algorithm is developed where the newly developed point symmetry-based distance is used for cluster assignment. The allocation of points to different clusters is performed in such a way that the closure property is satisfied. The proposed GA with newly developed point symmetry distance based (GAnPS) clustering algorithm is capable of determining different symmetrical shaped clusters having any sizes or convexities. The effectiveness of the proposed GAnPS clustering technique in identifying the proper partitioning is shown for twenty-one data sets having various characteristics. Performance of GAnPS is compared with existing symmetry-based genetic clustering technique, GAPS, three popular and well-known clustering techniques, K-means, expectation maximization and average linkage algorithm. In a part of the paper, the utility of the proposed clustering technique is shown for partitioning a remote sensing satellite image. The last part of the paper deals with the development of some automatic clustering techniques using the newly proposed symmetry-based distance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Alander JT (1992) On optimal population size of genetic algorithms. In: Proceedings of computer systems and software engineering, CompEuro ’92, The Hague , Netherlands, pp 65–70 Alander JT (1992) On optimal population size of genetic algorithms. In: Proceedings of computer systems and software engineering, CompEuro ’92, The Hague , Netherlands, pp 65–70
Zurück zum Zitat Alok AK, Saha S, Ekbal A (2015) A new semi-supervised clustering technique using multi-objective optimization. Appl Intell 43(3):633–661CrossRef Alok AK, Saha S, Ekbal A (2015) A new semi-supervised clustering technique using multi-objective optimization. Appl Intell 43(3):633–661CrossRef
Zurück zum Zitat Anderberg MR (2000) Computational geometry: algorithms and applications. Springer, Heidelberg Anderberg MR (2000) Computational geometry: algorithms and applications. Springer, Heidelberg
Zurück zum Zitat Bandyopadhyay S, Maulik U (2001) Nonparametric genetic clustering: comparison of validity indices. IEEE Trans Syst Man Cybern 31(1):120–125CrossRef Bandyopadhyay S, Maulik U (2001) Nonparametric genetic clustering: comparison of validity indices. IEEE Trans Syst Man Cybern 31(1):120–125CrossRef
Zurück zum Zitat Bandyopadhyay S, Maulik U (2002) Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recognit 35(6):1197–1208CrossRefMATH Bandyopadhyay S, Maulik U (2002) Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recognit 35(6):1197–1208CrossRefMATH
Zurück zum Zitat Bandyopadhyay S, Saha S (2007) GAPS: a clustering method using a new point symmetry based distance measure. Pattern Recognit 40(12):3430–3451CrossRefMATH Bandyopadhyay S, Saha S (2007) GAPS: a clustering method using a new point symmetry based distance measure. Pattern Recognit 40(12):3430–3451CrossRefMATH
Zurück zum Zitat Bandyopadhyay S, Saha S (2008) A point symmetry based clustering technique for automatic evolution of clusters. IEEE Trans Knowl Data Eng 20(11):1–17CrossRef Bandyopadhyay S, Saha S (2008) A point symmetry based clustering technique for automatic evolution of clusters. IEEE Trans Knowl Data Eng 20(11):1–17CrossRef
Zurück zum Zitat Bandyopadhyay S, Saha S (2013) Unsupervised classification–similarity measures, classical and metaheuristic approaches, and applications. Springer, BerlinMATH Bandyopadhyay S, Saha S (2013) Unsupervised classification–similarity measures, classical and metaheuristic approaches, and applications. Springer, BerlinMATH
Zurück zum Zitat Bentley JL, Weide BW, Yao AC (1980) Optimal expected-time algorithms for closest point problems. ACM Trans Math Softw 6(4):563–580MathSciNetCrossRefMATH Bentley JL, Weide BW, Yao AC (1980) Optimal expected-time algorithms for closest point problems. ACM Trans Math Softw 6(4):563–580MathSciNetCrossRefMATH
Zurück zum Zitat Bezdek JC (1973) Fuzzy mathematics in pattern classification. PhD thesis, Cornell University, Ithaca, NY Bezdek JC (1973) Fuzzy mathematics in pattern classification. PhD thesis, Cornell University, Ithaca, NY
Zurück zum Zitat Bong CW, Rajeswari M (2012) Multiobjective clustering with metaheuristic: current trends and methods in image segmentation. Image Process IET 6:1–10MathSciNetCrossRef Bong CW, Rajeswari M (2012) Multiobjective clustering with metaheuristic: current trends and methods in image segmentation. Image Process IET 6:1–10MathSciNetCrossRef
Zurück zum Zitat Chou C-H, Su M-C, Lai E (2002) Symmetry as a new measure for cluster validity. In: 2nd WSEAS international conference on scientific computation and soft computing, pp 209–213 Chou C-H, Su M-C, Lai E (2002) Symmetry as a new measure for cluster validity. In: 2nd WSEAS international conference on scientific computation and soft computing, pp 209–213
Zurück zum Zitat Chung K-L, Lin J-S (2007) Faster and more robust point symmetry-based K-means algorithm. Pattern Recognit 40(2):410–422CrossRefMATH Chung K-L, Lin J-S (2007) Faster and more robust point symmetry-based K-means algorithm. Pattern Recognit 40(2):410–422CrossRefMATH
Zurück zum Zitat Deb K, Agrawal S (1998) Understanding interactions among genetic algorithm parameters. In: In foundations of genetic algorithms 5, pp 265–286, Morgan Kaufmann Deb K, Agrawal S (1998) Understanding interactions among genetic algorithm parameters. In: In foundations of genetic algorithms 5, pp 265–286, Morgan Kaufmann
Zurück zum Zitat Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH
Zurück zum Zitat Everitt BS, Landau S, Leese M (2001) Cluster analysis. Arnold, LondonMATH Everitt BS, Landau S, Leese M (2001) Cluster analysis. Arnold, LondonMATH
Zurück zum Zitat Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 3:179–188CrossRef Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 3:179–188CrossRef
Zurück zum Zitat Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701CrossRefMATH Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701CrossRefMATH
Zurück zum Zitat Friedman JH, Bently JL, Finkel RA (1977) An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw 3(3):209–226CrossRefMATH Friedman JH, Bently JL, Finkel RA (1977) An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw 3(3):209–226CrossRefMATH
Zurück zum Zitat Furutani H, Sakamoto M, Katayama S (2005) Influence of finite population size–extinction of favorable schemata. ICNC 2:1025–1034 Furutani H, Sakamoto M, Katayama S (2005) Influence of finite population size–extinction of favorable schemata. ICNC 2:1025–1034
Zurück zum Zitat Furutani H, Fujimaru T, Zhang Y-A, Sakamoto M (2007) Effects of population size on computational performance of genetic algorithm on multiplicative landscape. In: Proceedings of the third international conference on natural computation, vol 03, ICNC ’07, Washington, DC, USA, pp 488–496, IEEE Computer Society Furutani H, Fujimaru T, Zhang Y-A, Sakamoto M (2007) Effects of population size on computational performance of genetic algorithm on multiplicative landscape. In: Proceedings of the third international conference on natural computation, vol 03, ICNC ’07, Washington, DC, USA, pp 488–496, IEEE Computer Society
Zurück zum Zitat Garcia S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data setsfor all pairwise comparisons. J Mach Learn Res 9:2677–2694MATH Garcia S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data setsfor all pairwise comparisons. J Mach Learn Res 9:2677–2694MATH
Zurück zum Zitat Garcia-Piquer A, Fornells A, Bacardit J, Orriols-Puig A, Golobardes E (2014) Large-scale experimental evaluation of cluster representations for multiobjective evolutionary clustering. IEEE Trans Evol Comput 18:36–53CrossRef Garcia-Piquer A, Fornells A, Bacardit J, Orriols-Puig A, Golobardes E (2014) Large-scale experimental evaluation of cluster representations for multiobjective evolutionary clustering. IEEE Trans Evol Comput 18:36–53CrossRef
Zurück zum Zitat Goldberg DE (1989a) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, New YorkMATH Goldberg DE (1989a) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, New YorkMATH
Zurück zum Zitat Goldberg DE (1989b) Sizing populations for serial and parallel genetic algorithms. In: Proceedings of the third international conference on Genetic algorithms, San Francisco, CA, USA, pp 70–79, Morgan Kaufmann Publishers Inc Goldberg DE (1989b) Sizing populations for serial and parallel genetic algorithms. In: Proceedings of the third international conference on Genetic algorithms, San Francisco, CA, USA, pp 70–79, Morgan Kaufmann Publishers Inc
Zurück zum Zitat Goldberg DE, Deb K (1991) A comparative analysis of selection schemes used in gas. In: Foundations of GAs (FOGA), pp 69–93 Goldberg DE, Deb K (1991) A comparative analysis of selection schemes used in gas. In: Foundations of GAs (FOGA), pp 69–93
Zurück zum Zitat Goldberg DE, Deb K, Clark JH (1992) Genetic algorithms, noise, and the sizing of populations. Complex Syst 6:333–362MATH Goldberg DE, Deb K, Clark JH (1992) Genetic algorithms, noise, and the sizing of populations. Complex Syst 6:333–362MATH
Zurück zum Zitat Goldberg DE, Kargupta H, Horn J, Cantu-Paz E (1995) Critical deme size for serial and parallel genetic algorithms, tech. rep., The Illinois GA Lab, University of Illinois, IlliGAL. Report 95002 Goldberg DE, Kargupta H, Horn J, Cantu-Paz E (1995) Critical deme size for serial and parallel genetic algorithms, tech. rep., The Illinois GA Lab, University of Illinois, IlliGAL. Report 95002
Zurück zum Zitat Grefenstette J (1986) Optimization of control parameters for genetic algorithms. IEEE Trans Syst Man Cybern 16:122–128CrossRef Grefenstette J (1986) Optimization of control parameters for genetic algorithms. IEEE Trans Syst Man Cybern 16:122–128CrossRef
Zurück zum Zitat Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update; SIGKDD explorations. IEEE Trans Pattern Anal Mach Intell 11(1):10–18 Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update; SIGKDD explorations. IEEE Trans Pattern Anal Mach Intell 11(1):10–18
Zurück zum Zitat Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. IEEE Trans Evol Comput 11(1):56–76CrossRef Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. IEEE Trans Evol Comput 11(1):56–76CrossRef
Zurück zum Zitat Handl J, Knowles J (2013) Evidence accumulation in multiobjective data clustering. In: Purshouse R, Fleming P, Fonseca C, Greco S, Shaw J (eds) Evolutionary multi-criterion optimization, vol 7811., Lecture Notes in Computer ScienceBerlin, Springer, pp 543–557CrossRef Handl J, Knowles J (2013) Evidence accumulation in multiobjective data clustering. In: Purshouse R, Fleming P, Fonseca C, Greco S, Shaw J (eds) Evolutionary multi-criterion optimization, vol 7811., Lecture Notes in Computer ScienceBerlin, Springer, pp 543–557CrossRef
Zurück zum Zitat Holland JH (1975) Adaptation in natural and artificial systems. The University of Michigan Press, Ann Arbor Holland JH (1975) Adaptation in natural and artificial systems. The University of Michigan Press, Ann Arbor
Zurück zum Zitat Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323CrossRef Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323CrossRef
Zurück zum Zitat Jardine N, Sibson R (1971) Mathematical taxonomy. Wiley, New YorkMATH Jardine N, Sibson R (1971) Mathematical taxonomy. Wiley, New YorkMATH
Zurück zum Zitat Maulik U, Bandyopadhyay S (2003) Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification. IEEE Trans Geosci Remote Sens 41(5):1075–1081CrossRef Maulik U, Bandyopadhyay S (2003) Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification. IEEE Trans Geosci Remote Sens 41(5):1075–1081CrossRef
Zurück zum Zitat Nanda SJ, Panda G (2014) A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evolut Comput 16:1–18CrossRef Nanda SJ, Panda G (2014) A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evolut Comput 16:1–18CrossRef
Zurück zum Zitat Nemenyi P (1963) Distribution-free multiple comparisons. PhD thesis Nemenyi P (1963) Distribution-free multiple comparisons. PhD thesis
Zurück zum Zitat Pal P, Chanda B (2002) A symmetry based clustering technique for multi-spectral satellite imagery. In: ICVGIP Pal P, Chanda B (2002) A symmetry based clustering technique for multi-spectral satellite imagery. In: ICVGIP
Zurück zum Zitat Richards JA (1993) Remote sensing digital image analysis: an introduction. Springer, New YorkCrossRef Richards JA (1993) Remote sensing digital image analysis: an introduction. Springer, New YorkCrossRef
Zurück zum Zitat Saha S, Bandyopadhyay S (2008) Application of a new symmetry based cluster validity index for satellite image segmentation. IEEE Geosci Remote Sens Lett 5(2):166–170CrossRef Saha S, Bandyopadhyay S (2008) Application of a new symmetry based cluster validity index for satellite image segmentation. IEEE Geosci Remote Sens Lett 5(2):166–170CrossRef
Zurück zum Zitat Saha S, Bandyopadhyay S (2009a) A new multiobjective simulated annealing based clustering technique using symmetry. Pattern Recognit Lett 30(15):1392–1403CrossRef Saha S, Bandyopadhyay S (2009a) A new multiobjective simulated annealing based clustering technique using symmetry. Pattern Recognit Lett 30(15):1392–1403CrossRef
Zurück zum Zitat Saha S, Bandyopadhyay S (2009b) A new line symmetry distance and its application to data clustering. J Comput Sci Technol 24(3):544–556CrossRef Saha S, Bandyopadhyay S (2009b) A new line symmetry distance and its application to data clustering. J Comput Sci Technol 24(3):544–556CrossRef
Zurück zum Zitat Saha S, Bandyopadhyay S (2010a) A symmetry based multiobjective clustering technique for automatic evolution of clusters. Pattern Recognit 43(3):738–751CrossRefMATH Saha S, Bandyopadhyay S (2010a) A symmetry based multiobjective clustering technique for automatic evolution of clusters. Pattern Recognit 43(3):738–751CrossRefMATH
Zurück zum Zitat Saha S, Bandyopadhyay S (2010b) A new multiobjective clustering technique based on the concepts of stability and symmetry. Knowl Inf Syst 23(1):1–27CrossRef Saha S, Bandyopadhyay S (2010b) A new multiobjective clustering technique based on the concepts of stability and symmetry. Knowl Inf Syst 23(1):1–27CrossRef
Zurück zum Zitat Saha S, Bandyopadhyay S (2011) On principle axis based line symmetry clustering techniques. Memet Comput 3(2):129–144CrossRef Saha S, Bandyopadhyay S (2011) On principle axis based line symmetry clustering techniques. Memet Comput 3(2):129–144CrossRef
Zurück zum Zitat Saha S, Bandyopadhyay S (2013) A generalized automatic clustering algorithm in a multiobjective framework. Appl Soft Comput 13:89–108CrossRef Saha S, Bandyopadhyay S (2013) A generalized automatic clustering algorithm in a multiobjective framework. Appl Soft Comput 13:89–108CrossRef
Zurück zum Zitat Saha S, Maulik U (2011) A new line symmetry distance based automatic clustering technique: application to image segmentation. Int J Imaging Syst Technol 21(1):86–100CrossRef Saha S, Maulik U (2011) A new line symmetry distance based automatic clustering technique: application to image segmentation. Int J Imaging Syst Technol 21(1):86–100CrossRef
Zurück zum Zitat Saha S, Spandana R, Ekbal A, Bandyopadhyay S (2015) Simultaneous feature selection and symmetry based clustering using multiobjective framework. Appl Soft Comput 29:479–486 Saha S, Spandana R, Ekbal A, Bandyopadhyay S (2015) Simultaneous feature selection and symmetry based clustering using multiobjective framework. Appl Soft Comput 29:479–486
Zurück zum Zitat Sheng W, Swift S, Zhang L, Liu X (2005) A weighted sum validity function for clustering with a hybrid niching genetic algorithm. IEEE Trans Syst Man Cybern Part B Cybern 35(6):56–67CrossRef Sheng W, Swift S, Zhang L, Liu X (2005) A weighted sum validity function for clustering with a hybrid niching genetic algorithm. IEEE Trans Syst Man Cybern Part B Cybern 35(6):56–67CrossRef
Zurück zum Zitat Srinivas M, Patnaik LM (1994) Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Trans Syst Man Cybern 24(4):656–667CrossRef Srinivas M, Patnaik LM (1994) Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Trans Syst Man Cybern 24(4):656–667CrossRef
Zurück zum Zitat Su M-C, Chou C-H (2001) A modified version of the K-means algorithm with a distance based on cluster symmetry. IEEE Trans Pattern Anal Mach Intell 23(6):674–680CrossRef Su M-C, Chou C-H (2001) A modified version of the K-means algorithm with a distance based on cluster symmetry. IEEE Trans Pattern Anal Mach Intell 23(6):674–680CrossRef
Zurück zum Zitat Zabrodsky H, Peleg S, Avnir D (1995) Symmetry as a continuous feature. IEEE Trans Pattern Anal Mach Intell 17(12):1154–1166CrossRef Zabrodsky H, Peleg S, Avnir D (1995) Symmetry as a continuous feature. IEEE Trans Pattern Anal Mach Intell 17(12):1154–1166CrossRef
Metadaten
Titel
Enhancing point symmetry-based distance for data clustering
verfasst von
Sriparna Saha
Publikationsdatum
14.02.2017
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 2/2018
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-016-2477-3

Weitere Artikel der Ausgabe 2/2018

Soft Computing 2/2018 Zur Ausgabe