Skip to main content
Erschienen in: Soft Computing 6/2019

24.11.2017 | Methodologies and Application

Gravitational search algorithm and K-means for simultaneous feature selection and data clustering: a multi-objective approach

verfasst von: Jay Prakash, Pramod Kumar Singh

Erschienen in: Soft Computing | Ausgabe 6/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Clustering is an unsupervised classification method used to group the objects of an unlabeled data set. The high dimensional data sets generally comprise of irrelevant and redundant features also along with the relevant features which deteriorate the clustering result. Therefore, feature selection is necessary to select a subset of relevant features as it improves discrimination ability of the original set of features which helps in improving the clustering result. Though many metaheuristics have been suggested to select subset of the relevant features in wrapper framework based on some criteria, most of them are marred by the three key issues. First, they require objects class information a priori which is unknown in unsupervised feature selection. Second, feature subset selection is devised on a single validity measure; hence, it produces a single best solution biased toward the cardinality of the feature subset. Third, they find difficulty in avoiding local optima owing to lack of balancing in exploration and exploitation in the feature search space. To deal with the first issue, we use unsupervised feature selection method where no class information is required. To address the second issue, we follow pareto-based approach to obtain diverse trade-off solutions by optimizing conceptually contradicting validity measures silhouette index (Sil) and feature cardinality (d). For the third issue, we introduce genetic crossover operator to improve diversity in a recent Newtonian law of gravity-based metaheuristic binary gravitational search algorithm (BGSA) in multi-objective optimization scenario; it is named as improved multi-objective BGSA for feature selection (IMBGSAFS). We use ten real-world data sets for comparison of the IMBGSAFS results with three multi-objective methods MBGSA, MOPSO, and NSGA-II in wrapper framework and the Pearson’s linear correlation coefficient (FM-CC) as a multi-objective filter method. We employ four multi-objective quality measures convergence, diversity, coverage and ONVG. The obtained results show superiority of the IMBGSAFS over its competitors. An external clustering validity index F-measure also establish the above finding. As the decision maker picks only a single solution from the set of trade-off solutions, we employee the F-measure to select a final single solution from the external archive. The quality of final solution achieved by IMBGSAFS is superior over competitors in terms of clustering accuracy and/or smaller subset size.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Bharti KK, Singh PK (2015) Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Exp Syst Appl 42(6):3105–3114CrossRef Bharti KK, Singh PK (2015) Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Exp Syst Appl 42(6):3105–3114CrossRef
Zurück zum Zitat Biesiada J, Duch W (2007) Feature selection for high-dimensional data—a pearson redundancy based filter. In: Computer recognition systems, vol 2. Springer, Berlin, Heidelberg, pp 242–249 Biesiada J, Duch W (2007) Feature selection for high-dimensional data—a pearson redundancy based filter. In: Computer recognition systems, vol 2. Springer, Berlin, Heidelberg, pp 242–249
Zurück zum Zitat Coello CAC, Pulido GT, Lechuga MS (2004) Handling multiple objectives with particle swarm optimization. IEEE Trans Evolut Comput 8(3):256–279CrossRef Coello CAC, Pulido GT, Lechuga MS (2004) Handling multiple objectives with particle swarm optimization. IEEE Trans Evolut Comput 8(3):256–279CrossRef
Zurück zum Zitat Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(3):131–156CrossRef Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(3):131–156CrossRef
Zurück zum Zitat Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–227CrossRef Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–227CrossRef
Zurück zum Zitat Deb K (2001) Multi-objective optimization using evolutionary algorithms, ser. Wiley-Interscience series in systems and optimization. Wiley, HobokenMATH Deb K (2001) Multi-objective optimization using evolutionary algorithms, ser. Wiley-Interscience series in systems and optimization. Wiley, HobokenMATH
Zurück zum Zitat Deb K, Agrawal S, Pratap A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii. Lect Notes Comput Sci 1917:849–858CrossRef Deb K, Agrawal S, Pratap A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii. Lect Notes Comput Sci 1917:849–858CrossRef
Zurück zum Zitat Deb K, Jain S (2002) Running performance metrics for evolutionary multi-objective optimizations. In: Proceedings of the 4th Asia-Pacific conference on simulated evolution and learning (SEAL’02), (Singapore), pp 13–20 Deb K, Jain S (2002) Running performance metrics for evolutionary multi-objective optimizations. In: Proceedings of the 4th Asia-Pacific conference on simulated evolution and learning (SEAL’02), (Singapore), pp 13–20
Zurück zum Zitat Dunn JC (1973) A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Taylor & Francis, pp 32–57 Dunn JC (1973) A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Taylor & Francis, pp 32–57
Zurück zum Zitat Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889MathSciNetMATH Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889MathSciNetMATH
Zurück zum Zitat Eberhart RC, Kennedy J (1995) A new optimizer using particle swarm theory. In: Proceedings of the sixth international symposium on micro machine and human science, vol 1. New York, pp 39–43 Eberhart RC, Kennedy J (1995) A new optimizer using particle swarm theory. In: Proceedings of the sixth international symposium on micro machine and human science, vol 1. New York, pp 39–43
Zurück zum Zitat Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701MATHCrossRef Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701MATHCrossRef
Zurück zum Zitat González B, Valdez F, Melin P, Prado-Arechiga G (2015) Fuzzy logic in the gravitational search algorithm for the optimization of modular neural networks in pattern recognition. Exp Syst Appl 42(14):5839–5847CrossRef González B, Valdez F, Melin P, Prado-Arechiga G (2015) Fuzzy logic in the gravitational search algorithm for the optimization of modular neural networks in pattern recognition. Exp Syst Appl 42(14):5839–5847CrossRef
Zurück zum Zitat Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH
Zurück zum Zitat Hancer E, Xue B, Zhang M, Karaboga D, Akay B (2017) Pareto front feature selection based on artificial bee colony optimization. Inf Sci 422:462–479CrossRef Hancer E, Xue B, Zhang M, Karaboga D, Akay B (2017) Pareto front feature selection based on artificial bee colony optimization. Inf Sci 422:462–479CrossRef
Zurück zum Zitat Handl J, Knowles J (2006) Feature subset selection in unsupervised learning via multiobjective optimization. Int J Comput Intell Res 2(3):217–238MathSciNetCrossRef Handl J, Knowles J (2006) Feature subset selection in unsupervised learning via multiobjective optimization. Int J Comput Intell Res 2(3):217–238MathSciNetCrossRef
Zurück zum Zitat Jain AK, Dubes RC et al (1988) Algorithms for clustering data, vol 6. Prentice Hall, Englewood CliffsMATH Jain AK, Dubes RC et al (1988) Algorithms for clustering data, vol 6. Prentice Hall, Englewood CliffsMATH
Zurück zum Zitat Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323CrossRef Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323CrossRef
Zurück zum Zitat Kaufman L, Rousseeuw P (2009) Finding groups in data: an introduction to cluster analysis, ser. Wiley series in probability and statistics. Wiley, HobokenMATH Kaufman L, Rousseeuw P (2009) Finding groups in data: an introduction to cluster analysis, ser. Wiley series in probability and statistics. Wiley, HobokenMATH
Zurück zum Zitat Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: Proceedings of conference on system, man, and cybernetics. Citeseer, pp 4104–4109 Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: Proceedings of conference on system, man, and cybernetics. Citeseer, pp 4104–4109
Zurück zum Zitat Kim Y, Street WN, Menczer F (2002) Evolutionary model selection in unsupervised learning. Intell Data Anal 6(6):531–556MATHCrossRef Kim Y, Street WN, Menczer F (2002) Evolutionary model selection in unsupervised learning. Intell Data Anal 6(6):531–556MATHCrossRef
Zurück zum Zitat Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Springer, BerlinMATHCrossRef Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Springer, BerlinMATHCrossRef
Zurück zum Zitat Morita M, Sabourin R, Bortolozzi F, Suen CY (2003) Unsupervised feature selection using multi-objective genetic algorithms for handwritten word recognition. In: Proceedings of the seventh international conference on document analysis and recognition, 2003. IEEE, pp 666–670 Morita M, Sabourin R, Bortolozzi F, Suen CY (2003) Unsupervised feature selection using multi-objective genetic algorithms for handwritten word recognition. In: Proceedings of the seventh international conference on document analysis and recognition, 2003. IEEE, pp 666–670
Zurück zum Zitat Mukhopadhyay A, Maulik U, Bandyopadhyay S, Coello Coello C (2014) A survey of multiobjective evolutionary algorithms for data mining: part i. IEEE Trans Evolut Comput 18(1):4–19CrossRef Mukhopadhyay A, Maulik U, Bandyopadhyay S, Coello Coello C (2014) A survey of multiobjective evolutionary algorithms for data mining: part i. IEEE Trans Evolut Comput 18(1):4–19CrossRef
Zurück zum Zitat Nag K, Pal NR (2016) A multiobjective genetic programming-based ensemble for simultaneous feature selection and classification. IEEE Trans Cybern 46(2):499–510CrossRef Nag K, Pal NR (2016) A multiobjective genetic programming-based ensemble for simultaneous feature selection and classification. IEEE Trans Cybern 46(2):499–510CrossRef
Zurück zum Zitat Okabe T, Jin Y, Sendhoff B (2003) A critical survey of performance indices for multi-objective optimisation. In: The 2003 congress on evolutionary computation, CEC’03, vol 2. IEEE, pp 878–885 Okabe T, Jin Y, Sendhoff B (2003) A critical survey of performance indices for multi-objective optimisation. In: The 2003 congress on evolutionary computation, CEC’03, vol 2. IEEE, pp 878–885
Zurück zum Zitat Prakash J, Singh PK (2015) An effective multiobjective approach for hard partitional clustering. Memet Comput 7(2):93–104CrossRef Prakash J, Singh PK (2015) An effective multiobjective approach for hard partitional clustering. Memet Comput 7(2):93–104CrossRef
Zurück zum Zitat Rashedi E, Nezamabadi-pour H (2014) Feature subset selection using improved binary gravitational search algorithm. J Intell Fuzzy Syst 26(3):1211–1221 Rashedi E, Nezamabadi-pour H (2014) Feature subset selection using improved binary gravitational search algorithm. J Intell Fuzzy Syst 26(3):1211–1221
Zurück zum Zitat Rashedi E, Nezamabadi-Pour H, Saryazdi S (2009) Gsa: a gravitational search algorithm. Inf Sci 179(13):2232–2248MATHCrossRef Rashedi E, Nezamabadi-Pour H, Saryazdi S (2009) Gsa: a gravitational search algorithm. Inf Sci 179(13):2232–2248MATHCrossRef
Zurück zum Zitat Shams M, Rashedi E, Hakimi A (2015) Clustered-gravitational search algorithm and its application in parameter optimization of a low noise amplifier. Appl Math Comput 258:436–453MathSciNetMATH Shams M, Rashedi E, Hakimi A (2015) Clustered-gravitational search algorithm and its application in parameter optimization of a low noise amplifier. Appl Math Comput 258:436–453MathSciNetMATH
Zurück zum Zitat Sikdar UK, Ekbal A, Saha S (2015) Mode: multiobjective differential evolution for feature selection and classifier ensemble. Soft Comput 19(12):3529–3549CrossRef Sikdar UK, Ekbal A, Saha S (2015) Mode: multiobjective differential evolution for feature selection and classifier ensemble. Soft Comput 19(12):3529–3549CrossRef
Zurück zum Zitat Xu R, Wunsch D et al (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678CrossRef Xu R, Wunsch D et al (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678CrossRef
Zurück zum Zitat Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224MathSciNetMATH Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224MathSciNetMATH
Zurück zum Zitat Zhang Y, Gong D-W, Cheng J (2017) Multi-objective particle swarm optimization approach for cost-based feature selection in classification. IEEE/ACM Trans Comput Biol Bioinf 14(1):64–75CrossRef Zhang Y, Gong D-W, Cheng J (2017) Multi-objective particle swarm optimization approach for cost-based feature selection in classification. IEEE/ACM Trans Comput Biol Bioinf 14(1):64–75CrossRef
Zurück zum Zitat Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans Evolut Comput 3(4):257–271CrossRef Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans Evolut Comput 3(4):257–271CrossRef
Zurück zum Zitat Zitzler E, Deb K, Thiele L (2000) Comparison of multiobjective evolutionary algorithms: empirical results. Evolut Comput 8(2):173–195CrossRef Zitzler E, Deb K, Thiele L (2000) Comparison of multiobjective evolutionary algorithms: empirical results. Evolut Comput 8(2):173–195CrossRef
Metadaten
Titel
Gravitational search algorithm and K-means for simultaneous feature selection and data clustering: a multi-objective approach
verfasst von
Jay Prakash
Pramod Kumar Singh
Publikationsdatum
24.11.2017
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 6/2019
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-017-2923-x

Weitere Artikel der Ausgabe 6/2019

Soft Computing 6/2019 Zur Ausgabe