nach oben

Journal of Classification

Erschienen in:

11.07.2019

An Ensemble Feature Ranking Algorithm for Clustering Analysis

verfasst von: Jaehong Yu, Hua Zhong, Seoung Bum Kim

Erschienen in: Journal of Classification | Ausgabe 2/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Feature ranking is a widely used feature selection method. It uses importance scores to evaluate features and selects those with high scores. Conventional unsupervised feature ranking methods do not consider the information on cluster structures; therefore, these methods may be unable to select the relevant features for clustering analysis. To address this limitation, we propose a feature ranking algorithm based on silhouette decomposition. The proposed algorithm calculates the ensemble importance scores by decomposing the average silhouette widths of random subspaces. By doing so, the contribution of a feature in generating cluster structures can be represented more clearly. Experiments on different benchmark data sets examined the properties of the proposed algorithm and compared it with the existing ensemble-based feature ranking methods. The experiments demonstrated that the proposed algorithm outperformed its existing counterparts.

Vorheriger Artikel Suboptimal Comparison of Partitions

Nächster Artikel Where Should I Submit My Work for Publication? An Asymmetrical Classification Model to Optimize Choice

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

http://genome-www.stanford.edu/hcc/

https://github.com/jwmi/BayesianMixtures.jl/tree/master/examples/datasets/gene-deSouto

ligarto.org/rdiaz/Papers/rfVS/randomForestVarSel.html

Andrews, J. L., & Mcnicholas, P. D. (2014). Variable selection for clustering and classification. Journal of Classification, 31(2), 136–153.MathSciNetMATH

Arbelaitz, O., Gurrutxaga, I., Muguerrza, J., Pèrez, J. M., & Perona, I. (2013). An extensive comparative study of cluster validity indices. Pattern Recognition, 46(1), 243–256.

Arthur, D., and Vassilvitskii, S. (2007). “k-means++: the advantages of careful seeding”, in Proceedings of the 18th annual ACM-SIAM symposium on discrete algorithms, 2007, pp. 1027–1035.

Ayad, H. G., & Kamel, M. S. (2008). Cumulative voting consensus method for partitions with variable number of clusters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(1), 160–173.

Boutsidis, C., Drineas, P., and Mahoney, M.W. (2009), “Unsupervised feature selection for the k-means clustering problem”, in Proceedings of the Advances in Neural Information Processing Systems, pp. 153–161.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.MATH

Caliński, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics-theory and Methods, 3(1), 1–27.MathSciNetMATH

Chiang, M. M. T., & Mirkin, B. (2010). Intelligent choice of the number of clusters in k-means clustering: an experimental study with different cluster spreads. Journal of Classification, 27(1), 3–40.MathSciNetMATH

de Amorim, R. C. (2016). A survey on feature weighting based k-means algorithms. Journal of Classification, 33(2), 210–242.MathSciNetMATH

de Amorim, R. C., & Mirkin, B. (2012). Minkowski metric, feature weighting and anomalous cluster initializing in k-means clustering. Pattern Recognition, 45(3), 1061–1075.

de Amorim, R. C., Makarenkov, V., & Mirkin, B. (2016). A-Ward_pβ: Effective hierarchical clustering using the Minkowski metric and a fast k-means initialization. Information Sciences, 370, 343–354.

de Amorim, R. C., Shestakov, A., Mirkin, B., & Makarenkov, V. (2017). The Minkowski central partition as a pointer to a suitable distance exponent and consensus partitioning. Pattern Recognition, 67, 62–72.

Dy, J. G., & Brodley, C. E. (2004). Feature selection for unsupervised learning. Journal of Machine Learning Research, 5, 845–889.MathSciNetMATH

Elghazel, H., & Aussem, A. (2015). Unsupervised feature selection with ensemble learning. Machine Learning, 98(1–2), 157–180.MathSciNetMATH

Fred, A. L., & Jain, A. K. (2005). Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), 835–850.

Guerra, L., Robles, V., Bielza, C., & Larrañaga, P. (2012). A comparison of clustering quality indices using outliers and noise. Intelligent Data Analysis, 16(4), 703–715.

Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.MATH

Handl, J., & Knowles, J. (2006). Feature subset selection in unsupervised learning via multiobjective optimization. International Journal of Computational Intelligence Research, 2(3), 217–238.MathSciNet

Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society: Series C: Applied Statistics, 28(1), 100–108.MATH

HE, X., CAI, D., and NIYOGI, P. (2005), “Laplacian score for feature selection”, in Proceedings of the Advances in Neural Information Processing Systems, pp. 507–514.

Herrero, J., Dìaz-uriarte, R., & Dopazo, J. (2003). Gene expression data preprocessing. Bioinformatics, 19(5), 655–656.

Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844.

Hong, Y., Kwong, S., Chang, Y., & Ren, Q. (2008a). Consensus unsupervised feature ranking from multiple views. Pattern Recognition Letters, 29(5), 595–602.

Hong, Y., Kwong, S., Chang, Y., & Ren, Q. (2008b). Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recognition, 41(9), 2742–2756.MATH

Huang, J. Z., Ng, M. K., Rong, H., & LI, Z. (2005). Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5), 657–668.

Hubrert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.

Iam-on, N., Boongoen, T., Garrett, S., & Price, C. (2011). A link-based approach to the cluster ensemble problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12), 2396–2409.

Ketchen, D. J., Jr., & Shook, C. L. (1996). The application of cluster analysis in strategic management research: an analysis and critique. Strategic Management Journal, 17, 441–458.

Kim, S. B., & Rattakorn, P. (2011). Unsupervised feature selection using weighted principal components. Expert Systems with Applications, 38(5), 5704–5710.

Kim, E. Y., Kim, S. Y., Ashlock, D., & Nam, D. (2009). MULTI-K: Accurate classification of microarray subtypes using ensemble k-means clustering. BMC Bioinformatics, 10(1), 260.

Kuncheva, L. I., & Vetrov, D. P. (2006). Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11), 1798–1808.

Lai, C., Reinders, M. J., & Wessels, L. (2006). Random subspace method for multivariate feature selection. Pattern Recognition Letters, 27(10), 1067–1076.

Li, F., Zhang, Z., & Jin, C. (2016). Feature selection with partition differentiation entropy for large-scale data sets. Information Sciences, 329, 690–700.MATH

Liu, Y., Li, Z., Xiong, H., Gao, X., and Wu, J. (2010), “Understanding of internal clustering validation measures”, in Proceedings of IEEE 10th International Conference on Data Mining (ICDM), pp. 911–916.

MacQueen, J. (1967), “Some methods for classification and analysis of multivariate observations”, in Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, 1(14), pp. 281–297.

Makarenkov, V., & Legendre, P. (2001). Optimal variable weighting for ultrametric and additive trees and K-means partitioning: methods and software. Journal of Classification, 18(2), 245–271.MathSciNetMATH

Mitra, P., Murthy, C. A., & Pal, S. K. (2002). Unsupervised feature selection using feature similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3), 301–312.

Oehmen, C., & Nieplocha, J. (2006). ScalaBLAST: a scalable implementation of BLAST for high-performance data-intensive bioinformatics analysis. IEEE Transactions on Parallel and Distributed Systems, 17(8), 740–749.

Panday, D., de Amorim, R. C., & Lane, P. (2018). Feature weighting as a tool for unsupervised feature selection. Information Processing Letters, 129, 44–52.MathSciNetMATH

Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65.MATH

Steinley, D., & Brusco, M. J. (2007). Initializing k-means batch clustering: a critical evaluation of several techniques. Journal of Classification, 24(1), 99–121.MathSciNetMATH

Tan, P. N., Steinbach, M., & Kumar, V. (2006). Introduction to data mining. Boston: Addison-Wesley.

Vendramin, L., Campello, R. J., & Hruschka, E. R. (2010). Relative clustering validity criteria: a comparative overview. Statistical Analysis and Data Mining: the ASA Data Science Journal, 3(4), 209–235.MathSciNet

Xu, R. F., & Lee, S. J. (2015). Dimensionality reduction by feature clustering for regression problems. Information Sciences, 299, 42–57.MathSciNetMATH

Yang, C., Wan, B., and Gao, X. (2006), “Effectivity of Internal Validation Techniques for Gene Clustering”, in Proceedings of International Symposium on Biological and Medical Data Analysis, pp. 49–59.

Yu, J., & Kim, S. B. (2016). A density-based Noisy graph partitioning algorithm. Neurocomputing, 175, 473–491.

Yu, Z., Wang, D., You, J., Wong, H. S., Wu, S., Zhang, J., & Han, G. (2016). Progressive subspace ensemble learning. Pattern Recognition, 60, 692–705.

Zhang, S., Wong, H. S., Shen, Y., & Xie, D. (2012). A new unsupervised feature ranking method for gene expression data based on consensus affinity. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(4), 1257–1263.

Zhong, C., Yue, X., Zhang, Z., & Lei, J. (2015). A clustering ensemble: Two-level refined co-association matrix with path-based transformation. Pattern Recognition, 48(8), 2699–2709.MATH

Titel: An Ensemble Feature Ranking Algorithm for Clustering Analysis
verfasst von: Jaehong Yu
Hua Zhong
Seoung Bum Kim
Publikationsdatum: 11.07.2019
Verlag: Springer US
Erschienen in: Journal of Classification / Ausgabe 2/2020
Print ISSN: 0176-4268
Elektronische ISSN: 1432-1343
DOI: https://doi.org/10.1007/s00357-019-09330-8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2020

Classification for Time Series Data. An Unsupervised Approach Based on Reduction of Dimensionality

Editorial: Journal of Classification Vol. 37-2

Where Should I Submit My Work for Publication? An Asymmetrical Classification Model to Optimize Choice

Cluster Validation for Mixtures of Regressions via the Total Sum of Squares Decomposition

Adjusting Person Fit Index for Skewness in Cognitive Diagnosis Modeling

Moduli Space of Families of Positive (n − 1)-Weights

Premium Partner