nach oben

Erschienen in:

2019 | OriginalPaper | Buchkapitel

Using Clustering for Supervised Feature Selection to Detect Relevant Features

verfasst von : Christoph Lohrmann, Pasi Luukka

Erschienen in: Machine Learning, Optimization, and Data Science

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In many applications in machine learning, large quantities of features and information are available, but these can be of low quality. A novel filter method for feature selection for classification termed COLD is presented that uses class-wise clustering to reduce the dimensionality of the data. The idea behind this approach is that if a relevant feature would be removed from the set of features, the separation of clusters belonging to different classes will deteriorate. Four artificial examples and two real-world data sets are presented on which COLD is compared with several popular filter methods. For the artificial examples, only COLD is capable to consistently rank the features according to their contribution to the separation of the classes. For the real-world Dermatology and Arrhythmia dataset, COLD demonstrates the ability to remove a large number of features and improve the classification accuracy or, at a minimum, not degrade the performance considerably.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel On Probabilistic k-Richness of the k-Means Algorithms

Nächstes Kapitel A Structural Theorem for Center-Based Clustering in High-Dimensional Euclidean Space

Bishop, C.M.: Pattern Recognition and Machine Learning. Springer ScienceBusiness Media, New York (2006)MATH

Caruana, R., Freitag, D.: Greedy attribute selection. In: Cohen, W., Hirsh, H. (eds.) Proceedings of the 11th International Conference on Machine Learning (ICML 1994), pp. 28–36. Morgan Kaufmann, New Brunswick (1994)

Chan, T.F.: Rank revealing QR factorizations. Linear Algebra Appl. 88–89, 67–82 (1987)MathSciNetMATH

Chormunge, S., Jena, S.: Correlation based feature selection with clustering for highdimensional data. J. Electr. Syst. Inf. Technol. 5, 542–549 (2018)

Cover, T.M.: The best two independent measurements are not the two best. IEEE Trans. Syst. Man Cybern. 4(1), 116–117 (1974)CrossRef

Dessì, N., Pes, B.: Similarity of feature selection methods. An empirical study across data intensive classification tasks. Expert Syst. Appl. 42(10), 4632–4642 (2015)CrossRef

Duda, R., Hart, P., Stork, D.: Pattern Classification. John Wiley and Sons, New York (2012)MATH

Elashoff, J.E., Elashoff, R.M., Goldman, G.E.: On the choice of variables in classification problems with dichotomous variables. Biometrika 54(3), 668–670 (1967)MathSciNetCrossRef

Ruffo, G.: Matlab Toolbox: Feature selection library. https://se.mathworks.com/matlabcentral/fileexchange/56937-feature-selection-library. Accessed 1 Dec 2018

10.

Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATH

11.

Hastie, T., Tibshirani, R., Friedman, J.: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7CrossRefMATH

12.

He X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Proceedings of the 18th International Conference on Neural Information Processing Systems (NIPS 2005), pp. 507–514. MIT Press, Cambridge (2005)

13.

Kittler, J., Mardia, K.V.: Statistical pattern recognition in image analysis. J. Appl. Stat. 21(1–2), 61–75 (1994)CrossRef

14.

Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-57868-4_57CrossRef

15.

Kononenko, I., Simec, E., Robnik-Sikonja, M.: Overcoming the myopia of inductive learning algorithms with RELIEFF. Appl. Intell. 7, 39–55 (1997)CrossRef

16.

Lichman, M.: UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/index.php. Accessed 20 June 2019

17.

Lohrmann, C., Luukka, P., Jablonska-Sabuka, M., Kauranne, T.: Supervised feature selection with a combination of fuzzy similarity measures and fuzzy entropy measures. Expert Syst. Appl. 110, 216–236 (2018)CrossRef

18.

Luukka, P.: Feature selection using fuzzy entropy measures with similarity classifier. Expert Syst. Appl. 38, 4600–4607 (2011)CrossRef

19.

Mitra, P., Murthy, C.A., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 301–312 (2002)CrossRef

20.

Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Appl. Intell. 53(1–2), 23–69 (2003)MATH

21.

Rousseeuw, P.J.: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)CrossRef

22.

Sahu, B., Dehuri, S., Jagadev, A.K.: Feature selection model based on clustering and ranking in pipeline for microarray data. Inf. Med. Unlocked 9, 107–122 (2017)CrossRef

23.

Sammut, C., Webb, G.I.: Encyclopedia of Machine Learning and Data Mining, 2017th edn. Springer Science+Business Media, New York (2017)CrossRef

24.

Sotoca, J.M., Pla, F.: Supervised feature selection by clustering using conditional mutual information-based distances. Pattern Recogn. 43, 2068–2081 (2010)CrossRef

25.

Toussaint, G.T.: Note on optimal selection of independent binary-valued features for pattern recognition. IEEE Trans. Inf. Theory 17(5), 618 (1971)

26.

Warton, D.I.: Penalized normal likelihood and ridge regularization of correlation and covariance matrices. J. Am. Stat. Assoc. 103(481), 340–349 (2008)MathSciNetCrossRef

Titel: Using Clustering for Supervised Feature Selection to Detect Relevant Features
verfasst von: Christoph Lohrmann
Pasi Luukka
Verlag: Springer International Publishing
Buch: Machine Learning, Optimization, and Data Science
Print ISBN: 978-3-030-37598-0

Electronic ISBN: 978-3-030-37599-7

Copyright-Jahr: 2019
DOI: https://doi.org/10.1007/978-3-030-37599-7_23

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"