nach oben

International Journal of Machine Learning and Cybernetics

Erschienen in:

03.09.2018 | Original Article

Filter-based unsupervised feature selection using Hilbert–Schmidt independence criterion

verfasst von: Samaneh Liaghat, Eghbal G. Mansoori

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 9/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Feature selection is a fundamental preprocess before performing actual learning; especially in unsupervised manner where the data are unlabeled. Essentially, when there are too many features in the problem, dimensionality reduction through discarding weak features is highly desirable. In this paper, we present a framework for unsupervised feature selection based on dependency maximization between the samples similarity matrices before and after deleting a feature. In this regard, a novel estimation of Hilbert–Schmidt independence criterion (HSIC), more appropriate for high-dimensional data with small sample size, is introduced. Its key idea is that by eliminating the redundant features and/or those have high inter-relevancy, the pairwise samples similarity is not affected seriously. Also, to handle the diagonally dominant matrices, a heuristic trick is used in order to reduce the dynamic range of matrix values. In order to speed up the proposed scheme, the gap statistic and k-means clustering methods are also employed. To assess the performance of our method, some experiments on benchmark datasets are conducted. The obtained results confirm the efficiency of our unsupervised feature selection scheme.

Vorheriger Artikel Attributes set reduction in multigranulation approximation space of a multi-source decision information system

Nächster Artikel Human activity recognition using mixture of heterogeneous features and sequential minimal optimization

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.

Order your 30-days-trial for free and without any commitment.

Jetzt informieren

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

Jetzt informieren

Nur mit Berechtigung zugänglich

Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224MathSciNet

Sharma A, Imoto S, Miyano S (2012) A top-r feature selection algorithm for microarray gene expression data. IEEE/ACM Trans Comput Biol Bioinform 9(3):754–64

Sharma A, Imoto S, Miyano S, Sharma V (2012) Null space based feature selection method for gene expression data. Int J Mach Learn Cybern 3(4):269–276CrossRef

Sharma A, Imoto S, Miyano S (2012) A between-class overlapping filter-based method for transcriptome data analysis. J Bioinform Comput Biol 10(5):1250010CrossRef

Dy J, Brodley C (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889MathSciNetMATH

Shang R, Chang J, Jiao L, Xue Y (2017) Unsupervised feature selection based on self-representation sparse regression and local similarity preserving. Int J Mach Learn Cybern 1–14

Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40:16–28CrossRef

Guyon I, Gunn S, Nikravesh M, Zadeh LA (2006) Feature extraction: foundations and applications, vol 207. Springer, Berlin, pp 89–117CrossRefMATH

Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundance. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRef

10.

Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH

11.

Brown G, Pocock A, Zhao M, Lujan M (2012) Conditional likelihood maximization: a unifying framework for information theoretic feature selection. J Mach Learn Res 13:27–66MATH

12.

Xu I, Cao L, Zhong J, Feng Y (2010) Adapt the mRMR criterion for unsupervised feature selection. Advanced data mining and applications. Springer, Berlin, pp 111–121

13.

Gretton A, Bousquet O, Smola AJ, Scholkopf B (2005) Measuring statistical dependence with Hilbert–Schmidt norms. In: Jain S, Simon HU, Tomita E (eds) Proceedings of the international conference on algorithmic learning theory, Springer, pp 63–77

14.

Zarkoob H (2010) Feature selection for gene expression data based on Hilbert–Schmidt independence criterion. University of Waterloo, Electronic theses and dissertations

15.

Bedo I, Chetty M, Ngom A, Ahmad S (2008) Microarray design using the Hilbert–Schmidt independence criterion. Springer, Berlin, pp 288–298

16.

Song I, Smola A, Gretton A, Bedo J, Borgwardt K (2012) Feature selection via dependence maximization. J Mach Learn Res 13:1393–1434MathSciNetMATH

17.

Farahat AK, Ghodsi A, Kamel MS (2013) Efficient greedy feature selection for unsupervised learning. Knowl Inf Syst 35(2):285–310CrossRef

18.

Sharma A, Paliwal KK, Imoto S, Miyano S (2014) A feature selection method using improved regularized linear discriminant analysis. Mach Vis Appl 25(3):775–786CrossRef

19.

Eskandari S, Akbas E (2017) Supervised infinite feature selection. arXiv Prepr. http://arxiv.org/abs/1704.02665

20.

Luo I, Nie F, Chang X, Yang Y, Hauptmann AG, Zheng Q (2018) Adaptive unsupervised feature selection with structure regularization. IEEE Trans Neural Netw Learn Syst 29(4):944–956CrossRef

21.

Weston I, Scholkopf B, Eskin E, Leslie C, Noble W (2003) Dealing with large diagonals in kernel matrices. Inst Stat Math 55(2):391–408MathSciNetMATH

22.

Fischer A, Roth V, Buhmann JM (2003) Clustering with the connectivity Kernel. Adv Neural Inf Process Syst 16:89–96

23.

Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B 63(2):411–423MathSciNetCrossRefMATH

24.

McQueen I (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of fifth Berkeley symposium, math statistics and probability, pp 281–297

25.

Somol P, Pudil P, Novovicova J, Paclik P (1999) Adaptive floating search methods in feature selection. Pattern Recognit Lett 20:1157–1163CrossRef

26.

UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets. Accessed Feb 2017

27.

Mramor I, Leban G, Demsar J, Zupan B (2007) Visualization-based cancer microarray data classification analysis. Bioinformatics 23(16):2147–2154CrossRef

28.

Scholkopf A, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge

29.

Lin S, Liu Z (2007) Parameter selection of support vector machines based on RBF kernel function. Zhejiang Univ Technol 35:163–167

30.

Ester I, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of KDD, vol 96, pp 226–231

31.

Kreyszig A (1970) Introductory mathematical statistics. Wiley, New YorkMATH

32.

Hubert L, Arabic P (1985) Comparing partitions. J Classif 2:193–218CrossRef

Titel: Filter-based unsupervised feature selection using Hilbert–Schmidt independence criterion
verfasst von: Samaneh Liaghat
Eghbal G. Mansoori
Publikationsdatum: 03.09.2018
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal of Machine Learning and Cybernetics / Ausgabe 9/2019
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI: https://doi.org/10.1007/s13042-018-0869-7

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

ATZelectronics worldwide

ATZelektronik

Weitere Artikel der Ausgabe 9/2019

Attributes set reduction in multigranulation approximation space of a multi-source decision information system

Kinect sensor-based interaction monitoring system using the BLSTM neural network in healthcare

A bibliometric overview of International Journal of Machine Learning and Cybernetics between 2010 and 2017

Accelerating incremental attribute reduction algorithm by compacting a decision table

A recommender system to address the Cold Start problem for App usage prediction

A projection method for multiple attribute group decision making with probabilistic linguistic term sets

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.