Top

2006 | Book

Read chapter Read first chapter

Feature Extraction

Foundations and Applications

Editors: Isabelle Guyon, Masoud Nikravesh, Steve Gunn, Lotfi A. Zadeh

Publisher: Springer Berlin Heidelberg

Book Series : Studies in Fuzziness and Soft Computing

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

Everyonelovesagoodcompetition. AsIwritethis,twobillionfansareeagerly anticipating the 2006 World Cup. Meanwhile, a fan base that is somewhat smaller (but presumably includes you, dear reader) is equally eager to read all about the results of the NIPS 2003 Feature Selection Challenge, contained herein. Fans of Radford Neal and Jianguo Zhang (or of Bayesian neural n- works and Dirichlet di?usion trees) are gloating “I told you so” and looking forproofthattheirwinwasnota?uke. Butthematterisbynomeanssettled, and fans of SVMs are shouting “wait ’til next year!” You know this book is a bit more edgy than your standard academic treatise as soon as you see the dedication: “To our friends and foes. ” Competition breeds improvement. Fifty years ago, the champion in 100m butter?yswimmingwas22percentslowerthantoday’schampion;thewomen’s marathon champion from just 30 years ago was 26 percent slower. Who knows how much better our machine learning algorithms would be today if Turing in 1950 had proposed an e?ective competition rather than his elusive Test? But what makes an e?ective competition? The ?eld of Speech Recognition hashadNIST-runcompetitionssince1988;errorrateshavebeenreducedbya factorofthreeormore,butthe?eldhasnotyethadtheimpactexpectedofit. Information Retrieval has had its TREC competition since 1992; progress has been steady and refugees from the competition have played important roles in the hundred-billion-dollar search industry. Robotics has had the DARPA Grand Challenge for only two years, but in that time we have seen the results go from complete failure to resounding success (although it may have helped that the second year’s course was somewhat easier than the ?rst’s).

Frontmatter

An Introduction to Feature Extraction

This chapter introduces the reader to the various aspects of feature extraction covered in this book. Section 1 reviews definitions and notations and proposes a unified view of the feature extraction problem. Section 2 is an overview of the methods and results presented in the book, emphasizing novel contributions. Section 3 provides the reader with an entry point in the field of feature extraction by showing small revealing examples and describing simple but effective algorithms. Finally, Section 4 introduces a more theoretical formalism and points to directions of research and open problems.

Isabelle Guyon, André Elisseeff

Feature Extraction Fundamentals

Frontmatter

Chapter 1. Learning Machines

Learning from data may be a very complex task. To satisfactorily solve a variety of problems, many different types of algorithms may need to be combined. Feature extraction algorithms are valuable tools, which prepare data for other learning methods. To estimate their usefulness one must examine the whole complex processes they are parts of.

Norbert Jankowski, Krzysztof Grabczewski

Chapter 2. Assessment Methods

This chapter aims at providing the reader with the tools required for a statistically significant assessment of feature relevance and of the outcome of feature selection. The methods presented in this chapter can be integrated in feature selection wrappers and can serve to select the number of features for filters or feature ranking methods. They can also serve for hyper-parameter selection or model selection. Finally, they can be helpful for assessing the confidence on predictions made by learning machines on fresh data. The concept of model complexity is ubiquitous in this chapter. Before they start reading the chapter, readers with little or old knowledge of basic statistics should first delve into Appendix A; for others, the latter may serve as a quick reference guide for useful definitions and properties. The first section of the present chapter is devoted to the basic statistical tools for feature selection; it puts the task of feature selection into the appropriate statistical perspective, and describes important tools such as hypothesis tests - which are of general use - and random probes, which are more specifically dedicated to feature selection. The use of hypothesis tests is exemplified, and caveats about the reliability of the results of multiple tests are given, leading to the Bonferroni correction and to the definition of the false discovery rate. The use of random probes is also exemplified, in conjunction with forward selection. The second section of the chapter is devoted to validation and cross-validation; those are general tools for assessing the ability of models to generalize; in the present chapter, we show how they can be used specifically in the context of feature selection; attention is drawn to the limitations of those methods.

Gérard Dreyfus, Isabelle Guyon

Chapter 3. Filter Methods

Feature ranking and feature selection algorithms may roughly be divided into three types. The first type encompasses algorithms that are built into adaptive systems for data analysis (predictors), for example feature selection that is a part of embedded methods (such as neural training algorithms). Algorithms of the second type are wrapped around predictors providing them subsets of features and receiving their feedback (usually accuracy). These wrapper approaches are aimed at improving results of the specific predictors they work with. The third type includes feature selection algorithms that are independent of any predictors, filtering out features that have little chance to be useful in analysis of data. These filter methods are based on performance evaluation metric calculated directly from the data, without direct feedback from predictors that will finally be used on data with reduced number of features. Such algorithms are usually computationally less expensive than those from the first or the second group. This chapter is devoted to filter methods.

Włodzisław Duch

Chapter 4. Search Strategies

In order to make a search for good variable subsets, one has to know which subsets are good and which are not. In other words, an evaluation mechanism for an individual variable subset needs to be defined first.

Juha Reunanen

Chapter 5. Embedded Methods

Although many embedded feature selection methods have been introduced during the last few years, a unifying theoretical framework has not been developed to date. We start this chapter by defining such a framework which we think is general enough to cover many embedded methods. We will then discuss embedded methods based on

how

they solve the feature selection problem.

Thomas Navin Lal, Olivier Chapelle, Jason Weston, André Elisseeff

Chapter 6. Information-Theoretic Methods

Shannon’s seminal work on information theory provided the conceptual framework for communication through noisy channels (

Shannon, 1948

). This work, quantifying the information content of coded messages, established the basis for all current systems aiming to transmit information through any medium.

Kari Torkkola

Chapter 7. Ensemble Learning

Supervised ensemble methods construct a set of base learners (experts) and use their weighted outcome to predict new data. Numerous empirical studies confirm that ensemble methods often outperform any single base learner (

Freund and Schapire, 1996

Bauer and Kohavi, 1999

Dietterich, 2000b

). The improvement is intuitively clear when a base algorithm is unstable. In an unstable algorithm small changes in the training data lead to large changes in the resulting base learner (such as for decision tree, neural network, etc). Recently, a series of theoretical developments (

Bousquet and Elisseeff, 2000

Poggio et al., 2002

Mukherjee et al., 2003

Poggio et al., 2004

) also confirmed the fundamental role of stability for generalization (ability to perform well on the unseen data) of any learning engine. Given a multivariate learning algorithm, model selection and feature selection are closely related problems (the latter is a special case of the former). Thus, it is sensible that model-based feature selection methods (wrappers, embedded) would benefit from the regularization effect provided by ensemble aggregation. This is especially true for the fast, greedy and unstable learners often used for feature evaluation.

Eugene Tuv

Chapter 8. Fuzzy Neural Networks

The theory of

fuzzy logic

, founded by Zadeh (1965), deals with the linguistic notion of graded membership, unlike the computational functions of the digital computer with bivalent propositions. Since mentation and cognitive functions of brains are based on

relative grades

of information acquired by the natural (biological) sensory systems, fuzzy logic has been used as a powerful tool for modeling human thinking and cognition (

Gupta and Sinha, 1999

Gupta et al., 2003

). The

perceptions

and

actions

of the cognitive process thus act on the graded information associated with fuzzy concepts, fuzzy judgment, fuzzy reasoning, and cognition. The most successful domain of fuzzy logic has been in the field of feedback control of various physical and chemical processes such as temperature, electric current, flow of liquid/gas, and the motion of machines (

Gupta, 1994

Rao and Gupta, 1994

Sun and Jang, 1993

Gupta and Kaufmann, 1988

Kiszka et al., 1985

Berenji and Langari, 1992

Lee, 1990a

). Fuzzy logic principles can also be applied to other areas. For example, these fuzzy principles have been used in the area such as fuzzy knowledge-based systems that use fuzzy IF-THEN rules,

fuzzy software engineering

, which may incorporate fuzziness in data and programs, and fuzzy database systems in the field of medicine, economics, and management problems. It is exciting to note that some consumer electronic and automotive industry products in the current market have used technology based on fuzzy logic, and the performance of these products has significantly improved (

Al-Holou et al., 2002

Eichfeld et al., 1996

Madan M. Gupta, Noriyasu Homma, Zeng-Guang Hou

Feature Selection Challenge

Frontmatter

Chapter 9. Design and Analysis of the NIPS2003 Challenge

We organized in 2003 a benchmark of feature selection methods, whose results are summarized and analyzed in this chapter. The top ranking entrants of the competition describe their methods and results in more detail in the following chapters. We provided participants with five datasets from different application domains and called for classification results using a minimal number of features. Participants were asked to make on-line submissions on two test sets: a validation set and a “final” test set, with performance on the validation set being presented immedi to the participant and performance on the final test set presented at the end of the competition. The competition took place over a period of 13 weeks and attracted 78 research groups. In total 1863 entries were made on the validation sets during the development period and 135 entries on all test sets for the final competition. The winners used a combination of Bayesian neural networks with ARD priors and Dirichlet diffusion trees. Other top entries used a variety of methods for feature selection, which combined filters and/or wrapper or embedded methods using Random Forests, kernel methods, neural networks as classification engine. The classification engines most often used after feature selection are regularized kernel methods, including SVMs. The results of the benchmark (including the predictions made by the participants and the features they selected) and the scoring software are publicly available. The benchmark is available at

http://www.nipsfsc.ecs.soton.ac.uk/

for post-challenge submissions to stimulate further research.

Isabelle Guyon, Steve Gunn, Asa Ben Hur, Gideon Dror

Chapter 10. High Dimensional Classification with Bayesian Neural Networks and Dirichlet Diffusion Trees

Our winning entry in the NIPS 2003 challenge was a hybrid, in which our predictions for the five data sets were made using different methods of classification, or, for the Madelon data set, by averaging the predictions produced using two methods. However, two aspects of our approach were the same for all data sets:

We reduced the number of features used for classification to no more than a few hundred, either by selecting a subset of features using simple univariate significance tests, or by performing a global dimensionality reduction using Principal Component Analysis (PCA).

We then applied a classification method based on Bayesian learning, using an Automatic Relevance Determination (ARD) prior that allows the model to determine which of these features are most relevant.

Radford M. Neal, Jianguo Zhang

Chapter 11. Ensembles of Regularized Least Squares Classifiers for High-Dimensional Problems

It has been recently pointed out that the Regularized Least Squares Classifier (RLSC), continues to be a viable option for binary classification problems. We apply RLSC to the datasets of the NIPS 2003 Feature Selection Challenge using Gaussian kernels. Since RLSC is sensitive to noise variables, ensemble-based variable filtering is applied first. RLSC makes use of the best-ranked variables only. We compare the performance of a stochastic ensemble of RLSCs to a single best RLSC. Our results indicate that in terms of classification error rate the two are similar on the challenge data. However, especially with large data sets, ensembles could provide other advantages that we list.

Kari Torkkola, Eugene Tuv

Chapter 12. Combining SVMs with Various Feature Selection Strategies

This article investigates the performance of combining support vector machines (SVM) and various feature selection strategies. Some of them are filter-type approaches: general feature selection methods independent of SVM, and some are wrapper-type methods: modifications of SVM which can be used to select features. We apply these strategies while participating to the NIPS 2003 Feature Selection Challenge and rank third as a group.

Yi-Wei Chen, Chih-Jen Lin

Chapter 13. Feature Selection with Transductive Support Vector Machines

SVM-related feature selection has shown to be effective, while feature selection with transductive SVMs has been less studied. This paper investigates the use of transductive SVMs for feature selection, based on three SVM-related feature selection methods: filtering scores + SVM wrapper, recursive feature elimination (RFE) and multiplicative updates(MU). We show transductive SVMs can be tailored to feature selection by embracing feature scores for feature filtering, or acting as wrappers and embedded feature selectors. We conduct experiments on the feature selection competition tasks to demonstrate the performance of Transductive SVMs in feature selection and classification.

Zhili Wu, Chunhung Li

Chapter 14. Variable Selection using Correlation and Single Variable Classifier Methods: Applications

Correlation and single variable classifier methods are very simple algorithms to select a subset of variables in a dimension reduction problem, which utilize some measures to detect relevancy of a single variable to the target classes without considering the predictor properties to be used. In this paper, along with the description of correlation and single variable classifier ranking methods, the application of these algorithms to the NIPS 2003 Feature Selection Challenge problems is also presented. The results show that these methods can be used as one of primary, computational cost efficient, and easy to implement techniques which have good performance especially when variable space is very large. Also, it has been shown that in all cases using an ensemble averaging predictor would result in a better performance, compared to a single stand-alone predictor.

Amir Reza Saffari Azar Alamdari

Chapter 15. Tree-Based Ensembles with Dynamic Soft Feature Selection

Tree-based ensembles have been proven to be among the most accurate and versatile state-of-the-art learning machines. The best known are MART (gradient tree boosting) and RF (Random Forest.) Usage of such ensembles in supervised problems with a very high dimensional input space can be challenging. Modelling with MART becomes computationally infeasible, and RF can produce low quality models when only a small subset of predictors is relevant. We propose an importance based sampling scheme where only a small sample of variables is selected at every step of ensemble construction. The sampling distribution is modified at every iteration to promote variables more relevant to the target. Experiments show that this method gives MART a very substantial performance boost with at least the same level of accuracy. It also adds a bias correction element to RF for very noisy problems. MART with dynamic feature selection produced very competitive results at the NIPS-2003 feature selection challenge.

Alexander Borisov, Victor Eruhimov, Eugene Tuv

Chapter 16. Sparse, Flexible and Efficient Modeling using L 1 Regularization

We consider the generic regularized optimization problem

$$ \hat w $$

(

) = arg min

∑

(

)+λ

(

). We derive a general characterization of the properties of (loss

, penalty

) pairs which give piecewise linear coefficient paths. Such pairs allow us to efficiently generate the full regularized coefficient paths.We illustrate how we can use our results to build robust, efficient and adaptable modeling tools.

Saharon Rosset, Ji Zhu

Chapter 17. Margin Based Feature Selection and Infogain with Standard Classifiers

The decision to devote a week or two to playing with the feature selection challenge (FSC) turned into a major effort that took up most of our time a few months. In most cases we used standard algorithms, with obvious modifications for the balanced error measure. Surprisingly enough, the naïve methods we used turned out to be among the best submissions to the FSC.

Ran Gilad-Bachrach, Amir Navot

Chapter 18. Bayesian Support Vector Machines for Feature Ranking and Selection

In this chapter, we develop and evaluate a feature selection algorithm for Bayesian support vector machines. The relevance level of features are represented by ARD (automatic relevance determination) parameters, which are optimized by maximizing the model evidence in the Bayesian framework. The features are ranked in descending order using the optimal ARD values, and then forward selection is carried out to determine the minimal set of relevant features. In the numerical experiments, our approach using ARD for feature ranking can achieve a more compact feature set than standard ranking techniques, along with better generalization performance.

Wei Chu, S. Sathiya Keerthi, Chong Jin Ong, Zoubin Ghahramani

Chapter 19. Nonlinear Feature Selection with the Potential Support Vector Machine

We describe the “Potential Support Vector Machine” (P-SVM) which is a new filter method for feature selection. The idea of the P-SVM feature selection is to exchange the role of features and data points in order to construct “support features”. The “support features” are the selected features. The P-SVM uses a novel objective function and novel constraints — one constraint for each feature. As with standard SVMs, the objective function represents a complexity or capacity measure whereas the constraints enforce low empirical error. In this contribution we extend the P-SVM in two directions. First, we introduce a parameter which controls the redundancy among the selected features. Secondly, we propose a nonlinear version of the P-SVM feature selection which is based on neural network techniques. Finally, the linear and nonlinear P-SVM feature selection approach is demonstrated on toy data sets and on data sets from the NIPS 2003 feature selection challenge.

Sepp Hochreiter, Klaus Obermayer

Chapter 20. Combining a Filter Method with SVMs

Our goal for the competition was to evaluate the usefulness of simple machine learning techniques. We decided to use the Fisher criterion (see Chapter 2) as a feature selection method and Support Vector Machines (see Chapter 1) for the classification part. Here we explain how we chose the regularization parameter

of the SVM, how we determined the kernel parameter

and how we estimated the number of features used for each data set. All analyzes were carried out on the training sets of the competition data. We choose the data set

Arcene

as an example to explain the approach step by step.

In our view the point of this competition was the construction of a well performing classifier rather than the systematic analysis of a specific approach. This is why our search for the best classifier was only guided by the described methods and that we deviated from the road map at several occasions. All calculations were done with the software (

Spider, 2005

Thomas Navin Lal, Olivier Chapelle, Bernhard Schölkopf

Chapter 21. Feature Selection via Sensitivity Analysis with Direct Kernel PLS

This chapter introduces Direct Kernel Partial Least Squares (DK-PLS) and feature selection via sensitivity analysis for DK-PLS. The overall feature selection strategy for the five data sets used in the NIPS competition is outlined as well.

Mark J. Embrechts, Robert A. Bress, Robert H. Kewley

Chapter 22. Information Gain, Correlation and Support Vector Machines

We report on our approach, CBAmethod3E, which was submitted to the NIPS 2003 Feature Selection Challenge on Dec. 8, 2003. Our approach consists of combining filtering techniques for variable selection, information gain and feature correlation, with Support Vector Machines for induction. We ranked 13th overall and ranked 6th as a group. It is worth pointing out that our feature selection method was very successful in selecting the second smallest set of features among the top-20 submissions, and in identifying almost all probes in the datasets, resulting in the challenge’s best performance on the latter benchmark.

Danny Roobaert, Grigoris Karakoulas, Nitesh V. Chawla

Chapter 23. Mining for Complex Models Comprising Feature Selection and Classification

Different classification tasks require different learning schemes to be satisfactorily solved. Most real-world datasets can be modeled only by complex structures resulting from deep data exploration with a number of different classification and data transformation methods. The search through the space of complex structures must be augmented with reliable validation strategies. All these techniques were necessary to build accurate models for the five high-dimensional datasets of the NIPS 2003 Feature Selection Challenge. Several feature selection algorithms (e.g. based on variance, correlation coefficient, decision trees) and several classification schemes (e.g. nearest neighbors, Normalized RBF, Support Vector Machines) were used to build complex models which transform the data and then classify. Committees of feature selection models and ensemble classifiers were also very helpful to construct models of high generalization abilities.

Krzysztof Grabczewski, Norbert Jankowski

Chapter 24. Combining Information-Based Supervised and Unsupervised Feature Selection

The filter is a simple and practical method for feature selection, but it can introduce biases resulting in decreased prediction performance. We propose an enhanced filter method that exploits features from two information-based filtering steps: supervised and unsupervised. By combining the features in these steps we attempt to reduce biases caused by misleading causal relations induced in the supervised selection procedure. When tested with the five datasets given at the NIPS 2003 Feature Extraction Workshop, our approach attained a significant performance, considering the simplicity of the approach. We expect the combined information-based method to be a promising substitute for classical filter methods.

Sang-Kyun Lee, Seung-Joon Yi, Byoung-Tak Zhang

Chapter 25. An Enhanced Selective Naïve Bayes Method with Optimal Discretization

In this chapter, we present an extension of the wrapper approach applied to the predictor. The originality is to use the area under the training lift curve as a criterion of feature set optimality and to preprocess the numeric variables with a new optimal discretization method. The method is experimented on the NIPS 2003 datasets both as a wrapper and as a filter for multi-layer perceptron.

Marc Boullé

Chapter 26. An Input Variable Importance Definition based on Empirical Data Probability Distribution

We propose in this chapter a new method to score subsets of variables according to their usefulness for a given model. It can be qualified as a variable ranking method ‘in the context of other variables’. The method consists in replacing a variable value by another value obtained by randomly choosing a among other values of that variable in the training set. The impact of this change on the output is measured and averaged over all training examples and changes of that variable for a given training example. As a search strategy, backward elimination is used. This method is applicable on every kind of model and on classification or regression task. We assess the efficiency of the method with our results on the NIPS 2003 feature selection challenge.

V. Lemaire, F. Clérot

New Perspectives in Feature Extraction

Frontmatter

Chapter 27. Spectral Dimensionality Reduction

In this chapter, we study and put under a common framework a number of non-linear dimensionality reduction methods, such as Locally Linear Embedding, Isomap, Laplacian eigenmaps and kernel PCA, which are based on performing an eigen-decomposition (hence the name “spectral”). That framework also includes classical methods such as PCA and metric multidimensional scaling (MDS). It also includes the data transformation step used in spectral clustering. We show that in all of these cases the learning algorithm estimates the principal eigenfunctions of an operator that depends on the unknown data density and on a kernel that is not necessarily positive semi-definite. This helps generalizing some of these algorithms so as to predict an embedding for out-of-sample examples without having to retrain the model. It also makes it more transparent what these algorithm are minimizing on the empirical data and gives a corresponding notion of generalization error.

Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux, Jean-François Paiement, Pascal Vincent, Marie Ouimet

Chapter 28. Constructing Orthogonal Latent Features for Arbitrary Loss

A boosting framework for constructing orthogonal features targeted to a given loss function is developed. Combined with techniques from spectral methods such as PCA and PLS, an orthogonal boosting algorithm for linear hypothesis is used to efficiently construct orthogonal latent features selected to optimize the given loss function. The method is generalized to construct orthogonal nonlinear features using the kernel trick. The resulting method, Boosted Latent Features (BLF) is demonstrated to both construct valuable orthogonal features and to be a competitive inference method for a variety of loss functions. For the least squares loss, BLF reduces to the PLS algorithm and preserves all the attractive properties of that algorithm. As in PCA and PLS, the resulting nonlinear features are valuable for visualization, dimensionality reduction, improving generalization by regularization, and use in other learning algorithms, but now these features can be targeted to a specific inference task/loss function. The data matrix is factorized by the extracted features. The low-rank approximation of the data matrix provides efficiency and stability in computation, an attractive characteristic of PLS-type methods. Computational results demonstrate the effectiveness of the approach on a wide range of classification and regression problems.

Michinari Momma, Kristin P. Bennett

Chapter 29. Large Margin Principles for Feature Selection

In this paper we introduce a margin based feature selection criterion and apply it to measure the quality of sets of features. Using margins we devise novel selection algorithms for multi-class categorization problems and provide theoretical generalization bound. We also study the well known

Relief

algorithm and show that it resembles a gradient ascent over our margin criterion. We report promising results on various datasets.

Ran Gilad-Bachrach, Amir Navot, Naftali Tishby

Chapter 30. Feature Extraction for Classification of Proteomic Mass Spectra: A Comparative Study

To satisfy the ever growing need for effective screening and diagnostic tests, medical practitioners have turned their attention to high resolution, high throughput methods. One approach is to use mass spectrometry based methods for disease diagnosis. Effective diagnosis is achieved by classifying the mass spectra as belonging to healthy or diseased individuals. Unfortunately, the high resolution mass spectrometry data contains a large degree of noisy, redundant and irrelevant information, making accurate classification difficult. To overcome these obstacles, feature extraction methods are used to select or create small sets of relevant features. This paper compares existing feature selection methods to a novel wrapper-based feature selection and centroid-based classification method. A key contribution is the exposition of different feature extraction techniques, which encompass dimensionality reduction and feature selection methods. The experiments, on two cancer data sets, indicate that feature selection algorithms tend to both reduce data dimensionality and increase classification accuracy, while the dimensionality reduction techniques sacrifice performance as a result of lowering the number of features. In order to evaluate the dimensionality reduction and feature selection techniques, we use a simple classifier, thereby making the approach tractable. In relation to previous research, the proposed algorithm is very competitive in terms of (i) classification accuracy, (ii) size of feature sets, (iii) usage of computational resources during both training and classification phases.

Ilya Levner, Vadim Bulitko, Guohui Lin

Chapter 31. Sequence Motifs: Highly Predictive Features of Protein Function

Protein function prediction, i.e. classification of proteins according to their biological function, is an important task in bioinformatics. In this chapter, we illustrate that the presence of sequence motifs — elements that are conserved across different proteins — are highly discriminative features for predicting the function of a protein. This is in agreement with the biological thinking that considers motifs to be the building blocks of protein sequences. We focus on proteins annotated as enzymes, and show that despite the fact that motif composition is a very high dimensional representation of a sequence, that most classes of enzymes can be classified using a handful of motifs, yielding accurate and interpretable classifiers. The enzyme data falls into a large number of classes; we find that the one-against-the-rest multi-class method works better than the one-against-one method on this data.

Asa Ben-Hur, Douglas Brutlag

Backmatter

Title: Feature Extraction
Editors: Isabelle Guyon
Masoud Nikravesh
Steve Gunn
Lotfi A. Zadeh
Publisher: Springer Berlin Heidelberg
Electronic ISBN: 978-3-540-35488-8
Print ISBN: 978-3-540-35487-1
DOI: https://doi.org/10.1007/978-3-540-35488-8

Springer Professional

About this book

Table of Contents

Frontmatter

An Introduction to Feature Extraction

An Introduction to Feature Extraction

Feature Extraction Fundamentals

Frontmatter

Chapter 1. Learning Machines

Chapter 2. Assessment Methods

Chapter 3. Filter Methods

Chapter 4. Search Strategies

Chapter 5. Embedded Methods

Chapter 6. Information-Theoretic Methods

Chapter 7. Ensemble Learning

Chapter 8. Fuzzy Neural Networks

Feature Selection Challenge

Frontmatter

Chapter 9. Design and Analysis of the NIPS2003 Challenge

Chapter 10. High Dimensional Classification with Bayesian Neural Networks and Dirichlet Diffusion Trees

Chapter 11. Ensembles of Regularized Least Squares Classifiers for High-Dimensional Problems

Chapter 12. Combining SVMs with Various Feature Selection Strategies

Chapter 13. Feature Selection with Transductive Support Vector Machines

Chapter 14. Variable Selection using Correlation and Single Variable Classifier Methods: Applications

Chapter 15. Tree-Based Ensembles with Dynamic Soft Feature Selection

Chapter 16. Sparse, Flexible and Efficient Modeling using L 1 Regularization

Chapter 17. Margin Based Feature Selection and Infogain with Standard Classifiers

Chapter 18. Bayesian Support Vector Machines for Feature Ranking and Selection

Chapter 19. Nonlinear Feature Selection with the Potential Support Vector Machine

Chapter 20. Combining a Filter Method with SVMs

Chapter 21. Feature Selection via Sensitivity Analysis with Direct Kernel PLS

Chapter 22. Information Gain, Correlation and Support Vector Machines

Chapter 23. Mining for Complex Models Comprising Feature Selection and Classification

Chapter 24. Combining Information-Based Supervised and Unsupervised Feature Selection

Chapter 25. An Enhanced Selective Naïve Bayes Method with Optimal Discretization

Chapter 26. An Input Variable Importance Definition based on Empirical Data Probability Distribution

New Perspectives in Feature Extraction

Frontmatter

Chapter 27. Spectral Dimensionality Reduction

Chapter 28. Constructing Orthogonal Latent Features for Arbitrary Loss

Chapter 29. Large Margin Principles for Feature Selection

Chapter 30. Feature Extraction for Classification of Proteomic Mass Spectra: A Comparative Study

Chapter 31. Sequence Motifs: Highly Predictive Features of Protein Function

Backmatter

Premium Partner