In the presented study, the problem of interactive feature extraction, i.e., supported by interaction with users, is discussed, and several innovative approaches to automating feature creation and selection are proposed. The current state of knowledge on feature extraction processes in commercial applications is shown. The problems associated with processing big data sets as well as approaches to process high-dimensional time series are discussed. The introduced feature extraction methods were subjected to experimental verification on real-life problems and data. Besides the experimentation, the practical case studies and applications of developed techniques in selected scientific projects are shown.
Feature extraction addresses the problem of finding the most compact and informative data representation resulting in improved efficiency of data storage and processing, facilitating the subsequent learning and generalization steps. Feature extraction not only simplifies the data representation but also enables the acquisition of features that can be further easily utilized by both analysts and learning algorithms. In its most common flow, the process starts from an initial set of measured data and builds derived features intended to be informative and non-redundant. Logically, there are two phases of this process: the first is the construction of the new attributes based on original data (sometimes referred to as feature engineering), the second is a selection of the most important among the attributes (sometimes referred to as feature selection). There are many approaches to feature creation and selection that are well-described in the literature. Still, it is hard to find methods facilitating interaction with users, which would take into consideration users’ knowledge about the domain, their experience, and preferences.
In the study on the interactiveness of the feature extraction, the problems of deriving useful and understandable attributes from raw sensor readings and reducing the amount of those attributes to achieve possibly simplest, yet accurate, models are addressed. The proposed methods go beyond the current standards by enabling a more efficient way to express the domain knowledge associated with the most important subsets of attributes. The proposed algorithms for the construction and selection of features can use various forms of information granulation, problem decomposition, and parallelization. They can also tackle large spaces of derivable features and ensure a satisfactory (according to a given criterion) level of information about the target variable (decision), even after removing a substantial number of features.
The proposed approaches have been developed based on the experience gained in the course of several research projects in the fields of data analysis and processing multi-sensor data streams. The methods have been validated in terms of the quality of the extracted features, as well as throughput, scalability, and robustness of their operation. The discussed methodology has been verified in open data mining competitions to confirm its usefulness.
Anzeige
Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.
We call a matrix \(X \in \mathbb {Z}^{n \times n}\) unitary iff \(X X^{H} = X^{H} X = \mathbb {I}\). For a real matrix \(X \in \mathbb {R}^{n \times n}\), we have \(X^{H} = X^{T}\), and we say that a matrix is orthogonal, i.e., \(X X^{T} = X^{T} X = \mathbb {I}\).
In literature, matrix \(\textbf{V}\) is often denoted as \(\textbf{W}\), whereas \(\mathbf {\Sigma }\) as \(\mathbf {\Lambda }\). We, however, continue with the notation as introduces with SVD example above.
For a given corpus, the co-occurrence of two words is the number of times they appear together (and are close enough, e.g., no more than 30 words separates them in text) in documents.
It is worth mentioning that the neural network input is a numeric vector embedding for each word (typically, word vectorization is performed after the initial preprocessing).
Entropy is one of the basic measures of information contained in data. For a discrete random variable X with possible values \(\{x_1, .., x_m\}\) is defined as: \(H(X) = -\sum _{i=1}^{m} p(x_i)log(p(x_i))\).
If attribute domains are overlapping, i.e., there exist \(a_i,a_j \in A\) for which \(V_{a_i} \cap V_{a_j} \ne \emptyset \), then concatenation may include a delimiter \(\mid _A\) such that for each \(a \in A\) we have \(\mid _A \notin V_a\).