Erschienen in:
2022 | OriginalPaper | Buchkapitel
Selected Aspects of Interactive Feature Extraction
verfasst von : Marek Grzegorowski
Erschienen in: Transactions on Rough Sets XXIII
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by (Link öffnet in neuem Fenster)
Abstract
Das Kapitel vertieft sich in die Feinheiten der interaktiven Featureextraktion und betont die Notwendigkeit einer effizienten Verarbeitung von Daten aus verschiedenen Sensoren und Bereichen. Darin werden die Herausforderungen diskutiert, die sich aus der Vielfalt, Variabilität und Geschwindigkeit der Daten ergeben, und die Bedeutung interaktiver Datenexplorationstechniken bei der Entdeckung von Abhängigkeiten und der Formulierung von Hypothesen. Der Text untersucht auch die Bedeutung der Einbettung von Domänenwissen in die Datenanalyse, um sinnvolle Funktionen zu schaffen und die Modellleistung zu verbessern. Darüber hinaus wird das Konzept der widerstandsfähigen Feature-Selektion eingeführt, das darauf abzielt, Feature-Sets zu schaffen, die robust gegenüber fehlenden Daten sind, und die theoretischen und rechnerischen Aspekte dieses Ansatzes diskutiert. Das Kapitel schließt mit einer detaillierten experimentellen Bewertung der vorgeschlagenen Methoden in verschiedenen Bereichen, die ihre Wirksamkeit und Vielseitigkeit demonstriert.
KI-Generiert
Diese Zusammenfassung des Fachinhalts wurde mit Hilfe von KI generiert.
Abstract
In the presented study, the problem of interactive feature extraction, i.e., supported by interaction with users, is discussed, and several innovative approaches to automating feature creation and selection are proposed. The current state of knowledge on feature extraction processes in commercial applications is shown. The problems associated with processing big data sets as well as approaches to process high-dimensional time series are discussed. The introduced feature extraction methods were subjected to experimental verification on real-life problems and data. Besides the experimentation, the practical case studies and applications of developed techniques in selected scientific projects are shown.
Feature extraction addresses the problem of finding the most compact and informative data representation resulting in improved efficiency of data storage and processing, facilitating the subsequent learning and generalization steps. Feature extraction not only simplifies the data representation but also enables the acquisition of features that can be further easily utilized by both analysts and learning algorithms. In its most common flow, the process starts from an initial set of measured data and builds derived features intended to be informative and non-redundant. Logically, there are two phases of this process: the first is the construction of the new attributes based on original data (sometimes referred to as feature engineering), the second is a selection of the most important among the attributes (sometimes referred to as feature selection). There are many approaches to feature creation and selection that are well-described in the literature. Still, it is hard to find methods facilitating interaction with users, which would take into consideration users’ knowledge about the domain, their experience, and preferences.
In the study on the interactiveness of the feature extraction, the problems of deriving useful and understandable attributes from raw sensor readings and reducing the amount of those attributes to achieve possibly simplest, yet accurate, models are addressed. The proposed methods go beyond the current standards by enabling a more efficient way to express the domain knowledge associated with the most important subsets of attributes. The proposed algorithms for the construction and selection of features can use various forms of information granulation, problem decomposition, and parallelization. They can also tackle large spaces of derivable features and ensure a satisfactory (according to a given criterion) level of information about the target variable (decision), even after removing a substantial number of features.
The proposed approaches have been developed based on the experience gained in the course of several research projects in the fields of data analysis and processing multi-sensor data streams. The methods have been validated in terms of the quality of the extracted features, as well as throughput, scalability, and robustness of their operation. The discussed methodology has been verified in open data mining competitions to confirm its usefulness.
Anzeige