Skip to main content
Erschienen in:
Buchtitelbild

2009 | OriginalPaper | Buchkapitel

Learning with Missing or Incomplete Data

verfasst von : Bogdan Gabrys

Erschienen in: Image Analysis and Processing – ICIAP 2009

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

The problem of learning with missing or incomplete data has received a lot of attention in the literature [6,10,13,21,23]. The reasons for missing data can be multi-fold ranging from sensor failures in engineering applications to deliberate withholding of some information in medical questioners in the case of missing input feature values or lack of solved (labelled) cases required in supervised learning algorithms in the case of missing labels. And though such problems are very interesting from the practical and theoretical point of view, there are very few pattern recognition techniques which can deal with missing values in a straightforward and efficient manner. It is in a sharp contrast to the very efficient way in which humans deal with unknown data and are able to perform various pattern recognition tasks given only a subset of input features or few labelled reference cases.

In the context of pattern recognition or classification systems the problem of missing labels and the problem of missing features are very often treated separately.

The availability or otherwise of labels determines the type of the learning algorithm that can be used and has led to the well known split into supervised, unsupervised or more recently introduced hybrid/semi-supervised classes of learning algorithms.

Commonly, using supervised learning algorithms enables designing of robust and well performing classifiers. Unfortunately, in many real world applications labelling of the data is costly and thus possible only to some extent. Unlabelled data on the other hand is often available in large quantities but a classifier built using unsupervised learning is likely to demonstrate performance inferior to its supervised counterpart. The interest in a mixed supervised and unsupervised learning is thus a natural consequence of this state of things and various approaches have been discussed in the literature [2,5,10,12,14,15,18,19]. Our experimental results have shown [10] that when supported by unlabelled samples much less labelled data is generally required to build a classifier without compromising the classification performance. If only a very limited amount of labelled data is available the results based on random selection of labelled samples show high variability and the performance of the final classifier is more dependent on how reliable the labelled data samples are rather than use of additional unlabelled data. This points to a very interesting discussion point related to the issue of the trade-off between the information content in the observed data (in this case available labels) versus the impact that can be achieved by employing sophisticated data processing algorithms which we will also revisit when discussing approaches dealing with missing feature values.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Metadaten
Titel
Learning with Missing or Incomplete Data
verfasst von
Bogdan Gabrys
Copyright-Jahr
2009
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-04146-4_1