Internet data sources provide us with large image datasets which are mostly without any explicit labeling. This setting is ideal for semi-supervised learning which seeks to exploit labeled data as well as a large pool of unlabeled data points to improve learning and classification. While we have made considerable progress on the theory and algorithms, we have seen limited success to translate such progress to the large scale datasets which these methods are inspired by. We investigate the computational complexity of popular graph-based semi-supervised learning algorithms together with different possible speed-ups. Our findings lead to a new algorithm that scales up to 40 times larger datasets in comparison to previous approaches and even increases the classification performance. Our method is based on the key insights that by employing a density-based measure unlabeled data points can be selected similar to an active learning scheme. This leads to a compact graph resulting in an improved performance up to 11.6% at reduced computational costs.
Weitere Kapitel dieses Buchs durch Wischen aufrufen
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
- Semi-Supervised Learning on a Budget: Scaling Up to Large Datasets
- Springer Berlin Heidelberg
Neuer Inhalt/© ITandMEDIA