Skip to main content

2001 | OriginalPaper | Buchkapitel

Active Hidden Markov Models for Information Extraction

verfasst von : Tobias Scheffer, Christian Decomain, Stefan Wrobel

Erschienen in: Advances in Intelligent Data Analysis

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Information extraction from HTML documents requires a classifier capable of assigning semantic labels to the words or word sequences to be extracted. If completely labeled documents are available for training, well-known Markov model techniques can be used to learn such classifiers. In this paper, we consider the more challenging task of learning hidden Markov models (HMMs) when only partially (sparsely) labeled documents are available for training. We first give detailed account of the task and its appropriate loss function, and show how it can be minimized given an HMM. We describe an EM style algorithm for learning HMMs from partially labeled data. We then present an active learning algorithm that selects “difficult” unlabeled tokens and asks the user to label them. We study empirically by how much active learning reduces the required data labeling effort, or increases the quality of the learned model achievable with a given amount of user effort.

Metadaten
Titel
Active Hidden Markov Models for Information Extraction
verfasst von
Tobias Scheffer
Christian Decomain
Stefan Wrobel
Copyright-Jahr
2001
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/3-540-44816-0_31

Premium Partner