Skip to main content

2002 | OriginalPaper | Buchkapitel

Information Extraction in Structured Documents Using Tree Automata Induction

verfasst von : Raymond Kosala, Jan Van den Bussche, Maurice Bruynooghe, Hendrik Blockeel

Erschienen in: Principles of Data Mining and Knowledge Discovery

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Information extraction (IE) addresses the problem of extracting specific information from a collection of documents. Much of the previous work for IE from structured documents formatted in HTML or XML uses techniques for IE from strings, such as grammar and automata induction. However, such documents have a tree structure. Hence it is natural to investigate methods that are able to recognise and exploit this tree structure. We do this by exploring the use of tree automata for IE in structured documents. Experimental results on benchmark data sets show that our approach compares favorably with previous approaches.

Metadaten
Titel
Information Extraction in Structured Documents Using Tree Automata Induction
verfasst von
Raymond Kosala
Jan Van den Bussche
Maurice Bruynooghe
Hendrik Blockeel
Copyright-Jahr
2002
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/3-540-45681-3_25

Premium Partner