Skip to main content

2002 | OriginalPaper | Buchkapitel

Machine Learning of Generalized Document Templates for Data Extraction

verfasst von : Janusz Wnek

Erschienen in: Document Analysis Systems V

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

The purpose of this research is to reverse engineer the process of encoding data in structured documents and subsequently automate the process of extracting it. We assume a broad category of structured documents for processing that goes beyond form processing. In fact, the documents may have flexible layouts and consist of multiple and varying numbers of pages. The data extraction method (DataX) employs general templates generated by the Inductive Template Generator (InTeGen). The InTeGen method utilizes inductive learning from examples of documents with identified data elements. Both methods achieve high automation with minimal user’s input.

Metadaten
Titel
Machine Learning of Generalized Document Templates for Data Extraction
verfasst von
Janusz Wnek
Copyright-Jahr
2002
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/3-540-45869-7_48

Premium Partner