2004 | OriginalPaper | Buchkapitel
Schema-Based Web Wrapping
verfasst von : Sergio Flesca, Andrea Tagarelli
Erschienen in: Conceptual Modeling – ER 2004
Verlag: Springer Berlin Heidelberg
Enthalten in: Professional Book Archive
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
An effective solution to automate information integration is represented by wrappers, i.e. programs which are designed for extracting relevant contents from a particular information source, such as web pages. Wrappers allow such contents to be delivered through a self-describing and easily processable representation model. However, most existing approaches to wrapper designing focus mainly on how to generate extraction rules, while do not weigh the importance of specifying and exploiting the desired schema of the extracted information. In this paper, we propose a new wrapping approach which encompasses both extraction rules and the schema of required information in wrapper definitions. We investigate the advantages of suitably exploiting extraction schemata, and we define a clean declarative wrapper semantics by introducing (preferred) extraction models for source HTML documents with respect to a given wrapper.