Skip to main content
Top

2004 | OriginalPaper | Chapter

Schema-Based Web Wrapping

Authors : Sergio Flesca, Andrea Tagarelli

Published in: Conceptual Modeling – ER 2004

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

An effective solution to automate information integration is represented by wrappers, i.e. programs which are designed for extracting relevant contents from a particular information source, such as web pages. Wrappers allow such contents to be delivered through a self-describing and easily processable representation model. However, most existing approaches to wrapper designing focus mainly on how to generate extraction rules, while do not weigh the importance of specifying and exploiting the desired schema of the extracted information. In this paper, we propose a new wrapping approach which encompasses both extraction rules and the schema of required information in wrapper definitions. We investigate the advantages of suitably exploiting extraction schemata, and we define a clean declarative wrapper semantics by introducing (preferred) extraction models for source HTML documents with respect to a given wrapper.

Metadata
Title
Schema-Based Web Wrapping
Authors
Sergio Flesca
Andrea Tagarelli
Copyright Year
2004
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-540-30464-7_23

Premium Partner