Skip to main content
Top

2002 | OriginalPaper | Chapter

Automatic Information Extraction for Multiple Singular Web Pages

Authors : Chia-Hui Chang, Shih-Chien Kuo, Kuo-Yu Hwang, Tsung-Hsin Ho, Chih-Lung Lin

Published in: Advances in Knowledge Discovery and Data Mining

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

The World Wide Web is now undeniably the richest and most dense source of information, yet its structure makes it difficult to make use of that information in a systematic way. This paper extends a pattern discovery approach called IEPAD to the rapid generation of information extractors that can extract structured data from semi-structured Web documents. IEPAD is proposed to automate wrapper generation from a multiple-record Web page without user-labeled examples. In this paper, we consider another case when multiple Web pages are available but each input Web page contains only one record (called singular Web pages). To solve this case, a hierarchical multiple string alignment is proposed to allow wrapper induction for multiple singular Web pages.

Metadata
Title
Automatic Information Extraction for Multiple Singular Web Pages
Authors
Chia-Hui Chang
Shih-Chien Kuo
Kuo-Yu Hwang
Tsung-Hsin Ho
Chih-Lung Lin
Copyright Year
2002
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/3-540-47887-6_29