2011 | OriginalPaper | Chapter
Concluding Pattern of Web Page Based on String Pattern Matching
Authors : Yiqing Cai, Xinjun Wang, Chunsheng Lu, Zhongmin Yan, Zhaohui Peng
Published in: Web Information Systems and Mining
Publisher: Springer Berlin Heidelberg
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
Presently, each Web site has its own topics and formats to arrange the page structure and present information. Therefore, there is a great need for value-added service that extracts information from multiple sources. Data extraction from HTML is usually performed by software modules called wrappers. In many studies of constructing wrapper, concluding the pattern of the Web site is a importance task in the beginning. This paper studies the problem of concluding pattern from a Web page that contains several nested structure and repeated structure. In our method, the algorithm bases on string pattern matching can discover the nested structure and the repeated structure in a Web page. Then a regular expression will be generated as the pattern of the Web site.