2005 | OriginalPaper | Chapter
WebArc: Website Archival Using a Structured Approach
Authors : Ee-Peng Lim, Maria Marissa
Published in: Digital Libraries: Implementing Strategies and Sharing Experiences
Publisher: Springer Berlin Heidelberg
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
Website archival refers to the task of monitoring and storing snapshots of website(s) for future retrieval and analysis. This task is particularly important for websites that have content changing over time with older information constantly overwritten by newer one. In this paper, we propose
WebArc
as a set of software tools to allow users to construct a logical structure for a website to be archived. Classifiers are trained to determine relevant web pages and their categories, and subsequently used in website downloading. The archival schedule can be specified and executed by a scheduler. A website viewer is also developed to browse one or more versions of archived web pages.