2005 | OriginalPaper | Chapter
wHunter: A Focused Web Crawler – A Tool for Digital Library
Authors : Yun Huang, YunMing Ye
Published in: Digital Libraries: International Collaboration and Cross-Fertilization
Publisher: Springer Berlin Heidelberg
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
Topic-driven Web Crawler or focused crawler is the key tool of on-line web information library. It’s a challenging issue that how to achieve good performance efficiently with limited time and space resources. This paper proposes a focused web crawler wHunter that implements incremental and multi-strategy learning by taking the advantages of both SVM (support vector machines) and naïve Bayes. On the one hand, the initial performance is guaranteed via SVM classifier; on the other hand, when enough web pages are obtained, the classifier is switched to naïve Bayes so that on-line incremental learning is achieved. Experimental results show that our proposed algorithm is efficient and easy to implement.