2015 | OriginalPaper | Buchkapitel
Adaptive Focused Crawling of Linked Data
Erschienen in: Web Information Systems Engineering – WISE 2015
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Given the evolution of publicly available Linked Data, crawling and preservation have become increasingly important challenges. Due to the scale of available data on the Web, efficient focused crawling approaches which are able to capture the relevant semantic neighborhood of seed entities are required. Here, determining relevant entities for a given set of seed entities is a crucial problem. While the weight of seeds within a seed list vary significantly with respect to the crawl intent, we argue that an adaptive crawler is required, which considers such characteristics when configuring the crawling and relevance detection approach. To address this problem, we introduce a crawling configuration, which considers seed list-specific features as part of its crawling and ranking algorithm. We evaluate it through extensive experiments in comparison to a number of baseline methods and crawling parameters. We demonstrate that, configurations which consider seed list features outperform the baselines and present further insights gained from our experiments.