2014 | OriginalPaper | Buchkapitel
Automatic Extraction of Logical Web Lists
verfasst von : Pasqua Fabiana Lanotte, Fabio Fumarola, Michelangelo Ceci, Andrea Scarpino, Michele Damiano Torelli, Donato Malerba
Erschienen in: Foundations of Intelligent Systems
Verlag: Springer International Publishing
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Recently, there has been increased interest in the extraction of structured data from the web (both “Surface” Web and“Hidden” Web). In particular, in this paper we focus on the automatic extraction of Web Lists. Although this task has been studied extensively, existing approaches are based on the assumption that lists are wholly contained in a Web page.They do not consider that many websites span their listing on several Web Pages and show for each of these only a partial
view
. Similar to databases, where a view can represent a subset of the data contained in a table, they split a
logical list
in multiple views (
view lists
). Automatic extraction of
logical lists
is an open problem. To tackle this issue we propose an unsupervised and domain-independent algorithm for
logical list extraction
. Experimental results on real-life and data-intensive Web sites confirm the effectiveness of our approach.