2012 | OriginalPaper | Buchkapitel
Generating Xpath Expressions for Structured Web Data Record Segmentation
verfasst von : Tomas Grigalis, Antanas Čenys
Erschienen in: Information and Software Technologies
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Record segmentation is a core problem in structured web data extraction. In this paper we present a novel technique that segments structured web data into individual data records that come from underlying database. Proposed technique exploits visual as well as structural features of web page elements to group them into semantically similar clusters. Resulting clusters reflect the page structure and are used to segment data records. During the segmentation process the technique also generates Xpath expressions. These expressions can be later used to directly extract data records from same template generated web pages without need to redo all the clustering and segmentation processes. Extracted structured data can be reused in wide range of applications, such as price comparison portals, meta-searching, knowledge bases and etc. The experimental evaluation results of proposed technique system on three publicly available benchmark data sets demonstrate nearly perfect results in terms of precision and recall.