2011 | OriginalPaper | Buchkapitel
Automatic Extraction Rules Generation Based on XPath Pattern Learning
verfasst von : Jingwei Zhang, Can Zhang, Weining Qian, Aoying Zhou
Erschienen in: Web Information Systems Engineering – WISE 2010 Workshops
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Web forums have become important information sources on the Web due to their rich content contributed by millions of Internet users every day. Data extraction from Web pages is a key but cumbersome step for data analysis because of significant human intervention. Web forums have fairly regular structures which allow us to generate extraction rules automatically according to their paths. In this paper, we introduce formal expressions for XPath patterns and pattern mapping rules, and advise machine learning methods to generate extraction rules for automatic data extraction from Web forums. The experimental results on real-life Web forums show good feasibility and accuracy for forum data.