2013 | OriginalPaper | Chapter
Learning Effective XML Classifiers Based on Discriminatory Structures and Nested Content
Authors : Gianni Costa, Riccardo Ortale, Ettore Ritacco
Published in: Knowledge Discovery, Knowledge Engineering and Knowledge Management
Publisher: Springer Berlin Heidelberg
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
Supervised classification aims to learn a model (or a classifier) from a collection of XML documents individually marked with one of a predefined set of class labels. The learnt classifier isolates each class by the content and structural regularities observed within the respective labeled XML documents and, thus, allows to predict the unknown class of unlabeled XML documents by looking at their content and structural features. The classification of unlabeled XML documents into the predefined classes is a valuable support for more effective and efficient XML search, retrieval and filtering.
We discuss an approach for learning intelligible XML classifiers. XML documents are represented as transactions in a space of boolean features, that are informative of their content and structure. Learning algorithms induce compact associative classifiers with outperforming effectiveness from the transactional XML representation. A preprocessing step contributes to the scalability of the approach with the size of XML corpora.