2010 | OriginalPaper | Chapter
Representation of Hypertext Documents Based on Terms, Links and Text Compressibility
Authors : Julian Szymański, Włodzisław Duch
Published in: Neural Information Processing. Theory and Algorithms
Publisher: Springer Berlin Heidelberg
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
Three methods for representation of hypertext based on links, terms and text compressibility have been compared to check their usefulness in document classification. Documents for classification have been selected from the Wikipedia articles taken from five distinct categories. For each representation dimensionality reduction by Principal Component Analysis has been performed, providing rough visual presentation of the data. Compression-based feature space representation needed about 5 times less PCA vectors than the term or link-based representations to reach 90% cumulative variance, giving comparable results of classification by Support Vector Machines.