2013 | OriginalPaper | Chapter
Clustering with Probabilistic Topic Models on Arabic Texts
Authors : Abdessalem Kelaiaia, Hayet Farida Merouani
Published in: Modeling Approaches and Algorithms for Advanced Computer Applications
Publisher: Springer International Publishing
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
Recently, probabilistic topic models such as LDA (Latent Dirichlet Allocation) have been widely used for applications in many text mining tasks such as retrieval, summarization, and clustering on different languages. In this paper we present a first comparative study between LDA and K-means, two well-known methods respectively in topics identification and clustering applied on Arabic texts. Our aim is to compare the influence of morpho-syntactic characteristics of Arabic language on performance of first method compared to the second one. In order to study different aspects of those methods the study is conducted on benchmark document collection in which the quality of clustering was measured by the use of two well-known evaluation measure, F-measure and Entropy. The results consistently show that LDA perform best results more than K-means in most cases.