2013 | OriginalPaper | Chapter
Semi-supervised Constituent Grammar Induction Based on Text Chunking Information
Authors : Jesús Santamaría, Lourdes Araujo
Published in: Computational Linguistics and Intelligent Text Processing
Publisher: Springer Berlin Heidelberg
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
There is a growing interest in unsupervised grammar induction, which does not require syntactic annotations, but provides less accurate results than the supervised approach. Aiming at improving the accuracy of the unsupervised approach, we have resorted to additional information, which can be obtained more easily. Shallow parsing or chunking identifies the sentence constituents (noun phrases, verb phrases, etc.), but without specifying their internal structure. There exist highly accurate systems to perform this task, and thus this information is available even for languages for which large syntactically annotated corpora are lacking. In this work we have investigated how the results of a pattern-based unsupervised grammar induction system improve as data on new kind of phrases are added, leading to a significant improvement in performance. We have analyzed the results for three different languages. We have also shown that the system is able to significantly improve the results of the unsupervised system using the chunks provided by automatic chunkers.