2011 | OriginalPaper | Buchkapitel
Automatic Recognition of Chinese Unknown Word for Single-Character and Affix Models
verfasst von : Xin Jiang, Ling Wang, Yanjiao Cao, Zhao Lu
Erschienen in: Knowledge Engineering and Management
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
This paper presents a novel method to recognize Chinese unknown word from short texts corpus, which is based our observation of both single-character and affix models of Chinese unknown word. In our approach, we collect some news titles of a news site and view these titles as short texts. There are three steps in our approach: First, all collected news titles are segmented with ICTCLAS, and statistics of potential unknown words are conducted. Second, all potential unknown words are classified into either single-character model or affix model based on structures of unknown word. Some filtration methods are used to filter garbage strings. Finally, unknown word is extracted according to the frequencies of word. We have got the excellent precision and the recalling rates, especially for the single-character model. The experiment results show that our approach is simple yet effective.