skip to main content
10.3115/992066.992085dlproceedingsArticle/Chapter ViewAbstractPublication PagescolingConference Proceedingsconference-collections
Article
Free Access

Word identification for Mandarin Chinese sentences

Published:23 August 1992Publication History

ABSTRACT

Chinese sentences are composed with string of characters without blanks to mark words. However the basic unit for sentence parsing and understanding is word. Therefore the first step of processing Chinese sentences is to identify the words. The difficulties of identifying words include (1) the identification of complex words, such as Determinative-Measure, reduplications, derived words etc., (2) the identification of proper names, (3) resolving the ambiguous segmentations. In this paper, we propose the possible solutions for the above difficulties. We adopt a matching algorithm with 6 different heuristic rules to resolve the ambiguities and achieve an 99.77% of the success rate. The statistical data supports that the maximal matching algorithm is the most effective heuristics.

References

  1. J. S. Chang, "A Multiple-Corpus Approach to Identi cation of Chinese Surname-Names." Proc. of Natural Language Processing Pacific Rim Symposium, Singapore, 1991Google ScholarGoogle Scholar
  2. J. S. Chang, J. I. Chang and S. D. Chen, "A Method of Constraint Satisfaction and Statistical Optimization for Chinese Word Segmentation," Proc. of the 1991 R. O. C. Computational Linguistics Conference, Taiwan, 1991Google ScholarGoogle Scholar
  3. Y. R. Chao, A Grammar of Spoken Chinese, University of California Press, California, 1968Google ScholarGoogle Scholar
  4. K. J. Chen, C. J. Chen and L. J. Lee, "Analysis and Research in Chinese Sentences---Segmentation and Construction," Technical Report, TR-86-004, Nankang, Academia Sinica, 1986Google ScholarGoogle Scholar
  5. K. J. Chen and C. R. Huang, "Information-based Case Grammar," COLING-90, Vol. 2, p. 54--p. 59 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. J. Chen et al, "Compounds and Parsing in Mandarin Chinese," Proc. of National Computer Symposium, 1987Google ScholarGoogle Scholar
  7. G. Y. Chen, "A-not-A Questions in Chinese," manuscript, CKIP group, Academia Sinica, Taipei, 1991Google ScholarGoogle Scholar
  8. C. K. Fan and W. H. Tsai, "Automatic Word Identification in Chinese Sentences by the Relaxation Technique," Computer Processing of Chinese and Oriental Languages, Vol. 4, No. 1, November 1988Google ScholarGoogle Scholar
  9. R. Garside, G. Leech and G. Sampson, "The Computational Analysis of English --- a Corpusbased Approach," Longman Group UK Limited, 1987Google ScholarGoogle Scholar
  10. W. H. Ho, "Automatic Recognition of Chinese Words," Master Thesis, National Taiwan Institute of Technology, Taipei, Taiwan, 1983Google ScholarGoogle Scholar
  11. W. M. Hong, C. R. Huang, T. Z. Tang and K. J. Chen, "The Morphological Rules of Chinese Derivative Words," To be presented at the 1991 International Conference on Teaching Chinese as a Second Language, December, 1991, TaipeiGoogle ScholarGoogle Scholar
  12. C. Y. Jie, Y. Liu and N. Y. Liang, "On Methods of Chinese Automatic Segmentation," Journal of Chinese Information Processing, Vol. 3, No. 1, 1989Google ScholarGoogle Scholar
  13. B. I. Li, S. Lien, C. F. Sun and M. S. Sun, "A Maximal Matching Automatic Chinese Word Segmentation Algorithm Using Corpus Tagging for Ambiguity Resolution," Proc. of the 1991 R. O. C Computational Linguistics Conference, Taiwan, 1991Google ScholarGoogle Scholar
  14. N. Y. Liang, "Automatic Chinese Text Word Segmentation System --- CDWS". Journal of Chinese Information Processing, Vol. 1, No. 2, 1987Google ScholarGoogle Scholar
  15. N. Y. Liang, "Contemporary Chinese Language Word Segmentation Standard Used for Information Processing," 1989, a draft proposalGoogle ScholarGoogle Scholar
  16. N. Y. Liang, "The Knowledge of Chinese Words Segmentation," Journal of Chinese Information Processing, Vol. 4, No. 2, 1990Google ScholarGoogle Scholar
  17. M. L. Lin, "The Grammatical and Semantic Properties of Reduplications," manuscript, CKIP group, Academia Sinica, 1991Google ScholarGoogle Scholar
  18. I. M. Liu, C. Z. Chang and S. C. Wang, "Frequency Count of Frequently Used Chinese Words," Taipei, Taiwan, Lucky Book Co., 1975Google ScholarGoogle Scholar
  19. R. P. Mo, Y. J. Yang, K. J. Chen and C. R. Huang, "Determinative-Measure Compounds in Mandarin Chinese: Their Formation Rules and Parser Implementation," Proc. of the 1991 R.O.C Computational Linguistics Conference, Taiwan, 1991Google ScholarGoogle Scholar
  20. R. Sproat and C. Shih, "A Statistical Method for Finding Word Boundaries in Chinese Text," Computer Processing of Chinese and Oriental Languages, Vol. 4, No. 4, March 1990Google ScholarGoogle Scholar
  21. C. L. Yeh and H. J. Lee, "Rule-based Word Identification for Mandarin Chinese Sentences --- A Unification Approach," Computer Processing of Chinese and Oriental Languages, Vol. 5, No. 2, March 1991Google ScholarGoogle Scholar
  1. Word identification for Mandarin Chinese sentences

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        COLING '92: Proceedings of the 14th conference on Computational linguistics - Volume 1
        August 1992
        418 pages

        Publisher

        Association for Computational Linguistics

        United States

        Publication History

        • Published: 23 August 1992

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate1,537of1,537submissions,100%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader