Article

Free Access

Word identification for Mandarin Chinese sentences

Authors:
Keh-Jiann Chen

Institute of Information Science, Academia Sinica

Institute of Information Science, Academia Sinica
View Profile

,
Shing-Huan Liu

Institute of Information Science, Academia Sinica

Institute of Information Science, Academia Sinica
View Profile

COLING '92: Proceedings of the 14th conference on Computational linguistics - Volume 1August 1992Pages 101–107https://doi.org/10.3115/992066.992085

Published:23 August 1992Publication History

COLING '92: Proceedings of the 14th conference on Computational linguistics - Volume 1

Pages 101–107

ABSTRACT

Chinese sentences are composed with string of characters without blanks to mark words. However the basic unit for sentence parsing and understanding is word. Therefore the first step of processing Chinese sentences is to identify the words. The difficulties of identifying words include (1) the identification of complex words, such as Determinative-Measure, reduplications, derived words etc., (2) the identification of proper names, (3) resolving the ambiguous segmentations. In this paper, we propose the possible solutions for the above difficulties. We adopt a matching algorithm with 6 different heuristic rules to resolve the ambiguities and achieve an 99.77% of the success rate. The statistical data supports that the maximal matching algorithm is the most effective heuristics.

References

J. S. Chang, "A Multiple-Corpus Approach to Identi cation of Chinese Surname-Names." Proc. of Natural Language Processing Pacific Rim Symposium, Singapore, 1991Google Scholar
J. S. Chang, J. I. Chang and S. D. Chen, "A Method of Constraint Satisfaction and Statistical Optimization for Chinese Word Segmentation," Proc. of the 1991 R. O. C. Computational Linguistics Conference, Taiwan, 1991Google Scholar
Y. R. Chao, A Grammar of Spoken Chinese, University of California Press, California, 1968Google Scholar
K. J. Chen, C. J. Chen and L. J. Lee, "Analysis and Research in Chinese Sentences---Segmentation and Construction," Technical Report, TR-86-004, Nankang, Academia Sinica, 1986Google Scholar
K. J. Chen and C. R. Huang, "Information-based Case Grammar," COLING-90, Vol. 2, p. 54--p. 59 Google ScholarDigital Library
K. J. Chen et al, "Compounds and Parsing in Mandarin Chinese," Proc. of National Computer Symposium, 1987Google Scholar
G. Y. Chen, "A-not-A Questions in Chinese," manuscript, CKIP group, Academia Sinica, Taipei, 1991Google Scholar
C. K. Fan and W. H. Tsai, "Automatic Word Identification in Chinese Sentences by the Relaxation Technique," Computer Processing of Chinese and Oriental Languages, Vol. 4, No. 1, November 1988Google Scholar
R. Garside, G. Leech and G. Sampson, "The Computational Analysis of English --- a Corpusbased Approach," Longman Group UK Limited, 1987Google Scholar
W. H. Ho, "Automatic Recognition of Chinese Words," Master Thesis, National Taiwan Institute of Technology, Taipei, Taiwan, 1983Google Scholar
W. M. Hong, C. R. Huang, T. Z. Tang and K. J. Chen, "The Morphological Rules of Chinese Derivative Words," To be presented at the 1991 International Conference on Teaching Chinese as a Second Language, December, 1991, TaipeiGoogle Scholar
C. Y. Jie, Y. Liu and N. Y. Liang, "On Methods of Chinese Automatic Segmentation," Journal of Chinese Information Processing, Vol. 3, No. 1, 1989Google Scholar
B. I. Li, S. Lien, C. F. Sun and M. S. Sun, "A Maximal Matching Automatic Chinese Word Segmentation Algorithm Using Corpus Tagging for Ambiguity Resolution," Proc. of the 1991 R. O. C Computational Linguistics Conference, Taiwan, 1991Google Scholar
N. Y. Liang, "Automatic Chinese Text Word Segmentation System --- CDWS". Journal of Chinese Information Processing, Vol. 1, No. 2, 1987Google Scholar
N. Y. Liang, "Contemporary Chinese Language Word Segmentation Standard Used for Information Processing," 1989, a draft proposalGoogle Scholar
N. Y. Liang, "The Knowledge of Chinese Words Segmentation," Journal of Chinese Information Processing, Vol. 4, No. 2, 1990Google Scholar
M. L. Lin, "The Grammatical and Semantic Properties of Reduplications," manuscript, CKIP group, Academia Sinica, 1991Google Scholar
I. M. Liu, C. Z. Chang and S. C. Wang, "Frequency Count of Frequently Used Chinese Words," Taipei, Taiwan, Lucky Book Co., 1975Google Scholar
R. P. Mo, Y. J. Yang, K. J. Chen and C. R. Huang, "Determinative-Measure Compounds in Mandarin Chinese: Their Formation Rules and Parser Implementation," Proc. of the 1991 R.O.C Computational Linguistics Conference, Taiwan, 1991Google Scholar
R. Sproat and C. Shih, "A Statistical Method for Finding Word Boundaries in Chinese Text," Computer Processing of Chinese and Oriental Languages, Vol. 4, No. 4, March 1990Google Scholar
C. L. Yeh and H. J. Lee, "Rule-based Word Identification for Mandarin Chinese Sentences --- A Unification Approach," Computer Processing of Chinese and Oriental Languages, Vol. 5, No. 2, March 1991Google Scholar

Word identification for Mandarin Chinese sentences
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

A parsing method for identifying words in mandarin Chinese sentences
IJCAI'91: Proceedings of the 12th international joint conference on Artificial intelligence - Volume 2

This paper presents a parsing method for identifying words in mandarin Chinese sentences. The identification system is composed of a Tomita's parser augmented with tests originally a part of the English-Chinese machine translation system CCL-ECMT ...
Read More
Resolving ambiguities in Mandarin Chinese: implications for machine translation
Read More
Recognizing unregistered names for Mandarin word identification
COLING '92: Proceedings of the 14th conference on Computational linguistics - Volume 4

Word Identification has been an important and active issue in Chinese Natural Language Processing. In this paper, a new mechanism, based on the concept of sublanguage, is proposed for identifying unknown words, especially personal names, in Chinese ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
COLING '92: Proceedings of the 14th conference on Computational linguistics - Volume 1
August 1992
418 pages
Program Chair:
Antonio Zampolli
Pisa
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 23 August 1992
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,537of1,537submissions,100%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 46
  Total Citations
  View Citations
- 1,718
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Word identification for Mandarin Chinese sentences

COLING '92: Proceedings of the 14th conference on Computational linguistics - Volume 1

ABSTRACT

References

Cited By

Recommendations

A parsing method for identifying words in mandarin Chinese sentences

Resolving ambiguities in Mandarin Chinese: implications for machine translation

Recognizing unregistered names for Mandarin word identification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Word identification for Mandarin Chinese sentences

COLING '92: Proceedings of the 14th conference on Computational linguistics - Volume 1

ABSTRACT

References

Cited By

Recommendations

A parsing method for identifying words in mandarin Chinese sentences

Resolving ambiguities in Mandarin Chinese: implications for machine translation

Recognizing unregistered names for Mandarin word identification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media