Article

Free Access

Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff

Authors:
Wei-Yun Ma

Institute of Information science, Academia Sinica

Institute of Information science, Academia Sinica
View Profile

,
Keh-Jiann Chen

Institute of Information science, Academia Sinica

Institute of Information science, Academia Sinica
View Profile

SIGHAN '03: Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17July 2003Pages 168–171https://doi.org/10.3115/1119250.1119276

Published:11 July 2003Publication History

SIGHAN '03: Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17

Pages 168–171

ABSTRACT

In this paper, we roughly described the procedures of our segmentation system, including the methods for resolving segmentation ambiguities and identifying unknown words. The CKIP group of Academia Sinica participated in testing on open and closed tracks of Beijing University (PK) and Hong Kong Cityu (HK). The evaluation results show our system performs very well in either HK open track or HK closed track and just acceptable in PK tracks. Some explanations and analysis are presented in this paper.

References

Chen, K. J. & S. H. Liu, 1992, "Word Identification for Mandarin Chinese Sentences," Proceedings of 14th Coling, pp. 101--107 Google ScholarDigital Library
Chen, C. J., M. H. Bai, & K. J. Chen, 1997," Category Guessing for Chinese Unknown Words," Proceedings of the Natural Language Processing Pacific Rim Symposium, 35-40, Thailand.Google Scholar
Chen, K. J. & Ming-Hong Bai, 1998, "Unknown Word Detection for Chinese by a Corpus-based Learning Method," international Journal of Computational linguistics and Chinese Language Processing, Vol. 3, #1, pp. 27--44Google Scholar
Chen, Keh-jiann, 1999," Lexical Analysis for Chinese- Difficulties and Possible Solutions", Journal of Chinese Institute of Engineers, Vol. 22. #5, pp. 561--571. Google ScholarDigital Library
Chen, K. J. & Wei-Yun Ma, 2002. Unknown Word Extraction for Chinese Documents. In Proceedings of COLING 2002, pages 169--175 Google ScholarDigital Library
Tseng, H. H. & K. J. Chen, 2002. Design of Chinese Morphological Analyzer. In Proceedings of SIGHAN, pages 49--55 Google ScholarDigital Library
Ma Wei-Yun & K. J. Chen, 2003. A bottom-up Merging Algorithm for Chinese Unknown Word Extraction. In Proceedings of SIGHAN Google ScholarDigital Library

Recommendations

Chinese word segmentation as morpheme-based lexical chunking

Chinese word segmentation plays an important role in many Chinese language processing tasks such as information retrieval and text mining. Recent research in Chinese word segmentation focuses on tagging approaches with either characters or words as ...
Read More
Splitting-merging model of Chinese word tokenization and segmentation

Currently, word tokenization and segmentation are still a hot topic in natural language processing, especially for languages like Chinese in which there is no blank space for word delimitation. Three major problems are faced: (1) tokenizing direction ...
Read More
Subword-based tagging for confidence-dependent Chinese word segmentation
COLING-ACL '06: Proceedings of the COLING/ACL on Main conference poster sessions

We proposed a subword-based tagging for Chinese word segmentation to improve the existing character-based tagging. The subword-based tagging was implemented using the maximum entropy (MaxEnt) and the conditional random fields (CRF) methods. We found ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGHAN '03: Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
July 2003
193 pages
Conference Chairs:
Qing Ma
Ryukoku University, Japan
,
Fei Xia
IBM, USA
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 11 July 2003
Qualifiers
- Article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 29
  Total Citations
  View Citations
- 652
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff

SIGHAN '03: Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17

ABSTRACT

References

Cited By

Recommendations

Chinese word segmentation as morpheme-based lexical chunking

Splitting-merging model of Chinese word tokenization and segmentation

Subword-based tagging for confidence-dependent Chinese word segmentation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff

SIGHAN '03: Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17

ABSTRACT

References

Cited By

Recommendations

Chinese word segmentation as morpheme-based lexical chunking

Splitting-merging model of Chinese word tokenization and segmentation

Subword-based tagging for confidence-dependent Chinese word segmentation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media