Abstract
This study presents a novel approach to error diagnosis of Chinese sentences for Chinese as second language (CSL) learners. A penalized probabilistic First-Order Inductive Learning (pFOIL) algorithm is presented for error diagnosis of Chinese sentences. The pFOIL algorithm integrates inductive logic programming (ILP), First-Order Inductive Learning (FOIL), and a penalized log-likelihood function for error diagnosis. This algorithm considers the uncertain, imperfect, and conflicting characteristics of Chinese sentences to infer error types and produce human-interpretable rules for further error correction. In a pFOIL algorithm, relation pattern background knowledge and quantized t-score background knowledge are proposed to characterize a sentence and then used for likelihood estimation. The relation pattern background knowledge captures the morphological, syntactic and semantic relations among the words in a sentence. One or two kinds of the extracted relations are then integrated into a pattern to characterize a sentence. The quantized t-score values are used to characterize various relations of a sentence for quantized t-score background knowledge representation. Afterwards, a decomposition-based testing mechanism which decomposes a sentence into background knowledge set needed for each error type is proposed to infer all potential error types and causes of the sentence. With the pFOIL method, not only the error types but also the error causes and positions can be provided for CSL learners. Experimental results reveal that the pFOIL method outperforms the C4.5, maximum entropy, and Naive Bayes classifiers in error classification.
- Al-Mekhlafi, K., Hu, X., and Zheng, Z. 2009. An approach to context-aware mobile Chinese language learning for foreign students. In Proceedings of the 8th International Conference on Mobile Business (ICMB’09). 340--346. Google ScholarDigital Library
- Brockett, C., Dolan, W. B., and Gamon, M. 2006. Correcting ESL errors using phrasal SMT techniques. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Conference for the Association for Computational Linguistics (COLING’06). 249--256. Google ScholarDigital Library
- Chang, J. 2004. Chinese Morphology in English-Chinese Contrast. XinZhou Teachers University, XinZhou.Google Scholar
- Chang, C. C. and Lee, Y. C. 2006. Implementation problems and instructional strategies of Web-based composition learning community Website. Global Chinese J. Comput. Educ. 4, 1, 2.Google Scholar
- Chang, P. C., Tseng, H. S., Jurafsky, D., and Manning, C. D. 2009. Discriminative reordering with Chinese grammatical relations features. In Proceedings of the 3rd Workshop on Syntax and Structure in Statistical Translation (SSST’09). 51--59. Google ScholarDigital Library
- Chen, H. R., Lin, Y. S., Huang, S. Y., and Shiau, S. Y. 2009. Content design for situated game-based learning: An exploration of Chinese language poetry learning. In Proceedings of the International Conference on Computational Intelligence and Software Engineering (ICCISE’09). 1--4.Google Scholar
- Chen, H. Y. and Liu, K. Y. 2008. Web-based synchronized multimedia lecture system design for teaching/learning Chinese as a second language. J. Comput. Educ. 50, 3, 693--702. Google ScholarDigital Library
- Chen, K. J. and Liu, S. H. 1992. Word identification for Mandarin Chinese sentences. In Proceedings of the International Conference on Computational Linguistics (ICCL’92). 101--107. Google ScholarDigital Library
- Chen, K. J. and Bai, M. H. 1998. Unknown word detection for Chinese by a corpus-based learning method. Comput. Linguist. Chinese Lang. Proc. 3, 1, 27--44.Google Scholar
- Chen, K. J. and Ma, W. Y. 2002. Unknown word extraction for Chinese documents. In Proceedings of the International Conference on Computational Linguistics (ICCL’02). 169--175. Google ScholarDigital Library
- Cheng, C. C. 1973. Computer-based Chinese teaching program at Illinois. J. Chinese Lang. Teach. Assn. 8, 75--79.Google Scholar
- Cheng, M. 1997. Error Analysis of 900 Sample Sentences-for Chinese Learner from English Speaking Countries (Chinese-English). Sinolingua, Beijing.Google Scholar
- Chodorow, M., Tetreault, J. R., and Han, N. R. 2007. Detection of grammatical errors involving prepositions. In Proceedings of the 4th ACL-SIGSEM Workshop on Prepositions (SIGSEM’07). 25--30. Google ScholarDigital Library
- CKIP. 1993. The analysis of Chinese category. Tech. rep. 93-05. Chinese Knowledge and Information Processing Group, Academia Sinica.Google Scholar
- CKIP. 1998. The content and introduction to Academia Sinica Balanced Corpus of Modern Chinese. Tech. rep. 95-02/98-04. Chinese Knowledge and Information Processing Group, Academia Sinica.Google Scholar
- CKIP. 2004. Chinese knowledge and information processing group. http://ckip.iis.sinica.edu.tw/CKIP/.Google Scholar
- Felice, R. D. and Pulman, S. G. 2007. Automatically acquiring models of preposition use. In Proceedings of the 4th ACL-SIGSEM Workshop on Prepositions (SIGSEM’07). 45--50. Google ScholarDigital Library
- Felice, R. D. and Pulman, S. G. 2008. A classifier-based approach to preposition and determiner error correction in L2 English. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING’08). 169--176. Google ScholarDigital Library
- Fotos, S. and Browne, C. M. 2004. New Perspectives on CALL for Second Language Classrooms. Lawrence Erlbaum Associates, Mahwah, NJ.Google Scholar
- Group, M. M. 2010. Top ten Internet languages - World Internet statistics. http://www.internetworldstats.com/stats7.htm.Google Scholar
- Han, N. R., Chodorow, M., and Leacock, C. 2006. Detecting errors in English article usage by non-native speakers. Nat. Lang. Eng. 12, 2, 115--129. Google ScholarDigital Library
- Heaneey, D. and Daly, C. 2004. Mass production of individual feedback. SIGCSE Bull. 36, 3, 117--121. Google ScholarDigital Library
- Howard, J. 2002. A Student Handbook for Chinese Function Words. The Chinese University of Hong Kong, Hong Kong.Google Scholar
- Huang, C. R., Chen, K. J., Chen, F. Y., Wei, W. J., and Chang, L. 1997. Design criteria and content of segmentation standard for Chinese information processing. Applied Linguist. 1, 92--100.Google Scholar
- Huang, C. R., Chang, R. Y., and Tsai, B. S. 2003. Chinese language education and the developing semantic Web: An introduction to Chinese-English bilingual ontology interface. In Proceedings of the 3rd International Conference of Internet Chinese Education (CICE’03). 24--26.Google Scholar
- Huang, C. R, Lu, C. J., and Chang R. Y. 2004a. A linguistic KnowledgeNet and future e-learning: The construction of “Adventures in Wen-Land”. In Language, Literature, and Technology, F. J. Lo Ed., Tsing Hua University Publisher, Xinzhu, Taiwan.Google Scholar
- Huang, C. R., Chang, R. Y., and Lee, S. B. 2004b. Sinica BOW (Bilingual Ontological Wordnet): Integration of bilingual WordNet and SUMO. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC’04). 26--28.Google Scholar
- Jiang, W. 2009. Acquisition of Word Order in Chinese as a Foreign Language. Mouton de Gruyter, Berlin, Germany.Google Scholar
- Kotani, K., Yoshimi, T., Kutsumi, T., and Sata, I. 2009. Automatic classification of language learner sentences into native-like or non-native-like based on word alignment distribution. In Advances in Technology, Education and Development. InTech, 451--460.Google Scholar
- Landwehr, N., Kersting, K., and Raedt, L. D. 2007. Integrating Naïve Bayes and FOIL. J. Mach. Learn. Res. 8, 481--507. Google ScholarDigital Library
- Leacock, C., Gamon, M., and Brockett, C. 2009. User input and interactions on Microsoft Research ESL Assistant. In Proceedings of the 4th Workshop on Innovative Use of Natural Language Processing for Building Educational Applications (NLP’09). 73--81. Google ScholarDigital Library
- Lee, J., and Seneff, S. 2008. Correcting misuse of verb forms. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT’08). 174--182.Google Scholar
- Lee, J., Zhoz, M., and Liu, X. H. 2007. Detection of non-native speaker sentences using machine-translated training data. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL’07). 93--96. Google ScholarDigital Library
- Li, C. N. and Thompson, S. A. 1981. Mandarin Chinese: A Functional Reference Grammar. University of California Press.Google Scholar
- Lin, Y. J., Huang, F. L., and Yu, M. S. 2002. A Chinese spelling error correction system. In Proceedings of the 7th Conference on Artificial Intelligence and Applications (AIA’02).Google Scholar
- Liu, C. L., Tien, K. W., Lai, M. H., Chuang, Y. H., and Wu, S. H. 2009a. Phonological and logographic influences on errors in written Chinese words. In Proceedings of the 7th Workshop on Asian Language Resources (ALR’09). 84--91. Google ScholarDigital Library
- Liu, C. L., Tien, K. W., Lai, M. H., Chuang, Y. H., and Wu, S. H. 2009b. Capturing errors in written Chinese words. In Proceedings of the International Joint Conference on Natural Language Processing (ACL-IJCNLP’09). 25--28. Google ScholarDigital Library
- Lo, F. J. and Chang, R. Y. 2000. The popular songs in Tang and Sung dynasty - Multi-guiding, multi-functional, and multi-media Chinese classic poetry learning system. In Proceedings of the 4th Global Conference on Computers in Education (GCCCE’00). 649--651.Google Scholar
- Ma, W. Y. and Chen, K. J. 2003. Introduction to CKIP Chinese word segmentation system for the first international Chinese word segmentation bakeoff. In Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing (SIGHAN’03). 168--171. Google ScholarDigital Library
- Ma, W. Y. and Chen, K. J. 2005. Design of CKIP Chinese word segmentation system. Chinese Oriental Lang. Inform. Process. Soc. 14, 3, 235--249.Google Scholar
- Manning, C. and Schutze, H. 1999. Foundation of Statistical Natural Language Processing. MIT Press, Cambridge, MA. Google ScholarDigital Library
- Muggleton, S. H. 1999. Inductive logic programming: Issues, results and the challenge of learning language in logic. Artif. Intell. 114, 1--2, 283--296. Google ScholarDigital Library
- Muggleton, S. H. and Raedt, D. L. 1994. Inductive logic programming: Theory and methods. J. Logic Program. 19, 20, 629--679.Google ScholarCross Ref
- Odlin, T. 1989. Language Transfer: Cross-Linguistic Influence in Language Learning. Cambridge University Press.Google ScholarCross Ref
- Packard, J. L. 2000. The Morphology of Chinese: A Linguistic and Cognitive Approach. Cambridge University Press.Google Scholar
- Paschke, A. and Schroder, M. 2007. Inductive logic programming for bioinformatics in Prova. In Proceedings of the 2nd Workshop on Data Mining in Bioinformatics (DMB’07).Google Scholar
- Qiu, Z. and Yang, L. 2001. Apparatus and methods for Chinese error check by means of dynamic programming and weighted classes. Patent, U.S. Ed., International Business Machines Corp., Armonk, NY.Google Scholar
- Quinlan, J. R. 1990. Learning logical definitions from relations. Mach. Learn. 5, 239--266. Google ScholarDigital Library
- Tsai, Y. F. and Chen, K. J. 2003. Context-rule model for POS tagging. In Proceedings of 17th Pacific Asia Conference on Language, Information and Computation (ACLIC’03). 146--151.Google Scholar
- Wikipedia. 2010a. Chinese as a second language. http://en.wikipedia.org/wiki/Chinese_as_a_second_language.Google Scholar
- Wikipedia. 2010b. Mandarin Chinese. http://en.wikipedia.org/wiki/Mandarin_Chinese.Google Scholar
- Wu, C. H., Liu, C. H., Harris, M., and Yu, L. C. 2010. Sentence correction incorporating relative position and parse template language models. IEEE Trans. Audio, Speech, Lang. Proc. 18, 6, 1170--1181. Google ScholarDigital Library
- Yao, T. C. 1996. A review of some Computer-Assisted Language Learning (CALL) software for Chinese. In Chinese Pedagogy: An Emerging Field, S. McGinnis Ed., Chinese Language Teachers Association Monograph #2, Foreign Language Publications, Columbus, OH.Google Scholar
- Zi, X., Liu, Z., Yuan, Z., Xie, M., and Huang, Y. 2009. A Chinese e-learning network platform based on Web2.0. In Proceedings of the International Conference on Information Management, Innovation Management and Industrial Engineering (IMIMIE’09). 3, 522--525. Google ScholarDigital Library
Index Terms
- Error Diagnosis of Chinese Sentences Using Inductive Learning Algorithm and Decomposition-Based Testing Mechanism
Recommendations
Inductive logic programming for corpus-based acquisition of semantic lexicons
ConLL '00: Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7In this paper, we propose an Inductive Logic Programming learning method which aims at automatically extracting special Noun-Verb (N-V) pairs from a corpus in order to build up semantic lexicons based on Pustejovsky's Generative Lexicon (GL) principles (...
Morphological segmentation of nouns using an inductive logic programming system
TELE-INFO'10: Proceedings of the 9th WSEAS international conference on Telecommunications and informaticsOne of the most explored fields of NLP is morphology. It is important because language is productive: in any given text we will encounter text words an word forms that we haven't seen before and that are not in a precompiled dictionary. The core task of ...
Recognition and tagging of compound verb groups in Czech
ConLL '00: Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7In Czech corpora compound verb groups are usually tagged in word-by-word manner. As a consequence, some of the morphological tags of particular components of the verb group lose their original meaning. We present a method for automatic recognition of ...
Comments