skip to main content
research-article

Error Diagnosis of Chinese Sentences Using Inductive Learning Algorithm and Decomposition-Based Testing Mechanism

Published:01 March 2012Publication History
Skip Abstract Section

Abstract

This study presents a novel approach to error diagnosis of Chinese sentences for Chinese as second language (CSL) learners. A penalized probabilistic First-Order Inductive Learning (pFOIL) algorithm is presented for error diagnosis of Chinese sentences. The pFOIL algorithm integrates inductive logic programming (ILP), First-Order Inductive Learning (FOIL), and a penalized log-likelihood function for error diagnosis. This algorithm considers the uncertain, imperfect, and conflicting characteristics of Chinese sentences to infer error types and produce human-interpretable rules for further error correction. In a pFOIL algorithm, relation pattern background knowledge and quantized t-score background knowledge are proposed to characterize a sentence and then used for likelihood estimation. The relation pattern background knowledge captures the morphological, syntactic and semantic relations among the words in a sentence. One or two kinds of the extracted relations are then integrated into a pattern to characterize a sentence. The quantized t-score values are used to characterize various relations of a sentence for quantized t-score background knowledge representation. Afterwards, a decomposition-based testing mechanism which decomposes a sentence into background knowledge set needed for each error type is proposed to infer all potential error types and causes of the sentence. With the pFOIL method, not only the error types but also the error causes and positions can be provided for CSL learners. Experimental results reveal that the pFOIL method outperforms the C4.5, maximum entropy, and Naive Bayes classifiers in error classification.

References

  1. Al-Mekhlafi, K., Hu, X., and Zheng, Z. 2009. An approach to context-aware mobile Chinese language learning for foreign students. In Proceedings of the 8th International Conference on Mobile Business (ICMB’09). 340--346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Brockett, C., Dolan, W. B., and Gamon, M. 2006. Correcting ESL errors using phrasal SMT techniques. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Conference for the Association for Computational Linguistics (COLING’06). 249--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chang, J. 2004. Chinese Morphology in English-Chinese Contrast. XinZhou Teachers University, XinZhou.Google ScholarGoogle Scholar
  4. Chang, C. C. and Lee, Y. C. 2006. Implementation problems and instructional strategies of Web-based composition learning community Website. Global Chinese J. Comput. Educ. 4, 1, 2.Google ScholarGoogle Scholar
  5. Chang, P. C., Tseng, H. S., Jurafsky, D., and Manning, C. D. 2009. Discriminative reordering with Chinese grammatical relations features. In Proceedings of the 3rd Workshop on Syntax and Structure in Statistical Translation (SSST’09). 51--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chen, H. R., Lin, Y. S., Huang, S. Y., and Shiau, S. Y. 2009. Content design for situated game-based learning: An exploration of Chinese language poetry learning. In Proceedings of the International Conference on Computational Intelligence and Software Engineering (ICCISE’09). 1--4.Google ScholarGoogle Scholar
  7. Chen, H. Y. and Liu, K. Y. 2008. Web-based synchronized multimedia lecture system design for teaching/learning Chinese as a second language. J. Comput. Educ. 50, 3, 693--702. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chen, K. J. and Liu, S. H. 1992. Word identification for Mandarin Chinese sentences. In Proceedings of the International Conference on Computational Linguistics (ICCL’92). 101--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chen, K. J. and Bai, M. H. 1998. Unknown word detection for Chinese by a corpus-based learning method. Comput. Linguist. Chinese Lang. Proc. 3, 1, 27--44.Google ScholarGoogle Scholar
  10. Chen, K. J. and Ma, W. Y. 2002. Unknown word extraction for Chinese documents. In Proceedings of the International Conference on Computational Linguistics (ICCL’02). 169--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cheng, C. C. 1973. Computer-based Chinese teaching program at Illinois. J. Chinese Lang. Teach. Assn. 8, 75--79.Google ScholarGoogle Scholar
  12. Cheng, M. 1997. Error Analysis of 900 Sample Sentences-for Chinese Learner from English Speaking Countries (Chinese-English). Sinolingua, Beijing.Google ScholarGoogle Scholar
  13. Chodorow, M., Tetreault, J. R., and Han, N. R. 2007. Detection of grammatical errors involving prepositions. In Proceedings of the 4th ACL-SIGSEM Workshop on Prepositions (SIGSEM’07). 25--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. CKIP. 1993. The analysis of Chinese category. Tech. rep. 93-05. Chinese Knowledge and Information Processing Group, Academia Sinica.Google ScholarGoogle Scholar
  15. CKIP. 1998. The content and introduction to Academia Sinica Balanced Corpus of Modern Chinese. Tech. rep. 95-02/98-04. Chinese Knowledge and Information Processing Group, Academia Sinica.Google ScholarGoogle Scholar
  16. CKIP. 2004. Chinese knowledge and information processing group. http://ckip.iis.sinica.edu.tw/CKIP/.Google ScholarGoogle Scholar
  17. Felice, R. D. and Pulman, S. G. 2007. Automatically acquiring models of preposition use. In Proceedings of the 4th ACL-SIGSEM Workshop on Prepositions (SIGSEM’07). 45--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Felice, R. D. and Pulman, S. G. 2008. A classifier-based approach to preposition and determiner error correction in L2 English. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING’08). 169--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Fotos, S. and Browne, C. M. 2004. New Perspectives on CALL for Second Language Classrooms. Lawrence Erlbaum Associates, Mahwah, NJ.Google ScholarGoogle Scholar
  20. Group, M. M. 2010. Top ten Internet languages - World Internet statistics. http://www.internetworldstats.com/stats7.htm.Google ScholarGoogle Scholar
  21. Han, N. R., Chodorow, M., and Leacock, C. 2006. Detecting errors in English article usage by non-native speakers. Nat. Lang. Eng. 12, 2, 115--129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Heaneey, D. and Daly, C. 2004. Mass production of individual feedback. SIGCSE Bull. 36, 3, 117--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Howard, J. 2002. A Student Handbook for Chinese Function Words. The Chinese University of Hong Kong, Hong Kong.Google ScholarGoogle Scholar
  24. Huang, C. R., Chen, K. J., Chen, F. Y., Wei, W. J., and Chang, L. 1997. Design criteria and content of segmentation standard for Chinese information processing. Applied Linguist. 1, 92--100.Google ScholarGoogle Scholar
  25. Huang, C. R., Chang, R. Y., and Tsai, B. S. 2003. Chinese language education and the developing semantic Web: An introduction to Chinese-English bilingual ontology interface. In Proceedings of the 3rd International Conference of Internet Chinese Education (CICE’03). 24--26.Google ScholarGoogle Scholar
  26. Huang, C. R, Lu, C. J., and Chang R. Y. 2004a. A linguistic KnowledgeNet and future e-learning: The construction of “Adventures in Wen-Land”. In Language, Literature, and Technology, F. J. Lo Ed., Tsing Hua University Publisher, Xinzhu, Taiwan.Google ScholarGoogle Scholar
  27. Huang, C. R., Chang, R. Y., and Lee, S. B. 2004b. Sinica BOW (Bilingual Ontological Wordnet): Integration of bilingual WordNet and SUMO. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC’04). 26--28.Google ScholarGoogle Scholar
  28. Jiang, W. 2009. Acquisition of Word Order in Chinese as a Foreign Language. Mouton de Gruyter, Berlin, Germany.Google ScholarGoogle Scholar
  29. Kotani, K., Yoshimi, T., Kutsumi, T., and Sata, I. 2009. Automatic classification of language learner sentences into native-like or non-native-like based on word alignment distribution. In Advances in Technology, Education and Development. InTech, 451--460.Google ScholarGoogle Scholar
  30. Landwehr, N., Kersting, K., and Raedt, L. D. 2007. Integrating Naïve Bayes and FOIL. J. Mach. Learn. Res. 8, 481--507. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Leacock, C., Gamon, M., and Brockett, C. 2009. User input and interactions on Microsoft Research ESL Assistant. In Proceedings of the 4th Workshop on Innovative Use of Natural Language Processing for Building Educational Applications (NLP’09). 73--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Lee, J., and Seneff, S. 2008. Correcting misuse of verb forms. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT’08). 174--182.Google ScholarGoogle Scholar
  33. Lee, J., Zhoz, M., and Liu, X. H. 2007. Detection of non-native speaker sentences using machine-translated training data. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL’07). 93--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Li, C. N. and Thompson, S. A. 1981. Mandarin Chinese: A Functional Reference Grammar. University of California Press.Google ScholarGoogle Scholar
  35. Lin, Y. J., Huang, F. L., and Yu, M. S. 2002. A Chinese spelling error correction system. In Proceedings of the 7th Conference on Artificial Intelligence and Applications (AIA’02).Google ScholarGoogle Scholar
  36. Liu, C. L., Tien, K. W., Lai, M. H., Chuang, Y. H., and Wu, S. H. 2009a. Phonological and logographic influences on errors in written Chinese words. In Proceedings of the 7th Workshop on Asian Language Resources (ALR’09). 84--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Liu, C. L., Tien, K. W., Lai, M. H., Chuang, Y. H., and Wu, S. H. 2009b. Capturing errors in written Chinese words. In Proceedings of the International Joint Conference on Natural Language Processing (ACL-IJCNLP’09). 25--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Lo, F. J. and Chang, R. Y. 2000. The popular songs in Tang and Sung dynasty - Multi-guiding, multi-functional, and multi-media Chinese classic poetry learning system. In Proceedings of the 4th Global Conference on Computers in Education (GCCCE’00). 649--651.Google ScholarGoogle Scholar
  39. Ma, W. Y. and Chen, K. J. 2003. Introduction to CKIP Chinese word segmentation system for the first international Chinese word segmentation bakeoff. In Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing (SIGHAN’03). 168--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Ma, W. Y. and Chen, K. J. 2005. Design of CKIP Chinese word segmentation system. Chinese Oriental Lang. Inform. Process. Soc. 14, 3, 235--249.Google ScholarGoogle Scholar
  41. Manning, C. and Schutze, H. 1999. Foundation of Statistical Natural Language Processing. MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Muggleton, S. H. 1999. Inductive logic programming: Issues, results and the challenge of learning language in logic. Artif. Intell. 114, 1--2, 283--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Muggleton, S. H. and Raedt, D. L. 1994. Inductive logic programming: Theory and methods. J. Logic Program. 19, 20, 629--679.Google ScholarGoogle ScholarCross RefCross Ref
  44. Odlin, T. 1989. Language Transfer: Cross-Linguistic Influence in Language Learning. Cambridge University Press.Google ScholarGoogle ScholarCross RefCross Ref
  45. Packard, J. L. 2000. The Morphology of Chinese: A Linguistic and Cognitive Approach. Cambridge University Press.Google ScholarGoogle Scholar
  46. Paschke, A. and Schroder, M. 2007. Inductive logic programming for bioinformatics in Prova. In Proceedings of the 2nd Workshop on Data Mining in Bioinformatics (DMB’07).Google ScholarGoogle Scholar
  47. Qiu, Z. and Yang, L. 2001. Apparatus and methods for Chinese error check by means of dynamic programming and weighted classes. Patent, U.S. Ed., International Business Machines Corp., Armonk, NY.Google ScholarGoogle Scholar
  48. Quinlan, J. R. 1990. Learning logical definitions from relations. Mach. Learn. 5, 239--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Tsai, Y. F. and Chen, K. J. 2003. Context-rule model for POS tagging. In Proceedings of 17th Pacific Asia Conference on Language, Information and Computation (ACLIC’03). 146--151.Google ScholarGoogle Scholar
  50. Wikipedia. 2010a. Chinese as a second language. http://en.wikipedia.org/wiki/Chinese_as_a_second_language.Google ScholarGoogle Scholar
  51. Wikipedia. 2010b. Mandarin Chinese. http://en.wikipedia.org/wiki/Mandarin_Chinese.Google ScholarGoogle Scholar
  52. Wu, C. H., Liu, C. H., Harris, M., and Yu, L. C. 2010. Sentence correction incorporating relative position and parse template language models. IEEE Trans. Audio, Speech, Lang. Proc. 18, 6, 1170--1181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Yao, T. C. 1996. A review of some Computer-Assisted Language Learning (CALL) software for Chinese. In Chinese Pedagogy: An Emerging Field, S. McGinnis Ed., Chinese Language Teachers Association Monograph #2, Foreign Language Publications, Columbus, OH.Google ScholarGoogle Scholar
  54. Zi, X., Liu, Z., Yuan, Z., Xie, M., and Huang, Y. 2009. A Chinese e-learning network platform based on Web2.0. In Proceedings of the International Conference on Information Management, Innovation Management and Industrial Engineering (IMIMIE’09). 3, 522--525. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Error Diagnosis of Chinese Sentences Using Inductive Learning Algorithm and Decomposition-Based Testing Mechanism

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian Language Information Processing
      ACM Transactions on Asian Language Information Processing  Volume 11, Issue 1
      March 2012
      72 pages
      ISSN:1530-0226
      EISSN:1558-3430
      DOI:10.1145/2090176
      Issue’s Table of Contents

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 March 2012
      • Accepted: 1 June 2011
      • Revised: 1 April 2011
      • Received: 1 November 2010
      Published in talip Volume 11, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader