Skip to main content
Top

2018 | OriginalPaper | Chapter

Exploring the Impact of Linguistic Features for Chinese Readability Assessment

Authors : Xinying Qiu, Kebin Deng, Likun Qiu, Xin Wang

Published in: Natural Language Processing and Chinese Computing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Readability assessment plays an important role in selecting proper reading materials for language learners, and is applicable for many NLP tasks such as text simplification and document summarization. In this study, we designed 100 factors to systematically evaluate the impact of four levels of linguistic features (shallow, POS, syntactic, discourse) on predicting text difficulty for L1 Chinese learners. We further selected 22 significant features with regression. Our experiment results show that the 100-feature model and the 22-feature model both achieve the same predictive accuracies as the BOW baseline for the majority of the text difficulty levels, and significantly better than baseline for the others. Using 18 out of the 22 features, we derived one of the first readability formulas for contemporary simplified Chinese language.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
3
This 18-feature formula and another 22-feature formula not presented in this paper are pending patent application.
 
Literature
1.
go back to reference Feng, L.: Automatic readability assessment. Ph.D. thesis. The City University of New York (2010) Feng, L.: Automatic readability assessment. Ph.D. thesis. The City University of New York (2010)
2.
go back to reference Todirascu, A., et al.: Are cohesive features relevant for text readability evaluation? In: Proceedings of 26th International Conference on Computational Linguistics (COLING 2016), pp. 987–997 (2016) Todirascu, A., et al.: Are cohesive features relevant for text readability evaluation? In: Proceedings of 26th International Conference on Computational Linguistics (COLING 2016), pp. 987–997 (2016)
3.
go back to reference Sung, Y.T., et al.: Leveling L2 texts through readability: combining multilevel linguistic features with the CEFR. Modern Lang. J. 99(2), 371–391 (2015)CrossRef Sung, Y.T., et al.: Leveling L2 texts through readability: combining multilevel linguistic features with the CEFR. Modern Lang. J. 99(2), 371–391 (2015)CrossRef
4.
go back to reference Jiang, Z., et al.: A graph-based readability assessment method using word coupling. In: Proceedings of the 2015 Conference on Empirical Methods on Natural Language Processing (EMNLP 2015), pp. 411–420 (2015) Jiang, Z., et al.: A graph-based readability assessment method using word coupling. In: Proceedings of the 2015 Conference on Empirical Methods on Natural Language Processing (EMNLP 2015), pp. 411–420 (2015)
5.
go back to reference van Schijndel, M., Schuler, W.: Addressing surprisal deficiencies in reading time models. In: Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC 2016), pp. 32–37 (2016) van Schijndel, M., Schuler, W.: Addressing surprisal deficiencies in reading time models. In: Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC 2016), pp. 32–37 (2016)
6.
go back to reference Pilán, I., et al.: Predicting proficiency levels in learner writings by transferring a linguistic complexity model from expert-written coursebooks. In: Proceedings of 26th International Conference on Computational Linguistics (COLING 2016), pp. 2101–2111 (2016) Pilán, I., et al.: Predicting proficiency levels in learner writings by transferring a linguistic complexity model from expert-written coursebooks. In: Proceedings of 26th International Conference on Computational Linguistics (COLING 2016), pp. 2101–2111 (2016)
7.
go back to reference Hancke, J., Vajjala, S., Meurers, D.: Readability classification for German using lexical, syntactic, and morphological features. In: Proceedings of 24th International Conference on Computational Linguistics (COLING 2012), pp. 1063–1080 (2012) Hancke, J., Vajjala, S., Meurers, D.: Readability classification for German using lexical, syntactic, and morphological features. In: Proceedings of 24th International Conference on Computational Linguistics (COLING 2012), pp. 1063–1080 (2012)
8.
go back to reference Sato, S., et al.: Automatic assessment of Japanese text readability based on a textbook corpus. In: Proceedings of the 6th Language Resources and Evaluation Conference (LREC 2008), pp. 654–660 (2008) Sato, S., et al.: Automatic assessment of Japanese text readability based on a textbook corpus. In: Proceedings of the 6th Language Resources and Evaluation Conference (LREC 2008), pp. 654–660 (2008)
9.
go back to reference Yang, S.: A readability formula for Chinese language. Ph.D. thesis. University of Wisconsin–Madison (1970) Yang, S.: A readability formula for Chinese language. Ph.D. thesis. University of Wisconsin–Madison (1970)
10.
go back to reference 荆溪昱.中学国文教材的适读性研究: 适读年级值的推估.教育研究资讯, 第3期 (1995) 荆溪昱.中学国文教材的适读性研究: 适读年级值的推估.教育研究资讯, 第3期 (1995)
11.
go back to reference Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32(3), 221 (1948)CrossRef Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32(3), 221 (1948)CrossRef
13.
go back to reference Kincaid, J.P., et al.: Derivation of new readability formulas for navy enlisted personnel. Naval Technical Training Command Millington TN Research Branch (1975) Kincaid, J.P., et al.: Derivation of new readability formulas for navy enlisted personnel. Naval Technical Training Command Millington TN Research Branch (1975)
14.
go back to reference 孙汉银. 中文易读性公式, 北京师范大学 (1992) 孙汉银. 中文易读性公式, 北京师范大学 (1992)
15.
go back to reference 王蕾, 初中级日韩留学生文本可读性公式初探, 北京语言大学 (2005) 王蕾, 初中级日韩留学生文本可读性公式初探, 北京语言大学 (2005)
16.
go back to reference 杨金余, 高级汉语精读教材语言难度测定研究, 北京大学 (2008) 杨金余, 高级汉语精读教材语言难度测定研究, 北京大学 (2008)
17.
go back to reference 左虹,朱勇. 中级欧美留学生汉语文本可读性公式研究. 世界汉语教学 2, 263–276 (2014) 左虹,朱勇. 中级欧美留学生汉语文本可读性公式研究. 世界汉语教学 2, 263–276 (2014)
18.
go back to reference Qiu, L., et al.: Multi-view Chinese treebanking. In: Proceedings of 25th International Conference on Computational Linguistics (COLING 2014), pp. 257–268 (2014) Qiu, L., et al.: Multi-view Chinese treebanking. In: Proceedings of 25th International Conference on Computational Linguistics (COLING 2014), pp. 257–268 (2014)
Metadata
Title
Exploring the Impact of Linguistic Features for Chinese Readability Assessment
Authors
Xinying Qiu
Kebin Deng
Likun Qiu
Xin Wang
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-73618-1_67

Premium Partner