Skip to main content
Top

2020 | OriginalPaper | Chapter

Automatic Classification and Comparison of Words by Difficulty

Authors : Shengyao Zhang, Qi Jia, Libin Shen, Yinggong Zhao

Published in: Neural Information Processing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Vocabulary knowledge is essential for both native and foreign language learning. Classifying words by difficulty helps students develop better in different stages of study and gives teachers the standard to adhere to when preparing tutorials. However, classifying word difficulty is time-consuming and labor-intensive. In this paper, we propose to classify and compare the word difficulty by analyzing multi-faceted features, including intra-word, syntactic and semantic features. The results show that our method is robust against different language environments.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
The Corpus of Contemporary American English: https://​www.​english-corpora.​org/​coca/​.
 
2
CEFR defines 6 difficulty levels {A1, A2, B1, B2, C1, C2} where A1 represents the minimum difficulty and C2 represents the highest difficulty.
 
Literature
1.
go back to reference Little, D.: The common European framework of reference for languages: a research agenda. Lang. Teach. 44(3), 381–393 (2011)CrossRef Little, D.: The common European framework of reference for languages: a research agenda. Lang. Teach. 44(3), 381–393 (2011)CrossRef
2.
go back to reference Breland, H.M.: Word frequency and word difficulty: a comparison of counts in four corpora. Psychol. Sci. 7(2), 96–99 (1996)CrossRef Breland, H.M.: Word frequency and word difficulty: a comparison of counts in four corpora. Psychol. Sci. 7(2), 96–99 (1996)CrossRef
3.
go back to reference Hiebert, E., Scott, J., Castaneda, R., Spichtig, A.: An analysis of the features of words that influence vocabulary difficulty. Educ. Sci. 9(1), 8 (2019)CrossRef Hiebert, E., Scott, J., Castaneda, R., Spichtig, A.: An analysis of the features of words that influence vocabulary difficulty. Educ. Sci. 9(1), 8 (2019)CrossRef
4.
go back to reference Koirala, C.: The word frequency effect on second language vocabulary learning. In: Critical CALL-Proceedings of the 2015 EUROCALL Conference, Padova, Italy, p. 318. Research-publishing.net (2015) Koirala, C.: The word frequency effect on second language vocabulary learning. In: Critical CALL-Proceedings of the 2015 EUROCALL Conference, Padova, Italy, p. 318. Research-publishing.net (2015)
5.
go back to reference Culligan, B.: A comparison of three test formats to assess word difficulty. Lang. Test. 32(4), 503–520 (2015)CrossRef Culligan, B.: A comparison of three test formats to assess word difficulty. Lang. Test. 32(4), 503–520 (2015)CrossRef
6.
go back to reference Schuster, S., Manning, C.D.: Enhanced English universal dependencies: an improved representation for natural language understanding tasks. In: LREC, Portorož, Slovenia, pp. 23–28 (2016) Schuster, S., Manning, C.D.: Enhanced English universal dependencies: an improved representation for natural language understanding tasks. In: LREC, Portorož, Slovenia, pp. 23–28 (2016)
7.
go back to reference Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 173–180. Association for Computational Linguistics (2003) Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 173–180. Association for Computational Linguistics (2003)
8.
go back to reference Evan, S.: The New York Times Annotated Corpus LDC2008T19. DVD. Linguistic Data Consortium, Philadelphia (2008) Evan, S.: The New York Times Annotated Corpus LDC2008T19. DVD. Linguistic Data Consortium, Philadelphia (2008)
9.
go back to reference Lahiri, S.: Complexity of word collocation networks: a preliminary structural analysis. In: Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 96–105. Association for Computational Linguistics, Gothenburg, April 2014. http://www.aclweb.org/anthology/E14-3011 Lahiri, S.: Complexity of word collocation networks: a preliminary structural analysis. In: Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 96–105. Association for Computational Linguistics, Gothenburg, April 2014. http://​www.​aclweb.​org/​anthology/​E14-3011
10.
go back to reference Kominek, J., Black, A.W.: The CMU arctic speech databases. In: Proceedings of the 5th ISCA Speech Synthesis Workshop (SSW5), Pittsburgh, PA, pp. 223–224 (2004) Kominek, J., Black, A.W.: The CMU arctic speech databases. In: Proceedings of the 5th ISCA Speech Synthesis Workshop (SSW5), Pittsburgh, PA, pp. 223–224 (2004)
11.
go back to reference Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: MT Summit, vol. 5, pp. 79–86. Citeseer (2005) Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: MT Summit, vol. 5, pp. 79–86. Citeseer (2005)
12.
go back to reference Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The Stanford Corenlp natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014) Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The Stanford Corenlp natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
13.
go back to reference Nakanishi, K., Kobayashi, N., Shiina, H., Kitagawa, F.: Estimating word difficulty using semantic descriptions in dictionaries and web data. In: 2012 IIAI International Conference on Advanced Applied Informatics, pp. 324–329. IEEE (2012) Nakanishi, K., Kobayashi, N., Shiina, H., Kitagawa, F.: Estimating word difficulty using semantic descriptions in dictionaries and web data. In: 2012 IIAI International Conference on Advanced Applied Informatics, pp. 324–329. IEEE (2012)
Metadata
Title
Automatic Classification and Comparison of Words by Difficulty
Authors
Shengyao Zhang
Qi Jia
Libin Shen
Yinggong Zhao
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-63820-7_72

Premium Partner