skip to main content
10.5555/1613715.1613742dlproceedingsArticle/Chapter ViewAbstractPublication PagesemnlpConference Proceedingsconference-collections
research-article
Free Access

Revisiting readability: a unified framework for predicting text quality

Published:25 October 2008Publication History

ABSTRACT

We combine lexical, syntactic, and discourse features to produce a highly predictive model of human readers' judgments of text readability. This is the first study to take into account such a variety of linguistic factors and the first to empirically demonstrate that discourse relations are strongly associated with the perceived quality of text. We show that various surface metrics generally expected to be related to readability are not very good predictors of readability judgments in our Wall Street Journal corpus. We also establish that readability predictors behave differently depending on the task: predicting text readability or ranking the readability. Our experiments indicate that discourse relations are the one class of features that exhibits robustness across these two tasks.

References

  1. Y. Attali and J. Burstein. 2006. Automated essay scoring with e-rater v.2. The Journal of Technology, Learning and Assessment, 4(3).Google ScholarGoogle Scholar
  2. A. Bailin and A. Grafstein. 2001. The linguistic assumptions underlying readability formulae a critique. Language and Communication, 21(3):285--301.Google ScholarGoogle ScholarCross RefCross Ref
  3. R. Barzilay and M. Lapata. 2008. Modeling local coherence: An entity-based approach. Computational Linguistics, 34(1):1--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Carlson, D. Marcu, and M. E. Okurowski. 2001. Building a discourse-tagged corpus in the framework of rhetorical structure theory. In Proceedings of the Second SIGdial Workshop, pages 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Coleman and TL Liau. 1975. A computer readability formula designed for machine scoring. Journal of Applied Psychology, 60(2):283--284.Google ScholarGoogle ScholarCross RefCross Ref
  6. K. Collins-Thompson and J. Callan. 2004. A language modeling approach to predicting reading difficulty. In Proceedings of HLT/NAACL'04.Google ScholarGoogle Scholar
  7. Noemie Elhadad and Komal Sutaria. 2007. Mining a lexicon of technical terms and lay equivalents. In Biological, translational, and clinical language processing, pages 49--56, Prague, Czech Republic. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Elsner and E. Charniak. 2008. Coreference-inspired coherence modeling. In Proceedings of ACL-HLT'08, (short paper). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. Gibson. 1998. Linguistic complexity: locality of syntactic dependencies. Cognition, 68:1--76.Google ScholarGoogle ScholarCross RefCross Ref
  10. P. Gordon, B. Grosz, and L. Gilliom. 1993. Pronouns, names, and the centering of attention in discourse. Cognitive Science, 17:311--347.Google ScholarGoogle ScholarCross RefCross Ref
  11. B. Grosz, A. Joshi, and S. Weinstein. 1995. Centering: a framework for modelling the local coherence of discourse. Computational Linguistics, 21(2):203--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Robert Gunning. 1952. The technique of clear writing. McGraw-Hill; Fouth Printing edition.Google ScholarGoogle Scholar
  13. Michael A. K. Halliday and Ruqaiya Hasan. 1976. Cohesion in English. Longman Group Ltd, London, U.K.Google ScholarGoogle Scholar
  14. M. Heilman, K. Collins-Thompson, J. Callan, and M. Eskenazi. 2007. Combining Lexical and Grammatical Features to Improve Readability Measures for First and Second Language Texts. Proceedings of NAACL HLT, pages 460--467.Google ScholarGoogle Scholar
  15. D. Higgins, J. Burstein, D. Marcu, and C. Gentile. 2004. Evaluating multiple aspects of coherence in student essays. In Proceedings of HLT/NAACL'04.Google ScholarGoogle Scholar
  16. N. Karamanis, M. Poesio, C. Mellish, and J. Oberlander. (to appear). Evaluating centering for information ordering using corpora. Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. JP Kincaid. 1975. Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel.Google ScholarGoogle Scholar
  18. A. Knott, J. Oberlander, M. ODonnell, and C. Mellish. 2001. Beyond elaboration: The interaction of relations and focus in coherent text. Text representation: linguistic and psycholinguistic aspects, pages 181--196.Google ScholarGoogle Scholar
  19. E. Krahmer and M. Theune. 2002. Efficient context-sensitive generation of referring expressions. In K. van Deemter and R. Kibble, editors, Information Sharing: Reference and Presupposition in Language Generation and Interpretation, pages 223--264. CSLI Publications.Google ScholarGoogle Scholar
  20. M. Lapata. 2006. Automatic evaluation of information ordering: Kendalls tau. Computational Linguistics, 32(4):471--484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. W. Mann and S. Thompson. 1988. Rhetorical structure theory: Towards a functional theory of text organization. Text, 8.Google ScholarGoogle Scholar
  22. M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz. 1994. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. H. McLaughlin. 1969. SMOG grading: A new readability formula. Journal of Reading, 12(8):639--646.Google ScholarGoogle Scholar
  24. E. Miltsakaki and K. Kukich. 2000. The role of centering theory's rough-shift in the teaching and evaluation of writing skills. In Proceedings of ACL'00, pages 408--415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Nenkova and K. McKeown. 2003. References to named entities: a corpus study. In Proceedings of HLT/NAACL 2003 (short paper). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. Pitler, M. Raghupathy, H. Mehta, A. Nenkova, A. Lee, and A. Joshi. 2008. Easily identifiable discourse relations. In Coling 2008: Companion volume: Posters and Demonstrations, pages 85--88, Manchester, UK, August.Google ScholarGoogle Scholar
  27. M. Poesio and R. Vieira. 1998. A corpus-based investigation of definite description use. Computational Linguistics, 24(2):183--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Prasad, N. Dinesh, A. Lee, E. Miltsakaki, L. Robaldo, A. Joshi, and B. Webber. 2008. The penn discourse treebank 2.0. In Proceedings of LREC'08.Google ScholarGoogle Scholar
  29. KA Schriver. 1989. Evaluating text quality: the continuum from text-focused toreader-focused methods. Professional Communication, IEEE Transactions on, 32(4):238--255.Google ScholarGoogle ScholarCross RefCross Ref
  30. S. Schwarm and M. Ostendorf. 2005. Reading level assessment using support vector machines and statistical language models. In Proceedings of ACL'05, pages 523--530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Si and J. Callan. 2001. A statistical model for scientific readability. Proceedings of the tenth international conference on Information and knowledge management, pages 574--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Siddharthan. 2003. Syntactic simplification and Text Cohesion. Ph.D. thesis, University of Cambridge, UK.Google ScholarGoogle Scholar
  33. V. Spandel. 2004. Creating writers through 6-trait writing assessment and instruction. Allyn&Bacon.Google ScholarGoogle Scholar
  34. F. Wolf and E. Gibson. 2005. Representing discourse coherence: A corpus-based study. Computational Linguistics, 31(2):249--288. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image DL Hosted proceedings
    EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing
    October 2008
    1129 pages

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    • Published: 25 October 2008

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate73of234submissions,31%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader