ABSTRACT
We combine lexical, syntactic, and discourse features to produce a highly predictive model of human readers' judgments of text readability. This is the first study to take into account such a variety of linguistic factors and the first to empirically demonstrate that discourse relations are strongly associated with the perceived quality of text. We show that various surface metrics generally expected to be related to readability are not very good predictors of readability judgments in our Wall Street Journal corpus. We also establish that readability predictors behave differently depending on the task: predicting text readability or ranking the readability. Our experiments indicate that discourse relations are the one class of features that exhibits robustness across these two tasks.
- Y. Attali and J. Burstein. 2006. Automated essay scoring with e-rater v.2. The Journal of Technology, Learning and Assessment, 4(3).Google Scholar
- A. Bailin and A. Grafstein. 2001. The linguistic assumptions underlying readability formulae a critique. Language and Communication, 21(3):285--301.Google ScholarCross Ref
- R. Barzilay and M. Lapata. 2008. Modeling local coherence: An entity-based approach. Computational Linguistics, 34(1):1--34. Google ScholarDigital Library
- L. Carlson, D. Marcu, and M. E. Okurowski. 2001. Building a discourse-tagged corpus in the framework of rhetorical structure theory. In Proceedings of the Second SIGdial Workshop, pages 1--10. Google ScholarDigital Library
- M. Coleman and TL Liau. 1975. A computer readability formula designed for machine scoring. Journal of Applied Psychology, 60(2):283--284.Google ScholarCross Ref
- K. Collins-Thompson and J. Callan. 2004. A language modeling approach to predicting reading difficulty. In Proceedings of HLT/NAACL'04.Google Scholar
- Noemie Elhadad and Komal Sutaria. 2007. Mining a lexicon of technical terms and lay equivalents. In Biological, translational, and clinical language processing, pages 49--56, Prague, Czech Republic. Association for Computational Linguistics. Google ScholarDigital Library
- M. Elsner and E. Charniak. 2008. Coreference-inspired coherence modeling. In Proceedings of ACL-HLT'08, (short paper). Google ScholarDigital Library
- E. Gibson. 1998. Linguistic complexity: locality of syntactic dependencies. Cognition, 68:1--76.Google ScholarCross Ref
- P. Gordon, B. Grosz, and L. Gilliom. 1993. Pronouns, names, and the centering of attention in discourse. Cognitive Science, 17:311--347.Google ScholarCross Ref
- B. Grosz, A. Joshi, and S. Weinstein. 1995. Centering: a framework for modelling the local coherence of discourse. Computational Linguistics, 21(2):203--226. Google ScholarDigital Library
- Robert Gunning. 1952. The technique of clear writing. McGraw-Hill; Fouth Printing edition.Google Scholar
- Michael A. K. Halliday and Ruqaiya Hasan. 1976. Cohesion in English. Longman Group Ltd, London, U.K.Google Scholar
- M. Heilman, K. Collins-Thompson, J. Callan, and M. Eskenazi. 2007. Combining Lexical and Grammatical Features to Improve Readability Measures for First and Second Language Texts. Proceedings of NAACL HLT, pages 460--467.Google Scholar
- D. Higgins, J. Burstein, D. Marcu, and C. Gentile. 2004. Evaluating multiple aspects of coherence in student essays. In Proceedings of HLT/NAACL'04.Google Scholar
- N. Karamanis, M. Poesio, C. Mellish, and J. Oberlander. (to appear). Evaluating centering for information ordering using corpora. Computational Linguistics. Google ScholarDigital Library
- JP Kincaid. 1975. Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel.Google Scholar
- A. Knott, J. Oberlander, M. ODonnell, and C. Mellish. 2001. Beyond elaboration: The interaction of relations and focus in coherent text. Text representation: linguistic and psycholinguistic aspects, pages 181--196.Google Scholar
- E. Krahmer and M. Theune. 2002. Efficient context-sensitive generation of referring expressions. In K. van Deemter and R. Kibble, editors, Information Sharing: Reference and Presupposition in Language Generation and Interpretation, pages 223--264. CSLI Publications.Google Scholar
- M. Lapata. 2006. Automatic evaluation of information ordering: Kendalls tau. Computational Linguistics, 32(4):471--484. Google ScholarDigital Library
- W. Mann and S. Thompson. 1988. Rhetorical structure theory: Towards a functional theory of text organization. Text, 8.Google Scholar
- M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz. 1994. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313--330. Google ScholarDigital Library
- G. H. McLaughlin. 1969. SMOG grading: A new readability formula. Journal of Reading, 12(8):639--646.Google Scholar
- E. Miltsakaki and K. Kukich. 2000. The role of centering theory's rough-shift in the teaching and evaluation of writing skills. In Proceedings of ACL'00, pages 408--415. Google ScholarDigital Library
- A. Nenkova and K. McKeown. 2003. References to named entities: a corpus study. In Proceedings of HLT/NAACL 2003 (short paper). Google ScholarDigital Library
- E. Pitler, M. Raghupathy, H. Mehta, A. Nenkova, A. Lee, and A. Joshi. 2008. Easily identifiable discourse relations. In Coling 2008: Companion volume: Posters and Demonstrations, pages 85--88, Manchester, UK, August.Google Scholar
- M. Poesio and R. Vieira. 1998. A corpus-based investigation of definite description use. Computational Linguistics, 24(2):183--216. Google ScholarDigital Library
- R. Prasad, N. Dinesh, A. Lee, E. Miltsakaki, L. Robaldo, A. Joshi, and B. Webber. 2008. The penn discourse treebank 2.0. In Proceedings of LREC'08.Google Scholar
- KA Schriver. 1989. Evaluating text quality: the continuum from text-focused toreader-focused methods. Professional Communication, IEEE Transactions on, 32(4):238--255.Google ScholarCross Ref
- S. Schwarm and M. Ostendorf. 2005. Reading level assessment using support vector machines and statistical language models. In Proceedings of ACL'05, pages 523--530. Google ScholarDigital Library
- L. Si and J. Callan. 2001. A statistical model for scientific readability. Proceedings of the tenth international conference on Information and knowledge management, pages 574--576. Google ScholarDigital Library
- A. Siddharthan. 2003. Syntactic simplification and Text Cohesion. Ph.D. thesis, University of Cambridge, UK.Google Scholar
- V. Spandel. 2004. Creating writers through 6-trait writing assessment and instruction. Allyn&Bacon.Google Scholar
- F. Wolf and E. Gibson. 2005. Representing discourse coherence: A corpus-based study. Computational Linguistics, 31(2):249--288. Google ScholarDigital Library
Recommendations
Adaptation of Classic Readability Metrics to Czech
Text, Speech, and DialogueAbstractWe have fitted four classic readability metrics to Czech, using InterCorp (a parallel corpus with manual sentence alignment), CzEng 2.0 (a large parallel corpus of crawled web texts), and the optimize.curve fit algorithm from the SciPy library. ...
Design Guidelines for Web Readability
DIS '17: Proceedings of the 2017 Conference on Designing Interactive SystemsReading is fundamental to interactive-system use, but around 800 million of people might struggle with it due to literacy difficulties. Few websites are designed for high readability, as readability remains an underinvestigated facet of User Experience. ...
Making readability indices readable
PITR '12: Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populationsAlthough many approaches have been presented to compute and predict readability of documents in different languages, the information provided by readability systems often fail to show in a clear and understandable way how difficult a document is and ...
Comments