research-article

Free Access

Revisiting readability: a unified framework for predicting text quality

Authors:
Emily Pitler

University of Pennsylvania, Philadelphia, PA

University of Pennsylvania, Philadelphia, PA
View Profile

,
Ani Nenkova

University of Pennsylvania, Philadelphia, PA

University of Pennsylvania, Philadelphia, PA
View Profile

Authors Info & Claims

EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language ProcessingOctober 2008Pages 186–195

Published:25 October 2008Publication History

EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing

Pages 186–195

ABSTRACT

We combine lexical, syntactic, and discourse features to produce a highly predictive model of human readers' judgments of text readability. This is the first study to take into account such a variety of linguistic factors and the first to empirically demonstrate that discourse relations are strongly associated with the perceived quality of text. We show that various surface metrics generally expected to be related to readability are not very good predictors of readability judgments in our Wall Street Journal corpus. We also establish that readability predictors behave differently depending on the task: predicting text readability or ranking the readability. Our experiments indicate that discourse relations are the one class of features that exhibits robustness across these two tasks.

References

Y. Attali and J. Burstein. 2006. Automated essay scoring with e-rater v.2. The Journal of Technology, Learning and Assessment, 4(3).Google Scholar
A. Bailin and A. Grafstein. 2001. The linguistic assumptions underlying readability formulae a critique. Language and Communication, 21(3):285--301.Google ScholarCross Ref
R. Barzilay and M. Lapata. 2008. Modeling local coherence: An entity-based approach. Computational Linguistics, 34(1):1--34. Google ScholarDigital Library
L. Carlson, D. Marcu, and M. E. Okurowski. 2001. Building a discourse-tagged corpus in the framework of rhetorical structure theory. In Proceedings of the Second SIGdial Workshop, pages 1--10. Google ScholarDigital Library
M. Coleman and TL Liau. 1975. A computer readability formula designed for machine scoring. Journal of Applied Psychology, 60(2):283--284.Google ScholarCross Ref
K. Collins-Thompson and J. Callan. 2004. A language modeling approach to predicting reading difficulty. In Proceedings of HLT/NAACL'04.Google Scholar
Noemie Elhadad and Komal Sutaria. 2007. Mining a lexicon of technical terms and lay equivalents. In Biological, translational, and clinical language processing, pages 49--56, Prague, Czech Republic. Association for Computational Linguistics. Google ScholarDigital Library
M. Elsner and E. Charniak. 2008. Coreference-inspired coherence modeling. In Proceedings of ACL-HLT'08, (short paper). Google ScholarDigital Library
E. Gibson. 1998. Linguistic complexity: locality of syntactic dependencies. Cognition, 68:1--76.Google ScholarCross Ref
P. Gordon, B. Grosz, and L. Gilliom. 1993. Pronouns, names, and the centering of attention in discourse. Cognitive Science, 17:311--347.Google ScholarCross Ref
B. Grosz, A. Joshi, and S. Weinstein. 1995. Centering: a framework for modelling the local coherence of discourse. Computational Linguistics, 21(2):203--226. Google ScholarDigital Library
Robert Gunning. 1952. The technique of clear writing. McGraw-Hill; Fouth Printing edition.Google Scholar
Michael A. K. Halliday and Ruqaiya Hasan. 1976. Cohesion in English. Longman Group Ltd, London, U.K.Google Scholar
M. Heilman, K. Collins-Thompson, J. Callan, and M. Eskenazi. 2007. Combining Lexical and Grammatical Features to Improve Readability Measures for First and Second Language Texts. Proceedings of NAACL HLT, pages 460--467.Google Scholar
D. Higgins, J. Burstein, D. Marcu, and C. Gentile. 2004. Evaluating multiple aspects of coherence in student essays. In Proceedings of HLT/NAACL'04.Google Scholar
N. Karamanis, M. Poesio, C. Mellish, and J. Oberlander. (to appear). Evaluating centering for information ordering using corpora. Computational Linguistics. Google ScholarDigital Library
JP Kincaid. 1975. Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel.Google Scholar
A. Knott, J. Oberlander, M. ODonnell, and C. Mellish. 2001. Beyond elaboration: The interaction of relations and focus in coherent text. Text representation: linguistic and psycholinguistic aspects, pages 181--196.Google Scholar
E. Krahmer and M. Theune. 2002. Efficient context-sensitive generation of referring expressions. In K. van Deemter and R. Kibble, editors, Information Sharing: Reference and Presupposition in Language Generation and Interpretation, pages 223--264. CSLI Publications.Google Scholar
M. Lapata. 2006. Automatic evaluation of information ordering: Kendalls tau. Computational Linguistics, 32(4):471--484. Google ScholarDigital Library
W. Mann and S. Thompson. 1988. Rhetorical structure theory: Towards a functional theory of text organization. Text, 8.Google Scholar
M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz. 1994. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313--330. Google ScholarDigital Library
G. H. McLaughlin. 1969. SMOG grading: A new readability formula. Journal of Reading, 12(8):639--646.Google Scholar
E. Miltsakaki and K. Kukich. 2000. The role of centering theory's rough-shift in the teaching and evaluation of writing skills. In Proceedings of ACL'00, pages 408--415. Google ScholarDigital Library
A. Nenkova and K. McKeown. 2003. References to named entities: a corpus study. In Proceedings of HLT/NAACL 2003 (short paper). Google ScholarDigital Library
E. Pitler, M. Raghupathy, H. Mehta, A. Nenkova, A. Lee, and A. Joshi. 2008. Easily identifiable discourse relations. In Coling 2008: Companion volume: Posters and Demonstrations, pages 85--88, Manchester, UK, August.Google Scholar
M. Poesio and R. Vieira. 1998. A corpus-based investigation of definite description use. Computational Linguistics, 24(2):183--216. Google ScholarDigital Library
R. Prasad, N. Dinesh, A. Lee, E. Miltsakaki, L. Robaldo, A. Joshi, and B. Webber. 2008. The penn discourse treebank 2.0. In Proceedings of LREC'08.Google Scholar
KA Schriver. 1989. Evaluating text quality: the continuum from text-focused toreader-focused methods. Professional Communication, IEEE Transactions on, 32(4):238--255.Google ScholarCross Ref
S. Schwarm and M. Ostendorf. 2005. Reading level assessment using support vector machines and statistical language models. In Proceedings of ACL'05, pages 523--530. Google ScholarDigital Library
L. Si and J. Callan. 2001. A statistical model for scientific readability. Proceedings of the tenth international conference on Information and knowledge management, pages 574--576. Google ScholarDigital Library
A. Siddharthan. 2003. Syntactic simplification and Text Cohesion. Ph.D. thesis, University of Cambridge, UK.Google Scholar
V. Spandel. 2004. Creating writers through 6-trait writing assessment and instruction. Allyn&Bacon.Google Scholar
F. Wolf and E. Gibson. 2005. Representing discourse coherence: A corpus-based study. Computational Linguistics, 31(2):249--288. Google ScholarDigital Library

Recommendations

Adaptation of Classic Readability Metrics to Czech
Text, Speech, and Dialogue
Abstract
We have fitted four classic readability metrics to Czech, using InterCorp (a parallel corpus with manual sentence alignment), CzEng 2.0 (a large parallel corpus of crawled web texts), and the optimize.curve fit algorithm from the SciPy library. ...
Read More
Design Guidelines for Web Readability
DIS '17: Proceedings of the 2017 Conference on Designing Interactive Systems

Reading is fundamental to interactive-system use, but around 800 million of people might struggle with it due to literacy difficulties. Few websites are designed for high readability, as readability remains an underinvestigated facet of User Experience. ...
Read More
Making readability indices readable
PITR '12: Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations

Although many approaches have been presented to compute and predict readability of documents in different languages, the information provided by readability systems often fail to show in a clear and understandable way how difficult a document is and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing
October 2008
1129 pages
Program Chairs:
Mirella Lapata
University of Edinburgh
,
Hwee Tou Ng
National University of Singapore
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 25 October 2008
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate73of234submissions,31%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 51
  Total Citations
  View Citations
- 1,902
  Total Downloads
- Downloads (Last 12 months)74
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Revisiting readability: a unified framework for predicting text quality

EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing

ABSTRACT

References

Cited By

Recommendations

Adaptation of Classic Readability Metrics to Czech

Design Guidelines for Web Readability

Making readability indices readable

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Revisiting readability: a unified framework for predicting text quality

EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing

ABSTRACT

References

Cited By

Recommendations

Adaptation of Classic Readability Metrics to Czech

Design Guidelines for Web Readability

Making readability indices readable

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media