skip to main content
10.1145/3476887.3476888acmotherconferencesArticle/Chapter ViewAbstractPublication PageshipConference Proceedingsconference-collections
research-article

A survey of OCR evaluation tools and metrics

Published:31 October 2021Publication History

ABSTRACT

The millions of pages of historical documents that are digitized in libraries are increasingly used in contexts that have more specific requirements for OCR quality than keyword search. How to comprehensively, efficiently and reliably assess the quality of OCR results against the background of mass digitization, when ground truth can only ever be produced for very small numbers? Due to gaps in specifications, results from OCR evaluation tools can return different results, and due to differences in implementation, even commonly used error rates are often not directly comparable. OCR evaluation metrics and sampling methods are also not sufficient where they do not take into account the accuracy of layout analysis, since for advanced use cases like Natural Language Processing or the Digital Humanities, accurate layout analysis and detection of the reading order are crucial. We provide an overview of OCR evaluation metrics and tools, describe two advanced use cases for OCR results, and perform an OCR evaluation experiment with multiple evaluation tools and different metrics for two distinct datasets. We analyze the differences and commonalities in light of the presented use cases and suggest areas for future work.

References

  1. Beatrice Alex and John Burns. 2014. Estimating and rating the quality of optically character recognised text. In Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage. ACM, NY, USA, 97–102.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Hildelies Balk and Aly Conteh. 2011. IMPACT: centre of competence in text digitisation. In Proceedings of the 2011 Workshop on Historical Document Imaging and Processing. ACM, NY, USA, 155–160.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Matthias Boenig, Konstantin Baierer, Volker Hartmann, Maria Federbusch, and Clemens Neudecker. 2019. Labelling OCR Ground Truth for Usage in Repositories. In Proceedings of the Third International Conference on Digital Access to Textual Cultural Heritage. ACM, NY, USA, 3–8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Christian Clausner, Christos Papadopoulos, Stefan Pletschacher, and Apostolos Antonacopoulos. 2015. The ENP image and ground truth dataset of historical newspapers. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, NY, USA, 931–935.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Christian Clausner, Stefan Pletschacher, and Apostolos Antonacopoulos. 2011. Scenario driven in-depth performance evaluation of document layout analysis methods. In 2011 International Conference on Document Analysis and Recognition. IEEE, NY, USA, 1404–1408.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Christian Clausner, Stefan Pletschacher, and Apostolos Antonacopoulos. 2013. The significance of reading order in document recognition and its evaluation. In 2013 12th International Conference on Document Analysis and Recognition. IEEE, NY, USA, 688–692.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Christian Clausner, Stefan Pletschacher, and Apostolos Antonacopoulos. 2016. Quality prediction system for large-scale digitisation workflows. In 2016 12th IAPR Workshop on Document Analysis Systems (DAS). IEEE, NY, USA, 138–143.Google ScholarGoogle ScholarCross RefCross Ref
  8. Christian Clausner, Stefan Pletschacher, and Apostolos Antonacopoulos. 2020. Flexible character accuracy measure for reading-order-independent evaluation. Pattern Recognition Letters 131 (2020), 390–397.Google ScholarGoogle ScholarCross RefCross Ref
  9. Gregory Crane and Alison Jones. 2006. The challenge of virginia banks: an evaluation of named entity analysis in a 19th-century newspaper collection. In Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries. IEEE, NY, USA, 31–40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Maud Ehrmann, Matteo Romanello, Alex Flückiger, and Simon Clematide. 2020. Extended overview of CLEF HIPE 2020: named entity processing on historical newspapers. In CLEF 2020 Working Notes. Conference and Labs of the Evaluation Forum, Vol. 2696. CEUR, Aachen, Germany, 1–38.Google ScholarGoogle Scholar
  11. Ahmed Hamdi, Axel Jean-Caurant, Nicolas Sidere, Mickaël Coustaty, and Antoine Doucet. 2019. An analysis of the performance of named entity recognition over OCRed documents. In 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL). IEEE, NY, USA, 333–334.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Mark J Hill and Simon Hengchen. 2019. Quantifying the impact of dirty OCR on historical text analysis: Eighteenth Century Collections Online as a case study. Digital Scholarship in the Humanities 34, 4 (2019), 825–843.Google ScholarGoogle ScholarCross RefCross Ref
  13. Rose Holley. 2009. How good can it get? Analysing and improving OCR accuracy in large scale historic newspaper digitisation programs. D-Lib Magazine 15, 3/4 (2009), Unpaginated.Google ScholarGoogle Scholar
  14. Kimmo Kettunen, Eetu Mäkelä, Teemu Ruokolainen, Juha Kuokkala, and Laura Löfberg. 2017. Old content and modern tools-searching named entities in a Finnish OCRed historical newspaper collection 1771-1910. Digital Humanities Quarterly 11 (2017), 24. Issue 3.Google ScholarGoogle Scholar
  15. Vladimir Kluzner, Asaf Tzadok, Yuval Shimony, Eugene Walach, and Apostolos Antonacopoulos. 2009. Word-based adaptive OCR for historical books. In 2009 10th International Conference on Document Analysis and Recognition. IEEE, NY, USA, 501–505.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gundram Leifert, Roger Labahn, Tobias Grüning, and Svenja Leifert. 2019. End-To-End Measure for Text Recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, NY, USA, 1424–1431.Google ScholarGoogle Scholar
  17. Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics doklady 10, 8 (1966), 707–710.Google ScholarGoogle Scholar
  18. Daniel Lopresti. 2009. Optical character recognition errors and their effects on natural language processing. International Journal on Document Analysis and Recognition (IJDAR) 12, 3(2009), 141–151.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Margot Mieskes and Stefan Schmunk. 2019. OCR Quality and NLP Preprocessing. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Florence, Italy. ACL, Stroudsburg PA, USA, 102–105.Google ScholarGoogle Scholar
  20. Christos Papadopoulos, Stefan Pletschacher, Christian Clausner, and Apostolos Antonacopoulos. 2013. The IMPACT dataset of historical document images. In Proceedings of the 2Nd international workshop on historical document imaging and processing. ACM, NY, USA, 123–130.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Stefan Pletschacher and Apostolos Antonacopoulos. 2010. The page (page analysis and ground-truth elements) format framework. In 2010 20th International Conference on Pattern Recognition. IEEE, NY, USA, 257–260.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Elvys Linhares Pontes, Ahmed Hamdi, Nicolas Sidere, and Antoine Doucet. 2019. Impact of OCR quality on named entity linking. In International Conference on Asian Digital Libraries. Springer, NY, USA, 102–115.Google ScholarGoogle Scholar
  23. Ulrich Reffle and Christoph Ringlstetter. 2013. Unsupervised profiling of OCRed historical documents. Pattern Recognition 46, 5 (2013), 1346–1357.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Georg Rehm, Peter Bourgonje, Stefanie Hegele, Florian Kintzel, Julián Moreno Schneider, Malte Ostendorff, Karolina Zaczynska, Armin Berger, Stefan Grill, and Sören et al. Räuchle. 2020. QURATOR: Innovative Technologies for Content and Data Curation. CEUR-WS 2535, 1 (2020), 15.Google ScholarGoogle Scholar
  25. Stephen Vincent Rice. 1996. Measuring the accuracy of page-reading systems. UNLV, Las Vega, NV.Google ScholarGoogle Scholar
  26. Stephen V Rice and Thomas A Nartker. 1996. The ISRI analytic tools for OCR evaluation. UNLV/Information Science Research Institute, TR-96 2 (1996), 45.Google ScholarGoogle Scholar
  27. Ahmed Ben Salah, Jean Philippe Moreux, Nicolas Ragot, and Thierry Paquet. 2015. OCR performance prediction using cross-OCR alignment. In Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, NY, USA, 556–560.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Eddie A Santos. 2019. OCR evaluation tools for the 21st century. In Proceedings of the Workshop on Computational Methods for Endangered Languages, Vol. 1. ACL, Stroudsburg PA, USA, 23––27.Google ScholarGoogle ScholarCross RefCross Ref
  29. Prashant Singh, Ekta Vats, and Anders Hast. 2018. Learning surrogate models of document image quality metrics for automated document image processing. In 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE, NY, USA, 67–72.Google ScholarGoogle ScholarCross RefCross Ref
  30. David A Smith and Ryan Cordell. 2018. A research agenda for historical and multilingual optical character recognition. Northeastern University, Boston, MA.Google ScholarGoogle Scholar
  31. Ray Smith. 2011. Limits on the application of frequency-based language models to OCR. In 2011 International Conference on Document Analysis and Recognition. IEEE, NY, USA, 538–542.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Uwe Springmann, Florian Fink, and Klaus U Schulz. 2016. Automatic quality evaluation and (semi-) automatic improvement of OCR models for historical printings. arXiv preprint arXiv:1606.05157 (2016), 8.Google ScholarGoogle Scholar
  33. Uwe Springmann, Christian Reul, Stefanie Dipper, and Johannes Baiter. 2018. Ground Truth for training OCR engines on historical documents in German Fraktur and Early Modern Latin. arXiv preprint arXiv:1809.05501 (2018), 8.Google ScholarGoogle Scholar
  34. Simon Tanner, Trevor Muñoz, and Pich Hemy Ros. 2009. Measuring mass text digitization quality and usefulness. D-lib Magazine 15, 7/8 (2009), 1082–9873.Google ScholarGoogle ScholarCross RefCross Ref
  35. Myriam C Traub, Jacco Van Ossenbruggen, and Lynda Hardman. 2015. Impact analysis of OCR quality on research tasks in digital archives. In International Conference on Theory and Practice of Digital Libraries. Springer, NY, USA, 252–263.Google ScholarGoogle ScholarCross RefCross Ref
  36. Esko Ukkonen. 1995. On-line construction of suffix trees. Algorithmica 14, 3 (1995), 249–260.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Daniel van Strien, Kaspar Beelen, Mariona Coll Ardanuy, Kasra Hosseini, Barbara McGillivray, and Giovanni Colavizza. 2020. Assessing the Impact of OCR Quality on Downstream NLP Tasks. In ICAART (1). SCITEPRESS, Setúbal, Portugal, 484–496.Google ScholarGoogle Scholar
  38. Maria Wernersson. 2015. Evaluation von automatisch erzeugten OCR-Daten am Beispiel der Allgemeinen Zeitung. ABI Technik 35, 1 (2015), 23–35.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    HIP '21: Proceedings of the 6th International Workshop on Historical Document Imaging and Processing
    September 2021
    72 pages
    ISBN:9781450386906
    DOI:10.1145/3476887

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 31 October 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate52of90submissions,58%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format