Towards the Processing of Historic Documents

Gottfried, Björn; Meyer-Lerbs, Lothar

doi:10.1007/978-3-642-23160-5_2

Towards the Processing of Historic Documents

Björn Gottfried²⁰ &
Lothar Meyer-Lerbs²⁰

Conference paper

520 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6699))

Abstract

This chapter describes methods required for transforming complex document images into texts. The goal is to make the contents of those documents available for search engines, which are not born-digital but converted from a physical medium to a digital format. Established optical character recognition methods fail for documents for which no assumptions can be made regarding the, probably unknown, symbols contained in the document, historic documents being the example domain par excellence. This paper, however, has a much broader goal: it outlines fundamental problems as well as a methodology in the dealing with documents containing unknown and arbitrary symbols in order to provide a basis for discussions and future work within the digital library community. In particular, future advances will more closely require the interaction of researchers concerned with such diverse topics as document digitisation, reproduction, and preservation as well as search engines, cross-language processing, mobile libraries, and many further areas. Adopting a general view on the presented issues, researchers of the aforementioned areas should be sensitised for the problems met in processing complex, especially historic documents.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Flickner, M., Sawhney, W., Niblack, H., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D., Yanker, P.: Query by Image and Video Content: The QBIC System. Computer 28, 23–32 (1995)
Article Google Scholar
Gottfried, B.: Shape from Positional-Contrast — Characterising Sketches with Qualitative Line Arrangements. DUV - Deutscher Universitätsverlag, Springer Science+Business Media, Wiesbaden (2007)
Google Scholar
Gottfried, B.: Qualitative Similarity Measures - The Case of Two-Dimensional Outlines. Computer Vision and Image Understanding 110(1), 117–133 (2008)
Article Google Scholar
Ho, T.K.: Random decision forests. In: ICDAR 1995: Proceedings of the Third International Conference on Document Analysis and Recognition, p. 278. IEEE Computer Society Press, Washington, DC, USA (1995)
Google Scholar
Hu, M.-K.: Visual pattern recognition by moment invariants. IRE Transactions on Information Theory 8(2), 179–187 (1962)
Article MATH Google Scholar
Lee, J.-S.: Digital image smoothing and the sigma filter. Computer Vision, Graphics, and Image Processing 24(2), 255–269 (1983)
Article Google Scholar
Meyer-Lerbs, L., Schuldt, A., Gottfried, B.: Glyph extraction from historic document images. In: Proceedings of the 2010 ACM Symposium on Document Engineering. ACM, New York (2010)
Google Scholar
Pletschacher, S.: A self-adaptive method for extraction of document-specific alphabets. In: ICDAR 2009: Proceedings of the 10th International Conference on Document Analysis and Recognition, pp. 656–660. IEEE Computer Society, Los Alamitos (2009)
Chapter Google Scholar
Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recognition 33(2), 225–236 (2000)
Article Google Scholar
Schuldt, A., Gottfried, B., Herzog, O.: Towards the visualisation of shape features the scope histogram. In: Freksa, C., Kohlhase, M., Schill, K. (eds.) KI 2006. LNCS (LNAI), vol. 4314, pp. 289–301. Springer, Heidelberg (2007)
Chapter Google Scholar
Shafait, F., Keysers, D., Breuel, T.M.: Efficient implementation of local adaptive thresholding techniques using integral images. In: Document Recognition and Retrieval XV, San Jose, CA, p. 6 (January 2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Computing and Communication Technologies, University of Bremen, Germany
Björn Gottfried & Lothar Meyer-Lerbs

Authors

Björn Gottfried
View author publications
You can also search for this author in PubMed Google Scholar
Lothar Meyer-Lerbs
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Trento, Povo, Italy
Raffaella Bernardi & Ilya Zaihrayeu &
The European Library, c/o De Koninklijke Bibliotheek, The National Library of the Netherlands, The Hague, The Netherlands
Sally Chambers
University of Bremen, Germany
Björn Gottfried
Xerox Research Centre Europe, Meylan, France
Frédérique Segond

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gottfried, B., Meyer-Lerbs, L. (2011). Towards the Processing of Historic Documents. In: Bernardi, R., Chambers, S., Gottfried, B., Segond, F., Zaihrayeu, I. (eds) Advanced Language Technologies for Digital Libraries. NLP4DL AT4DL 2009 2009. Lecture Notes in Computer Science, vol 6699. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23160-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-23160-5_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23159-9
Online ISBN: 978-3-642-23160-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics