nach oben

International Journal on Document Analysis and Recognition (IJDAR)

Erschienen in:

23.07.2019 | Special Issue Paper

A two-stage method for text line detection in historical documents

verfasst von: Tobias Grüning, Gundram Leifert, Tobias Strauß, Johannes Michael, Roger Labahn

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This work presents a two-stage text line detection method for historical documents. Each detected text line is represented by its baseline. In a first stage, a deep neural network called ARU-Net labels pixels to belong to one of the three classes: baseline, separator and other. The separator class marks beginning and end of each text line. The ARU-Net is trainable from scratch with manageably few manually annotated example images (\(<\,50\)). This is achieved by utilizing data augmentation strategies. The network predictions are used as input for the second stage which performs a bottom-up clustering to build baselines. The developed method is capable of handling complex layouts as well as curved and arbitrarily oriented text lines. It substantially outperforms current state-of-the-art approaches. For example, for the complex track of the cBAD: ICDAR2017 Competition on Baseline Detection the F value is increased from 0.859 to 0.922. The framework to train and run the ARU-Net is open source.

Vorheriger Artikel Comic MTL: optimized multi-task learning for comic book image analysis

Nächster Artikel On optimal stopping strategies for text recognition in a video stream as an application of a monotone sequential decision model

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

Optical Character Recognition + Handwritten Text Recognition.

https://github.com/TobiasGruening/ARU-Net.

https://transkribus.eu.

https://github.com/TobiasGruening/ARU-Net.

https://transkribus.eu.

https://zenodo.org/record/218236.

http://www.primaresearch.org/tools.

https://zenodo.org/record/257972.

A separable MDLSTM layer is a concatenation of two (x- and y-direction) BLSTM layers.

The competition training data were not available to the authors.

http://diuf.unifr.ch/main/hisdoc/diva-hisdb.

Isaac, A., Clayphan, R., Haslhofer, B.: Europeana: moving to linked open data. Inf. Stand. Q. 24(2/3)

Causer, T., Wallace, V.: Building a volunteer community: results and findings from transcribe bentham. Digit. Humanit. Q. 6(2), 1–28 (2012)

Sánchez, J.A., Mühlberger, G., Gatos, B., Schofield, P., Depuydt, K., Davis, R.M., Vidal, E., de Does, J.: TranScriptorium: a European project on handwritten text recognition. In: Proceedings of the 2013 ACM Symposium on Document Engineering, pp. 227–228. ACM, (2013)

Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: Advances in Neural Information Processing Systems 21, NIPS’21, pp. 545–552. (2008)

Leifert, G., Strauß, T., Grüning, T., Wustlich, W., Labahn, R.: Cells in multidimensional recurrent neural networks. J. Mach. Learn. Res. 17(1), 3313–3349 (2016)MathSciNetMATH

Puigcerver, J., Toselli, A.H., Vidal, E.: Word-graph and character-lattice combination for KWS in handwritten documents. In: 2014 14th International Conference on Frontiers in Handwriting Recognition, pp. 181–186. IEEE, (2014)

Strauß, T., Grüning, T., Leifert, G., Labahn, R.: CITlab ARGUS for Keyword Search in Historical Handwritten Documents: Description of CITlab’s System for the ImageCLEF 2016 Handwritten Scanned Document Retrieval Task, CEUR Workshop Proceedings, Évora, Portugal, (2016)

Strauß, T., Leifert, G., Grüning, T., Labahn, R.: Regular expressions for decoding of neural network outputs. Neural Netw. 79, 1–11 (2016)CrossRefMATH

Sanchez, J.A., Romero, V., Toselli, A.H., Vidal, E.: ICFHR2014 competition on handwritten text recognition on Transcriptorium datasets (HTRtS). In: Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR, vol. 2014-Decem, pp. 785–790. IEEE, (2014)

10.

Pratikakis, I., Zagoris, K., Puigcerver, J., Toselli, A.H., Vidal, E.: ICFHR2016 handwritten keyword spotting competition (H-KWS 2016), In: Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR, pp. 613–618. IEEE, (2016)

11.

Rusiñol, M., Aldavert, D., Toledo, R., Lladós, J.: Efficient segmentation-free keyword spotting in historical document collections. Pattern Recognit. 48(2), 545–555 (2015)CrossRef

12.

Bluche, T.: Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. In: Advances in Neural Information Processing Systems, pp. 838–846. (2016)

13.

Konidaris, T., Kesidis, A.L., Gatos, B.: A segmentation-free word spotting method for historical printed documents. Pattern Anal. Appl. 19(4), 963–976 (2016)MathSciNetCrossRef

14.

Murdock, M., Reid, S., Hamilton, B., Reese, J.: ICDAR 2015 competition on text line detection in historical documents. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, vol. 2015-Novem, pp. 1171–1175. IEEE, (2015)

15.

Sudholt, S., Fink, G.A.: Phocnet : a deep convolutional neural network for word spotting in handwritten documents. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR, pp. 1–6. (2016)

16.

Renton, G., Soullard, Y., Chatelain, C., Adam, S., Kermorvant, C., Paquet, T.: Fully convolutional network with dilated convolutions for handwritten text line segmentation. Int. J. Doc. Anal. Recognit. (IJDAR) 21, 1–10 (2018)CrossRef

17.

Arvanitopoulos, N., Süsstrunk, S.: Seam carving for text line extraction on color and grayscale historical manuscripts. In: International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 726–731. (2014)

18.

Vo, Q.N., Kim, S.H., Yang, H.J., Lee, G.: Binarization of degraded document images based on hierarchical deep supervised network. Pattern Recognit. 74, 568–586 (2018)CrossRef

19.

Tensmeyer, C., Davis, B., Wigington, C., Lee, I., Barrett, B.: PageNet: page boundary extraction in historical handwritten documents. In: Proceedings of the 4th International Workshop on Historical Document Imaging and Processing, HIP ’1, pp. 59–64. ACM, New York, USA, (2017)

20.

Chen, K., Seuret, M., Hennebert, J., Ingold, R.: Convolutional neural networks for page segmentation of historical document images. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 965–970. (2017)

21.

Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: MICCAI, pp. 234–241. (2015)

22.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. (2016)

23.

Ryu, J., Koo, H.I., Cho, N.I.: Language-independent text-line extraction algorithm for handwritten documents. IEEE Signal Process. Lett. 21(9), 1115–1119 (2014)CrossRef

24.

Sánchez, J.A., Romero, V., Toselli, A.H., Vidal, E.: READ dataset Bozen (2016). https://doi.org/10.5281/zenodo.218236

25.

Grüning, T., Labahn, R., Diem, M., Kleber, F., Fiel, S.: READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents. arXiv preprint arXiv:1705.03311

26.

Diem, M., Kleber, F., Fiel, S., Grüning, T., Gatos, B.: ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD) (2017). https://doi.org/10.5281/zenodo.257972.

27.

Zahour, A., Likforman-Sulem, L., Boussalaa, W., Taconet, B.: Text Line segmentation of historical Arabic documents. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, vol. 1(2–4), pp. 138–142. (2007)

28.

Eskenazi, S., Gomez-Krämer, P., Ogier, J.-M.: A comprehensive survey of mostly textual document segmentation algorithms since 2008. Pattern Recognit. 64, 1–14 (2017)CrossRef

29.

Nicolaou, A., Gatos, B.: Handwritten text line segmentation by shredding text into its lines. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 626–630. (2009)

30.

Saabni, R., Asi, A., El-Sana, J.: Text line extraction for historical document images. Pattern Recognit. Lett. 35(1), 23–33 (2014)CrossRef

31.

Garz, A., Fischer, A., Sablatnig, R., Bunke, H.: Binarization-free text line segmentation for historical documents based on interest point clustering. In: Proceedings of 10th IAPR International Workshop on Document Analysis Systems, DAS 2012, pp. 95–99. IEEE, (2012)

32.

Ahn, B., Ryu, J., Koo, H.I., Cho, N.I.: Textline detection in degraded historical document images. EURASIP J. Image Video Process. 2017(1), 82 (2017)CrossRef

33.

Grüning, T., Leifert, G., Strauß, T., Labahn, R.: A robust and Binarization-free approach for text line detection in historical documents. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 236–241. (2017)

34.

Moysset, B., Kermorvant, C., Wolf, C., Louradour, J.: Paragraph text segmentation into lines with Recurrent Neural Networks. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR 2015-Novem, pp. 456–460. (2015)

35.

Moysset, B., Kermorvant, C., Wolf, C.: Learning to detect, localize and recognize many text objects in document images from few examples. Int. J. Doc. Anal. Recognit. (IJDAR) 21, 1–15 (2018)CrossRef

36.

Diem, M., Kleber, F., Fiel, S., Gatos, B., Grüning, T.: cBAD: ICDAR2017 competition on baseline detection. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 1355–1360. (2017)

37.

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3431–3440. (2015)

38.

Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 Inter, pp. 1520–1528. (2015)

39.

Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)CrossRefMATH

40.

Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)MATH

41.

LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, vol. 86(11), pp. 2278–2323. (1998)

42.

Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Titterington, M. (Eds.), Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, PMLR, vol. 9, pp. 249–256. (2010)

43.

Serra, J.: Image Analysis and Mathematical Morphology, vol. 1. Academic Press Inc., New York (1982)MATH

44.

Delaunay, B.: Sur la sphere vide. Bulletin de l’Académie des Sciences de l’URSS 6, 793–800 (1934)MATH

45.

Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1222–1239 (2001)CrossRef

46.

Simard, P., Steinkraus, D., Platt, J.C.: Best practices for convolutional neural networks applied to visual document analysis. In: Proceedings of the 7th International Conference on Document Analysis and Recognition, pp. 958–963. (2003)

47.

Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 67–72. (2017)

48.

Efron, B.: Better bootstrap confidence intervals. J. Am. Stat. Assoc. 82(397), 171–185 (1987)MathSciNetCrossRefMATH

49.

Tukey, J.W.: A quick compact two sample test to Duckworth’s specifications. Technometrics 1(1), 31–48 (1959)MathSciNet

50.

Simistira, F., Bouillon, M., Seuret, M., Würsch, M., Alberti, M., Ingold, R., Liwicki, M.: ICDAR2017 competition on layout analysis for challenging medieval manuscripts. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 1361–1370. (2017)

51.

Aldavert, D., Rusiñol, M., Manuscript text line detection and segmentation using second-order derivatives. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 293–298. IEEE, (2018)

52.

Quirós, L.: Multi-task Handwritten Document Layout Analysis. arXiv preprint arXiv:1806.08852

53.

Fink, M., Layer, T., Mackenbrock, G., Sprinzl, M.: Baseline detection in historical documents using convolutional u-nets. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 37–42. IEEE, (2018)

54.

Oliveira, S.A., Seguin, B., Kaplan, F.: Dhsegment: a generic deep-learning approach for document segmentation. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 7–12. IEEE, (2018)

Titel: A two-stage method for text line detection in historical documents
verfasst von: Tobias Grüning
Gundram Leifert
Tobias Strauß
Johannes Michael
Roger Labahn
Publikationsdatum: 23.07.2019
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 3/2019
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI: https://doi.org/10.1007/s10032-019-00332-1

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2019

Are 2D-LSTM really dead for offline text recognition?

Editorial for special issue on “Advanced Topics in Document Analysis and Recognition”

Coarse-to-fine document localization in natural scene image with regional attention and recursive corner refinement

On optimal stopping strategies for text recognition in a video stream as an application of a monotone sequential decision model

Handwritten Arabic text recognition using multi-stage sub-core-shape HMMs

Generalized framework for summarization of fixed-camera lecture videos by detecting and binarizing handwritten content

Premium Partner