nach oben

International Journal on Document Analysis and Recognition (IJDAR)

Erschienen in:

03.03.2021 | Original Paper

Combination of deep neural networks and logical rules for record segmentation in historical handwritten registers using few examples

verfasst von: Solène Tarride, Aurélie Lemaitre, Bertrand Coüasnon, Sophie Tardivel

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 1-2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This work focuses on the layout analysis of historical handwritten registers, in which local religious ceremonies were recorded. The aim of this work is to delimit each record using few available training data. To this end, two approaches are proposed. Firstly, three state-of-the-art object detection networks are explored and compared. Further experiments are then conducted on Mask R-CNN, as it yields the best performance. Secondly, we introduce and investigate Deep&Syntax, a hybrid system that takes advantages of recurrent patterns to delimit each record, by combining u-shaped networks and logical rules. Finally, these two approaches are evaluated on 3708 French records (sixteenth–eighteenth centuries), as well as on the Esposalles public database, containing 253 Spanish records (seventeenth century). While both systems perform well on homogeneous documents, we observe a significant drop in performance with Mask R-CNN on more challenging documents, especially when trained on a small, non-representative subset. By contrast, Deep&Syntax relies on steady patterns and is therefore able to process a wider range of documents with less training data. When both systems are trained on 120 documents, Deep&Syntax produces 15% more match configurations and reduces the ZoneMap surface error metric by 30%. It also outperforms Mask R-CNN when trained on a database three times smaller. As Deep&Syntax generalizes better, we believe it can be used for massive parish register processing, as collecting and annotating a sufficiently large and representative set of training data is not always achievable.

Vorheriger Artikel Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training

Nächster Artikel Offline script recognition from handwritten and printed multilingual documents: a survey

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

https://github.com/matterport/Mask_RCNN.

https://github.com/fizyr/keras-retinanet.

https://github.com/qqwweee/keras-yolo3.

https://gitlab.inria.fr/starride/structure-esposalles.

https://github.com/Transkribus/TranskribusBaseLineEvaluationScheme.

Alaasam, R., Kurar, B., El-Sana, J.: Layout analysis on challenging historical Arabic manuscripts using Siamese network. In: 15th International Conference on Document Analysis and Recognition, pp. 738–742 (2019)

Alberti, M., Pondenkandath, V., Würsch, M., Ingold, R., Liwicki, M.: Deepdiva: a highly-functional python framework for reproducible experiments. CoRR arXiv:1805.00329 (2018)

Alberti, M., Vögtlin, L., Pondenkandath, V., Seuret, M., Ingold, R., Liwicki, M.: Labeling, cutting, grouping: an efficient text line segmentation method for medieval manuscripts. CoRR arXiv:1906.11894 (2019)

Alvaro, F., Cruz, F., Sánchez, J.A., Ramos Terrades, O., Benedí, J.M.: Structure detection and segmentation of documents using 2D stochastic context-free grammars. Neurocomputing 150, 147–154 (2015)CrossRef

Antonacopoulos, A., Gatos, B., Bridson, D.: Page segmentation competition. In: 9th International Conference on Document Analysis and Recognition, vol. 2, pp. 1279–1283 (2007)

Asi, A., Cohen, R., Kedem, K., El-Sana, J.: Simplifying the reading of historical manuscripts. In: 13th International Conference on Document Analysis and Recognition, pp. 826–830 (2015)

Baechler, M., Liwicki, M., Ingold, R.: Text line extraction using DMLP classifiers for historical manuscripts. In: 12th International Conference on Document Analysis and Recognition, pp. 1029–1033 (2013)

Barlas, P., Adam, S., Chatelain, C., Paquet, T.: A typed and handwritten text block segmentation system for heterogeneous and complex documents. In: 11th International Workshop on Document Analysis Systems, pp. 46–50 (2014)

Benjlaiel, M., Mullot, R., Alimi, A.M.: Multi-oriented handwritten annotations extraction from scanned documents. In: 11th International Workshop on Document Analysis Systems, pp. 126–130 (2014)

10.

Bolshakov, I.A., Gelbukh, A.: Text segmentation into paragraphs based on local text cohesion. In: Matoušek, V., Mautner, P., Mouček, R., Taušer, K. (eds.) Text, Speech and Dialogue, pp. 158–166 (2001)

11.

Brunessaux, S., Giroux, P., Grilhères, B., Manta, M., Bodin, M., Choukri, K., Galibert, O., Kahn, J.: The Maurdor project: improving automatic processing of digital documents. In: 11th International Workshop on Document Analysis Systems, pp. 349–354 (2014)

12.

Bukhari, S., Shafait, F., Breuel, T.: Coupled Snakelets for curled text-line segmentation from warped document images. In: 11th International Journal on Document Analysis and Recognition vol. 16, pp. 1–21 (2011)

13.

Bukhari, S.S., Shafait, F., Breuel, T.M.: High performance layout analysis of Arabic and Urdu document images. In: 11th International Conference on Document Analysis and Recognition, pp. 1275–1279 (2011)

14.

Bulacu, M., Koert, R., Schomaker, L.: Layout analysis of handwritten historical documents for searching the archive of the cabinet of the Dutch queen. In: 9th International Conference on Document Analysis and Recognition (2007)

15.

Carel, E., Burie, J.C., Courboulay, V., Ogier, J.M., Poulain d’Andecy, V.: Multiresolution approach based on adaptive superpixels for administrative documents segmentation into color layers. In: 13th International Conference on Document Analysis and Recognition, pp. 566–570 (2015)

16.

Chen, K., Seuret, M., Liwicki, M., Hennebert, J., Ingold, R.: Page segmentation of historical document images with convolutional autoencoders. In: 13th International Conference on Document Analysis and Recognition, pp. 1011–1015 (2015)

17.

Chen, K., Wei, H., Liwicki, M., Hennebert, J., Ingold, R.: Robust text line segmentation for historical manuscript images using color and texture. In: 22nd International Conference on Pattern Recognition, pp. 2978–2983 (2014)

18.

Chen, K., Yin, F., Liu, C.: Hybrid page segmentation with efficient whitespace rectangles extraction and grouping. In: 12th International Conference on Document Analysis and Recognition, pp. 958–962 (2013)

19.

Clausner, C., Antonacopoulos, A., Pletschacher, S.: A robust hybrid approach for text line segmentation in historical documents. In: 21st International Conference on Pattern Recognition, pp. 335–338 (2012)

20.

Coüasnon, B.: Dmos, a generic document recognition method: Application to table structure analysis in a general and in a specific way. IJDAR 8, 111–122 (2006)CrossRef

21.

Coüasnon, B.B., Lemaitre, A.: DMOS, It’s your turn ! In: 1st International Workshop on Open Services and Tools for Document Analysis (2017)

22.

Cruz, F., Terrades, O.R.: Em-based layout analysis method for structured documents. In: 22nd International Conference on Pattern Recognition, pp. 315–320 (2014)

23.

Diem, M., Kleber, F., Sablatnig, R.: Text classification and document layout analysis of paper fragments. In: 11th International Conference on Document Analysis and Recognition, pp. 854–858 (2011)

24.

Diem, M., Kleber, F., Sablatnig, R.: Text line detection for heterogeneous documents. In: 12th International Conference on Document Analysis and Recognition, pp. 743–747 (2013)

25.

Diem, M., Kleber, F., Sablatnig, R., Gatos, B.: CBAD: ICDAR2019 competition on baseline detection. In: 15th International Conference on Document Analysis and Recognition, pp. 1494–1498 (2019)

26.

Ferilli, S., Biba, M., Esposito, F., Basile, T.M.A.: A distance-based technique for non-manhattan layout analysis. In: 10th International Conference on Document Analysis and Recognition, pp. 231–235 (2009)

27.

Fernández, F.C., Terrades, O.R.: Document segmentation using relative location features. In: 21st International Conference on Pattern Recognition, pp. 1562–1565 (2012)

28.

Filippova, K., Strube, M.: Using linguistically motivated features for paragraph boundary identification. In: Conference on Empirical Methods in Natural Language Processing, pp. 267–274 (2006)

29.

Fischer, A., Baechler, M., Garz, A., Liwicki, M., Ingold, R.: A combined system for text line extraction and handwriting recognition in historical documents. In: 11th International Workshop on Document Analysis Systems, pp. 71–75 (2014)

30.

Fornès, A., Romero, V., Barò, A., Toledo, J.I., Sánchez, J.A., Vidal, E., Lladòs, J.: Icdar2017 competition on information extraction in historical handwritten records. In: 14th International Conference on Document Analysis and Recognition, vol. 01, pp. 1389–1394 (2017)

31.

Gaceb, D., Eglin, V., Lebourgeois, F., Emptoz, H.: Application of graph coloring in physical layout segmentation. In: 19th International Conference on Pattern Recognition, pp. 1–4 (2008)

32.

Galibert, O., Kahn, J., Oparin, I.: The zonemap metric for page segmentation and area classification in scanned documents. In: 21st International Conference on Image Processing, pp. 2594–2598 (2014)

33.

Garz, A., Sablatnig, R., Diem, M.: Layout analysis for historical manuscripts using sift features. In: 11th International Conference on Document Analysis and Recognition, pp. 508–512 (2011)

34.

Grüning, T., Labahn, R., Diem, M., Kleber, F., Fiel, S.: Read-bad: A new dataset and evaluation scheme for baseline detection in archival documents. In: 13th International Workshop on Document Analysis Systems, pp. 351–356 (2018)

35.

Grüning, T., Leifert, G., Strauß, T., Labahn, R.: A two-stage method for text line detection in historical documents. CoRR arXiv:1802.03345 (2018)

36.

He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. CoRR arXiv:1703.06870 (2017)

37.

Hebert, D., Paquet, T., Nicolas, S.: Continuous crf with multi-scale quantization feature functions application to structure extraction in old newspaper. In: 11th International Conference on Document Analysis and Recognition, pp. 493–497 (2011)

38.

Jaekyu Ha, Haralick, R.M., Phillips, I.T.: Document page decomposition by the bounding-box project. In: 3rd International Conference on Document Analysis and Recognition, vol. 2, pp. 1119–1122 vol.2 (1995)

39.

Journet, N., Ramel, J.Y., Eglin, V., Mullot, R.: Document image characterization using a multiresolution analysis of the texture: application to old documents. Int. J. Doc. Anal. Recognit. 11(1), 9–18 (2008)CrossRef

40.

Kamola, G., Spytkowski, M., Paradowski, M., Markowska-Kaczmar, U.: Image-based logical document structure recognition. Pattern Anal. Appl. 18, 651–665 (2015)MathSciNetCrossRef

41.

Kumar, J., Abd-Almageed, W., Kang, L., Doermann, D.: Handwritten Arabic text line segmentation using affinity propagation. In: 9th IAPR International Workshop on Document Analysis Systems, pp. 135–142 (2010)

42.

Lemaitre, A., Camillerapp, J., Coüasnon, B.: Multiresolution cooperation makes easier document structure recognition. IJDAR 11, 97–109 (2008)CrossRef

43.

Lemaitre, A., Camillerapp, J., Coüasnon, B.: A perceptive method for handwritten text segmentation. Document recognition and retrieval XVIII 7874, (2011)

44.

Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. CoRR arXiv:1708.02002 (2017)

45.

Lin, T., Maire, M., Belongie, S.J., Bourdev, L.D., Girshick, R.B., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. CoRR arXiv:1405.0312 (2014)

46.

Mehri, M., Gomez-Krämer, P., Héroux, P., Boucher, A., Mullot, R.: Texture feature evaluation for segmentation of historical document images. In: 2nd International Workshop on Historical Document Imaging and Processing, pp 102–109 (2013)

47.

Mehri, M., Heroux, P., Gomez-Krämer, P., Boucher, A., Mullot, R.: A pixel labeling approach for historical digitized books. In: 12th International Conference on Document Analysis and Recognition, pp. 817–821 (2013)

48.

Mehri, M., Hèroux, P., Mullot, R., Moreux, J., Coüasnon, B., Barrett, B.: ICDAR2019 competition on historical book analysis—HBA2019. In: 15th International Conference on Document Analysis and Recognition, pp. 1488–1493 (2019)

49.

Moysset, B., Kermorvant, C., Wolf, C., Louradour, J.: Paragraph text segmentation into lines with recurrent neural networks. In: 13th International Conference on Document Analysis and Recognition, pp. 456–460 (2015)

50.

Oliveira, D., Viana, M.: Fast cnn-based document layout analysis. In: IEEE International Conference on Computer Vision Workshops, pp. 1173–1180 (2017)

51.

Oliveira, S.A., Seguin, B., Kaplan, F.: dhsegment: A generic deep-learning approach for document segmentation. CoRR arXiv:1804.10371 (2018)

52.

Ouwayed, N., Belaïd, A.: A general approach for multi-oriented text line extraction of handwritten document. Int. J. Doc. Anal. Recognit. 14(4), 297–314 (2011)CrossRef

53.

Papavassiliou, V., Stafylakis, T., Katsouros, V., Carayannis, G.: Handwritten document image segmentation into text lines and words. Pattern Recognit. 43(1), 369–377 (2010)CrossRef

54.

Peng, X., Setlur, S., Govindaraju, V., Sitaram, R.: Handwritten text separation from annotated machine printed documents using Markov random fields. In: 11th International Journal on Document Analysis and Recognition, vol. 16, pp. 1–16 (2011)

55.

Pinson, S.J., Barrett, W.A.: Connected component level discrimination of handwritten and machine-printed text using eigenfaces. In: 11th International Conference on Document Analysis and Recognition, pp. 1394–1398 (2011)

56.

Prusty, A., Aitha, S., Trivedi, A., Sarvadevabhatla, R.K.: Indiscapes: Instance segmentation networks for layout parsing of historical INDIC manuscripts (2019)

57.

Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. CoRR arXiv:1804.02767 (2018)

58.

Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR arXiv:1506.01497 (2015)

59.

Renton, G., Soullard, Y., Chatelain, C., Adam, S., Kermorvant, C., Paquet, T.: Fully convolutional network with dilated convolutions for handwritten text line segmentation. Int. J. Doc. Anal. Recognit. 21, 177–186 (2018)CrossRef

60.

Romero, V., Fornés, A., Serrano, N., Sánchez, J.A., Toselli, A.H., Frinken, V., Vidal, E., Lladós, J.: The esposalles database: an ancient marriage license corpus for off-line handwriting recognition. Pattern Recognit. 46, 1658–1669 (2013)CrossRef

61.

Ryu, J., Koo, H.I., Cho, N.I.: Language-independent text-line extraction algorithm for handwritten documents. IEEE Signal Process. Lett. 21(9), 1115–1119 (2014)CrossRef

62.

Saha, R., Mondal, A., Jawahar, C.V.: Graphical object detection in document images. In: 15th International Conference on Document Analysis and Recognition pp. 51–58 (2019)

63.

Shafait, F., v. Beusekom, J., Keysers, D., Breuel, T.M.: Structural mixtures for statistical layout analysis. In: 8th International Workshop on Document Analysis Systems, pp. 415–422 (2008)

64.

Tang, Y., Wu, X., Bu, W.: Text line segmentation based on matched filtering and top-down grouping for handwritten documents. In: 11th International Workshop on Document Analysis Systems, pp. 365–369 (2014)

65.

Tarride, S., Lemaitre, A., Coüasnon, B., Tardivel, S.: Signature detection as a way to recognise historical parish register structure. In: 5th International Workshop on Historical Document Imaging and Processing, pp. 54–59 (2019)

66.

Wei, H., Baechler, M., Slimane, F., Ingold, R.: Evaluation of SVM, MLP and GMM classifiers for layout analysis of historical documents. In: 12th International Conference on Document Analysis and Recognition, pp. 1220–1224 (2013)

67.

Wei, H., Chen, K., Ingold, R., Liwicki, M.: Hybrid feature selection for historical document layout analysis. In: 14th International Conference on Frontiers in Handwriting Recognition, pp. 87–92 (2014)

68.

Weliwitage, C., Harvey, A.L., Jennings, A.B.: Handwritten document offline text line segmentation. In: Digital Image Computing: Techniques and Applications, pp. 27–27 (2005)

69.

Yi, X., Gao, L., Liao, Y., Zhang, X., Liu, R., Jiang, Z.: CNN based page object detection in document images. In: 14th International Conference on Document Analysis and Recognition, vol. 01, pp. 230–235 (2017)

70.

Yin, F., Liu, C.: A variational Bayes method for handwritten text line segmentation. In: 10th International Conference on Document Analysis and Recognition, pp. 436–440 (2009)

71.

Yin, F., Liu, C.L.: Handwritten chinese text line segmentation by clustering with distance metric learning. Pattern Recognit. 42(12), 3146–3157 (2009)CrossRef

72.

Ziaratban, M., Faez, K.: An adaptive script-independent block-based text line extraction. In: 20th International Conference on Pattern Recognition, pp. 249–252 (2010)

Titel: Combination of deep neural networks and logical rules for record segmentation in historical handwritten registers using few examples
verfasst von: Solène Tarride
Aurélie Lemaitre
Bertrand Coüasnon
Sophie Tardivel
Publikationsdatum: 03.03.2021
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 1-2/2021
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI: https://doi.org/10.1007/s10032-021-00362-8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1-2/2021

Text recognition for Vietnamese identity card based on deep features network

Cross-modal photo-caricature face recognition based on dynamic multi-task learning

Knowledge-driven description synthesis for floor plan interpretation

CNN-based segmentation of speech balloons and narrative text boxes from comic book page images

Deep learning for graphics recognition: document understanding and beyond

Persian handwritten digit, character and word recognition using deep learning

Premium Partner