nach oben

International Journal on Document Analysis and Recognition (IJDAR)

Erschienen in:

29.08.2022 | Special Issue Paper

Textline alignment on the image domain

verfasst von: Boraq Madi, Ahmad Droby, Jihad El-Sana

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 4/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Editing and publishing a historical manuscript involves a research phase to recover the original manuscript and reconstruct the transmission of its text based on the relations between its surviving copies. Manuscript alignment, which aims to locate the shared and the different text among a set of copies of the same manuscript, is essential for this phase. In this paper, we present an alignment algorithm for historical handwritten documents that works directly on the image domain due to the absence of an accurate handwritten text recognition (HTR) system for handwritten historical documents and the necessity to visualize the original manuscripts in parallel to examine features beyond the transcribed text. Our approach extracts subwords, estimates the similarity among these subwords, and establishes an alignment among them. We extract subwords from textlines images and convert them into sequences of subword images. It estimates the similarity between two subwords using a Siamese network model and applies Longest Common Subsequence (LCS) to establish the alignment between two image sequences. We have implemented our algorithm, trained the Siamese model, and evaluate its performance using textline images from historical documents. Our algorithm outperformed the state-of-the-art by large margins. Unlike the state-of-the-art, the framework builds the alignment from scratch without requiring any prior knowledge concern subwords boundaries. In addition, we build a new dataset for textline alignment for historical documents, which include ten pairs of pages taken from two copies of two Arabic manuscripts and annotated at the subword level.

Vorheriger Artikel Benchmarking online sequence-to-sequence and character-based handwriting recognition from IMU-enhanced pens

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Bochkovskiy, A., Wang, C., Liao, H.M.: Yolov4: Optimal speed and accuracy of object detection. CoRR (2020) arXiv:2004.10934

Gao, M., Skolnick, J.: A novel sequence alignment algorithm based on deep learning of the protein folding code. Bioinformatics 37(4), 490–496 (2021)CrossRef

Jourabloo, A., Liu, X.: Pose-invariant face alignment via CNN-based dense 3d model fitting. Int. J. Comput. Vis. 124(2), 187–203 (2017)MathSciNetCrossRef

Kowalski, M., Naruniec, J., Trzcinski, T.: Deep alignment network: A convolutional neural network for robust face alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 88–97 (2017)

Wang, J., Fang, Z., Zhao, H.: Alignnet: A unifying approach to audio-visual alignment. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3309–3317 (2020)

Choi, H., Cho, K., Bengio, Y.: Fine-grained attention mechanism for neural machine translation. Neurocomputing 284, 171–176 (2018)CrossRef

Al Azawi, M., Liwicki, M., Breuel, T.M.: Wfst-based ground truth alignment for difficult historical documents with text modification and layout variations. In: Document Recognition and Retrieval XX, vol. 8658, p. 865818 (2013). International Society for Optics and Photonics

Romero-Gómez, V., Toselli, A.H., Bosch, V., Sánchez, J.A., Vidal, E.: Automatic alignment of handwritten images and transcripts for training handwritten text recognition systems. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 328–333 (2018). IEEE

Tomai, C.I., Zhang, B., Govindaraju, V.: Transcript mapping for historic handwritten document images. In: Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition, pp. 413–418 (2002). IEEE

10.

Huang, C., Srihari, S.N.: Mapping transcripts to handwritten text. In: 10th International Workshop on Frontiers in Handwriting Recognition (2006). Suvisoft

11.

Fischer, A., Indermuhle, E., Frinken, V., Bunke, H.: Hmm-based alignment of inaccurate transcriptions for historical documents. In: 2011 International Conference on Document Analysis and Recognition, pp. 53–57 (2011). https://doi.org/10.1109/ICDAR.2011.20

12.

Kornfield, E.M., Manmatha, R., Allan, J.: Text alignment with handwritten documents. In: Proceedingsof the 1st International Workshop on Document Image Analysis for Libraries, 2004, pp. 195–209 (2004). IEEE

13.

Kornfield, E.M., Manmatha, R., Allan, J.: Further explorations in text alignment with handwritten documents. Int. J. Document Anal. Recognit. (IJDAR) 10(1), 39–52 (2007)CrossRef

14.

Lorigo, L.M., Govindaraju, V.: Transcript mapping for handwritten arabic documents. In: Document Recognition and Retrieval XIV, vol. 6500, p. 65000 (2007). International Society for Optics and Photonics

15.

Hassner, T., Wolf, L., Dershowitz, N.: Ocr-free transcript alignment. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1310–1314 (2013). IEEE

16.

Rabaev, I., Cohen, R., El-Sana, J., Kedem, K.: Aligning transcript of historical documents using dynamic programming. In: Document Recognition and Retrieval XXII, vol. 9402, p. 94020 (2015). International Society for Optics and Photonics

17.

Ezra, D.S.B., Brown-DeVost, B., Dershowitz, N., Pechorin, A., Kiessling, B.: Transcription alignment for highly fragmentary historical manuscripts: The dead sea scrolls. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 361–366 (2020). IEEE

18.

Cohen, R., Rabaev, I., El-Sana, J., Kedem, K., Dinstein, I.: Aligning transcript of historical documents using energy minimization. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 266–270 (2015). IEEE

19.

Toselli, A.H., Romero, V., Vidal, E.: Viterbi based alignment between text images and their transcripts. In: Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007)., pp. 9–16 (2007)

20.

Indermühle, E., Liwicki, M., Bunke, H.: Combining alignment results for historical handwritten document analysis. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 1186–1190 (2009). IEEE

21.

Fischer, A., Frinken, V., Fornés, A., Bunke, H.: Transcription alignment of latin manuscripts using hidden markov models. In: Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, pp. 29–36 (2011)

22.

Zinger, S., Nerbonne, J., Schomaker, L.: Text-image alignment for historical handwritten documents. In: Document Recognition and Retrieval XVI, vol. 7247, p. 724703 (2009). International Society for Optics and Photonics

23.

Stamatopoulos, N., Louloudis, G., Gatos, B.: Efficient transcript mapping to ease the creation of document image segmentation ground truth with text-image alignment. In: 2010 12th International Conference on Frontiers in Handwriting Recognition, pp. 226–231 (2010). IEEE

24.

Ziran, Z., Pic, X., Innocenti, S.U., Mugnai, D., Marinai, S.: Text alignment in early printed books combining deep learning and dynamic programming. Pattern Recognit. Lett. 133, 109–115 (2020)CrossRef

25.

Torras, P., Souibgui, M.A., Chen, J., Fornés, A.: A transcription is all you need: Learning to align through attention. In: International Conference on Document Analysis and Recognition, pp. 141–146 (2021). Springer

26.

Asi, A., Rabaev, I., Kedem, K., El-Sana, J.: User-assisted alignment of arabic historical manuscripts. In: Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, pp. 22–28 (2011)

27.

Kassis, M., Nassour, J., El-Sana, J.: Alignment of historical handwritten manuscripts using siamese neural network. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 293–298 (2017). IEEE

28.

Kassis, M., Abdalhaleem, A., Droby, A., Alaasam, R., El-Sana, J.: Vml-hd: The historical arabic documents dataset for recognition systems. In: 1st International Workshop on Arabic Script Analysis and Recognition (2017). IEEE

29.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

30.

Tian, Z., Huang, W., Tong, H., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network 9912, 56–72 (2016). https://doi.org/10.1007/978-3-319-46484-8_4

31.

Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11474–11481 (2020)

32.

Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., Shao, S.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)

33.

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

Titel: Textline alignment on the image domain
verfasst von: Boraq Madi
Ahmad Droby
Jihad El-Sana
Publikationsdatum: 29.08.2022
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 4/2022
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI: https://doi.org/10.1007/s10032-022-00408-5

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2022

A holistic approach for image-to-graph: application to optical music recognition

A novel holistic unconstrained handwritten urdu recognition system using convolutional neural networks

BCBId: first Bangla comic dataset and its applications

Advances in handwriting recognition

Combination of explicit segmentation with Seq2Seq recognition for fine analysis of children handwriting

A survey of historical document image datasets

Premium Partner