Skip to main content
Top
Published in: International Journal on Document Analysis and Recognition (IJDAR) 4/2022

29-08-2022 | Special Issue Paper

Textline alignment on the image domain

Authors: Boraq Madi, Ahmad Droby, Jihad El-Sana

Published in: International Journal on Document Analysis and Recognition (IJDAR) | Issue 4/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Editing and publishing a historical manuscript involves a research phase to recover the original manuscript and reconstruct the transmission of its text based on the relations between its surviving copies. Manuscript alignment, which aims to locate the shared and the different text among a set of copies of the same manuscript, is essential for this phase. In this paper, we present an alignment algorithm for historical handwritten documents that works directly on the image domain due to the absence of an accurate handwritten text recognition (HTR) system for handwritten historical documents and the necessity to visualize the original manuscripts in parallel to examine features beyond the transcribed text. Our approach extracts subwords, estimates the similarity among these subwords, and establishes an alignment among them. We extract subwords from textlines images and convert them into sequences of subword images. It estimates the similarity between two subwords using a Siamese network model and applies Longest Common Subsequence (LCS) to establish the alignment between two image sequences. We have implemented our algorithm, trained the Siamese model, and evaluate its performance using textline images from historical documents. Our algorithm outperformed the state-of-the-art by large margins. Unlike the state-of-the-art, the framework builds the alignment from scratch without requiring any prior knowledge concern subwords boundaries. In addition, we build a new dataset for textline alignment for historical documents, which include ten pairs of pages taken from two copies of two Arabic manuscripts and annotated at the subword level.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Gao, M., Skolnick, J.: A novel sequence alignment algorithm based on deep learning of the protein folding code. Bioinformatics 37(4), 490–496 (2021)CrossRef Gao, M., Skolnick, J.: A novel sequence alignment algorithm based on deep learning of the protein folding code. Bioinformatics 37(4), 490–496 (2021)CrossRef
3.
go back to reference Jourabloo, A., Liu, X.: Pose-invariant face alignment via CNN-based dense 3d model fitting. Int. J. Comput. Vis. 124(2), 187–203 (2017)MathSciNetCrossRef Jourabloo, A., Liu, X.: Pose-invariant face alignment via CNN-based dense 3d model fitting. Int. J. Comput. Vis. 124(2), 187–203 (2017)MathSciNetCrossRef
4.
go back to reference Kowalski, M., Naruniec, J., Trzcinski, T.: Deep alignment network: A convolutional neural network for robust face alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 88–97 (2017) Kowalski, M., Naruniec, J., Trzcinski, T.: Deep alignment network: A convolutional neural network for robust face alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 88–97 (2017)
5.
go back to reference Wang, J., Fang, Z., Zhao, H.: Alignnet: A unifying approach to audio-visual alignment. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3309–3317 (2020) Wang, J., Fang, Z., Zhao, H.: Alignnet: A unifying approach to audio-visual alignment. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3309–3317 (2020)
6.
go back to reference Choi, H., Cho, K., Bengio, Y.: Fine-grained attention mechanism for neural machine translation. Neurocomputing 284, 171–176 (2018)CrossRef Choi, H., Cho, K., Bengio, Y.: Fine-grained attention mechanism for neural machine translation. Neurocomputing 284, 171–176 (2018)CrossRef
7.
go back to reference Al Azawi, M., Liwicki, M., Breuel, T.M.: Wfst-based ground truth alignment for difficult historical documents with text modification and layout variations. In: Document Recognition and Retrieval XX, vol. 8658, p. 865818 (2013). International Society for Optics and Photonics Al Azawi, M., Liwicki, M., Breuel, T.M.: Wfst-based ground truth alignment for difficult historical documents with text modification and layout variations. In: Document Recognition and Retrieval XX, vol. 8658, p. 865818 (2013). International Society for Optics and Photonics
8.
go back to reference Romero-Gómez, V., Toselli, A.H., Bosch, V., Sánchez, J.A., Vidal, E.: Automatic alignment of handwritten images and transcripts for training handwritten text recognition systems. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 328–333 (2018). IEEE Romero-Gómez, V., Toselli, A.H., Bosch, V., Sánchez, J.A., Vidal, E.: Automatic alignment of handwritten images and transcripts for training handwritten text recognition systems. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 328–333 (2018). IEEE
9.
go back to reference Tomai, C.I., Zhang, B., Govindaraju, V.: Transcript mapping for historic handwritten document images. In: Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition, pp. 413–418 (2002). IEEE Tomai, C.I., Zhang, B., Govindaraju, V.: Transcript mapping for historic handwritten document images. In: Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition, pp. 413–418 (2002). IEEE
10.
go back to reference Huang, C., Srihari, S.N.: Mapping transcripts to handwritten text. In: 10th International Workshop on Frontiers in Handwriting Recognition (2006). Suvisoft Huang, C., Srihari, S.N.: Mapping transcripts to handwritten text. In: 10th International Workshop on Frontiers in Handwriting Recognition (2006). Suvisoft
12.
go back to reference Kornfield, E.M., Manmatha, R., Allan, J.: Text alignment with handwritten documents. In: Proceedingsof the 1st International Workshop on Document Image Analysis for Libraries, 2004, pp. 195–209 (2004). IEEE Kornfield, E.M., Manmatha, R., Allan, J.: Text alignment with handwritten documents. In: Proceedingsof the 1st International Workshop on Document Image Analysis for Libraries, 2004, pp. 195–209 (2004). IEEE
13.
go back to reference Kornfield, E.M., Manmatha, R., Allan, J.: Further explorations in text alignment with handwritten documents. Int. J. Document Anal. Recognit. (IJDAR) 10(1), 39–52 (2007)CrossRef Kornfield, E.M., Manmatha, R., Allan, J.: Further explorations in text alignment with handwritten documents. Int. J. Document Anal. Recognit. (IJDAR) 10(1), 39–52 (2007)CrossRef
14.
go back to reference Lorigo, L.M., Govindaraju, V.: Transcript mapping for handwritten arabic documents. In: Document Recognition and Retrieval XIV, vol. 6500, p. 65000 (2007). International Society for Optics and Photonics Lorigo, L.M., Govindaraju, V.: Transcript mapping for handwritten arabic documents. In: Document Recognition and Retrieval XIV, vol. 6500, p. 65000 (2007). International Society for Optics and Photonics
15.
go back to reference Hassner, T., Wolf, L., Dershowitz, N.: Ocr-free transcript alignment. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1310–1314 (2013). IEEE Hassner, T., Wolf, L., Dershowitz, N.: Ocr-free transcript alignment. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1310–1314 (2013). IEEE
16.
go back to reference Rabaev, I., Cohen, R., El-Sana, J., Kedem, K.: Aligning transcript of historical documents using dynamic programming. In: Document Recognition and Retrieval XXII, vol. 9402, p. 94020 (2015). International Society for Optics and Photonics Rabaev, I., Cohen, R., El-Sana, J., Kedem, K.: Aligning transcript of historical documents using dynamic programming. In: Document Recognition and Retrieval XXII, vol. 9402, p. 94020 (2015). International Society for Optics and Photonics
17.
go back to reference Ezra, D.S.B., Brown-DeVost, B., Dershowitz, N., Pechorin, A., Kiessling, B.: Transcription alignment for highly fragmentary historical manuscripts: The dead sea scrolls. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 361–366 (2020). IEEE Ezra, D.S.B., Brown-DeVost, B., Dershowitz, N., Pechorin, A., Kiessling, B.: Transcription alignment for highly fragmentary historical manuscripts: The dead sea scrolls. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 361–366 (2020). IEEE
18.
go back to reference Cohen, R., Rabaev, I., El-Sana, J., Kedem, K., Dinstein, I.: Aligning transcript of historical documents using energy minimization. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 266–270 (2015). IEEE Cohen, R., Rabaev, I., El-Sana, J., Kedem, K., Dinstein, I.: Aligning transcript of historical documents using energy minimization. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 266–270 (2015). IEEE
19.
go back to reference Toselli, A.H., Romero, V., Vidal, E.: Viterbi based alignment between text images and their transcripts. In: Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007)., pp. 9–16 (2007) Toselli, A.H., Romero, V., Vidal, E.: Viterbi based alignment between text images and their transcripts. In: Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007)., pp. 9–16 (2007)
20.
go back to reference Indermühle, E., Liwicki, M., Bunke, H.: Combining alignment results for historical handwritten document analysis. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 1186–1190 (2009). IEEE Indermühle, E., Liwicki, M., Bunke, H.: Combining alignment results for historical handwritten document analysis. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 1186–1190 (2009). IEEE
21.
go back to reference Fischer, A., Frinken, V., Fornés, A., Bunke, H.: Transcription alignment of latin manuscripts using hidden markov models. In: Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, pp. 29–36 (2011) Fischer, A., Frinken, V., Fornés, A., Bunke, H.: Transcription alignment of latin manuscripts using hidden markov models. In: Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, pp. 29–36 (2011)
22.
go back to reference Zinger, S., Nerbonne, J., Schomaker, L.: Text-image alignment for historical handwritten documents. In: Document Recognition and Retrieval XVI, vol. 7247, p. 724703 (2009). International Society for Optics and Photonics Zinger, S., Nerbonne, J., Schomaker, L.: Text-image alignment for historical handwritten documents. In: Document Recognition and Retrieval XVI, vol. 7247, p. 724703 (2009). International Society for Optics and Photonics
23.
go back to reference Stamatopoulos, N., Louloudis, G., Gatos, B.: Efficient transcript mapping to ease the creation of document image segmentation ground truth with text-image alignment. In: 2010 12th International Conference on Frontiers in Handwriting Recognition, pp. 226–231 (2010). IEEE Stamatopoulos, N., Louloudis, G., Gatos, B.: Efficient transcript mapping to ease the creation of document image segmentation ground truth with text-image alignment. In: 2010 12th International Conference on Frontiers in Handwriting Recognition, pp. 226–231 (2010). IEEE
24.
go back to reference Ziran, Z., Pic, X., Innocenti, S.U., Mugnai, D., Marinai, S.: Text alignment in early printed books combining deep learning and dynamic programming. Pattern Recognit. Lett. 133, 109–115 (2020)CrossRef Ziran, Z., Pic, X., Innocenti, S.U., Mugnai, D., Marinai, S.: Text alignment in early printed books combining deep learning and dynamic programming. Pattern Recognit. Lett. 133, 109–115 (2020)CrossRef
25.
go back to reference Torras, P., Souibgui, M.A., Chen, J., Fornés, A.: A transcription is all you need: Learning to align through attention. In: International Conference on Document Analysis and Recognition, pp. 141–146 (2021). Springer Torras, P., Souibgui, M.A., Chen, J., Fornés, A.: A transcription is all you need: Learning to align through attention. In: International Conference on Document Analysis and Recognition, pp. 141–146 (2021). Springer
26.
go back to reference Asi, A., Rabaev, I., Kedem, K., El-Sana, J.: User-assisted alignment of arabic historical manuscripts. In: Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, pp. 22–28 (2011) Asi, A., Rabaev, I., Kedem, K., El-Sana, J.: User-assisted alignment of arabic historical manuscripts. In: Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, pp. 22–28 (2011)
27.
go back to reference Kassis, M., Nassour, J., El-Sana, J.: Alignment of historical handwritten manuscripts using siamese neural network. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 293–298 (2017). IEEE Kassis, M., Nassour, J., El-Sana, J.: Alignment of historical handwritten manuscripts using siamese neural network. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 293–298 (2017). IEEE
28.
go back to reference Kassis, M., Abdalhaleem, A., Droby, A., Alaasam, R., El-Sana, J.: Vml-hd: The historical arabic documents dataset for recognition systems. In: 1st International Workshop on Arabic Script Analysis and Recognition (2017). IEEE Kassis, M., Abdalhaleem, A., Droby, A., Alaasam, R., El-Sana, J.: Vml-hd: The historical arabic documents dataset for recognition systems. In: 1st International Workshop on Arabic Script Analysis and Recognition (2017). IEEE
29.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
31.
go back to reference Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11474–11481 (2020) Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11474–11481 (2020)
32.
go back to reference Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., Shao, S.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019) Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., Shao, S.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)
33.
go back to reference Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556 (2014)
Metadata
Title
Textline alignment on the image domain
Authors
Boraq Madi
Ahmad Droby
Jihad El-Sana
Publication date
29-08-2022
Publisher
Springer Berlin Heidelberg
Published in
International Journal on Document Analysis and Recognition (IJDAR) / Issue 4/2022
Print ISSN: 1433-2833
Electronic ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-022-00408-5

Other articles of this Issue 4/2022

International Journal on Document Analysis and Recognition (IJDAR) 4/2022 Go to the issue

Premium Partner