Skip to main content
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) 1/2022

22.09.2021 | Original Paper

\(\hbox {TG}^2\): text-guided transformer GAN for restoring document readability and perceived quality

verfasst von: Oldřich Kodym, Michal Hradiš

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Most image enhancement methods focused on restoration of digitized textual documents are limited to cases where the text information is still preserved in the input image, which may often not be the case. In this work, we propose a novel generative document restoration method which allows conditioning the restoration on a guiding signal in the form of target text transcription and which does not need paired high- and low-quality images for training. We introduce a neural network architecture with an implicit text-to-image alignment module. We demonstrate good results on inpainting, debinarization and deblurring tasks, and we show that the trained models can be used to manually alter text in document images. A user study shows that that human observers confuse the outputs of the proposed enhancement method with reference high-quality images in as many as 30% of cases.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The demonstration tool with the trained newspaper restoration and inpainting models along with image examples is publicly available at https://​github.​com/​DCGM/​pero-enhance. The repository also includes training scripts and links to training data.
 
Literatur
1.
5.
7.
Zurück zum Zitat Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein gans. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds) Advances in Neural Information Processing Systems 30, pp. 5767–5777. Curran Associates Inc. (2017) Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein gans. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds) Advances in Neural Information Processing Systems 30, pp. 5767–5777. Curran Associates Inc. (2017)
9.
Zurück zum Zitat Hradiš, M., Kotera, J., Zemčík, P., Šroubek, F.: Convolutional neural networks for direct text deblurring. In: Presented at the (2015). In: Proceedings of the British Machine Vision Conference 2015, British Machine Vision Association https://doi.org/10.5244/c.29.6 Hradiš, M., Kotera, J., Zemčík, P., Šroubek, F.: Convolutional neural networks for direct text deblurring. In: Presented at the (2015). In: Proceedings of the British Machine Vision Conference 2015, British Machine Vision Association https://​doi.​org/​10.​5244/​c.​29.​6
10.
Zurück zum Zitat Hu, X., Naiel, M.A., Wong, A., Lamm, M., Fieguth, P.: Runet: A robust UNET architecture for image super-resolution. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2019) Hu, X., Naiel, M.A., Wong, A., Lamm, M., Fieguth, P.: Runet: A robust UNET architecture for image super-resolution. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2019)
12.
Zurück zum Zitat Kahle, P., Colutto, S., Hackl, G., Muhlberger, G.: Transkribus—a service platform for transcription, recognition and retrieval of historical documents. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE (2017). https://doi.org/10.1109/icdar.2017.307 Kahle, P., Colutto, S., Hackl, G., Muhlberger, G.: Transkribus—a service platform for transcription, recognition and retrieval of historical documents. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE (2017). https://​doi.​org/​10.​1109/​icdar.​2017.​307
13.
Zurück zum Zitat Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CoRR arxiv:1812.04948 (2018) Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CoRR arxiv:​1812.​04948 (2018)
14.
Zurück zum Zitat Kinoshita, K., Delcroix, M., Ogawa, A., Nakatani, T.: Text-informed speech enhancement with deep neural networks (2015) Kinoshita, K., Delcroix, M., Ogawa, A., Nakatani, T.: Text-informed speech enhancement with deep neural networks (2015)
16.
17.
Zurück zum Zitat Lahiri, A., Jain, A., Biswas, P.K., Mitra, P.: Improving consistency and correctness of sequence inpainting using semantically guided generative adversarial network. arXiv:1711.06106 (2017) Lahiri, A., Jain, A., Biswas, P.K., Mitra, P.: Improving consistency and correctness of sequence inpainting using semantically guided generative adversarial network. arXiv:​1711.​06106 (2017)
18.
Zurück zum Zitat Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., Shi, W.: Photo-realistic single image super-resolution using a generative adversarial network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2017). https://doi.org/10.1109/cvpr.2017.19 Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., Shi, W.: Photo-realistic single image super-resolution using a generative adversarial network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2017). https://​doi.​org/​10.​1109/​cvpr.​2017.​19
19.
Zurück zum Zitat Lehtinen, J., Munkberg, J., Hasselgren, J., Laine, S., Karras, T., Aittala, M., Aila, T.: Noise2Noise: Learning image restoration without clean data. In: Dy, J., Krause, A. (eds) Proceedings of the 35th International Conference on Machine Learning, PMLR, Stockholmsmässan, Stockholm Sweden, Proceedings of Machine Learning Research, vol. 80, pp. 2965–2974. http://proceedings.mlr.press/v80/lehtinen18a.html (2018) Lehtinen, J., Munkberg, J., Hasselgren, J., Laine, S., Karras, T., Aittala, M., Aila, T.: Noise2Noise: Learning image restoration without clean data. In: Dy, J., Krause, A. (eds) Proceedings of the 35th International Conference on Machine Learning, PMLR, Stockholmsmässan, Stockholm Sweden, Proceedings of Machine Learning Research, vol. 80, pp. 2965–2974. http://​proceedings.​mlr.​press/​v80/​lehtinen18a.​html (2018)
24.
Zurück zum Zitat Mujumdar, S., Gupta, N., Jain, A., Burdick, D.: Simultaneous optimisation of image quality improvement and text content extraction from scanned documents. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2019). https://doi.org/10.1109/icdar.2019.00189 Mujumdar, S., Gupta, N., Jain, A., Burdick, D.: Simultaneous optimisation of image quality improvement and text content extraction from scanned documents. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2019). https://​doi.​org/​10.​1109/​icdar.​2019.​00189
26.
Zurück zum Zitat Mustafa, W.A., Yazid, H.: Illumination and contrast correction strategy using bilateral filtering and binarization comparison (2016) Mustafa, W.A., Yazid, H.: Illumination and contrast correction strategy using bilateral filtering and binarization comparison (2016)
30.
Zurück zum Zitat Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.: Context encoders: feature learning by inpainting (2016) Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.: Context encoders: feature learning by inpainting (2016)
33.
Zurück zum Zitat Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)CrossRef Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)CrossRef
35.
Zurück zum Zitat Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds) Advances in Neural Information Processing Systems 30. Curran Associates Inc., pp. 5998–6008, http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf (2017) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds) Advances in Neural Information Processing Systems 30. Curran Associates Inc., pp. 5998–6008, http://​papers.​nips.​cc/​paper/​7181-attention-is-all-you-need.​pdf (2017)
36.
Zurück zum Zitat Wang, W., Xie, E., Sun, P., Wang, W., Tian, L., Shen, C., Luo, P.: Textsr: Content-aware text super-resolution guided by recognition. arXiv:1909.07113 (2019) Wang, W., Xie, E., Sun, P., Wang, W., Tian, L., Shen, C., Luo, P.: Textsr: Content-aware text super-resolution guided by recognition. arXiv:​1909.​07113 (2019)
37.
Zurück zum Zitat Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.S.: Generative image inpainting with contextual attention. arXiv:1801.07892 (2018) Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.S.: Generative image inpainting with contextual attention. arXiv:​1801.​07892 (2018)
39.
Zurück zum Zitat Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: The IEEE International Conference on Computer Vision (ICCV) (2017) Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: The IEEE International Conference on Computer Vision (ICCV) (2017)
Metadaten
Titel
: text-guided transformer GAN for restoring document readability and perceived quality
verfasst von
Oldřich Kodym
Michal Hradiš
Publikationsdatum
22.09.2021
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 1/2022
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-021-00387-z

Weitere Artikel der Ausgabe 1/2022

International Journal on Document Analysis and Recognition (IJDAR) 1/2022 Zur Ausgabe

Premium Partner