nach oben

International Journal on Document Analysis and Recognition (IJDAR)

Erschienen in:

22.09.2021 | Original Paper

\(\hbox {TG}^2\): text-guided transformer GAN for restoring document readability and perceived quality

verfasst von: Oldřich Kodym, Michal Hradiš

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Most image enhancement methods focused on restoration of digitized textual documents are limited to cases where the text information is still preserved in the input image, which may often not be the case. In this work, we propose a novel generative document restoration method which allows conditioning the restoration on a guiding signal in the form of target text transcription and which does not need paired high- and low-quality images for training. We introduce a neural network architecture with an implicit text-to-image alignment module. We demonstrate good results on inpainting, debinarization and deblurring tasks, and we show that the trained models can be used to manually alter text in document images. A user study shows that that human observers confuse the outputs of the proposed enhancement method with reference high-quality images in as many as 30% of cases.

Vorheriger Artikel TableSegNet: a fully convolutional network for table detection and segmentation in document images

Nächster Artikel MRZ code extraction from visa and passport documents using convolutional neural networks

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

The demonstration tool with the trained newspaper restoration and inpainting models along with image examples is publicly available at https://github.com/DCGM/pero-enhance. The repository also includes training scripts and links to training data.

Bal, G., Agam, G., Frieder, O., Frieder, G.: Interactive degraded document enhancement and ground truth generation. In: Yanikoglu BA, Berkner K (eds) Document Recognition and Retrieval XV, SPIE. (2008). https://doi.org/10.1117/12.767203

Chen, X., He, X., Yang, J., Wu, Q.: An effective document image deblurring algorithm. In: CVPR 2011. IEEE. (2011). https://doi.org/10.1109/cvpr.2011.5995568

Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016). https://doi.org/10.1109/tpami.2015.2439281CrossRef

Fang, X., Zhou, Q., Shen, J., Jacquemin, C., Shao, L.: Text image deblurring using kernel sparsity prior. IEEE Trans. Cybern. 50(3), 997–1008 (2018). https://doi.org/10.1109/tcyb.2018.2876511CrossRef

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds) Advances in Neural Information Processing Systems 27, pp. 2672–2680. Curran Associates Inc., http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf (2014)

Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification. In: Proceedings of the 23rd international conference on Machine learning—ICML 2006. ACM Press (2006). https://doi.org/10.1145/1143844.1143891

Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein gans. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds) Advances in Neural Information Processing Systems 30, pp. 5767–5777. Curran Associates Inc. (2017)

He, S., Schomaker, L.: DeepOtsu: Document enhancement and binarization using iterative deep learning. Pattern Recogn. 91, 379–390 (2019). https://doi.org/10.1016/j.patcog.2019.01.025CrossRef

Hradiš, M., Kotera, J., Zemčík, P., Šroubek, F.: Convolutional neural networks for direct text deblurring. In: Presented at the (2015). In: Proceedings of the British Machine Vision Conference 2015, British Machine Vision Association https://doi.org/10.5244/c.29.6

10.

Hu, X., Naiel, M.A., Wong, A., Lamm, M., Fieguth, P.: Runet: A robust UNET architecture for image super-resolution. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2019)

11.

Jiao, J., Sun, J., Satoshi, N.: A convolutional neural network based two-stage document deblurring. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE (2017). https://doi.org/10.1109/icdar.2017.120

12.

Kahle, P., Colutto, S., Hackl, G., Muhlberger, G.: Transkribus—a service platform for transcription, recognition and retrieval of historical documents. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE (2017). https://doi.org/10.1109/icdar.2017.307

13.

Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CoRR arxiv:1812.04948 (2018)

14.

Kinoshita, K., Delcroix, M., Ogawa, A., Nakatani, T.: Text-informed speech enhancement with deep neural networks (2015)

15.

Kiss, M., Hradis, M., Kodym, O.: Brno mobile OCR dataset. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2019). https://doi.org/10.1109/icdar.2019.00218

16.

Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D., Matas, J.: DeblurGAN: Blind motion deblurring using conditional adversarial networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2018). https://doi.org/10.1109/cvpr.2018.00854

17.

Lahiri, A., Jain, A., Biswas, P.K., Mitra, P.: Improving consistency and correctness of sequence inpainting using semantically guided generative adversarial network. arXiv:1711.06106 (2017)

18.

Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., Shi, W.: Photo-realistic single image super-resolution using a generative adversarial network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2017). https://doi.org/10.1109/cvpr.2017.19

19.

Lehtinen, J., Munkberg, J., Hasselgren, J., Laine, S., Karras, T., Aittala, M., Aila, T.: Noise2Noise: Learning image restoration without clean data. In: Dy, J., Krause, A. (eds) Proceedings of the 35th International Conference on Machine Learning, PMLR, Stockholmsmässan, Stockholm Sweden, Proceedings of Machine Learning Research, vol. 80, pp. 2965–2974. http://proceedings.mlr.press/v80/lehtinen18a.html (2018)

20.

Leung, C.C., Chan, K.S., Chan, H.M., Tsui, W.K.: A new approach for image enhancement applied to low-contrast-low-illumination IC and document images. Pattern Recogn. Lett. 26(6), 769–778 (2005). https://doi.org/10.1016/j.patrec.2004.09.032CrossRef

21.

Liao, C.F., Tsao, Y., Lu, X., Kawai, H.: Incorporating symbolic sequential modeling for speech enhancement. Proc. Interspeech 2019, 2733–2737 (2019). https://doi.org/10.21437/Interspeech.2019-1777CrossRef

22.

Lu, D., Huang, X., Sui, L.: Binarization of degraded document images based on contrast enhancement. Int. J. Document Anal. Recognit. (IJDAR) 21(1–2), 123–135 (2018). https://doi.org/10.1007/s10032-018-0299-9CrossRef

23.

Madam, N.T., Kumar, S., Rajagopalan, A.N.: Unsupervised class-specific deblurring. In: Computer Vision—ECCV 2018, pp. 358–374. Springer International Publishing (2018). https://doi.org/10.1007/978-3-030-01249-6_22

24.

Mujumdar, S., Gupta, N., Jain, A., Burdick, D.: Simultaneous optimisation of image quality improvement and text content extraction from scanned documents. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2019). https://doi.org/10.1109/icdar.2019.00189

25.

Murray, R.L.: Toward a metadata standard for digitized historical newspapers. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries—JCDL 2005. ACM Press (2005). https://doi.org/10.1145/1065385.1065459

26.

Mustafa, W.A., Yazid, H.: Illumination and contrast correction strategy using bilateral filtering and binarization comparison (2016)

27.

Pan, J., Hu, Z., Su, Z., Yang, M.H.: Deblurring text images via l0-regularized intensity and gradient prior. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2014). https://doi.org/10.1109/cvpr.2014.371

28.

Pandey, R.K., Ramakrishnan, A.G.: Improving the perceptual quality of document images using deep neural network, In: Advances in Neural Networks—ISNN 2019. Springer International Publishing, pp. 448–459 (2019). https://doi.org/10.1007/978-3-030-22808-8_44

29.

Papadopoulos, C., Pletschacher, S., Clausner, C., Antonacopoulos, A.: The impact dataset of historical document images, pp. 123–130 (2013). https://doi.org/10.1145/2501115.2501130

30.

Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.: Context encoders: feature learning by inpainting (2016)

31.

Ramakrishnan, S., Pachori, S., Gangopadhyay, A., Raman, S.: Deep generative filter for motion deblurring. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). IEEE (2017). https://doi.org/10.1109/iccvw.2017.353

32.

Ronneberger, O., PFischer, Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI), Springer, LNCS, vol 9351, pp. 234–241. http://lmb.informatik.uni-freiburg.de/Publications/2015/RFB15a, arXiv:1505.04597 [cs.CV] (2015)

33.

Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)CrossRef

34.

Sulaiman, Omar Nasrudin: Degraded historical document binarization: A review on issues, challenges, techniques, and future directions. J. Imaging 5(4), 48 (2019). https://doi.org/10.3390/jimaging5040048CrossRef

35.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds) Advances in Neural Information Processing Systems 30. Curran Associates Inc., pp. 5998–6008, http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf (2017)

36.

Wang, W., Xie, E., Sun, P., Wang, W., Tian, L., Shen, C., Luo, P.: Textsr: Content-aware text super-resolution guided by recognition. arXiv:1909.07113 (2019)

37.

Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.S.: Generative image inpainting with contextual attention. arXiv:1801.07892 (2018)

38.

Zhao, H., Gallo, O., Frosio, I., Kautz, J.: Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 3(1), 47–57 (2017). https://doi.org/10.1109/tci.2016.2644865CrossRef

39.

Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: The IEEE International Conference on Computer Vision (ICCV) (2017)

Titel: : text-guided transformer GAN for restoring document readability and perceived quality
verfasst von: Oldřich Kodym
Michal Hradiš
Publikationsdatum: 22.09.2021
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 1/2022
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI: https://doi.org/10.1007/s10032-021-00387-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2022

TableSegNet: a fully convolutional network for table detection and segmentation in document images

A novel normal to tangent line (NTL) algorithm for scale invariant feature extraction for Urdu OCR

MRZ code extraction from visa and passport documents using convolutional neural networks

An end-to-end network for irregular printed Mongolian recognition

Premium Partner