nach oben

Erschienen in:

2019 | OriginalPaper | Buchkapitel

Learning to Clean: A GAN Perspective

verfasst von : Monika Sharma, Abhishek Verma, Lovekesh Vig

Erschienen in: Computer Vision – ACCV 2018 Workshops

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In the big data era, the impetus to digitize the vast reservoirs of data trapped in unstructured scanned documents such as invoices, bank documents, courier receipts and contracts has gained fresh momentum. The scanning process often results in the introduction of artifacts such as salt-and-pepper/background noise, blur due to camera motion or shake, watermarkings, coffee stains, wrinkles, or faded text. These artifacts pose many readability challenges to current text recognition algorithms and significantly degrade their performance. Existing learning based denoising techniques require a dataset comprising of noisy documents paired with cleaned versions of the same document. In such scenarios, a model can be trained to generate clean documents from noisy versions. However, very often in the real world such a paired dataset is not available, and all we have for training our denoising model are unpaired sets of noisy and clean images. This paper explores the use of Generative Adversarial Networks (GAN) to generate denoised versions of the noisy documents. In particular, where paired information is available, we formulate the problem as an image-to-image translation task i.e, translating a document from noisy domain (i.e., background noise, blurred, faded, watermarked) to a target clean document using Generative Adversarial Networks (GAN). However, in the absence of paired images for training, we employed CycleGAN which is known to learn a mapping between the distributions of the noisy images to the denoised images using unpaired data to achieve image-to-image translation for cleaning the noisy documents. We compare the performance of CycleGAN for document cleaning tasks using unpaired images with a Conditional GAN trained on paired data from the same dataset. Experiments were performed on a public document dataset on which different types of noise were artificially induced, results demonstrate that CycleGAN learns a more robust mapping from the space of noisy to clean documents.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Reading Industrial Inspection Sheets by Inferring Visual Relations

Nächstes Kapitel Deep Reader: Information Extraction from Document Images via Relation Extraction and Natural Language

Peak Signal-to-Noise Ratio: http://www.ni.com/white-paper/13306/en/.

Google news dataset: EMNLP 2011 Sixth Workshop on Statistical Machine Translation (2011). http://www.statmt.org/wmt11/translation-task.html#download

Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. CoRR abs/1606.03657 (2016). http://arxiv.org/abs/1606.03657

Farahmand, A., Sarrafzadeh, A., Shanbehzadeh, J.: Document image noises and removal methods. In: Proceedings of the International MultiConference of Engineers and Computer Scientists 2013, vol. 1 (2013). http://www.iaeng.org/publication/IMECS2013/IMECS2013_pp436-440.pdf

Frank, A.: UCI machine learning repository. University of California, School of information and computer science, Irvine, CA (2010). http://archive.ics.uci.edu/ml

Ganbold, G.: History document image background noise and removal methods. Int. J. Knowl. Content Dev. Technol. 5, 11 (2015). http://ijkcdt.net/xml/05531/05531.pdf

Goodfellow, I.J., et al.: Generative Adversarial Networks. ArXiv e-prints, June 2014

Hradiš, M., Kotera, J., Zemčík, P., Šroubek, F.: Convolutional neural networks for direct text deblurring. In: Proceedings of BMVC 2015. The British Machine Vision Association and Society for Pattern Recognition (2015). http://www.fit.vutbr.cz/research/view_pub.php?id=10922

Isola, P., Zhu, J., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. CoRR abs/1611.07004 (2016). http://arxiv.org/abs/1611.07004

Javed, S.T., Fasihi, M.M., Khan, A., Ashraf, U.: Background and punch-hole noise removal from handwritten urdu text. In: 2017 International Multi-topic Conference (INMIC), pp. 1–6, November 2017. https://doi.org/10.1109/INMIC.2017.8289451

10.

Jiao, J., Sun, J., Satoshi, N.: A convolutional neural network based two-stage document deblurring. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 703–707, November 2017. https://doi.org/10.1109/ICDAR.2017.120

11.

Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D., Matas, J.: DeblurGAN: Blind motion deblurring using conditional adversarial networks. CoRR abs/1711.07064 (2017). http://arxiv.org/abs/1711.07064

12.

Li, H., Zhang, Y., Zhang, H., Zhu, Y., Sun, J.: Blind image deblurring based on sparse prior of dictionary pair. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp. 3054–3057, November 2012

13.

Lin, D., Fu, K., Wang, Y., Xu, G., Sun, X.: MARTA GANs: unsupervised representation learning for remote sensing image classification. IEEE Geosci. Remote Sens. Lett. 14(11), 2092–2096 (2017). https://doi.org/10.1109/LGRS.2017.2752750CrossRef

14.

Liu, C., Szeliski, R., Kang, S.B., Zitnick, C.L., Freeman, W.T.: Automatic estimation and removal of noise from a single image. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 299-314 (2006). http://people.csail.mit.edu/celiu/denoise/denoise_pami.pdf

15.

Liu, R.W., Li, Y., Liu, Y., Duan, J., Xu, T., Liu, J.: Single-image blind deblurring with hybrid sparsity regularization. In: 2017 20th International Conference on Information Fusion (Fusion), pp. 1–8, July 2017. https://doi.org/10.23919/ICIF.2017.8009659

16.

Ljubenovic, M., Zhuang, L., Figueiredo, M.A.T.: Class-adapted blind deblurring of document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 721–726, November 2017. https://doi.org/10.1109/ICDAR.2017.123

17.

Mirza, M., Osindero, S.: Conditional generative adversarial nets. CoRR abs/1411.1784 (2014). http://arxiv.org/abs/1411.1784

18.

Nah, S., Kim, T.H., Lee, K.M.: Deep multi-scale convolutional neural network for dynamic scene deblurring. CoRR abs/1612.02177 (2016). http://arxiv.org/abs/1612.02177

19.

Peng, Y., Qi, J., Yuan, Y.: CM-GANs: Cross-modal generative adversarial networks for common representation learning. CoRR abs/1710.05106 (2017). http://arxiv.org/abs/1710.05106

20.

Qin, C., He, Z., Yao, H., Cao, F., Gao, L.: Visible watermark removal scheme based on reversible data hiding and image inpainting. Sig. Process. Image Commun. 60, 160–172 (2018). https://doi.org/10.1016/j.image.2017.10.003, http://www.sciencedirect.com/science/article/pii/S0923596517301868

21.

Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR abs/1511.06434 (2015). http://arxiv.org/abs/1511.06434

22.

Wang, L., Gao, C., Yang, L., Zhao, Y., Zuo, W., Meng, D.: PM-GANs: Discriminative representation learning for action recognition using partial-modalities. CoRR abs/1804.06248 (2018). http://arxiv.org/abs/1804.06248

23.

Xu, C., Lu, Y., Zhou, Y.: An automatic visible watermark removal technique using image inpainting algorithms. In: 2017 4th International Conference on Systems and Informatics (ICSAI), pp. 1152–1157, November 2017. https://doi.org/10.1109/ICSAI.2017.8248459

24.

Yao, Q., Kwok, J.T.: Efficient Learning with a Family of Nonconvex Regularizers by Redistributing Nonconvexity. ArXiv e-prints, June 2016

25.

Yi, Z., Zhang, H., Tan, P., Gong, M.: DualGAN: Unsupervised dual learning for image-to-image translation. CoRR abs/1704.02510 (2017). http://arxiv.org/abs/1704.02510

26.

Zhu, J., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. CoRR abs/1703.10593 (2017). http://arxiv.org/abs/1703.10593

Titel: Learning to Clean: A GAN Perspective
verfasst von: Monika Sharma
Abhishek Verma
Lovekesh Vig
Verlag: Springer International Publishing
Buch: Computer Vision – ACCV 2018 Workshops
Print ISBN: 978-3-030-21073-1

Electronic ISBN: 978-3-030-21074-8

Copyright-Jahr: 2019
DOI: https://doi.org/10.1007/978-3-030-21074-8_14

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner