nach oben

International Journal on Document Analysis and Recognition (IJDAR)

Erschienen in:

21.04.2021 | Special Issue Paper

CNN-based segmentation of speech balloons and narrative text boxes from comic book page images

verfasst von: Arpita Dutta, Samit Biswas, Amit Kumar Das

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 1-2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Most of the recent research works on comic document images have focused on the reading and distribution of comics digitally due to the evolution of technologies. In this work, the extraction of narrative text boxes and speech balloons, which contain the conversations among comic characters along with their feelings, is presented. Due to the huge variety of drawing styles, the shape of these speech balloons is complex, and extraction is difficult. We present a shape-aware dual-stream convolutional neural network for the segmentation of narrative text boxes and speech balloons of various shapes. In our dual-stream architecture, an added shape module processes edge information of the speech balloons and narrative texts with the main module. Later, the concatenation of these two modules produces more accurate segmentation of speech balloons and narrative text boxes. The proposed method achieves significant performance improvements in terms of both region accuracy (mIOU) and boundary accuracy (F-measure and Hausdorff distance) compared to other state-of-the-art methods on various publicly available comic datasets (namely eBDtheque, DCM and Manga 109 dataset subset) in different languages. In addition, we have developed a new dataset (BCBId) for comics in Bangla, the eighth most spoken language in the world, and propose a method for the development of ground-truth images in a semiautomatic way.

Vorheriger Artikel Cross-modal photo-caricature face recognition based on dynamic multi-task learning

Nächster Artikel Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Codes and data are available at https://github.com/Arpi07/Arpi07-2/tree/Speech_balloon_segmentation.

BCBID: sites.google.com/view/banglacomicbookdataset. Accessed 8 Sept 2020

Christophe Rigaud|Gitlab. https://git.univ-lr.fr/u/crigau02. Accessed 7 Jan 2020

Digital Comic Museum. https://digitalcomicmuseum.com/. Accessed 29 May 2019

Arai, K., Tolle, H.: Method for real time text extraction of digital manga comic. Int. J. Image Process. IJIP 4(6), 669–676 (2011)

Augereau, O., Iwata, M., Kise, K.: An overview of comics research in computer science. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 3, pp. 54–59. IEEE (2017)

Augereau, O., Iwata, M., Kise, K.: A survey of comics research in computer science. J. Imaging 4(7), 87 (2018)CrossRef

Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)MATH

Cao, Y., Pang, X., Chan, A.B., Lau, R.W.: Dynamic manga: animating still manga via camera movement. IEEE Trans. Multimedia 19(1), 160–172 (2016)CrossRef

Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV, pp. 801–818 (2018)

10.

Dubray, D., Laubrock, J.: Deep CNN-based speech balloon detection and segmentation for comic books. arXiv preprint arXiv:1902.08137 (2019)

11.

Dubuisson, M.P., Jain, A.K.: A modified Hausdorff distance for object matching. In: Proceedings of 12th International Conference on Pattern Recognition, vol. 1, pp. 566–568. IEEE (1994)

12.

Dunst, A., Laubrock, J., Wildfeuer, J.: Empirical Comics Research: Digital, Multimodal, and Cognitive Methods. Routledge, Milton Park (2018)CrossRef

13.

Dutta, A., Biswas, S.: CNN based extraction of panels/characters from bengali comic book page images. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 1, pp. 38–43. IEEE (2019)

14.

Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In: International Conference on Learning Representations (2019)

15.

Guérin, C., Rigaud, C., Mercier, A., Ammar-Boudjelal, F., Bertet, K., Bouju, A., Burie, J.C., Louis, G., Ogier, J.M., Revel, A.: eBDtheque: a representative database of comics. In: ICDAR, pp. 1145–1149. IEEE (2013)

16.

He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

17.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

18.

Ho, A.K.N., Burie, J.C., Ogier, J.M.: Panel and speech balloon extraction from comic books. In: DAS, 2012, pp. 424–428. IEEE (2012)

19.

Huttenlocher, D.P., Klanderman, G.A., Rucklidge, W.J.: Comparing images using the Hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15(9), 850–863 (1993)CrossRef

20.

Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. Int. J. Comput. Vis. 1(4), 321–331 (1988)CrossRef

21.

Li, L., Wang, Y., Gao, L., Tang, Z., Suen, C.Y.: Comic2cebx: a system for automatic comic content adaptation. In: IEEE/ACM Joint Conference on Digital Libraries, pp. 299–308. IEEE (2014)

22.

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

23.

Matsui, Y., Ito, K., Aramaki, Y., Fujimoto, A., Ogawa, T., Yamasaki, T., Aizawa, K.: Sketch-based manga retrieval using manga109 dataset. Multimedia Tools Appl. 76(20), 21811–21838 (2017)CrossRef

24.

Matsui, Y., Yamasaki, T., Aizawa, K.: Interactive manga retargeting. In: SIGGRAPH Posters, p. 35 (2011)

25.

Nguyen, N.V., Rigaud, C., Burie, J.C.: Digital comics image indexing based on deep learning. J. Imaging 4(7), 89 (2018)CrossRef

26.

Nguyen, N.V., Rigaud, C., Burie, J.C.: Comic MTL: optimized multi-task learning for comic book image analysis. Int. J. Doc. Anal. Recognit. IJDAR 22(3), 265–284 (2019)CrossRef

27.

Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1520–1528 (2015)

28.

Ogawa, T., Otsubo, A., Narita, R., Matsui, Y., Yamasaki, T., Aizawa, K.: Object detection for comics using manga109 annotations. arXiv:1803.08670 (2018)

29.

Osserman, R., et al.: The isoperimetric inequality. Bull. Am. Math. Soc. 84(6), 1182–1238 (1978)MathSciNetCrossRef

30.

Prewitt, J.M.: Object enhancement and extraction. Picture Process. Psychopictorics 10(1), 15–19 (1970)

31.

Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)

32.

Ribera, J., Guera, D., Chen, Y., Delp, E.J.: Locating objects without bounding boxes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6479–6489 (2019)

33.

Rigaud, C., Burie, J.C., Ogier, J.M.: Text-independent speech balloon segmentation for comics and manga. In: International Workshop on Graphics Recognition, pp. 133–147. Springer (2015)

34.

Rigaud, C., Burie, J.C., Ogier, J.M., Karatzas, D., Van de Weijer, J.: An active contour model for speech balloon detection in comics. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1240–1244. IEEE (2013)

35.

Rigaud, C., Guérin, C., Karatzas, D., Burie, J.C., Ogier, J.M.: Knowledge-driven understanding of images in comic books. IJDAR 18(3), 199–221 (2015)CrossRef

36.

Rigaud, C., Le Thanh, N., Burie, J.C., Ogier, J.M., Iwata, M., Imazu, E., Kise, K.: Speech balloon and speaker association for comics and manga understanding. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 351–355. IEEE (2015)

37.

Rigaud, C., Nguyen, V., Burie, J.C.: Confidence criterion for speech balloon segmentation. In: 13th IAPR International Workshop on Graphics Recognition (2019)

38.

Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer (2015)

39.

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)MathSciNetCrossRef

40.

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

41.

Sun, W., Kise, K.: Similar manga retrieval using visual vocabulary based on regions of interest. In: 2011 International Conference on Document Analysis and Recognition, pp. 1075–1079. IEEE (2011)

42.

Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Academic Press, Boca Raton (2008)MATH

43.

Woo, S., Park, J., Lee, J.Y., So Kweon, I.: Cbam: Convolutional block attention module. In: ECCV, pp. 3–19 (2018)

44.

Yamada, M., Budiarto, R., Endo, M., Miyazaki, S.: Comic image decomposition for reading comics on cellular phones. IEICE Trans. Inf. Syst. 87(6), 1370–1376 (2004)

45.

Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: NIPS. Curran Associates (2014)

46.

Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: 4th International Conference on Learning Representations, ICLR 2016

47.

Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)

Titel: CNN-based segmentation of speech balloons and narrative text boxes from comic book page images
verfasst von: Arpita Dutta
Samit Biswas
Amit Kumar Das
Publikationsdatum: 21.04.2021
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 1-2/2021
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI: https://doi.org/10.1007/s10032-021-00366-4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1-2/2021

Offline script recognition from handwritten and printed multilingual documents: a survey

Text recognition for Vietnamese identity card based on deep features network

Knowledge-driven description synthesis for floor plan interpretation

Cross-modal photo-caricature face recognition based on dynamic multi-task learning

Arrow R-CNN for handwritten diagram recognition

Persian handwritten digit, character and word recognition using deep learning