Skip to main content
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) 1-2/2021

21.04.2021 | Special Issue Paper

CNN-based segmentation of speech balloons and narrative text boxes from comic book page images

verfasst von: Arpita Dutta, Samit Biswas, Amit Kumar Das

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 1-2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Most of the recent research works on comic document images have focused on the reading and distribution of comics digitally due to the evolution of technologies. In this work, the extraction of narrative text boxes and speech balloons, which contain the conversations among comic characters along with their feelings, is presented. Due to the huge variety of drawing styles, the shape of these speech balloons is complex, and extraction is difficult. We present a shape-aware dual-stream convolutional neural network for the segmentation of narrative text boxes and speech balloons of various shapes. In our dual-stream architecture, an added shape module processes edge information of the speech balloons and narrative texts with the main module. Later, the concatenation of these two modules produces more accurate segmentation of speech balloons and narrative text boxes. The proposed method achieves significant performance improvements in terms of both region accuracy (mIOU) and boundary accuracy (F-measure and Hausdorff distance) compared to other state-of-the-art methods on various publicly available comic datasets (namely eBDtheque, DCM and Manga 109 dataset subset) in different languages. In addition, we have developed a new dataset (BCBId) for comics in Bangla, the eighth most spoken language in the world, and propose a method for the development of ground-truth images in a semiautomatic way.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat BCBID: sites.google.com/view/banglacomicbookdataset. Accessed 8 Sept 2020 BCBID: sites.google.com/view/banglacomicbookdataset. Accessed 8 Sept 2020
4.
Zurück zum Zitat Arai, K., Tolle, H.: Method for real time text extraction of digital manga comic. Int. J. Image Process. IJIP 4(6), 669–676 (2011) Arai, K., Tolle, H.: Method for real time text extraction of digital manga comic. Int. J. Image Process. IJIP 4(6), 669–676 (2011)
5.
Zurück zum Zitat Augereau, O., Iwata, M., Kise, K.: An overview of comics research in computer science. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 3, pp. 54–59. IEEE (2017) Augereau, O., Iwata, M., Kise, K.: An overview of comics research in computer science. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 3, pp. 54–59. IEEE (2017)
6.
Zurück zum Zitat Augereau, O., Iwata, M., Kise, K.: A survey of comics research in computer science. J. Imaging 4(7), 87 (2018)CrossRef Augereau, O., Iwata, M., Kise, K.: A survey of comics research in computer science. J. Imaging 4(7), 87 (2018)CrossRef
7.
Zurück zum Zitat Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)MATH Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)MATH
8.
Zurück zum Zitat Cao, Y., Pang, X., Chan, A.B., Lau, R.W.: Dynamic manga: animating still manga via camera movement. IEEE Trans. Multimedia 19(1), 160–172 (2016)CrossRef Cao, Y., Pang, X., Chan, A.B., Lau, R.W.: Dynamic manga: animating still manga via camera movement. IEEE Trans. Multimedia 19(1), 160–172 (2016)CrossRef
9.
Zurück zum Zitat Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV, pp. 801–818 (2018) Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV, pp. 801–818 (2018)
10.
Zurück zum Zitat Dubray, D., Laubrock, J.: Deep CNN-based speech balloon detection and segmentation for comic books. arXiv preprint arXiv:1902.08137 (2019) Dubray, D., Laubrock, J.: Deep CNN-based speech balloon detection and segmentation for comic books. arXiv preprint arXiv:​1902.​08137 (2019)
11.
Zurück zum Zitat Dubuisson, M.P., Jain, A.K.: A modified Hausdorff distance for object matching. In: Proceedings of 12th International Conference on Pattern Recognition, vol. 1, pp. 566–568. IEEE (1994) Dubuisson, M.P., Jain, A.K.: A modified Hausdorff distance for object matching. In: Proceedings of 12th International Conference on Pattern Recognition, vol. 1, pp. 566–568. IEEE (1994)
12.
Zurück zum Zitat Dunst, A., Laubrock, J., Wildfeuer, J.: Empirical Comics Research: Digital, Multimodal, and Cognitive Methods. Routledge, Milton Park (2018)CrossRef Dunst, A., Laubrock, J., Wildfeuer, J.: Empirical Comics Research: Digital, Multimodal, and Cognitive Methods. Routledge, Milton Park (2018)CrossRef
13.
Zurück zum Zitat Dutta, A., Biswas, S.: CNN based extraction of panels/characters from bengali comic book page images. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 1, pp. 38–43. IEEE (2019) Dutta, A., Biswas, S.: CNN based extraction of panels/characters from bengali comic book page images. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 1, pp. 38–43. IEEE (2019)
14.
Zurück zum Zitat Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In: International Conference on Learning Representations (2019) Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In: International Conference on Learning Representations (2019)
15.
Zurück zum Zitat Guérin, C., Rigaud, C., Mercier, A., Ammar-Boudjelal, F., Bertet, K., Bouju, A., Burie, J.C., Louis, G., Ogier, J.M., Revel, A.: eBDtheque: a representative database of comics. In: ICDAR, pp. 1145–1149. IEEE (2013) Guérin, C., Rigaud, C., Mercier, A., Ammar-Boudjelal, F., Bertet, K., Bouju, A., Burie, J.C., Louis, G., Ogier, J.M., Revel, A.: eBDtheque: a representative database of comics. In: ICDAR, pp. 1145–1149. IEEE (2013)
16.
Zurück zum Zitat He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
17.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
18.
Zurück zum Zitat Ho, A.K.N., Burie, J.C., Ogier, J.M.: Panel and speech balloon extraction from comic books. In: DAS, 2012, pp. 424–428. IEEE (2012) Ho, A.K.N., Burie, J.C., Ogier, J.M.: Panel and speech balloon extraction from comic books. In: DAS, 2012, pp. 424–428. IEEE (2012)
19.
Zurück zum Zitat Huttenlocher, D.P., Klanderman, G.A., Rucklidge, W.J.: Comparing images using the Hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15(9), 850–863 (1993)CrossRef Huttenlocher, D.P., Klanderman, G.A., Rucklidge, W.J.: Comparing images using the Hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15(9), 850–863 (1993)CrossRef
20.
Zurück zum Zitat Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. Int. J. Comput. Vis. 1(4), 321–331 (1988)CrossRef Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. Int. J. Comput. Vis. 1(4), 321–331 (1988)CrossRef
21.
Zurück zum Zitat Li, L., Wang, Y., Gao, L., Tang, Z., Suen, C.Y.: Comic2cebx: a system for automatic comic content adaptation. In: IEEE/ACM Joint Conference on Digital Libraries, pp. 299–308. IEEE (2014) Li, L., Wang, Y., Gao, L., Tang, Z., Suen, C.Y.: Comic2cebx: a system for automatic comic content adaptation. In: IEEE/ACM Joint Conference on Digital Libraries, pp. 299–308. IEEE (2014)
22.
Zurück zum Zitat Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
23.
Zurück zum Zitat Matsui, Y., Ito, K., Aramaki, Y., Fujimoto, A., Ogawa, T., Yamasaki, T., Aizawa, K.: Sketch-based manga retrieval using manga109 dataset. Multimedia Tools Appl. 76(20), 21811–21838 (2017)CrossRef Matsui, Y., Ito, K., Aramaki, Y., Fujimoto, A., Ogawa, T., Yamasaki, T., Aizawa, K.: Sketch-based manga retrieval using manga109 dataset. Multimedia Tools Appl. 76(20), 21811–21838 (2017)CrossRef
24.
Zurück zum Zitat Matsui, Y., Yamasaki, T., Aizawa, K.: Interactive manga retargeting. In: SIGGRAPH Posters, p. 35 (2011) Matsui, Y., Yamasaki, T., Aizawa, K.: Interactive manga retargeting. In: SIGGRAPH Posters, p. 35 (2011)
25.
Zurück zum Zitat Nguyen, N.V., Rigaud, C., Burie, J.C.: Digital comics image indexing based on deep learning. J. Imaging 4(7), 89 (2018)CrossRef Nguyen, N.V., Rigaud, C., Burie, J.C.: Digital comics image indexing based on deep learning. J. Imaging 4(7), 89 (2018)CrossRef
26.
Zurück zum Zitat Nguyen, N.V., Rigaud, C., Burie, J.C.: Comic MTL: optimized multi-task learning for comic book image analysis. Int. J. Doc. Anal. Recognit. IJDAR 22(3), 265–284 (2019)CrossRef Nguyen, N.V., Rigaud, C., Burie, J.C.: Comic MTL: optimized multi-task learning for comic book image analysis. Int. J. Doc. Anal. Recognit. IJDAR 22(3), 265–284 (2019)CrossRef
27.
Zurück zum Zitat Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1520–1528 (2015) Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1520–1528 (2015)
28.
Zurück zum Zitat Ogawa, T., Otsubo, A., Narita, R., Matsui, Y., Yamasaki, T., Aizawa, K.: Object detection for comics using manga109 annotations. arXiv:1803.08670 (2018) Ogawa, T., Otsubo, A., Narita, R., Matsui, Y., Yamasaki, T., Aizawa, K.: Object detection for comics using manga109 annotations. arXiv:​1803.​08670 (2018)
29.
30.
Zurück zum Zitat Prewitt, J.M.: Object enhancement and extraction. Picture Process. Psychopictorics 10(1), 15–19 (1970) Prewitt, J.M.: Object enhancement and extraction. Picture Process. Psychopictorics 10(1), 15–19 (1970)
31.
Zurück zum Zitat Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015) Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
32.
Zurück zum Zitat Ribera, J., Guera, D., Chen, Y., Delp, E.J.: Locating objects without bounding boxes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6479–6489 (2019) Ribera, J., Guera, D., Chen, Y., Delp, E.J.: Locating objects without bounding boxes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6479–6489 (2019)
33.
Zurück zum Zitat Rigaud, C., Burie, J.C., Ogier, J.M.: Text-independent speech balloon segmentation for comics and manga. In: International Workshop on Graphics Recognition, pp. 133–147. Springer (2015) Rigaud, C., Burie, J.C., Ogier, J.M.: Text-independent speech balloon segmentation for comics and manga. In: International Workshop on Graphics Recognition, pp. 133–147. Springer (2015)
34.
Zurück zum Zitat Rigaud, C., Burie, J.C., Ogier, J.M., Karatzas, D., Van de Weijer, J.: An active contour model for speech balloon detection in comics. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1240–1244. IEEE (2013) Rigaud, C., Burie, J.C., Ogier, J.M., Karatzas, D., Van de Weijer, J.: An active contour model for speech balloon detection in comics. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1240–1244. IEEE (2013)
35.
Zurück zum Zitat Rigaud, C., Guérin, C., Karatzas, D., Burie, J.C., Ogier, J.M.: Knowledge-driven understanding of images in comic books. IJDAR 18(3), 199–221 (2015)CrossRef Rigaud, C., Guérin, C., Karatzas, D., Burie, J.C., Ogier, J.M.: Knowledge-driven understanding of images in comic books. IJDAR 18(3), 199–221 (2015)CrossRef
36.
Zurück zum Zitat Rigaud, C., Le Thanh, N., Burie, J.C., Ogier, J.M., Iwata, M., Imazu, E., Kise, K.: Speech balloon and speaker association for comics and manga understanding. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 351–355. IEEE (2015) Rigaud, C., Le Thanh, N., Burie, J.C., Ogier, J.M., Iwata, M., Imazu, E., Kise, K.: Speech balloon and speaker association for comics and manga understanding. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 351–355. IEEE (2015)
37.
Zurück zum Zitat Rigaud, C., Nguyen, V., Burie, J.C.: Confidence criterion for speech balloon segmentation. In: 13th IAPR International Workshop on Graphics Recognition (2019) Rigaud, C., Nguyen, V., Burie, J.C.: Confidence criterion for speech balloon segmentation. In: 13th IAPR International Workshop on Graphics Recognition (2019)
38.
Zurück zum Zitat Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer (2015) Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer (2015)
39.
Zurück zum Zitat Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)MathSciNetCrossRef Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)MathSciNetCrossRef
40.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556 (2014)
41.
Zurück zum Zitat Sun, W., Kise, K.: Similar manga retrieval using visual vocabulary based on regions of interest. In: 2011 International Conference on Document Analysis and Recognition, pp. 1075–1079. IEEE (2011) Sun, W., Kise, K.: Similar manga retrieval using visual vocabulary based on regions of interest. In: 2011 International Conference on Document Analysis and Recognition, pp. 1075–1079. IEEE (2011)
42.
Zurück zum Zitat Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Academic Press, Boca Raton (2008)MATH Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Academic Press, Boca Raton (2008)MATH
43.
Zurück zum Zitat Woo, S., Park, J., Lee, J.Y., So Kweon, I.: Cbam: Convolutional block attention module. In: ECCV, pp. 3–19 (2018) Woo, S., Park, J., Lee, J.Y., So Kweon, I.: Cbam: Convolutional block attention module. In: ECCV, pp. 3–19 (2018)
44.
Zurück zum Zitat Yamada, M., Budiarto, R., Endo, M., Miyazaki, S.: Comic image decomposition for reading comics on cellular phones. IEICE Trans. Inf. Syst. 87(6), 1370–1376 (2004) Yamada, M., Budiarto, R., Endo, M., Miyazaki, S.: Comic image decomposition for reading comics on cellular phones. IEICE Trans. Inf. Syst. 87(6), 1370–1376 (2004)
45.
Zurück zum Zitat Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: NIPS. Curran Associates (2014) Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: NIPS. Curran Associates (2014)
46.
Zurück zum Zitat Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: 4th International Conference on Learning Representations, ICLR 2016 Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: 4th International Conference on Learning Representations, ICLR 2016
47.
Zurück zum Zitat Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017) Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
Metadaten
Titel
CNN-based segmentation of speech balloons and narrative text boxes from comic book page images
verfasst von
Arpita Dutta
Samit Biswas
Amit Kumar Das
Publikationsdatum
21.04.2021
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 1-2/2021
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-021-00366-4

Weitere Artikel der Ausgabe 1-2/2021

International Journal on Document Analysis and Recognition (IJDAR) 1-2/2021 Zur Ausgabe