Skip to main content

2021 | OriginalPaper | Chapter

C2VNet: A Deep Learning Framework Towards Comic Strip to Audio-Visual Scene Synthesis

Authors : Vaibhavi Gupta, Vinay Detani, Vivek Khokar, Chiranjoy Chattopadhyay

Published in: Document Analysis and Recognition – ICDAR 2021

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

loading …


Advances in technology have propelled the growth of methods and methodologies that can create the desired multimedia content. “Automatic image synthesis” is one such instance that has earned immense importance among researchers. In contrast, audio-video scene synthesis, especially from document images, remains challenging and less investigated. To bridge this gap, we propose a novel framework, Comic-to-Video Network (C2VNet), which evolves panel-by-panel in a comic strip and eventually creates a full-length video (with audio) of a digitized or born-digital storybook. This step-by-step video synthesis process enables the creation of a high-resolution video. The proposed work’s primary contributions are; (1) a novel end-to-end comic strip to audio-video scene synthesis framework, (2) an improved panel and text balloon segmentation technique, and (3) a dataset of a digitized comic storybook in the English language with complete annotation and binary masks of the text balloon. Qualitative and quantitative experimental results demonstrate the effectiveness of the proposed C2VNet framework for automatic audio-visual scene synthesis.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"


Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"


Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe


Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"


Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

go back to reference Aizawa, K., et al.: Building a manga dataset “manga109’’ with annotations for multimedia applications. IEEE Multimedia 2, 8–18 (2020)CrossRef Aizawa, K., et al.: Building a manga dataset “manga109’’ with annotations for multimedia applications. IEEE Multimedia 2, 8–18 (2020)CrossRef
go back to reference Arai, K., Tolle, H., Arai, K., Tolle, H.: Method for real time text extraction of digital manga comic, pp. 669–676 (2011) Arai, K., Tolle, H., Arai, K., Tolle, H.: Method for real time text extraction of digital manga comic, pp. 669–676 (2011)
go back to reference Aramaki, Y., Matsui, Y., Yamasaki, T., Aizawa, K.: Text detection in manga by combining connected-component-based and region-based classifications. In: ICIP, pp. 2901–2905 (2016) Aramaki, Y., Matsui, Y., Yamasaki, T., Aizawa, K.: Text detection in manga by combining connected-component-based and region-based classifications. In: ICIP, pp. 2901–2905 (2016)
go back to reference Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9357–9366 (2019) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9357–9366 (2019)
go back to reference Dubray, D., Laubrock, J.: Deep CNN-based speech balloon detection and segmentation for comic books. In: ICDAR, pp. 1237–1243 (2019) Dubray, D., Laubrock, J.: Deep CNN-based speech balloon detection and segmentation for comic books. In: ICDAR, pp. 1237–1243 (2019)
go back to reference Guérin, C., et al.: eBDtheque: a representative database of comics. In: ICDAR, pp. 1145–1149, August 2013 Guérin, C., et al.: eBDtheque: a representative database of comics. In: ICDAR, pp. 1145–1149, August 2013
go back to reference Ho, A.K.N., Burie, J., Ogier, J.: Panel and speech balloon extraction from comic books. In: DAS, pp. 424–428 (2012) Ho, A.K.N., Burie, J., Ogier, J.: Panel and speech balloon extraction from comic books. In: DAS, pp. 424–428 (2012)
go back to reference Iyyer, M., et al.: The amazing mysteries of the gutter: Drawing inferences between panels in comic book narratives. In: CVPR, pp. 6478–6487 (2017) Iyyer, M., et al.: The amazing mysteries of the gutter: Drawing inferences between panels in comic book narratives. In: CVPR, pp. 6478–6487 (2017)
go back to reference Khan, F.S., Anwer, R.M., van de Weijer, J., Bagdanov, A.D., Vanrell, M., Lopez, A.M.: Color attributes for object detection. In: CVPR, pp. 3306–3313 (2012) Khan, F.S., Anwer, R.M., van de Weijer, J., Bagdanov, A.D., Vanrell, M., Lopez, A.M.: Color attributes for object detection. In: CVPR, pp. 3306–3313 (2012)
go back to reference Nguyen Nhu, V., Rigaud, C., Burie, J.: What do we expect from comic panel extraction? In: ICDARW, pp. 44–49 (2019) Nguyen Nhu, V., Rigaud, C., Burie, J.: What do we expect from comic panel extraction? In: ICDARW, pp. 44–49 (2019)
go back to reference Ogawa, T., Otsubo, A., Narita, R., Matsui, Y., Yamasaki, T., Aizawa, K.: Object detection for comics using manga109 annotations. CoRR (2018) Ogawa, T., Otsubo, A., Narita, R., Matsui, Y., Yamasaki, T., Aizawa, K.: Object detection for comics using manga109 annotations. CoRR (2018)
go back to reference Pang, X., Cao, Y., Lau, R.W., Chan, A.B.: A robust panel extraction method for manga. In: ACM MM, pp. 1125–1128 (2014) Pang, X., Cao, Y., Lau, R.W., Chan, A.B.: A robust panel extraction method for manga. In: ACM MM, pp. 1125–1128 (2014)
go back to reference Ponsard, C., Ramdoyal, R., Dziamski, D.: An OCR-enabled digital comic books viewer. In: ICCHP, pp. 471–478 (2012) Ponsard, C., Ramdoyal, R., Dziamski, D.: An OCR-enabled digital comic books viewer. In: ICCHP, pp. 471–478 (2012)
go back to reference Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv (2018) Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv (2018)
go back to reference Rigaud, C., Burie, J., Ogier, J.: Segmentation-free speech text recognition for comic books. In: ICDAR, pp. 29–34 (2017) Rigaud, C., Burie, J., Ogier, J.: Segmentation-free speech text recognition for comic books. In: ICDAR, pp. 29–34 (2017)
go back to reference Rigaud, C., Burie, J., Ogier, J., Karatzas, D., Van De Weijer, J.: An active contour model for speech balloon detection in comics. In: ICDAR, pp. 1240–1244 (2013) Rigaud, C., Burie, J., Ogier, J., Karatzas, D., Van De Weijer, J.: An active contour model for speech balloon detection in comics. In: ICDAR, pp. 1240–1244 (2013)
go back to reference Rigaud, C.: Segmentation and indexation of complex objects in comic book images. ELCVIA (2014) Rigaud, C.: Segmentation and indexation of complex objects in comic book images. ELCVIA (2014)
go back to reference Rigaud, C., Pal, S., Burie, J.C., Ogier, J.M.: Toward speech text recognition for comic books. In: MANPU, pp. 1–6 (2016) Rigaud, C., Pal, S., Burie, J.C., Ogier, J.M.: Toward speech text recognition for comic books. In: MANPU, pp. 1–6 (2016)
go back to reference Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: MICCAI, pp. 234–241 (2015) Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: MICCAI, pp. 234–241 (2015)
go back to reference Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. TPAMI 39, 2298–2304 (2017)CrossRef Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. TPAMI 39, 2298–2304 (2017)CrossRef
go back to reference Tanaka, T., Shoji, K., Toyama, F., Miyamichi, J.: Layout analysis of tree-structured scene frames in comic images. In: IJCAI, pp. 2885–2890 (2007) Tanaka, T., Shoji, K., Toyama, F., Miyamichi, J.: Layout analysis of tree-structured scene frames in comic images. In: IJCAI, pp. 2885–2890 (2007)
go back to reference Tolle, H., Arai, K.: Automatic e-comic content adaptation. IJUC 1, 1–11 (2010) Tolle, H., Arai, K.: Automatic e-comic content adaptation. IJUC 1, 1–11 (2010)
go back to reference Yamada, M., Budiarto, R., Endo, M., Miyazaki, S.: Comic image decomposition for reading comics on cellular phones. IEICE Trans. Inf. Syst. 87, 1370–1376 (2004) Yamada, M., Budiarto, R., Endo, M., Miyazaki, S.: Comic image decomposition for reading comics on cellular phones. IEICE Trans. Inf. Syst. 87, 1370–1376 (2004)
go back to reference Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: CVPR, pp. 2642–2651 (2017) Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: CVPR, pp. 2642–2651 (2017)
C2VNet: A Deep Learning Framework Towards Comic Strip to Audio-Visual Scene Synthesis
Vaibhavi Gupta
Vinay Detani
Vivek Khokar
Chiranjoy Chattopadhyay
Copyright Year

Premium Partner