Skip to main content

2021 | OriginalPaper | Buchkapitel

C2VNet: A Deep Learning Framework Towards Comic Strip to Audio-Visual Scene Synthesis

verfasst von : Vaibhavi Gupta, Vinay Detani, Vivek Khokar, Chiranjoy Chattopadhyay

Erschienen in: Document Analysis and Recognition – ICDAR 2021

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Advances in technology have propelled the growth of methods and methodologies that can create the desired multimedia content. “Automatic image synthesis” is one such instance that has earned immense importance among researchers. In contrast, audio-video scene synthesis, especially from document images, remains challenging and less investigated. To bridge this gap, we propose a novel framework, Comic-to-Video Network (C2VNet), which evolves panel-by-panel in a comic strip and eventually creates a full-length video (with audio) of a digitized or born-digital storybook. This step-by-step video synthesis process enables the creation of a high-resolution video. The proposed work’s primary contributions are; (1) a novel end-to-end comic strip to audio-video scene synthesis framework, (2) an improved panel and text balloon segmentation technique, and (3) a dataset of a digitized comic storybook in the English language with complete annotation and binary masks of the text balloon. Qualitative and quantitative experimental results demonstrate the effectiveness of the proposed C2VNet framework for automatic audio-visual scene synthesis.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
5.
Zurück zum Zitat Aizawa, K., et al.: Building a manga dataset “manga109’’ with annotations for multimedia applications. IEEE Multimedia 2, 8–18 (2020)CrossRef Aizawa, K., et al.: Building a manga dataset “manga109’’ with annotations for multimedia applications. IEEE Multimedia 2, 8–18 (2020)CrossRef
6.
Zurück zum Zitat Arai, K., Tolle, H., Arai, K., Tolle, H.: Method for real time text extraction of digital manga comic, pp. 669–676 (2011) Arai, K., Tolle, H., Arai, K., Tolle, H.: Method for real time text extraction of digital manga comic, pp. 669–676 (2011)
7.
Zurück zum Zitat Aramaki, Y., Matsui, Y., Yamasaki, T., Aizawa, K.: Text detection in manga by combining connected-component-based and region-based classifications. In: ICIP, pp. 2901–2905 (2016) Aramaki, Y., Matsui, Y., Yamasaki, T., Aizawa, K.: Text detection in manga by combining connected-component-based and region-based classifications. In: ICIP, pp. 2901–2905 (2016)
8.
Zurück zum Zitat Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9357–9366 (2019) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9357–9366 (2019)
10.
Zurück zum Zitat Dubray, D., Laubrock, J.: Deep CNN-based speech balloon detection and segmentation for comic books. In: ICDAR, pp. 1237–1243 (2019) Dubray, D., Laubrock, J.: Deep CNN-based speech balloon detection and segmentation for comic books. In: ICDAR, pp. 1237–1243 (2019)
11.
Zurück zum Zitat Guérin, C., et al.: eBDtheque: a representative database of comics. In: ICDAR, pp. 1145–1149, August 2013 Guérin, C., et al.: eBDtheque: a representative database of comics. In: ICDAR, pp. 1145–1149, August 2013
12.
Zurück zum Zitat Ho, A.K.N., Burie, J., Ogier, J.: Panel and speech balloon extraction from comic books. In: DAS, pp. 424–428 (2012) Ho, A.K.N., Burie, J., Ogier, J.: Panel and speech balloon extraction from comic books. In: DAS, pp. 424–428 (2012)
13.
Zurück zum Zitat Iyyer, M., et al.: The amazing mysteries of the gutter: Drawing inferences between panels in comic book narratives. In: CVPR, pp. 6478–6487 (2017) Iyyer, M., et al.: The amazing mysteries of the gutter: Drawing inferences between panels in comic book narratives. In: CVPR, pp. 6478–6487 (2017)
14.
Zurück zum Zitat Khan, F.S., Anwer, R.M., van de Weijer, J., Bagdanov, A.D., Vanrell, M., Lopez, A.M.: Color attributes for object detection. In: CVPR, pp. 3306–3313 (2012) Khan, F.S., Anwer, R.M., van de Weijer, J., Bagdanov, A.D., Vanrell, M., Lopez, A.M.: Color attributes for object detection. In: CVPR, pp. 3306–3313 (2012)
16.
Zurück zum Zitat Nguyen Nhu, V., Rigaud, C., Burie, J.: What do we expect from comic panel extraction? In: ICDARW, pp. 44–49 (2019) Nguyen Nhu, V., Rigaud, C., Burie, J.: What do we expect from comic panel extraction? In: ICDARW, pp. 44–49 (2019)
17.
Zurück zum Zitat Ogawa, T., Otsubo, A., Narita, R., Matsui, Y., Yamasaki, T., Aizawa, K.: Object detection for comics using manga109 annotations. CoRR (2018) Ogawa, T., Otsubo, A., Narita, R., Matsui, Y., Yamasaki, T., Aizawa, K.: Object detection for comics using manga109 annotations. CoRR (2018)
18.
Zurück zum Zitat Pang, X., Cao, Y., Lau, R.W., Chan, A.B.: A robust panel extraction method for manga. In: ACM MM, pp. 1125–1128 (2014) Pang, X., Cao, Y., Lau, R.W., Chan, A.B.: A robust panel extraction method for manga. In: ACM MM, pp. 1125–1128 (2014)
19.
Zurück zum Zitat Ponsard, C., Ramdoyal, R., Dziamski, D.: An OCR-enabled digital comic books viewer. In: ICCHP, pp. 471–478 (2012) Ponsard, C., Ramdoyal, R., Dziamski, D.: An OCR-enabled digital comic books viewer. In: ICCHP, pp. 471–478 (2012)
20.
Zurück zum Zitat Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv (2018) Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv (2018)
21.
Zurück zum Zitat Rigaud, C., Burie, J., Ogier, J.: Segmentation-free speech text recognition for comic books. In: ICDAR, pp. 29–34 (2017) Rigaud, C., Burie, J., Ogier, J.: Segmentation-free speech text recognition for comic books. In: ICDAR, pp. 29–34 (2017)
22.
Zurück zum Zitat Rigaud, C., Burie, J., Ogier, J., Karatzas, D., Van De Weijer, J.: An active contour model for speech balloon detection in comics. In: ICDAR, pp. 1240–1244 (2013) Rigaud, C., Burie, J., Ogier, J., Karatzas, D., Van De Weijer, J.: An active contour model for speech balloon detection in comics. In: ICDAR, pp. 1240–1244 (2013)
23.
Zurück zum Zitat Rigaud, C.: Segmentation and indexation of complex objects in comic book images. ELCVIA (2014) Rigaud, C.: Segmentation and indexation of complex objects in comic book images. ELCVIA (2014)
24.
Zurück zum Zitat Rigaud, C., Pal, S., Burie, J.C., Ogier, J.M.: Toward speech text recognition for comic books. In: MANPU, pp. 1–6 (2016) Rigaud, C., Pal, S., Burie, J.C., Ogier, J.M.: Toward speech text recognition for comic books. In: MANPU, pp. 1–6 (2016)
25.
Zurück zum Zitat Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: MICCAI, pp. 234–241 (2015) Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: MICCAI, pp. 234–241 (2015)
26.
Zurück zum Zitat Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. TPAMI 39, 2298–2304 (2017)CrossRef Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. TPAMI 39, 2298–2304 (2017)CrossRef
27.
Zurück zum Zitat Tanaka, T., Shoji, K., Toyama, F., Miyamichi, J.: Layout analysis of tree-structured scene frames in comic images. In: IJCAI, pp. 2885–2890 (2007) Tanaka, T., Shoji, K., Toyama, F., Miyamichi, J.: Layout analysis of tree-structured scene frames in comic images. In: IJCAI, pp. 2885–2890 (2007)
28.
Zurück zum Zitat Tolle, H., Arai, K.: Automatic e-comic content adaptation. IJUC 1, 1–11 (2010) Tolle, H., Arai, K.: Automatic e-comic content adaptation. IJUC 1, 1–11 (2010)
29.
Zurück zum Zitat Yamada, M., Budiarto, R., Endo, M., Miyazaki, S.: Comic image decomposition for reading comics on cellular phones. IEICE Trans. Inf. Syst. 87, 1370–1376 (2004) Yamada, M., Budiarto, R., Endo, M., Miyazaki, S.: Comic image decomposition for reading comics on cellular phones. IEICE Trans. Inf. Syst. 87, 1370–1376 (2004)
30.
Zurück zum Zitat Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: CVPR, pp. 2642–2651 (2017) Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: CVPR, pp. 2642–2651 (2017)
Metadaten
Titel
C2VNet: A Deep Learning Framework Towards Comic Strip to Audio-Visual Scene Synthesis
verfasst von
Vaibhavi Gupta
Vinay Detani
Vivek Khokar
Chiranjoy Chattopadhyay
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-86331-9_11