Skip to main content
Top

2019 | OriginalPaper | Chapter

Character Prediction in TV Series via a Semantic Projection Network

Authors : Ke Sun, Zhuo Lei, Jiasong Zhu, Xianxu Hou, Bozhi Liu, Guoping Qiu

Published in: MultiMedia Modeling

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The goal of this paper is to automatically recognize characters in popular TV series. In contrast to conventional approaches which rely on weak supervision afforded by transcripts, subtitles or character facial data, we formulate the problem as the multi-label classification which requires only label-level supervision. We propose a novel semantic projection network consisting of two stacked subnetworks with specially designed constraints. The first subnetwork is a contractive autoencoder which focuses on reconstructing feature activations extracted from a pre-trained single-label convolutional neural network (CNN). The second subnetwork functions as a region-based multi-label classifier which produces character labels for the input video frame as well as reconstructing the input visual feature from the mapped semantic labels space. Extensive experiments show that the proposed model achieves state-of-the-art performance in comparison with recent approaches on three challenging TV series datasets (the Big Bang Theory, the Defenders and Nirvava in Fire).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bojanowski, P., Bach, F., Laptev, I., Ponce, J., Schmid, C., Sivic, J.: Finding actors and actions in movies. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 2280–2287. IEEE (2013) Bojanowski, P., Bach, F., Laptev, I., Ponce, J., Schmid, C., Sivic, J.: Finding actors and actions in movies. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 2280–2287. IEEE (2013)
2.
go back to reference Cour, T., Sapp, B., Nagle, A., Taskar, B.: Talking pictures: temporal grouping and dialog-supervised person recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1014–1021 (2011) Cour, T., Sapp, B., Nagle, A., Taskar, B.: Talking pictures: temporal grouping and dialog-supervised person recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1014–1021 (2011)
3.
go back to reference Cour, T., Sapp, B., Jordan, C., Taskar, B.: Learning from ambiguously labeled images. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 919–926 (2009) Cour, T., Sapp, B., Jordan, C., Taskar, B.: Learning from ambiguously labeled images. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 919–926 (2009)
4.
go back to reference Cour, T., Sapp, B., Nagle, A., Taskar, B.: Talking pictures: temporal grouping and dialog-supervised person recognition. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1014–1021. IEEE (2010) Cour, T., Sapp, B., Nagle, A., Taskar, B.: Talking pictures: temporal grouping and dialog-supervised person recognition. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1014–1021. IEEE (2010)
5.
go back to reference Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009) Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)
6.
go back to reference Dong, Z., Jia, S., Wu, T., Pei, M.: Face video retrieval via deep learning of binary hash representations. In: AAAI, pp. 3471–3477 (2016) Dong, Z., Jia, S., Wu, T., Pei, M.: Face video retrieval via deep learning of binary hash representations. In: AAAI, pp. 3471–3477 (2016)
7.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
8.
go back to reference He, Z., Chen, C., Bu, J., Li, P., Cai, D.: Multi-view based multi-label propagation for image annotation. Neurocomputing 168(C), 853–860 (2015)CrossRef He, Z., Chen, C., Bu, J., Li, P., Cai, D.: Multi-view based multi-label propagation for image annotation. Neurocomputing 168(C), 853–860 (2015)CrossRef
9.
go back to reference Iwata, M., Ito, A., Kise, K.: A study to achieve manga character retrieval method for manga images. In: 2014 11th IAPR International Workshop on Document Analysis Systems (DAS), pp. 309–313. IEEE (2014) Iwata, M., Ito, A., Kise, K.: A study to achieve manga character retrieval method for manga images. In: 2014 11th IAPR International Workshop on Document Analysis Systems (DAS), pp. 309–313. IEEE (2014)
10.
go back to reference Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015) Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)
11.
go back to reference Kostinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Learning to recognize faces from videos and weakly related information cues. In: IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 23–28 (2011) Kostinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Learning to recognize faces from videos and weakly related information cues. In: IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 23–28 (2011)
12.
go back to reference Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. M.Sc. thesis, University of Toronto (2009) Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. M.Sc. thesis, University of Toronto (2009)
13.
go back to reference Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
14.
go back to reference Li, C., Kang, Q., Ge, G., Song, Q., Lu, H., Cheng, J.: Deepbe: learning deep binary encoding for multi-label classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 39–46 (2016) Li, C., Kang, Q., Ge, G., Song, Q., Lu, H., Cheng, J.: Deepbe: learning deep binary encoding for multi-label classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 39–46 (2016)
15.
go back to reference Li, Y., Wang, R., Cui, Z., Shan, S., Chen, X.: Compact video code and its application to robust face retrieval in tv-series. In: BMVC (2014) Li, Y., Wang, R., Cui, Z., Shan, S., Chen, X.: Compact video code and its application to robust face retrieval in tv-series. In: BMVC (2014)
16.
go back to reference Li, Y., Wang, R., Shan, S., Chen, X.: Hierarchical hybrid statistic based video binary code and its application to face retrieval in tv-series. In: 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), vol. 1, pp. 1–8. IEEE (2015) Li, Y., Wang, R., Shan, S., Chen, X.: Hierarchical hybrid statistic based video binary code and its application to face retrieval in tv-series. In: 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), vol. 1, pp. 1–8. IEEE (2015)
17.
go back to reference Nagrani, A., Zisserman, A.: From benedict cumberbatch to sherlock holmes: Character identification in tv series without a script. CoRR abs/1801.10442 (2017) Nagrani, A., Zisserman, A.: From benedict cumberbatch to sherlock holmes: Character identification in tv series without a script. CoRR abs/1801.10442 (2017)
18.
go back to reference Nam, J., Kim, J., Loza Mencía, E., Gurevych, I., Fürnkranz, J.: Large-scale multi-label text classification—revisiting neural networks. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8725, pp. 437–452. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44851-9_28CrossRef Nam, J., Kim, J., Loza Mencía, E., Gurevych, I., Fürnkranz, J.: Large-scale multi-label text classification—revisiting neural networks. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8725, pp. 437–452. Springer, Heidelberg (2014). https://​doi.​org/​10.​1007/​978-3-662-44851-9_​28CrossRef
19.
go back to reference Parkhi, O.M., Rahtu, E., Zisserman, A.: It’s in the bag: stronger supervision for automated face labelling. In: ICCV Workshop, vol. 2, p. 6 (2015) Parkhi, O.M., Rahtu, E., Zisserman, A.: It’s in the bag: stronger supervision for automated face labelling. In: ICCV Workshop, vol. 2, p. 6 (2015)
20.
go back to reference Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS-W (2017) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS-W (2017)
21.
go back to reference Pont-Tuset, J., Arbeláez, P., Barron, J.T., Marques, F., Malik, J.: Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Trans. Pattern Anal. Mach. Intell. 39(1), 128–140 (2015)CrossRef Pont-Tuset, J., Arbeláez, P., Barron, J.T., Marques, F., Malik, J.: Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Trans. Pattern Anal. Mach. Intell. 39(1), 128–140 (2015)CrossRef
23.
go back to reference Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 512–519 (2014) Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 512–519 (2014)
24.
go back to reference Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: explicit invariance during feature extraction. In: ICML (2011) Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: explicit invariance during feature extraction. In: ICML (2011)
25.
go back to reference Shan, C.: Face recognition and retrieval in video. Stud. Comput. Intell. 287, 235–260 (2010) Shan, C.: Face recognition and retrieval in video. Stud. Comput. Intell. 287, 235–260 (2010)
26.
go back to reference Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)CrossRef Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)CrossRef
27.
go back to reference Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556 (2014)
28.
go back to reference Sivic, J., Everingham, M., Zisserman, A.: “who are you?"- learning person specific classifiers from video. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1145–1152. IEEE (2009) Sivic, J., Everingham, M., Zisserman, A.: “who are you?"- learning person specific classifiers from video. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1145–1152. IEEE (2009)
29.
go back to reference Tapaswi, M., Bäuml, M., Stiefelhagen, R.: Story-based video retrieval in TV series using plot synopses. In: Proceedings of International Conference on Multimedia Retrieval, p. 137. ACM (2014) Tapaswi, M., Bäuml, M., Stiefelhagen, R.: Story-based video retrieval in TV series using plot synopses. In: Proceedings of International Conference on Multimedia Retrieval, p. 137. ACM (2014)
30.
go back to reference Tapaswi, M., Bauml, M., Stiefelhagen, R.: Storygraphs: visualizing character interactions as a timeline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 827–834 (2014) Tapaswi, M., Bauml, M., Stiefelhagen, R.: Storygraphs: visualizing character interactions as a timeline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 827–834 (2014)
31.
go back to reference Tsoumakas, G., Katakis, I.: Multi-label classification: an overview. Int. J. Data Warehous. Min. (IJDWM) 3(3), 1–13 (2007)CrossRef Tsoumakas, G., Katakis, I.: Multi-label classification: an overview. Int. J. Data Warehous. Min. (IJDWM) 3(3), 1–13 (2007)CrossRef
32.
go back to reference Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., Xu, W.: CNN-RNN: a unified framework for multi-label image classification. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2285–2294. IEEE (2016) Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., Xu, W.: CNN-RNN: a unified framework for multi-label image classification. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2285–2294. IEEE (2016)
34.
go back to reference Wei, Y., et al.: HCP: A flexible CNN framework for multi-label image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1901–1907 (2016)CrossRef Wei, Y., et al.: HCP: A flexible CNN framework for multi-label image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1901–1907 (2016)CrossRef
36.
go back to reference Wu, F., Wang, Z., Zhang, Z., Yang, Y., Luo, J., Zhu, W., Zhuang, Y.: Weakly semi-supervised deep learning for multi-label image annotation. IEEE Trans. Big Data 1(3), 109–122 (2015)CrossRef Wu, F., Wang, Z., Zhang, Z., Yang, Y., Luo, J., Zhu, W., Zhuang, Y.: Weakly semi-supervised deep learning for multi-label image annotation. IEEE Trans. Big Data 1(3), 109–122 (2015)CrossRef
37.
go back to reference Yu, Q., Wang, J., Zhang, S., Gong, Y., Zhao, J.: Combining local and global hypotheses in deep neural network for multi-label image classification. Neurocomputing 235, 38–45 (2017)CrossRef Yu, Q., Wang, J., Zhang, S., Gong, Y., Zhao, J.: Combining local and global hypotheses in deep neural network for multi-label image classification. Neurocomputing 235, 38–45 (2017)CrossRef
38.
go back to reference Zhang, M., Zhou, Z.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit. 40(7), 2038–2048 (2007)CrossRef Zhang, M., Zhou, Z.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit. 40(7), 2038–2048 (2007)CrossRef
39.
go back to reference Zhang, Z., Saligrama, V.: Zero-shot learning via joint latent similarity embedding. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 6034–6042 (2016) Zhang, Z., Saligrama, V.: Zero-shot learning via joint latent similarity embedding. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 6034–6042 (2016)
Metadata
Title
Character Prediction in TV Series via a Semantic Projection Network
Authors
Ke Sun
Zhuo Lei
Jiasong Zhu
Xianxu Hou
Bozhi Liu
Guoping Qiu
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-05710-7_25