Skip to main content
Top
Published in: Multimedia Systems 6/2022

15-06-2022 | Special Issue Paper

Understanding videos with face recognition: a complete pipeline and applications

Authors: Pasquale Lisena, Jorma Laaksonen, Raphaël Troncy

Published in: Multimedia Systems | Issue 6/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

When browsing or studying a video corpus, particularly relevant information consists in knowing who are the people appearing in the scenes. In this paper, we show how a combination of state of the art techniques can be organised in a pipeline for face recognition of celebrities. In particular, we propose a system which combines MTCNN for detecting faces and FaceNet for extracting face embeddings, which are used to train a set of classifiers. The face recognition results obtained at a frame level are then combined with those in consecutive frames, relying on automatic object tracking. Differently from previous work, we use images automatically retrieved by web search engines. We evaluate the systems one three datasets including historical videos from 1945 to 1969 and contemporary videos, obtaining a good precision score. In addition, we show how the obtained results can be applied to foster historical studies.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
We use the icrawler open-source library: https://​github.​com/​hellock/​icrawler/​.
 
2
We decided to convert to greyscale because some preliminary experiments revealed not enough improvement, considering the increment of computation complexity of using 3 colour channels.
 
3
We use the implementation provided at https://​github.​com/​ipazc/​mtcnn.
 
4
We use \(x_{l} = 0.35w\), \(x_{r} = (1 - x_{l})\), and \(y_{l} = y_{r} = 0.35h\).
 
5
In this context, we are not taking care of high visual diversity in the images of one person, which can be due for example to ageing. In cases like Elizabeth II, with pictures publicly available for several decades, we decided to modify the search keyword for images to “Elizabeth II 1960”. In other cases with high visual variation in less time—e.g. for Charles De Gaulle in 1940 and 1960—, the Facenet embeddings were similar enough to not require splitting into two different classifiers.
 
6
SVM obtained better performance than other tested classifiers, namely Random Forest, Logistic Regression and the k-Nearest Neighbours.
 
7
We also performed experiments on this system using a multi-class classifier with n class, instead of the n binary classifiers. While the results revealed similar precision scores, the recall for the multi-class solution was considerably worse, 22 percentage points lower than the system with binary classifiers.
 
8
We used the implementation provided at https://​github.​com/​Linzaer/​Face-Track-Detect-Extract with some minor modification.
 
9
The mode is “the number or value that appears most often in a particular set” (Cambridge Dictionary).
 
10
The mode can be seen a generalisation of the weighted mode, putting all weights (in our formula, \(c_p\)) to 1.
 
13
The corpus can be downloaded from https://​dataset.​ina.​fr/​.
 
14
In the following, we define media as the entire video resource (e.g. an MPEG-4 file), segment a temporal fragment of variable length (possibly composed of different shots), and shot, a not interrupted recording of the video-camera. See also the definitions of MediaResource, Part and Shot in the EBU Core ontology (https://​www.​ebu.​ch/​metadata/​ontologies/​ebucore/​).
 
Literature
1.
go back to reference Wactlar, H., Christel, M.: Digital Video Archives: Managing through Metadata. In: Building a National Strategy for Digital Preservation: Issues in Digital Media Archiving, pp. 84–99. Library of Congress, Washington, DC, USA (2002) Wactlar, H., Christel, M.: Digital Video Archives: Managing through Metadata. In: Building a National Strategy for Digital Preservation: Issues in Digital Media Archiving, pp. 84–99. Library of Congress, Washington, DC, USA (2002)
2.
go back to reference Kilgarriff, A., Grefenstette, G.: Introduction to the Special Issue on the Web as Corpus. Computational Linguistics 29(3), 333–347 (2003) Kilgarriff, A., Grefenstette, G.: Introduction to the Special Issue on the Web as Corpus. Computational Linguistics 29(3), 333–347 (2003)
3.
go back to reference Ma, H., Kink, I., Lyu, M.R.: Mining Web Graphs for Recommendations. IEEE Transactions on Knowledge and Data Engineering 24, 1051–1064 (2012)CrossRef Ma, H., Kink, I., Lyu, M.R.: Mining Web Graphs for Recommendations. IEEE Transactions on Knowledge and Data Engineering 24, 1051–1064 (2012)CrossRef
4.
go back to reference Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks. IEEE Signal Processing Letters 23(10), 1499–1503 (2016)CrossRef Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks. IEEE Signal Processing Letters 23(10), 1499–1503 (2016)CrossRef
5.
go back to reference Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: A Unified Embedding for Face Recognition and Clustering. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 815–823. IEEE Computer Society, Boston, MA, USA (2015) Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: A Unified Embedding for Face Recognition and Clustering. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 815–823. IEEE Computer Society, Boston, MA, USA (2015)
6.
8.
go back to reference Viola, P., Jones, M.J.: Robust Real-Time Face Detection. International Journal of Computer Vision 57(2), 137–154 (2004)CrossRef Viola, P., Jones, M.J.: Robust Real-Time Face Detection. International Journal of Computer Vision 57(2), 137–154 (2004)CrossRef
9.
go back to reference Ahonen, T., Hadid, A., Pietikäinen, M.: Face description with local binary patterns: Application to face recognition. IEEE Transactions on Pattern Analysis & Machine Intelligence 28(12), 2037–2041 (2006)CrossRefMATH Ahonen, T., Hadid, A., Pietikäinen, M.: Face description with local binary patterns: Application to face recognition. IEEE Transactions on Pattern Analysis & Machine Intelligence 28(12), 2037–2041 (2006)CrossRefMATH
10.
go back to reference King, D.E.: Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10, 1755–1758 (2009) King, D.E.: Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10, 1755–1758 (2009)
11.
go back to reference Liu, L., Zhang, L., Liu, H., Yan, S.: Toward Large-Population Face Identification in Unconstrained Videos. IEEE Transactions on Circuits and Systems for Video Technology 24(11), 1874–1884 (2014). DOI: 10.1109/TCSVT.2014.2319671CrossRef Liu, L., Zhang, L., Liu, H., Yan, S.: Toward Large-Population Face Identification in Unconstrained Videos. IEEE Transactions on Circuits and Systems for Video Technology 24(11), 1874–1884 (2014). DOI: 10.1109/TCSVT.2014.2319671CrossRef
12.
go back to reference Huang, Z., Wang, R., Shan, S., Van Gool, L., Chen, X.: Cross euclidean-to-riemannian metric learning with application to face recognition from video. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(12), 2827–2840 (2018). DOI: 10.1109/TPAMI.2017.2776154CrossRef Huang, Z., Wang, R., Shan, S., Van Gool, L., Chen, X.: Cross euclidean-to-riemannian metric learning with application to face recognition from video. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(12), 2827–2840 (2018). DOI: 10.1109/TPAMI.2017.2776154CrossRef
13.
go back to reference Ding, C., Tao, D.: Trunk-branch ensemble convolutional neural networks for video-based face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(4), 1002–1014 (2018). DOI: 10.1109/TPAMI.2017.2700390CrossRef Ding, C., Tao, D.: Trunk-branch ensemble convolutional neural networks for video-based face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(4), 1002–1014 (2018). DOI: 10.1109/TPAMI.2017.2700390CrossRef
14.
go back to reference William, I., Ignatius Moses Setiadi, D.R., Rachmawanto, E.H., Santoso, H.A., Sari, C.A.: Face Recognition using FaceNet (Survey, Performance Test, and Comparison). In: 4\(^{th}\) International Conference on Informatics and Computing (ICIC). IEEE, Semarang, Indonesia (2019) William, I., Ignatius Moses Setiadi, D.R., Rachmawanto, E.H., Santoso, H.A., Sari, C.A.: Face Recognition using FaceNet (Survey, Performance Test, and Comparison). In: 4\(^{th}\) International Conference on Informatics and Computing (ICIC). IEEE, Semarang, Indonesia (2019)
16.
go back to reference Shafin, M., Hansda, R., Pallavi, E., Kumar, D., Bhattacharyya, S., Kumar, S.: Partial Face Recognition: A Survey. In: 3\(^{rd}\) International Conference on Advanced Informatics for Computing Research (ICAICR), pp. 1–6. Association for Computing Machinery, Shimla, India (2019) Shafin, M., Hansda, R., Pallavi, E., Kumar, D., Bhattacharyya, S., Kumar, S.: Partial Face Recognition: A Survey. In: 3\(^{rd}\) International Conference on Advanced Informatics for Computing Research (ICAICR), pp. 1–6. Association for Computing Machinery, Shimla, India (2019)
17.
go back to reference Ali-Gombe, A., Elyan, E., Zwiegelaar, J.: Towards a Reliable Face Recognition System. In: Iliadis, L., Angelov, P.P., Jayne, C., Pimenidis, E. (eds.) 21\(^{st}\) Engineering Applications of Neural Networks Conference (EANN), pp. 304–316. Springer, Cham (2020) Ali-Gombe, A., Elyan, E., Zwiegelaar, J.: Towards a Reliable Face Recognition System. In: Iliadis, L., Angelov, P.P., Jayne, C., Pimenidis, E. (eds.) 21\(^{st}\) Engineering Applications of Neural Networks Conference (EANN), pp. 304–316. Springer, Cham (2020)
19.
go back to reference Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: VGGFace2: A Dataset for Recognising Faces across Pose and Age. In: 13\(^{th}\) IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 67–74. IEEE Computer Society, Xi’an, China (2018) Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: VGGFace2: A Dataset for Recognising Faces across Pose and Age. In: 13\(^{th}\) IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 67–74. IEEE Computer Society, Xi’an, China (2018)
20.
go back to reference Hsu, C.-W., Lin, C.-J.: A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks 13(2), 415–425 (2002)CrossRef Hsu, C.-W., Lin, C.-J.: A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks 13(2), 415–425 (2002)CrossRef
21.
go back to reference Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple Online and Realtime Tracking. In: IEEE International Conference on Image Processing (ICIP), pp. 3464–3468. IEEE Computer Society, Phoenix, AZ, USA (2016) Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple Online and Realtime Tracking. In: IEEE International Conference on Image Processing (ICIP), pp. 3464–3468. IEEE Computer Society, Phoenix, AZ, USA (2016)
22.
go back to reference Beloued, A., Stockinger, P., Lalande, S.: 4. Studio Campus AAR: A Semantic Platform for Analyzing and Publishing Audiovisual Corpuses, pp. 85–133. John Wiley & Sons, Ltd, Hoboken, NJ, USA (2017) Beloued, A., Stockinger, P., Lalande, S.: 4. Studio Campus AAR: A Semantic Platform for Analyzing and Publishing Audiovisual Corpuses, pp. 85–133. John Wiley & Sons, Ltd, Hoboken, NJ, USA (2017)
23.
go back to reference Carrive, J., Beloued, A., Goetschel, P., Heiden, S., Laurent, A., Lisena, P., Mazuet, F., Meignier, S., Pinchemin, B., Poels, G., Troncy, R.: Transdisciplinary Analysis of a Corpus of French Newsreels: The ANTRACT Project. Digital Humanities Quarterly, Special Issue on AudioVisual Data in DH 15(1) (2021) Carrive, J., Beloued, A., Goetschel, P., Heiden, S., Laurent, A., Lisena, P., Mazuet, F., Meignier, S., Pinchemin, B., Poels, G., Troncy, R.: Transdisciplinary Analysis of a Corpus of French Newsreels: The ANTRACT Project. Digital Humanities Quarterly, Special Issue on AudioVisual Data in DH 15(1) (2021)
24.
go back to reference Harrando, I., Reboud, A., Lisena, P., Troncy, R., Laaksonen, J., Virkkunen, A., Kurimo, M.: Using Fan-Made Content, Subtitles and Face Recognition for Character-Centric Video Summarization. In: International Workshop on Video Retrieval Evaluation (TRECVID 2020). NIST, Virtual Conference (2020) Harrando, I., Reboud, A., Lisena, P., Troncy, R., Laaksonen, J., Virkkunen, A., Kurimo, M.: Using Fan-Made Content, Subtitles and Face Recognition for Character-Centric Video Summarization. In: International Workshop on Video Retrieval Evaluation (TRECVID 2020). NIST, Virtual Conference (2020)
25.
go back to reference Santemiz, P., Spreeuwers, L.J., Veldhuis, R.N.J.: Automatic landmark detection and face recognition for side-view face images. In: International Conference of the BIOSIG Special Interest Group (BIOSIG). IEEE, Darmstadt, Germany (2013) Santemiz, P., Spreeuwers, L.J., Veldhuis, R.N.J.: Automatic landmark detection and face recognition for side-view face images. In: International Conference of the BIOSIG Special Interest Group (BIOSIG). IEEE, Darmstadt, Germany (2013)
26.
go back to reference Haider, H., Khiyal, M.: Side-View Face Detection using Automatic Landmarks. Journal of Multidisciplinary Engineering Science Studies 3, 1729–1736 (2017) Haider, H., Khiyal, M.: Side-View Face Detection using Automatic Landmarks. Journal of Multidisciplinary Engineering Science Studies 3, 1729–1736 (2017)
27.
go back to reference Lee, Y.J., Grauman, K.: Face Discovery with Social Context. In: British Machine Vision Conference (BMVA). BMVA Press, Dundee, UK (2011) Lee, Y.J., Grauman, K.: Face Discovery with Social Context. In: British Machine Vision Conference (BMVA). BMVA Press, Dundee, UK (2011)
28.
go back to reference Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimedia Systems 16(6), 345–379 (2010)CrossRef Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimedia Systems 16(6), 345–379 (2010)CrossRef
29.
go back to reference Handa, A., Agarwal, R., Kohli, N.: A survey of face recognition techniques and comparative study of various bi-modal and multi-modal techniques. In: 11\(^{th}\) International Conference on Industrial and Information Systems (ICIIS), pp. 274–279. IEEE, Roorkee, India (2016) Handa, A., Agarwal, R., Kohli, N.: A survey of face recognition techniques and comparative study of various bi-modal and multi-modal techniques. In: 11\(^{th}\) International Conference on Industrial and Information Systems (ICIIS), pp. 274–279. IEEE, Roorkee, India (2016)
30.
go back to reference Zhou, H., Lam, K.-M.: Age-invariant face recognition based on identity inference from appearance age. Pattern Recognition 76, 191–202 (2018)CrossRef Zhou, H., Lam, K.-M.: Age-invariant face recognition based on identity inference from appearance age. Pattern Recognition 76, 191–202 (2018)CrossRef
Metadata
Title
Understanding videos with face recognition: a complete pipeline and applications
Authors
Pasquale Lisena
Jorma Laaksonen
Raphaël Troncy
Publication date
15-06-2022
Publisher
Springer Berlin Heidelberg
Published in
Multimedia Systems / Issue 6/2022
Print ISSN: 0942-4962
Electronic ISSN: 1432-1882
DOI
https://doi.org/10.1007/s00530-022-00959-x

Other articles of this Issue 6/2022

Multimedia Systems 6/2022 Go to the issue