Top

Multimedia Systems

Published in:

15-06-2022 | Special Issue Paper

Understanding videos with face recognition: a complete pipeline and applications

Authors: Pasquale Lisena, Jorma Laaksonen, Raphaël Troncy

Published in: Multimedia Systems | Issue 6/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

When browsing or studying a video corpus, particularly relevant information consists in knowing who are the people appearing in the scenes. In this paper, we show how a combination of state of the art techniques can be organised in a pipeline for face recognition of celebrities. In particular, we propose a system which combines MTCNN for detecting faces and FaceNet for extracting face embeddings, which are used to train a set of classifiers. The face recognition results obtained at a frame level are then combined with those in consecutive frames, relying on automatic object tracking. Differently from previous work, we use images automatically retrieved by web search engines. We evaluate the systems one three datasets including historical videos from 1945 to 1969 and contemporary videos, obtaining a good precision score. In addition, we show how the obtained results can be applied to foster historical studies.

previous article Special issue on data-driven personalisation of television content

next article Combining semantic and linguistic representations for media recommendation

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

We use the icrawler open-source library: https://github.com/hellock/icrawler/.

We decided to convert to greyscale because some preliminary experiments revealed not enough improvement, considering the increment of computation complexity of using 3 colour channels.

We use the implementation provided at https://github.com/ipazc/mtcnn.

We use \(x_{l} = 0.35w\), \(x_{r} = (1 - x_{l})\), and \(y_{l} = y_{r} = 0.35h\).

In this context, we are not taking care of high visual diversity in the images of one person, which can be due for example to ageing. In cases like Elizabeth II, with pictures publicly available for several decades, we decided to modify the search keyword for images to “Elizabeth II 1960”. In other cases with high visual variation in less time—e.g. for Charles De Gaulle in 1940 and 1960—, the Facenet embeddings were similar enough to not require splitting into two different classifiers.

SVM obtained better performance than other tested classifiers, namely Random Forest, Logistic Regression and the k-Nearest Neighbours.

We also performed experiments on this system using a multi-class classifier with n class, instead of the n binary classifiers. While the results revealed similar precision scores, the recall for the multi-class solution was considerably worse, 22 percentage points lower than the system with binary classifiers.

We used the implementation provided at https://github.com/Linzaer/Face-Track-Detect-Extract with some minor modification.

The mode is “the number or value that appears most often in a particular set” (Cambridge Dictionary).

The mode can be seen a generalisation of the weighted mode, putting all weights (in our formula, \(c_p\)) to 1.

We used the implementation available in SciPy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.fcluster.html.

https://www.ina.fr/emissions/les-actualites-francaises/.

The corpus can be downloaded from https://dataset.ina.fr/.

In the following, we define media as the entire video resource (e.g. an MPEG-4 file), segment a temporal fragment of variable length (possibly composed of different shots), and shot, a not interrupted recording of the video-camera. See also the definitions of MediaResource, Part and Shot in the EBU Core ontology (https://www.ebu.ch/metadata/ontologies/ebucore/).

https://memad.eu/.

https://data.memad.eu/.

https://www.openapis.org/.

https://swagger.io/.

https://www.ebu.ch/metadata/ontologies/ebucore/.

https://www.w3.org/ns/oa.ttl.

https://www.w3.org/TR/media-frags/.

Wactlar, H., Christel, M.: Digital Video Archives: Managing through Metadata. In: Building a National Strategy for Digital Preservation: Issues in Digital Media Archiving, pp. 84–99. Library of Congress, Washington, DC, USA (2002)

Kilgarriff, A., Grefenstette, G.: Introduction to the Special Issue on the Web as Corpus. Computational Linguistics 29(3), 333–347 (2003)

Ma, H., Kink, I., Lyu, M.R.: Mining Web Graphs for Recommendations. IEEE Transactions on Knowledge and Data Engineering 24, 1051–1064 (2012)CrossRef

Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks. IEEE Signal Processing Letters 23(10), 1499–1503 (2016)CrossRef

Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: A Unified Embedding for Face Recognition and Clustering. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 815–823. IEEE Computer Society, Boston, MA, USA (2015)

Lisena, P., Laaksonen, J., Troncy, R.: FaceRec: An Interactive Framework for Face Recognition in Video Archives. In: 2nd International Workshop on Data-driven Personalisation of Television (DataTV-2021), New York, USA (2021). https://doi.org/10.5281/zenodo.4764632

Vij, R., Kaushik, B.: A survey on various face detecting and tracking techniques in video sequences. In: 2019 International Conference on Intelligent Computing and Control Systems (ICCS), pp. 69–73 (2019). https://doi.org/10.1109/ICCS45141.2019.9065483

Viola, P., Jones, M.J.: Robust Real-Time Face Detection. International Journal of Computer Vision 57(2), 137–154 (2004)CrossRef

Ahonen, T., Hadid, A., Pietikäinen, M.: Face description with local binary patterns: Application to face recognition. IEEE Transactions on Pattern Analysis & Machine Intelligence 28(12), 2037–2041 (2006)CrossRefMATH

10.

King, D.E.: Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10, 1755–1758 (2009)

11.

Liu, L., Zhang, L., Liu, H., Yan, S.: Toward Large-Population Face Identification in Unconstrained Videos. IEEE Transactions on Circuits and Systems for Video Technology 24(11), 1874–1884 (2014). DOI: 10.1109/TCSVT.2014.2319671CrossRef

12.

Huang, Z., Wang, R., Shan, S., Van Gool, L., Chen, X.: Cross euclidean-to-riemannian metric learning with application to face recognition from video. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(12), 2827–2840 (2018). DOI: 10.1109/TPAMI.2017.2776154CrossRef

13.

Ding, C., Tao, D.: Trunk-branch ensemble convolutional neural networks for video-based face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(4), 1002–1014 (2018). DOI: 10.1109/TPAMI.2017.2700390CrossRef

14.

William, I., Ignatius Moses Setiadi, D.R., Rachmawanto, E.H., Santoso, H.A., Sari, C.A.: Face Recognition using FaceNet (Survey, Performance Test, and Comparison). In: 4\(^{th}\) International Conference on Informatics and Computing (ICIC). IEEE, Semarang, Indonesia (2019)

15.

Guo, G., Zhang, N.: A survey on deep learning based face recognition. Computer Vision and Image Understanding 189 (2019). https://doi.org/10.1016/j.cviu.2019.102805

16.

Shafin, M., Hansda, R., Pallavi, E., Kumar, D., Bhattacharyya, S., Kumar, S.: Partial Face Recognition: A Survey. In: 3\(^{rd}\) International Conference on Advanced Informatics for Computing Research (ICAICR), pp. 1–6. Association for Computing Machinery, Shimla, India (2019)

17.

Ali-Gombe, A., Elyan, E., Zwiegelaar, J.: Towards a Reliable Face Recognition System. In: Iliadis, L., Angelov, P.P., Jayne, C., Pimenidis, E. (eds.) 21\(^{st}\) Engineering Applications of Neural Networks Conference (EANN), pp. 304–316. Springer, Cham (2020)

18.

Li, S., Deng, W.: Deep facial expression recognition: a survey. IEEE Trans Affect Comput (2020). https://doi.org/10.1109/TAFFC.2020.2981446CrossRef

19.

Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: VGGFace2: A Dataset for Recognising Faces across Pose and Age. In: 13\(^{th}\) IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 67–74. IEEE Computer Society, Xi’an, China (2018)

20.

Hsu, C.-W., Lin, C.-J.: A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks 13(2), 415–425 (2002)CrossRef

21.

Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple Online and Realtime Tracking. In: IEEE International Conference on Image Processing (ICIP), pp. 3464–3468. IEEE Computer Society, Phoenix, AZ, USA (2016)

22.

Beloued, A., Stockinger, P., Lalande, S.: 4. Studio Campus AAR: A Semantic Platform for Analyzing and Publishing Audiovisual Corpuses, pp. 85–133. John Wiley & Sons, Ltd, Hoboken, NJ, USA (2017)

23.

Carrive, J., Beloued, A., Goetschel, P., Heiden, S., Laurent, A., Lisena, P., Mazuet, F., Meignier, S., Pinchemin, B., Poels, G., Troncy, R.: Transdisciplinary Analysis of a Corpus of French Newsreels: The ANTRACT Project. Digital Humanities Quarterly, Special Issue on AudioVisual Data in DH 15(1) (2021)

24.

Harrando, I., Reboud, A., Lisena, P., Troncy, R., Laaksonen, J., Virkkunen, A., Kurimo, M.: Using Fan-Made Content, Subtitles and Face Recognition for Character-Centric Video Summarization. In: International Workshop on Video Retrieval Evaluation (TRECVID 2020). NIST, Virtual Conference (2020)

25.

Santemiz, P., Spreeuwers, L.J., Veldhuis, R.N.J.: Automatic landmark detection and face recognition for side-view face images. In: International Conference of the BIOSIG Special Interest Group (BIOSIG). IEEE, Darmstadt, Germany (2013)

26.

Haider, H., Khiyal, M.: Side-View Face Detection using Automatic Landmarks. Journal of Multidisciplinary Engineering Science Studies 3, 1729–1736 (2017)

27.

Lee, Y.J., Grauman, K.: Face Discovery with Social Context. In: British Machine Vision Conference (BMVA). BMVA Press, Dundee, UK (2011)

28.

Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimedia Systems 16(6), 345–379 (2010)CrossRef

29.

Handa, A., Agarwal, R., Kohli, N.: A survey of face recognition techniques and comparative study of various bi-modal and multi-modal techniques. In: 11\(^{th}\) International Conference on Industrial and Information Systems (ICIIS), pp. 274–279. IEEE, Roorkee, India (2016)

30.

Zhou, H., Lam, K.-M.: Age-invariant face recognition based on identity inference from appearance age. Pattern Recognition 76, 191–202 (2018)CrossRef

Title: Understanding videos with face recognition: a complete pipeline and applications
Authors: Pasquale Lisena
Jorma Laaksonen
Raphaël Troncy
Publication date: 15-06-2022
Publisher: Springer Berlin Heidelberg
Published in: Multimedia Systems / Issue 6/2022
Print ISSN: 0942-4962
Electronic ISSN: 1432-1882
DOI: https://doi.org/10.1007/s00530-022-00959-x

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 6/2022

CyberBERT: BERT for cyberbullying identification

Multimodal cyberbullying detection using capsule network with dynamic routing and deep convolutional neural network

Automated brain tumor malignancy detection via 3D MRI using adaptive-3-D U-Net and heuristic-based deep neural network

Improved SSD using deep multi-scale attention spatial–temporal features for action recognition

Semantically guided projection for zero-shot 3D model classification and retrieval

Micro-expression recognition based on SqueezeNet and C3D