1 Introduction
2 Related work
2.1 Face detection
2.2 Head pose estimation
2.3 Multi-task methods
3 Towards real-time face detection and head pose estimation
3.1 Architecture
3.2 Multi-task loss
3.3 Headpose estimation
4 Experiments and results
4.1 Training
4.1.1 The data processing
4.1.2 Training details
4.2 Results of face detection
4.3 Results of head pose estimation
Method | Yaw | Pitch | Roll | MAE |
---|---|---|---|---|
Dlib | 23.153 | 10.545 | 13.633 | 15.777 |
Hopenet | 8.84 | 15.41 | 14.1 | 12.78 |
Our | 5.49 | 23.81 | 17.26 | 15.52 |
4.4 Inference efficiency
Method | Inference time |
---|---|
Retinaface(mobile) | 0.2 |
yolo_face_dect | 0.017 |
SSH | 0.12 |
Retinaface(mobile)+hopenet | 0.32 |
yolo_face_dect+hopenet | 0.137 |
SSH+hopenet | 0.24 |
Our | 0.071 |
Nums of people | SSH+Hopenet | YOLO_face+Hopenet | Our |
---|---|---|---|
1 | 0.152 | 0.156 | 0.015 |
5 | 0.183 | 0.18 | 0.016 |
17 | 0.32 | 0.252 | 0.015 |
34 | 0.58 | 0.46 | 0.017 |
Method | Frames per second |
---|---|
Retinaface(mobile)+Hopenet | 9 |
MTCNN+Hopenet | 1.8 |
SSH+hopenet | 3.9 |
Our | 40.69 |