Introduction
-
A monocular 3D vehicle localization network CenterLoc3D for roadside surveillance cameras in traffic scenes is proposed, which directly predicts accurate 3D vehicle projection vertexes and dimensions.
-
A weighted-fusion module is proposed in multi-scale feature fusion, which further enhances feature extraction capability.
-
A loss with spatial constraints embedding is proposed, which can effectively improve the accuracy of 3D vehicle localization.
-
A benchmark including a dataset, an annotation tool, and evaluation metrics is proposed for experimental validation, which is helpful for the development of monocular 3D vehicle localization in roadside monocular traffic scenes.
Related work
CenterLoc3D for 3D vehicle localization
Framework
Camera calibration
CenterLoc3D
Network architecture
Loss function
Vertex | World coordinate |
---|---|
\(P_1^\textrm{proj}\) | \((x_2^{gt} + w_v^\textrm{pred},y_2^{gt},z_2^{gt})\) |
\(P_2^\textrm{proj}\) | \((x_2^{gt},y_2^{gt},z_2^{gt})\) |
\(P_3^\textrm{proj}\) | \((x_2^{gt},y_2^{gt} + l_v^\textrm{pred},z_2^{gt})\) |
\(P_4^\textrm{proj}\) | \((x_2^{gt} + w_v^\textrm{pred},y_2^{gt} + l_v^\textrm{pred},z_2^{gt})\) |
\(P_5^\textrm{proj}\) | \((x_2^{gt} + w_v^\textrm{pred},y_2^{gt},z_2^{gt} + h_v^\textrm{pred})\) |
\(P_6^\textrm{proj}\) | \((x_2^{gt},y_2^{gt},z_2^{gt} + h_v^\textrm{pred})\) |
\(P_7^\textrm{proj}\) | \((x_2^{gt},y_2^{gt} + l_v^\textrm{pred},z_2^{gt} + h_v^\textrm{pred})\) |
\(P_8^\textrm{proj}\) | \((x_2^{gt} + w_v^\textrm{pred},y_2^{gt} + l_v^\textrm{pred},z_2^{gt} + h_v^\textrm{pred})\) |
Dataset of 3D vehicle localization
Dataset composition
Scene | \({D_{ry}}\) | \({D_{rx}}\) | Camera Calibration Parameters | |||
---|---|---|---|---|---|---|
f | \(\phi /rad\) | \(\theta /rad\) | h/mm | |||
A | 120 | 25 | 2878.13 | 0.17874 | 0.26604 | 10119.08 |
B | 120 | 25 | 3994.17 | 0.15717 | 0.35346 | 8071.00 |
C | 60 | 15 | 3384.25 | 0.26295 | \(-\)0.24869 | 8126.49 |
D | 80 | 10 | 3743.78 | 0.11225 | \(-\)0.07516 | 7353.40 |
E | 60 | 10 | 1142.26 | 0.33372 | 0.14387 | 7166.44 |
Label process
Vertex | World coordinate |
---|---|
\(P_1^{gt}\) | \((x_\textrm{cen}^{gt} + {{w_v^{gt}}/2},y_\textrm{cen}^{gt} - {{l_v^{gt}}/2},z_\textrm{cen}^{gt} - {{h_v^{gt}}/2})\) |
\(P_2^{gt}\) | \((x_\textrm{cen}^{gt} - {{w_v^{gt}}/2},y_\textrm{cen}^{gt} - {{l_v^{gt}}/2},z_\textrm{cen}^{gt} - {{h_v^{gt}}/2})\) |
\(P_3^{gt}\) | \((x_\textrm{cen}^{gt} - {{w_v^{gt}}/2},y_\textrm{cen}^{gt} + {{l_v^{gt}}/2},z_\textrm{cen}^{gt} - {{h_v^{gt}}/2})\) |
\(P_4^{gt}\) | \((x_\textrm{cen}^{gt} + {{w_v^{gt}}/2},y_\textrm{cen}^{gt} + {{l_v^{gt}}/2},z_\textrm{cen}^{gt} - {{h_v^{gt}}/2})\) |
\(P_5^{gt}\) | \((x_\textrm{cen}^{gt} + {{w_v^{gt}}/2},y_\textrm{cen}^{gt} - {{l_v^{gt}}/2},z_\textrm{cen}^{gt} + {{h_v^{gt}}/2})\) |
\(P_6^{gt}\) | \((x_\textrm{cen}^{gt} - {{w_v^{gt}}/2},y_\textrm{cen}^{gt} - {{l_v^{gt}}/2},z_\textrm{cen}^{gt} + {{h_v^{gt}}/2})\) |
\(P_7^{gt}\) | \((x_\textrm{cen}^{gt} - {{w_v^{gt}}/2},y_\textrm{cen}^{gt} + {{l_v^{gt}}/2},z_\textrm{cen}^{gt} + {{h_v^{gt}}/2})\) |
\(P_8^{gt}\) | \((x_\textrm{cen}^{gt} + {{w_v^{gt}}/2},y_\textrm{cen}^{gt} + {{l_v^{gt}}/2},z_\textrm{cen}^{gt} + {{h_v^{gt}}/2})\) |
Experimental protocols
Implementation details
Evaluation metrics
Average precision and speed
3D vehicle localization precision and error
3D vehicle dimension precision and error
Results and discussions
Average precision and speed of centerLoc3D
Method | Scene | Backbone | GPU | \(A{P_{3D}}(IOU > 0.5)\) \([va{l_1}/va{l_2}]\) | \(A{P_{3D}}(IOU > 0.7)\) \([va{l_1}/va{l_2}/test]\) | FPS | ||||
---|---|---|---|---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | |||||
MonoGRNet [13] | Onboard | VGG-16 | GTX Titan X | 50.51/54.21 | 36.97/39.69 | 30.82/33.06 | 13.88/24.97/– | 10.19/19.44/– | 7.62/16.30/– | 16.7 |
Deep3DBox [19] | Onboard | VGG-16 | – | 27.04/– | 20.55/– | 15.88/– | 5.85/–/– | 4.10/–/– | 3.84/–/– | – |
GS3D [22] | Onboard | VGG-16 | – | 32.15/30.60 | 29.89/26.40 | 26.19/22.89 | 13.46/11.63/7.69 | 10.97/10.51/6.29 | 10.38/10.51/6.16 | 0.4 |
RTM3D [25] | Onboard | ResNet-18 | \(\hbox {GTX}\,1080\hbox {Ti}{\times }2\) | 47.43/46.52 | 33.86/32.61 | 31.04/30.95 | 18.13/18.38/– | 14.14/14.66/– | 13.33/12.35/– | 28.6 |
DLA-34 | 54.36/52.59 | 41.90/40.96 | 35.84/34.95 | 20.77/19.47/13.61 | 16.86/16.29/10.09 | 16.63/15.57/8.18 | 18.2 | |||
SMOKE [15] | Onboard | DLA-34 | GTX TITAN \(\hbox {X}{\times }4\) | – | 14.76/19.99/14.03 | 12.85/15.61/9.76 | 11.50/15.28/7.84 | 33.3 | ||
KM3D [28] | Onboard | ResNet-18 | GTX 1080Ti | 47.23/47.13 | 34.12/33.31 | 31.51/25.84 | 19.48/18.34/12.65 | 15.32/14.91/8.39 | 13.88/12.58/7.12 | 47.6 |
DLA-34 | 56.02/54.09 | 43.13/43.07 | 36.77/37.56 | 22.50/22.71/16.73 | 19.60/17.71/11.45 | 17.12/16.15/9.92 | 25.0 | |||
Lite-FPN [27] | Onboard | ResNet-18 | GTX 2080Ti | – | 17.04/–/– | 14.02/–/– | 12.23/–/– | 88.57 | ||
ResNet-34 | 18.01/–/15.32 | 15.29/–/10.64 | 14.28/–/8.59 | 71.32 | ||||||
DLA-34 | 19.31/–/– | 16.19/–/– | 15.47/–/– | 42.37 | ||||||
Ours | Roadside | ResNet-50 | GTX 1080Ti | 91.34/– | 79.36/–/51.30 | 41.18 |
Method | BBox | Centroid | Dimension | Location |
---|---|---|---|---|
MonoGRNet [13] | \(\checkmark \) | \(\checkmark \) | \(\checkmark \) | |
Deep3DBox [19] | \(\checkmark \) | |||
GS3D [22] | \(\checkmark \) | \(\checkmark \) | ||
RTM3D [25] | \(\checkmark \) | \(\checkmark \) | \(\checkmark \) | |
SMOKE [15] | \(\checkmark \) | \(\checkmark \) | \(\checkmark \) | |
KM3D [28] | \(\checkmark \) | \(\checkmark \) | ||
Lite-FPN [27] | \(\checkmark \) | \(\checkmark \) | ||
Ours | \(\checkmark \) | \(\checkmark \) | \(\checkmark \) | \(\checkmark \) |
3D vehicle localization precision and error of centerLoc3D
3D vehicle dimension precision and error of centerLoc3D
Vehicle | Type | \({P_{centroid}}\) | \({{\widetilde{P}}_{centorid}}\) | Precision |
---|---|---|---|---|
1 | Car | 24.927, 88.430, 0.673 | 24.952, 88.519, 0.760 | 0.996 |
2 | Car | 8.611, 68.119, 0.717 | 8.585, 67.729, 0.770 | 0.991 |
3 | Car | \(-\)0.015, 82.595, 0.755 | \(-\)0.015, 82.595, 0.755 | 0.999 |
4 | Car | 8.188, 105.823, 0.780 | 8.217, 105.714, 0.825 | 0.996 |
5 | Car | 11.508, 72.225, 0.731 | 11.434, 71.608, 0.785 | 0.984 |
6 | Car | 18.829, 62.322, 0.727 | 18.791, 61.883, 0.790 | 0.990 |
7 | Truck | 21.315, 43.322, 0.958 | 21.219, 43.156, 0.875 | 0.990 |
8 | Car | 8.538, 38.425, 0.735 | 8.451, 38.100, 0.730 | 0.988 |
9 | Car | 18.144, 67.893, 0.674 | 18.165, 68.067, 0.700 | 0.995 |
10 | Car | 0.336, 43.784, 0.730 | 0.382, 43.959, 0.700 | 0.993 |
11 | Car | 3.812, 46.123, 0.697 | 3.794, 46.043, 0.710 | 0.997 |
12 | Car | 11.035, 65.584, 0.730 | 11.261, 65.935, 0.740 | 0.976 |
13 | Car | 0.142, 81.101, 0.721 | 0.100, 80.344, 0.770 | 0.984 |
14 | Car | \(-\)14.298, 59.759, 0.662 | \(-\)14.297, 59.758, 0.680 | 0.999 |
15 | Car | \(-\)6.232, 39.097, 0.672 | \(-\)6.193, 39.138, 0.750 | 0.993 |
16 | Car | \(-\)9.671, 38.371, 0.705 | \(-\)9.754, 38.703, 0.665 | 0.978 |
17 | Car | \(-\)6.249, 56.957, 0.681 | \(-\)6.275, 56.747, 0.690 | 0.989 |
18 | Car | \(-\)10.033, 64.324, 0.740 | \(-\)10.032, 64.321, 0.740 | 0.999 |
19 | Car | \(-\)1.770, 53.300, 0.756 | \(-\)1.820, 52.683, 0.860 | 0.975 |
20 | Bus | \(-\)5.174, 71.789, 1.452 | \(-\)5.341, 73.016, 1.410 | 0.936 |
21 | Car | \(-\)7.645, 57.673, 0.800 | \(-\)7.593, 57.247, 0.800 | 0.975 |
22 | Car | 0.356, 22.313, 0.739 | 0.340, 22.388, 0.670 | 0.994 |
23 | Car | 0.862, 37.990, 0.735 | 0.804, 37.053, 0.765 | 0.957 |
24 | Car | 1.376, 40.059, 0.825 | 1.412, 40.053, 0.860 | 0.993 |
Vehicle | Type |
\({D_v}\)
|
\({{\widetilde{D}}_v}\)
| Precision |
---|---|---|---|---|
1 | Car | 3.60, 1.71, 1.37 | 3.79, 1.70, 1.27 | 0.860 |
2 | Car | 3.26, 1.67, 1.31 | 3.18, 1.61, 1.25 | 0.890 |
3 | Car | 4.05, 1.76, 1.40 | 3.92, 1.80, 1.40 | 0.942 |
4 | Car | 4.51, 1.81, 1.47 | 4.42, 1.88, 1.46 | 0.935 |
5 | Car | 4.43, 1.78, 1.37 | 4.40, 1.78, 1.48 | 0.915 |
6 | Car | 4.74, 1.80, 1.46 | 4.33, 1.77, 1.40 | 0.843 |
7 | Car | 4.58, 1.82, 1.45 | 4.96, 1.86, 1.54 | 0.845 |
8 | Car | 4.50, 1.79, 1.40 | 4.50, 1.70, 1.36 | 0.912 |
9 | Car | 3.74, 1.64, 1.27 | 4.07, 1.68, 1.30 | 0.872 |
10 | Car | 4.55, 1.80, 1.42 | 4.58, 1.68, 1.40 | 0.910 |
11 | Car | 3.57, 1.80, 1.35 | 4.11, 1.80, 1.38 | 0.844 |
12 | Car | 3.71, 1.76, 1.36 | 3.90, 1.80, 1.33 | 0.912 |
13 | Car | 3.34, 1.77, 1.32 | 3.70, 1.76, 1.25 | 0.838 |
14 | Bus | 12.83, 2.71, 2.75 | 12.00, 2.76, 2.82 | 0.886 |
15 | Car | 4.74, 1.87, 1.48 | 4.77, 1.83, 1.53 | 0.939 |
16 | Car | 5.00, 1.89, 1.48 | 4.75, 1.86, 1.56 | 0.880 |
17 | Car | 4.69, 1.84, 1.44 | 4.60, 1.81, 1.37 | 0.914 |
18 | Car | 4.68, 1.85, 1.43 | 4.56, 1.81, 1.34 | 0.885 |
19 | Car | 4.64, 1.84, 1.45 | 4.68, 1.82, 1.50 | 0.947 |
20 | Bus | 12.74, 2.68, 2.62 | 12.00, 2.76, 2.82 | 0.838 |
Ablation study of centerLoc3D
Method | Length/m | Width/m | Height/m |
---|---|---|---|
3DOP [10] | 0.504 | 0.094 | 0.107 |
Mono3D [34] | 0.582 | 0.103 | 0.172 |
MonoGRNet [13] | 0.412 | 0.084 | 0.084 |
MonoGRK [24] | 0.403 | 0.091 | 0.101 |
Ours | 0.137 | 0.031 | 0.030 |
Model | Modules |
\(A{P_{3D}}(IOU > 0.7)\)
| FPS | Improvement of \(A{P_{3D}}\) | ||
---|---|---|---|---|---|---|
Weighted-Fusion | Reprojection | IoU | ||||
\({M_\textrm{base}}\)
| 52.52 / 36.20 | 46.73 | – | |||
\({M_1}\)
|
\(\checkmark \)
| 57.07 / 42.84 | 43.23 | 6.64 | ||
\({M_2}\)
|
\(\checkmark \)
|
\(\checkmark \)
| 68.38 / 45.32 | 41.31 | 2.48 | |
\({M_3}\)
|
\(\checkmark \)
|
\(\checkmark \)
|
\(\checkmark \)
| 79.36 / 51.30 | 41.18 | 5.98 |