1 INTRODUCTION
2 RELATED WORK
3 METHOD
3.1 ERF-SfMLearner Architecture
3.2 Learning Approach
3.3 Receptive Field Extension with Dilated Convolution
3.4 Receptive Field Extension with Deformable Convolution
4 EXPERIMENTS
4.1 Pose Estimation
Seq. 09 | Seq. 10 | |||||||
---|---|---|---|---|---|---|---|---|
Method (image resolution) | \(AT{{E}_{{{\text{mean}}}}}\) | \(AT{{E}_{{{\text{std}}}}}\) | \(R{{E}_{{{\text{mean}}}}}\) | \(R{{E}_{{{\text{std}}}}}\) | \(AT{{E}_{{{\text{mean}}}}}\) | \(AT{{E}_{{{\text{std}}}}}\) | \(R{{E}_{{{\text{mean}}}}}\) | \(R{{E}_{{{\text{std}}}}}\) |
PoseNet 248 × 75 | 0.04 | 0.0419 | 0.0058 | 0.0035 | 0.0218 | 0.0147 | 0.0052 | 0.0035 |
PoseNet 310 × 94 | 0.0272 | 0.0251 | 0.0054 | 0.0033 | 0.0162 | 0.0105 | 0.0052 | 0.0034 |
PoseNet 416 × 128 | 0.021 | 0.0157 | 0.0048 | 0.0029 | 0.0145 | 0.009 | 0.0047 | 0.0034 |
ERFPoseNet (Dilated) 416 × 128 | 0.0187 | 0.0147 | 0.0048 | 0.003 | 0.0141 | 0.0092 | 0.0049 | 0.004 |
ERFPoseNet (Dfc) 416 × 128 | 0.018 | 0.0124 | 0.0042 | 0.0027 | 0.0135 | 0.0094 | 0.0043 | 0.0036 |
ERFPoseNet (Dfcv2) 416 × 128 | 0.0206 | 0.0138 | 0.0049 | 0.0028 | 0.0143 | 0.0091 | 0.0048 | 0.0035 |
PoseNet 620 × 188 | 0.0186 | 0.0117 | 0.0048 | 0.0036 | 0.0137 | 0.0086 | 0.0049 | 0.0042 |
ERFPoseNet (Dilated) 620 × 188 | 0.0182 | 0.0108 | 0.0048 | 0.0033 | 0.0132 | 0.0088 | 0.0047 | 0.0033 |
ERFPoseNet (Dfc) 620 × 188 | 0.0165 | 0.0087 | 0.0043 | 0.0024 | 0.0132 | 0.0095 | 0.0042 | 0.0036 |
ERFPoseNet (Dfcv2) 620 × 188 | 0.0173 | 0.0102 | 0.0048 | 0.0029 | 0.0136 | 0.0096 | 0.005 | 0.0045 |
PoseNet 1241 × 376 | 0.0165 | 0.008 | 0.0056 | 0.005 | 0.0148 | 0.0095 | 0.0056 | 0.0054 |
4.2 Depth Estimation
Method (image resolution) | Scale | Error metric | Accuracy metric | |||||
---|---|---|---|---|---|---|---|---|
PoseNet | GT | Abs Rel | Sq Rel | RMSE | \(\delta < 1.25\) | \(\delta {{ < 1.25}^{2}}\) | \(\delta {{ < 1.25}^{3}}\) | |
ERFDepthNet (Dfc) 416 × 128 | ✓ | 0.1988 | 1.8269 | 6.6759 | 0.7091 | 0.8866 | 0.953 | |
ERFDepthNet (Dfcv2) 416 × 128 | ✓ | 0.214 | 2.1015 | 6.8433 | 0.6805 | 0.8811 | 0.9503 | |
ERFDepthNet (Dfc) 416 × 128 | ✓ | 0.3097 | 3.9491 | 7.7958 | 0.55 | 0.7944 | 0.8982 | |
ERFDepthNet (Dfcv2) 416 × 128 | ✓ | 0.3303 | 4.6615 | 8.1794 | 0.5344 | 0.7709 | 0.8797 | |
ERFDepthNet (Dfc) 620 × 188 | ✓ | 0.1927 | 1.8134 | 6.3779 | 0.7334 | 0.9082 | 0.9639 | |
ERFDepthNet(Dfc) 620 × 188 | ✓ | 0.2832 | 3.2512 | 7.5411 | 0.5665 | 0.805 | 0.9015 |
Method (image resolution) | Scale | Error metric | Accuracy metric | |||||
---|---|---|---|---|---|---|---|---|
PoseNet | GT | Abs Rel | Sq Rel | RMSE | \(\delta < 1.25\) | \(\delta {{ < 1.25}^{2}}\) | \(\delta {{ < 1.25}^{3}}\) | |
DepthNet + PoseNet 248 × 75 | ✓ | 0.2375 | 2.2019 | 7.4432 | 0.6201 | 0.8539 | 0.9373 | |
DepthNet + PoseNet 310 × 94 | ✓ | 0.2215 | 2.0839 | 7.1324 | 0.656 | 0.8726 | 0.9463 | |
DepthNet + PoseNet 416 × 128 | ✓ | 0.2135 | 1.8977 | 6.7938 | 0.6789 | 0.875 | 0.946 | |
DepthNet + ERFPoseNet (Dilated) 416 × 128 | ✓ | 0.2041 | 1.7651 | 6.736 | 0.6922 | 0.8872 | 0.9539 | |
DepthNet + ERFPoseNet (Dfc) 416 × 128 | ✓ | 0.2017 | 1.8202 | 6.7059 | 0.704 | 0.8902 | 0.9529 | |
DepthNet + ERFPoseNet (Dfcv2) 416 × 128 | ✓ | 0.2083 | 2.0195 | 6.8569 | 0.695 | 0.884 | 0.9506 | |
DepthNet + PoseNet 416 × 128 | ✓ | 0.354 | 5.0006 | 8.3413 | 0.5026 | 0.7519 | 0.8628 | |
DepthNet + ERFPoseNet (Dilated) 416 × 128 | ✓ | 0.3305 | 3.6844 | 8.1672 | 0.4968 | 0.7378 | 0.8509 | |
DepthNet + ERFPoseNet (Dfc) 416 × 128 | ✓ | 0.3132 | 4.7717 | 7.8997 | 0.5478 | 0.7907 | 0.8957 | |
DepthNet + ERFPoseNet (Dfcv2) 416 × 128 | ✓ | 0.3123 | 4.2249 | 8.0961 | 0.5371 | 0.78 | 0.8903 | |
DepthNet + PoseNet 620 × 188 | ✓ | 0.2034 | 2.1613 | 6.7045 | 0.7147 | 0.8929 | 0.9529 | |
DepthNet + ERFPoseNet (Dilated) 620 × 188 | ✓ | 0.1977 | 1.9366 | 6.5297 | 0.7185 | 0.8948 | 0.9557 | |
DepthNet + ERFPoseNet (Dfc) 620 × 188 | ✓ | 0.2125 | 2.7657 | 6.8724 | 0.7048 | 0.8874 | 0.948 | |
ERFDepthNet (DFC) + ERFPoseNet (Dfc) 620 × 188 | ✓ | 0.1928 | 1.8521 | 6.4198 | 0.7425 | 0.8983 | 0.9631 | |
DepthNet + PoseNet 620 × 188 | ✓ | 0.3162 | 4.3905 | 7.9923 | 0.5455 | 0.7783 | 0.8821 | |
DepthNet + ERFPoseNet (Dilated) 620 × 188 | ✓ | 0.3083 | 3.8333 | 7.8461 | 0.5499 | 0.7744 | 0.8769 | |
DepthNet + ERFPoseNet (Dfc) 620 × 188 | ✓ | 0.2972 | 4.5935 | 7.9023 | 0.5773 | 0.8033 | 0.8994 | |
ERFDepthNet (DFC) + ERFPoseNet (Dfc) 620 × 188 | ✓ | 0.2865 | 3.4243 | 7.6243 | 0.5863 | 0.8092 | 0.9001 | |
DepthNet + PoseNet 1241 × 376 | ✓ | 0.2202 | 3.0018 | 7.183 | 0.6976 | 0.8897 | 0.9515 |