Introduction
Method
Pose estimation
Symbol | Description |
---|---|
\(t\in \mathbb {Z}\) | Time frame index |
\(\mathcal {I}^{(l)}_t \in \mathbb {R}^{X\times Y\times 3}\) | Rectified left stereo image at time t |
\(\mathcal {D}_t\in \mathbb {R}^{X\times Y}\) | Depth map w.r.t. to left image at time t |
\(\mathcal {F}_t\in \mathbb {R}^{X\times Y}\) | Optical flow from \(\mathcal {I}^{(l)}_t\) to \(\mathcal {I}^{(l)}_{t-1}\) |
\(\mathcal {F}'_t\in \mathbb {R}^{X\times Y}\) | Parallax flow displacement across stereo images |
\(\textbf{x}\in \mathbb {Z}^2\) | Pixel index in 2D Cartesian coordinate system |
\(\textbf{p}_t\in \mathfrak {se}(3)\subset \mathbb {R}^6\) | Relative pose from t to \(t-1\) in Lie algebra space |
\(\textbf{p}_t^\star \in \mathfrak {se}(3)\subset \mathbb {R}^6\) | Relative pose solution in Lie algebra space |
\(\exp (\textbf{p}):\mathfrak {se}(3)\rightarrow \text {SE}(3)\) | Matrix exponential from Lie algebra to Lie group |
\(\pi _{\text {2D}}(\textbf{v}):\mathbb {R}^4\rightarrow \mathbb {R}^2\) | Projection of homogeneous 3D to 2D coordinates |
\(\pi _{\text {3D}}(\mathcal {D}_t,\textbf{x}):\mathbb {R}^{X\times Y}\times \mathbb {Z}^2\rightarrow \mathbb {R}^4\) | Re-projection of 2D to 3D homogeneous coordinates |
\(\mathcal {F}_t(\textbf{x}):\mathbb {Z}^2\rightarrow \mathbb {R}^{2}\) | Optical flow function across temporal domain |
\(\omega _{\text {2D}}(\textbf{x}):\mathbb {Z}^2\rightarrow \left[ 0, 1\right] \) | Learned per-pixel weight for 2D residuals |
\(\omega _{\text {3D}}(\textbf{x}):\mathbb {Z}^2\rightarrow \left[ 0, 1\right] \) | Learned per-pixel weight for 3D residuals |
\(\Vert \cdot \Vert _n: \mathbb {R}^m\rightarrow \mathbb {R}^+\) | \(\ell ^n\) Vector norm |
Learning the weight maps
Experiments
Datasets
-
breathing: only depicts breathing deformations and contains no camera or tool motion,
-
scanning: includes camera motion in addition to breathing deformations,
-
deforming: comprises tissue deformations due to breathing and manipulation or resection of tissue, while the camera is static.
Implementation details
Segmentation of surgical instruments
Training and inference
Metrics and baseline methods
Results
Scenario | Breathing | Scanning | Deforming | Microavg. | Macroavg. |
---|---|---|---|---|---|
# Sequences | 17 | 60 | 9 | ||
Camera motion | \(\checkmark \) | ||||
Breathing | \(\checkmark \) | \(\checkmark \) | \(\checkmark \) | ||
Tool interactions | \(\checkmark \) | ||||
ORB-SLAM2 [1] | \(2.35\pm 1.81\) | \(3.26 \pm 1.65\) | \(4.29 \pm 2.30\) | \(3.19 \pm 1.81\) | \(3.30 \pm 0.97\) |
ElasticFusion [2] | \(1.94 \pm 0.93\) | \(4.04 \pm 3.46\) | \(6.47 \pm 8.64\) | \(3.88 \pm 4.12\) | \(4.15 \pm 2.27\) |
Ours (w/o weight) | \(1.65\pm 0.97\) | \(3.01 \pm 1.60\) | \(4.67 \pm 2.13\) | \(2.91 \pm 1.74\) | \(3.11 \pm 1.51\) |
Ours (only 2D) | \(1.15\pm 0.72\) | \(3.01 \pm 1.66\) | \(2.83 \pm 1.41\) | \(2.62 \pm 1.66\) | \(2.33 \pm 1.03\) |
Ours (only 3D) | \(\mathbf {0.78\pm 2.03}\) | \(7.02 \pm 5.86\) | \(2.72 \pm 1.90\) | \(5.34 \pm 5.64\) | \(3.51 \pm 3.20\) |
Ours (2D & 3D) | \(1.01 \pm 0.59\) | \(\mathbf {2.89\pm 2.33}\) | \(\mathbf {2.23 \pm 1.07}\) | \(\mathbf {2.45 \pm 2.12}\) | \(\mathbf {2.04 \pm 0.95}\) |
H2 | H3 | P2 | P3 | Macroavg | |
---|---|---|---|---|---|
ATE-RMSE (mm) | |||||
ORB-SLAM2 [1] | 18.0 | \(\mathbf {9.1}\) | 14.0 | 21.4 | \(15.6 \pm 5.3\) |
ElasticFusion [2] | 30.8 | 72.1 | 33.6 | 37.7 | \(43.6 \pm 19.3\) |
Ours | \(\mathbf {10.9}\) | 21.2 | \(\mathbf {13.8}\) | \(\mathbf {8.8}\) | \(\mathbf {13.7 \pm 5.4}\) |
RPE-trans (mm) | |||||
ORB-SLAM2 [1] | \(0.20 \pm 0.43\) | \(0.24 \pm 0.25\) | \(0.35 \pm 0.46\) | \(0.54 \pm 0.47\) | \(0.33 \pm 0.13\) |
ElasticFusion [2] | \(0.87 \pm 1.11\) | \(0.56 \pm 1.03\) | \(0.81 \pm 1.11\) | \(0.71 \pm 0.79\) | \(0.74 \pm 0.12\) |
Ours | \(\mathbf {0.10 \pm 0.27}\) | \(\mathbf {0.10 \pm 0.18}\) | \(\mathbf {0.16 \pm 0.32}\) | \(\mathbf {0.19 \pm 0.31}\) | \(\mathbf {0.14 \pm 0.04}\) |
RPE-rot (deg) | |||||
ORB-SLAM2 [1] | \(0.16 \pm 0.36\) | \(0.16 \pm 0.22\) | \(0.19 \pm 0.24\) | \(0.28 \pm 0.27\) | \(0.20\pm 0.05\) |
ElasticFusion [2] | \(0.73 \pm 1.06\) | \(0.41 \pm 0.96\) | \(0.50 \pm 1.11\) | \(0.38 \pm 0.40\) | \(0.51 \pm 0.14\) |
Ours | \(\mathbf {0.04 \pm 0.20}\) | \(\mathbf {0.04 \pm 0.13}\) | \(\mathbf {0.07 \pm 0.14}\) | \(\mathbf {0.05 \pm 0.10}\) | \(\mathbf {0.05 \pm 0.01}\) |
d1_k2 | d8_k1 | d9_k1 | d9_k3 | Avg | SCARED avg | |
---|---|---|---|---|---|---|
ORB-SLAM2 [1] | 0.91 | 2.97 | 4.33 | 3.79 | 3.00 | \(2.34 \pm 1.24\) |
ElasticFusion [2] | 1.02 | 3.62 | 4.30 | 3.36 | 3.08 | \(2.91 \pm 1.77\) |
Wei et al. [8] | 0.74 | 2.47 | 4.07 | 1.54 | 2.21 | – |
Ours (frame2model) | \(\mathbf {0.37}\) | \(\mathbf {2.08}\) | \(\mathbf {2.04}\) | \(\mathbf {0.84}\) | \(\mathbf {1.33}\) | \(\mathbf {1.38 \pm 0.93}\) |