The pose of a slicing plane with respect to a volume can be estimated with traditional approaches such as
feature-based and
intensity-based slice-to-volume registration or convolutional neural network (CNN)-based methods. In traditional approaches, iterative numerical optimisation maximises intensity-based similarity metrics or minimises the distance between registered point features [
7,
8]. However, the cost functions associated with these metrics are frequently non-convex and require a reliable initialisation. They are also computationally costly and, more importantly, require having a 3D volume of the subject being scanned beforehand, which is not suitable for a point-of-care fetal US application. With the increased interest in DL, new approaches have been proposed to address the ill-posed slice-to-volume registration problem using CNNs [
9,
10]. 3D pose estimation methods based on CNN are classifiable into two groups. The first includes models that predict keypoints used to find the orientation [
11,
12]. The second group comprises models predicting the object pose directly from images [
13,
14]. Works like [
11,
15] demonstrated that DL metrics slightly outperform patch features and local image intensity, which are typically employed in slice-to-volume registration. Pose estimation has been primarily approached as a classification problem, with the pose space being discretised into bins [
13,
14]. Conversely, Mahendran et al. [
16] have modelled the 3D object pose estimation as a regression problem, proposing a deep CNN to estimate rotation matrices with a new geodesic distance-based loss function. In fetal magnetic resonance imaging (MRI) [
17] and fetal US [
18], learning-based approaches have also been proposed. Namburete et al. [
18] formulated the alignment of fetal US as a one-coordinate position estimation and a 3-class slice plane classification. They trained their CNN using the negative likelihood loss to simultaneously predict slice location and brain segmentation. Hierarchical learning has been proposed for pose estimation in works such as [
11,
19]. Here, the six dimensions of the parameter space were partitioned into three areas to separately learn the regression function based on in-plane and out-of-plane rotations as well as on out-of-plane translations hierarchically in order to speed up slice-to-volume rigid registration and improve its capture range. However, the pose estimation was based on a 2D-projected image representation of objects, leading to limited rotations. Li et al. [
20] proposed a new approach for standard plane detection in 3D fetal US using a CNN to regress a rigid transformation iteratively comparing different transformation representations. In [
21], Salehi et al. used a CNN to estimate the 3D pose (rotation and translation) of arbitrarily oriented MRI slices based on their sectional image representations for registration purposes. To this aim, they devised a regression problem based on the angle-axis representation of 3D rotations.
Deep learning regression of 6D pose, and in particular 3D rotations, is a widely studied topic beyond the medical field. Different rotation representations have been used in this context. Works like [
22] adopted quaternions for regression, which are free from singularities but have an antipodal problem. This issue is also shown in [
23], where the authors reported a high percentage of errors between 90
\(^\circ \) and 180
\(^\circ \). Axis-angle representation has also been used [
24] to estimate the 6D pose of object instances starting from RGB images, depth maps or scanned point clouds. However, Zhou et al. [
6] showed that any rotation representation in 3D with less than five dimensions is discontinuous in the real Euclidean space, making them harder to learn. Empirically, the network converges but produces large errors for specific rotation angles. To cope with this limitation, they proposed a new continuous representation for the
n dimensional rotations
SO(
n), the “6D-loss”, obtained through projection and normalisation of the first two rows of each rotation matrix and continuous for all elements in
SO(3):
\(\mathcal {L}_{6D}~=~\left\| (\tilde{R}_{:,1:2}/\left\| \tilde{R}_{:,1:2} \right\| _2) - (R_{:,1:2}/\left\| R_{:,1:2} \right\| _2) \right\| _2 \). Empirical results suggest that continuous representations (5D, 6D and vector-based) outperform discontinuous ones (Euler angles, quaternions, axis-angle) and are more suited for the regression task.