Dynamic movement primitives (DMPs) as a robust and efficient framework has been studied widely for robot learning from demonstration. Classical DMPs framework mainly focuses on the movement learning in Cartesian or joint space, and can't properly represent end-effector orientation. In this paper, we present an extended DMPs framework (EDMPs) both in Cartesian space and 2-Dimensional (2D) sphere manifold for Quaternion-based orientation learning and generalization. Gaussian mixture model and Gaussian mixture regression (GMM-GMR) are adopted as the initialization phase of EDMPs to handle multi-demonstrations and obtain their mean and covariance. Additionally, some evaluation indicators including reachability and similarity are defined to characterize the learning and generalization abilities of EDMPs. Finally, a real-world experiment was conducted with human demonstrations, the endpoint poses of human arm were recorded and successfully transferred from human to the robot. The experimental results show that the absolute errors of the Cartesian and Riemannian space skills are less than 3.5 mm and 1.0°, respectively. The Pearson’s correlation coefficients of the Cartesian and Riemannian space skills are mostly greater than 0.9. The developed EDMPs exhibits superior reachability and similarity for the multi-space skills’ learning and generalization. This research proposes a fused framework with EDMPs and GMM-GMR which has sufficient capability to handle the multi-space skills in multi-demonstrations.
1 Introduction
Learning from Demonstration (LfD) has played a key role for robots to learn movement and manipulation skills from humans due to its high efficiency [1]. Conventional LfD methods, e.g., teach-pendant, joysticks, keyboard, etc. are used for fast programming that more focus on the endpoint movement trajectory planning and control. Such interfaces are only for some simple tasks, and it is powerless to the anthropomorphic skillful operations. In recent years, many LfD approaches have been developed for complicated tasks, of which DMPs [2], stable estimator of dynamical systems (SEDS) [3], Gaussian mixture model/regression (GMM-GMR) [4], probabilistic movement primitives (ProMP) [5], kernelized movement primitives (KMP) [6] and Hidden (Semi-) Markov model (H(s)MM) [7] are outstanding representatives.
As a widespread LfD approach, DMPs is proposed and developed by Ijspeert et al. [8‐10], to describe a trajectory by a series of action units. Such movement primitives are formalized as a stable attractor system to generate the trajectory either in task or joint space [11]. The classical DMPs framework composed of a canonical system module, a transformation system module, and a locally weighted regression (LWR) module, is developed to encode movement, learn their characteristics, and generalize to other similar targets.
Advertisement
In recent years, many approaches based on the classical DMPs are presented to extend its functionality, such as obstacle avoidance [12‐14], stiffness learning [15, 16], collaborative behavior imitation [17], etc. As one of the most commonly used skill learning frameworks, DMPs model exhibits many excellent performances such as robustness to perturbations, convergence to attractors, time independence, etc. The approach is extensively applied to learn some anthropomorphic skills such as the skillful sports [18]. Although the classical DMPs is widely used, it still has some drawbacks [19]. In this paper, we are committed to endowing the classical DMPs with the capability to handle multi-demonstrations and Riemannian space skills such as orientations.
In LfD community, GMM-GMR provides a suitable option for multi-demonstrations to obtain more demonstrated information, such as the probability distribution of multi-trajectories. GMM-GMR encodes the human skills as a clustering problem by estimating the joint distribution over the state variables and performing regression with the conditional distribution. As a robust learning algorithm, GMM-GMR is widely used for learning and reproducing human skills in kinematics and dynamics. When dealing with multi-demonstrated trajectories, the data is usually projected onto a latent space, and then encoded and reproduced by GMM and GMR successively [20]. Comparing with the DMPs approach, GMM-GMR can obtain mean and probability distribution simultaneously from multi-demonstrations. These parameters are beneficial to summarizing the demonstrated law, even provide some guidance for variable impedance controllers [21]. Although GMM-GMR has many merits, this approach lacks generalization capacity when the target exceeds its distribution range. On that account, TP-GMM [22] is developed to adapt the context by extracting the relevance between different tasks. Due to the mutual complementarity between DMPs and GMM-GMR, in Ref. [23], GMM-GMR is introduced into DMPs framework as the nonlinear terms for multi-trajectories, but this approach was applied in joint space, only suitable for Cartesian space parameters, and ignored the probability distribution of multi-demonstrations. Similarly, we incorporate GMM-GMR into DMPs, but we more focus on the task space and Riemannian space skills like orientation, and effectively utilized the covariance characteristics.
Position and orientation are important for robots to accurately learn movement skills. Many existing works have addressed the position learning based on the classical DMPs framework in Cartesian space. Since the orientation is the skill on manifolds, the classical DMPs framework is unable to precisely handle such skills. Therefore, in recent years, many researches have represented the distance between orientations with the geodesics on the Riemannian manifolds. Such approaches provide the possibility to properly represent end-effector orientations. In Ref. [24], several concepts of Riemannian manifolds such as geodesics and logarithm/exponential maps are specifically discussed in robotics, and four kinds of manifolds are listed including the sphere manifold \(\mathcal{S}^{d}\), special orthogonal group \(SO(d)\), special Euclidean group \(SE(3)\), and the manifold of SPD matrix \(S_{ + + }^{d}\). In Refs. [25, 26], a modified DMPs framework is proposed to learn orientations in Cartesian space based on the quaternion \(S^{3}\) and rotation matrix \(SO(3)\) with the logarithmic map. The approaches take an effective way for the robot end-effector orientations, but they lack the ability to handle multi-space skills, such as the poses including positions and orientations, Moreover, the methods inherit the drawbacks of the classical DMPs which are powerless to the multi-demonstrations. In Ref. [27], the skills on the \(S_{ + + }^{d}\) manifold are learned with their geometry of the SPD matrix space. Although the method successfully learns the end-point stiffness skills which have SPD property. But the rotation matrix always has not the positive definite and symmetric characteristics which limits its application.
To this end, we provide a new approach for learning Quaternion-based orientations based on the concepts of geodesics and exponential function on the Riemannian manifold. Different from the above-mentioned publications, our approach focuses on the 2D sphere manifold \(S^{2}\). We decompose the quaternion \(S^{3}\) into a Cartesian term \({\mathbb{R}}\) and a Riemannian term \(S^{2}\), i.e., the rotation angle and axis \({\varvec{q}} = q + \lambda {\varvec{v}}\). Thus, our framework can handle the Cartesian term \(q \in {\mathbb{R}}\) and the Riemannian term \({\varvec{v}} \in S^{{{\kern 1pt} 2}}\) respectively. In brief, comparing with the state-of-the-art researches [28], our framework can learn the multi-space skills in cartesian space and 2D sphere manifold. The demonstrated human arm endpoint poses including positions and orientations can be transferred to robots simultaneously.
Advertisement
The contributions of this paper can be summarized as follows:
(1)
We proposed an EDMPs framework to learn and generalize quaternion-based orientations from human to robots by extending the classical DMPs to the 2D sphere manifold.
(2)
We combined the GMM-GMR and EDMPs framework according to their mutual complementarity. The fused framework can not only handle multiple demonstrations to obtain more demonstrated information, but also has a good generalization ability.
(3)
We proposed several evaluation indicators including reachability and similarity to evaluate the learning results of EDMPs under the determined RBFs and time constants of the algorithms.
The remaining of this paper is organized as follows. Section 2 presents the methodology of data preprocessing, EDMPs framework, GMM-GMR algorithm and evaluation indicators. In Section 3, a real-world experiment has been performed to evaluate its effectiveness. Discussion is carried out in Section 4. Section 5 provides the conclusion of this paper
2 Methodology
Aiming at the orientation learning from human to robots, and helping them acquire multi-space skills conveniently and autonomously, as shown in Figure 1, the architecture mainly consists of four layers, i.e., human demonstrations (green), data preprocessing (blue), skills learning (yellow) and robot control (red). We will provide a specific description of data preprocessing and EDMPs framework in Sections 2.1 and 2.2. And then, the methodology of GMM-GMR for multi-space parameters under multi-demonstrations will be introduced in Section 2.3. Additionally, we design several evaluation indicators in Section 2.4 to evaluate our learning and generalization results. For a better understanding, we summarize the key notations and abbreviations in Table 1.
Table 1
Description of key notations and abbreviations
\(\{ \cdot \}\)
Trajectory from one demonstration
\(\{ \{ \cdot \} \}\)
Multi-trajectories from multi-demonstrations
\({\varvec{p}}\)
Position
\({\varvec{q}}\)
Quaternion
\(\theta\)
Angle-quaternion
\({\varvec{v}}\)
Axis-quaternion
\({\varvec{T}}\)
Time
\({\varvec{s}}\)
Phase variable
\({\varvec{\varPsi}}\)
RBFs
\(W\)
Weights of RBFs
\(c_{i}\)
Center of i-th RBFs
\(h_{i}\)
Width of i-th RBFs
\({\varvec{R}}_{O}^{{\hat{O}}}\)
Rotation matrix from \(O\) to \(\hat{O}\)
\({\varvec{\gamma}}\)
Vectors from \({\varvec{v}}_{i}\) to \({\varvec{v}}_{{i{ + 1}}}\)
\(f(s)\)
Nonlinear term
\(\tau\)
Temporal scaling factor
\({\text{d}}({\varvec{x}},{\varvec{y}})\)
Geodesics between x and y
M
Number of points in a demonstration
K
Number of demonstrations
N
Number of Gaussian distributions
\({\varvec{\xi}}^{I}\)
Inputs of GMR
\({\varvec{\xi}}^{O}\)
Outputs of GMR
\(P( * )\)
Probability distribution
\(\uppi\)
Probability of Gaussian distributions
\({\varvec{\mu}}\)
Mean of Gaussian distributions
\(\user2{\sum }\)
Covariance of Gaussian distributions
\(e_{c}\)
Absolute error of Cartesian skills
\(\Delta e_{c}\)
Relative error of Cartesian skills
\(e_{r}\)
Absolute error of Riemannian skills
\(\rho_{c}\)
PCCc of Cartesian skills
\(\rho_{r}\)
PCCr of Riemannian skills
\(\sigma\)
Standard deviation
DMPs
Dynamic movement primitives
GMM/R
Gaussian mixture model/ regression
TS
Transformation system
EM
Expectation-maximization
RBFs
Radial basis functions
LWR
Locally weighted regression
PCCc
Pearson’s correlation coefficient in Cartesian space
PCCr
Pearson’s correlation coefficient on 2D sphere manifold
R
Reproduced curve
G1−3
Generalized curve 1-3
×
2.1 Data Processing
2.1.1 Orthogonal Processing
As described in Figure 1, several trajectories of the reference points are recorded from human demonstrations with the VICON motion capture system, thus, we can calculate the positions \(\left\{ {\left\{ {\varvec{p}} \right\}} \right\} \in {\mathbb{R}}^{3}\) and orientations \(\left\{ {\left\{ {{\varvec{o}}_{x} } \right\}} \right\}\), \(\left\{ {\left\{ {{\varvec{o}}_{y} } \right\}} \right\}\), \(\left\{ {\left\{ {{\varvec{o}}_{z} } \right\}} \right\} \in \mathcal{S}^{2}\) with these reference points. Thus, the pose matrices can be constructed with multi-dimensional orientations \({\varvec{R}} \in SO(3)\), \(\user2{R = }\left[ {{\varvec{o}}_{x}^{\rm {T}} ,{\varvec{o}}_{y}^{\rm {T}} ,{\varvec{o}}_{z}^{\rm {T}} } \right]\). To guarantee the orthogonality of the columns in pose matrices, we should firstly adopt the Gram-Schmidt orthogonalization approach to fine-tune the demonstrated multi-dimensional orientations.
where \(\left\langle {{\varvec{o}},{\varvec{\xi}}} \right\rangle\) represents the inner product between \({\varvec{o}}\) and \({\varvec{\xi}}\), i.e., \(\left\langle {{\varvec{o}},{\varvec{\xi}}} \right\rangle = {\varvec{o}}^{\rm {T}} {\varvec{\xi}}\). Thus, we can obtain a set of orthogonal basis \(\left\{ {{\varvec{\xi}}_{x} ,{\varvec{\xi}}_{y} ,{\varvec{\xi}}_{z} } \right\}\) as well as their standard form \(\left\{ {\eta_{x} ,\eta_{y} ,\eta_{z} } \right\}\), wherein \(\eta_{x} = {\varvec{\xi}}_{x} /\left\| {{\varvec{\xi}}_{x} } \right\|\), \(\eta_{y} = {\varvec{\xi}}_{y} /\left\| {{\varvec{\xi}}_{y} } \right\|\) and \(\eta_{z} = {\varvec{\xi}}_{z} /\left\| {{\varvec{\xi}}_{z} } \right\|\). The pose matrices are constructed of the axes with orthogonal constraints \(\hat{\user2{R}} \in SO(3)\), \(\user2{\hat{R} = }\left[ {\eta_{x}^{\rm {T}} ,\eta_{y}^{\rm {T}} ,\eta_{z}^{\rm {T}} } \right]\).
2.1.2 Continuous Quaternion Solution and Decomposition
In screw theory, every transformation of robot end-effector with respect to the base coordinate system can be expressed by a screw displacement, which is a translation along a axis \({\varvec{v}} \in S^{2}\) and a rotation with an angle \(\theta \in {\mathbb{R}}\) about the axis. Quaternion-based representation of robot end-effector poses has been widely used with its high efficiency and non-singularity. Due to a specific pose can be represented in two different ways of quaternions, i.e., \((\theta ,{\varvec{v}})\) and \(( - \theta , - {\varvec{v}})\), we introduced a constraint rule for adjacent quaternions to ensure the quaternion-based trajectories continuously.
where the sign of \({\varvec{q}}_{i}\) is determined by \({\varvec{q}}_{i - 1}\). On this basis, we decomposed quaternion into a Cartesian term \(q = \cos (\theta /2)\) and a Riemannian term \({\varvec{v}}{ = [}x{,}y {,}z ]\). And then, the multi-space parameters \(\theta\) and \({\varvec{v}}\) can be learned with the presented EDMPs framework respectively.
2.1.3 Quaternion Dimension Reduction before GMM-GMR
In the initial stage, to get the mean and covariance from multi-demonstrations, the dimension of quaternion-based orientations should be reduced firstly before GMM-GMR initialization. As depicted in Eq. (4), quaternions can be written in exponential form.
Based on the above conversion, we can handle the quaternion-based orientations with GMM and GMR in Cartesian space, and finally obtain the mean and covariance in all decoupling dimensions. Hereinafter, we will use DR-quaternion to represent the quaternion after dimensionality reduction.
2.2 Methodology of EDMPs
For notational simplicity, in the rest of this paper, we denote the rotation angle and the rotation axis of quaternion as angle-quaternion \(\theta\) and axis-quaternion \({\varvec{v}}\).
As described in Figure 1, EDMPs framework is combined with a transformation system module, an LWR updating module, and a canonical system module, wherein the transformation system module includes two components, i.e., the transformation system in Cartesian space and 2D sphere manifold. We use the transformation system in Cartesian space to learn the angle-quaternions and positions, and the extended transformation system on the 2D sphere manifold is developed for the axis-quaternions. LWR is applied for updating nonlinear terms, the canonical system is used to avoid the explicit time dependency.
To be specific, under the proposed EDMPs framework, at the learning stage, positions and angle-quaternions \(\left\{ {({\varvec{p}},\theta ),(\dot{\user2{p}},\dot{\theta }),(\user2{\ddot{p}},\ddot{\theta })} \right\}\) and axis-quaternions \(\left\{ {{\varvec{v}},\dot{\user2{v}},\user2{\ddot{v}}} \right\}\) are processed with the transformation system in Cartesian space and 2D sphere manifold, respectively. The target nonlinear terms of \(\left\{ {{\varvec{f}}_{{\varvec{p}}} \in {\mathbb{R}}^{3} ,f_{\theta } \in {\mathbb{R}}} \right\}\) and \(f_{{\varvec{v}}} \in {\mathbb{R}}\) are calculated with the input parameters, and then encoded with the linear combination of several RBFs. The weights of RBFs in the nonlinear terms are finally updated with the LWR approach. In the generalization stage, the target position and angle-quaternion \(\hat{\user2{p}}_{g}\), \(\hat{\theta }_{g}\) and the target axis-quaternion \(\hat{\user2{v}}_{g}\) are provided as the unique attractors of the second-order differential equations to calculate the corresponding generalized trajectories.
2.2.1 Transformation System Module
In this section, we take the angle-quaternions \(\theta \in {\mathbb{R}}\) and the axis-quaternions \({\varvec{v}} \in S^{2}\) as the research objects to describe the transformation system in Cartesian space and 2D sphere manifold, respectively.
As depicted in Figure 1, the transformation system in Cartesian space is composed of a simple dynamic and a nonlinear function, wherein the simple dynamics is developed to build the relationship among the position, velocity and acceleration of angle-quaternions \(\left\{ {\theta ,\dot{\theta },\ddot{\theta }} \right\}\) by a second-order differential equation. The nonlinear term is formalized with several nonlinear radial basis functions to fit any curve. The mathematical model of the transformation system is defined as Eq. (6).
where \(\theta\), \(z\) and \(\dot{z}\) denote the position, velocity and acceleration of angle-quaternions, respectively. τ is used to adjust the duration of the task. \(\alpha_{\theta }\) and \(\beta_{\theta }\) are time constants for guaranteeing that the angle-quaternion \(\theta\) will finally converge to the target \(\theta_{g}\). In this paper, we set \(\alpha = 4\beta\) for position, angle-quaternion and axis-quaternion learning that the Eqs. (6) and (7) becomes critically damped, and the values are determined by the specific task.
The extended unit of the transformation system is developed on the 2D sphere manifold for the axis-quaternions. The distance between two axis-quaternions is represented by geodesics on the 2D sphere manifold, and the modified mathematical model is described as Eq. (7).
where \(\lambda_{i}\), \(\dot{\lambda }_{i} \in {\mathbb{R}}\) denote the velocity and acceleration term between \({\varvec{v}}_{i}\) and \({\varvec{v}}_{i + 1}\). \({\rm {d}}\left( {{\varvec{v}}_{{i{ + 1}}} ,{\varvec{v}}_{i} } \right) = {\text{arccos(}}{\varvec{v}}_{{i{ + 1}}}^{{T}} {\varvec{v}}_{{\text{i}}} {)} \in {\mathbb{R}}\) is the geodesic distance between \({\varvec{v}}_{i}\) and \({\varvec{v}}_{i + 1}\). dt represents their interval time. \({\varvec{v}}_{i}\) represents the axis-quaternion in the i-th state of trajectories. \(\tau\), \(\alpha_{v}\) and \(\beta_{v}\) are constants.
Taking into consideration of other situations where the initial and target axis-quaternions are changed, the rotation matrix \({\varvec{R}}_{O}^{{\hat{O}}} \in {\mathbb{R}}^{3 \times 3}\) should be introduced to update the mapping direction \({\varvec{\gamma}}{\text{ = log}}_{{{\varvec{v}}_{{\varvec{i}}} }} {\varvec{v}}_{{{i + 1}}} \in {\mathbb{R}}^{3 \times 1}\) between neighboring axis-quaternions.
where \({\varvec{o}} \in {\mathbb{R}}^{3 \times 1}\) and \(\hat{\user2{o}} \in {\mathbb{R}}^{3 \times 1}\) represent the vectors from the initial to the target axis-quaternions of the demonstrated and the generalized trajectory, respectively.
The rotation angle \(\theta_{{\varvec{o}}}^{{\hat{\user2{o}}}} \in {\mathbb{R}}\) and the rotation axis \({\varvec{\omega}} \in {\mathbb{R}}^{3 \times 1}\) of \({\varvec{o}}\) and \(\hat{\user2{o}}\) can be calculated as Eq. (10).
where \({\varvec{I}} \in {\mathbb{R}}^{3 \times 3}\) is the identity matrix, \(\hat{\user2{\omega }} \in {\mathbb{R}}^{3 \times 3}\) is the anti-symmetric matrix. The \({\varvec{v}}_{{i{ + 1}}}\) can be calculated by \({\varvec{v}}_{i}\) with the exponential function [24].
where the vector \({\overline{\user2{\gamma }}}_{i}\), can be updated by the normalized \({{\hat{\user2{\gamma }}}_{i}}\) and the \({\text{arccos(}}{\varvec{v}}_{{i{ + 1}}}^{\rm {T}} {\varvec{v}}_{i} {)}\), in which the geodesic distance between \( {\varvec{v}}_{i} \) and \({\varvec{v}}_{{i{ + 1}}}\) is calculated with the Eq. (7). On this basis, the nonlinear sequence \(\left\{ {f_{\theta } } \right\}\) and \(\left\{ {f_{{\varvec{v}}} } \right\}\) can be calculated with the Eqs. (6) and (7) successively.
2.2.2 LWR Updating Module
In this paper, we used a linear combination of several nonlinear RBFs to successively fit the proposed nonlinear terms. LWR approach is introduced to update their weighted distributions in the linear combinations.
where \(c_{i} = \exp ( - \alpha \cdot i \cdot T/N_{1} )\), \(h_{i} = 1/(c_{i + 1} - c_{i} )^{2}\) when \(i = 1,2, \cdots, N\), and \(h_{N} = h_{N - 1}\). Each RBFs \(\Psi_{i} (s)\) is weighted by \(W_{i}\), which can be updated by the LWR approach.
2.2.3 Canonical System Module
To avoid the explicit time dependency during learning and generalization, the phase variables \({\varvec{s}} \in {\mathbb{R}}\) are introduced as the state parameters in the first-order linear dynamic system, i.e., the canonical system.
where \({\varvec{s}} \in [0,1]\), \({\varvec{s}}{(0) = 1}\), \(\dot{\user2{s}}\) denotes the derivative of \({\varvec{s}}\); \(\tau\) and \(\alpha_{s}\) are constants. When \({\varvec{s}}\) converges to zero, the nonlinear term \(f{(}s{) = 0}\); \(\theta\) and \({\varvec{v}}\) are finally converged to the target \(\theta_{g}\) and \({\varvec{v}}_{g}\). The whole system is dependent on the phase variables \({\varvec{s}}\), but not the time. Thus, the EDMPs framework can be generalized to other situations without changing the trajectories.
2.3 GMM-GMR Algorithm for Multi-space Parameters
GMM-GMR is presented at the initialization stage to handle multi-trajectories from human demonstrations. As depicted in Figure 1, \(\left\{ {\left\{ {\varvec{p}} \right\}} \right\}\) and \(\left\{ {\left\{ {(\theta ,{\varvec{v}})} \right\}} \right\}\) are obtained from multi-demonstrations of a human tutor. In the initialization stage, multi-demonstrated positions \(\left\{ {\left\{ {\varvec{p}} \right\}} \right\}\), DR-quaternions \(\left\{ {\left\{ {(\theta x,\theta y,\theta z) \in {\mathbb{R}}^{3 \times 1} } \right\}} \right\}\) and phase variables \(\left\{ {\varvec{s}} \right\}\) are imported into the GMM unit in Cartesian space to learn the distribution of multi-trajectories, and the GMR unit is applied to generate a single trajectory and the corresponding probability distribution. After that, the output QR-quaternions \(\left\{ {\left\{ {(\theta x,\theta y,\theta z)} \right\}} \right\}\) is refactored back to the quaternion representation \(\left\{ {\left\{ {(\theta ,{\varvec{v}})} \right\}} \right\}\), and the obtained single generated trajectory including positions and orientations can be learned by the EDMPs. Moreover, the variable impedance control can be realized with the probability distribution of multi-trajectories. The specific process is depicted as follows.
In this paper, we have K demonstrations, and each demonstration has M discrete points. \(\left\{ {{\varvec{\xi}}^{I} } \right\}\) is the phase variables \(\left\{ {\varvec{s}} \right\}\) in EDMPs, and \(\left\{ {{\varvec{\xi}}^{O} } \right\}\) is composed with positions \(\left\{ {\varvec{p}} \right\}\) and DR-quaternions \(\left\{ {(\theta x,\theta y,\theta z)} \right\}\).
As depicted in Eq. (16), we have K +M discrete data, and each data follows the probability distribution \(P({\varvec{p}}(s))\), \(P(\theta (s))\) and \(P({\varvec{v}}(s))\). Hereinafter, we take the position \({\varvec{p}}(s)\) as example.
where d denotes the dimension of output parameters. The posterior probability \(\uppi\), mean \({\varvec{u}}\) and covariance matrix \({\varvec{\varSigma}}\) of N2 Gaussian distribution functions can be determined by the EM algorithm.
To avoid local optimal values, K-means algorithm is firstly introduced to initial the clustering centers. And then, the EM algorithm is applied to update the parameters. The whole process can be divided into E-step and M-step, and the former is used to optimize the expectation function, i.e., the sum of posterior probabilities \(E = \sum\nolimits_{k = 1}^{M + K} {P({\varvec{\mu}}_{k} \,\,{\varvec{\varSigma}}_{k} \,|\,{\varvec{\xi}}_{k} )}\), in this phase, the parameters \(\left\{ {\uppi ,{\varvec{\mu}},{\varvec{\varSigma}}} \right\}\) are seen as invariants. Oppositely, the purpose of M-step is to update the parameters \(\left\{ {\uppi ,{\varvec{\mu}},{\varvec{\varSigma}}} \right\}\), and the expectation function E is invariant. The detailed explanation of EM algorithm, and the parameters’ updating process, please refer to Ref. [29].
Based on the updated parameters \(\left\{ {{\hat{\uppi }},\hat{\user2{\mu }},\hat{\user2{\Sigma }}} \right\}\) of GMM, for positions, the GMR is applied to calculate the expectation \(E(P(p|{\varvec{s}}))\) and the covariance \({\text{cov}} \left( {P(p|{\varvec{s}})} \right)\) of the conditional probability \(P(p|{\varvec{s}})\). In brief, the conditional probability \(P({\varvec{\xi}}^{O} |{\varvec{\xi}}^{I} )\) with several Gaussian distribution functions can be calculated based on the updated mean and covariance matrix, i.e., \(\hat{\user2{u}}_{k} { = }\left[ {\begin{array}{*{20}c} {\hat{u}_{k}^{I} } & {\hat{u}_{k}^{O} } \\ \end{array} } \right]^{\rm {T}}\), \(\hat{\user2{\Sigma }}_{k} { = }\left[ {\begin{array}{*{20}c} {\hat{\Sigma }_{k}^{O} } & {\hat{\Sigma }_{k}^{OI} } \\ {\hat{\Sigma }_{k}^{IO} } & {\hat{\Sigma }_{k}^{I} } \\ \end{array} } \right]\).
After initialization stage with GMM-GMR, a single trajectory with covariance can be obtained, wherein the trajectory can be used to train EDMPs framework, and the covariance can be applied to estimate the stiffness matrices \({\varvec{K}}_{i} \in {\mathbb{R}}^{6 \times 6}\) of impedance control loop.
where \({\mathbf{0}} \in {\mathbb{R}}^{3 \times 3}\), \({\varvec{K}}_{{T_{i} }} = {\rm {diag}}(k_{px} ,k_{py} ,k_{pz} ) \in {\mathbb{R}}^{3 \times 3}\) and \({\varvec{K}}_{{R_{i} }} = {\rm {diag}}(k_{rx} ,k_{ry} ,k_{rz} ) \in {\mathbb{R}}^{3 \times 3}\) respectively represent the translational and rotational stiffness. \(k_{i} = k_{\min } + (k_{\max } - k_{\min } )\frac{{\phi_{i} - \phi_{\min } }}{{\phi_{\max } - \phi_{\min } }}\) , and \(\phi\) are the stiffness indicators determined by the inverse expected covariance matrices \((\hat{\Sigma }^{O} )^{ - 1}\) in Eq. (24). \(k_{\min }\) and \(k_{\max }\) are the predetermined minimum and maximum stiffness according to the specific application scenarios.
2.4 Evaluation Indicators of Learning Results
Although the DMPs has the merit of convergence to the attractor, the effects in a limited execution time largely depend on the selection of the number of RBFs and the constants of α and β in Eqs. (6) and (7). In this section, to properly exhibit the reproducibility or generalization capability of our approach under the determined RBFs and constants, we defined some evaluation indicators including reachability and similarity for the learning results. In Cartesian space, the reachability is determined with the absolute error in the Cartesian space \(e_{c}\) between the target and actual position/angle-quaternion in the end state, and the relative error \(\Delta e_{c}\) calculated with the \(e_{c}\) relative to the range of the trajectories. The similarity is determined by the PCCc \(\rho_{c}\) between the scaled demonstration and the actual generalized trajectories, wherein the scaling factor \(\eta\,{ = }{{\left| {\hat{\theta }_{{\varvec{g}}} - \hat{\theta }_{{0}} } \right|} \mathord{\left/ {\vphantom {{\left| {\hat{\theta }_{{\varvec{g}}} - \hat{\theta }_{{0}} } \right|} {\left| {\theta_{{\varvec{g}}} - \theta_{{0}} } \right|}}} \right. \kern-\nulldelimiterspace} {\left| {\theta_{{\varvec{g}}} - \theta_{{0}} } \right|}}\) is calculated according to the difference between the demonstrated target and the new targets. On the 2D sphere manifold, the reachability is determined with absolute error \(e_{r}\) between the target and actual axis-quaternions in the end state. The similarity is determined with the PCCr \(\rho_{r}\) between the rotated demonstration and the actual generalized axis-quaternions. The evaluation indicators of \(\Delta e_{c}\), \(\rho_{c}\) and \(\rho_{r}\) are dimensionless.
The acceptable reachability and similarity can be determined according to the actual application scenarios. In this paper, we defined the satisfactory generalized results when \(\Delta e_{c}\) is small than 0.005, \(e_{c}\) range is between -5°–5°, and \(\rho_{c}\), \(\rho_{r}\) are greater than 0.8. Under these criteria, the generalized trajectories will converge to the target poses with high accuracy and strong correlation compared with the demonstrated trajectory.
3 Experiment
In this section, The Franka Panda robot was used as the experimental platform. A pick-up task with different poses was designed and illustrated to verify the learning and generalization ability of the proposed method both in Cartesian space and 2D sphere manifold.
3.1 Multi-space Skills Processing and Learning
Multi-demonstrations of the pick-up task were conducted in Figure 2. The VICON motion capture system composed of 10 cameras and 4 optical markers was used to record the trajectories of demonstrations. Three of these optical markers were respectively placed at the center of the palm, the radial and ulnar styloid, to ensure that the plane formed by these points is approximately parallel to the subject’s palm, and further determine the z-axis of the palm during movement. The last optical marker was selected between the radial and ulnar styloid, to facilitate the determination of the y-axis. The x-axis is determined with the right-hand rule. The trajectories of these points are processed to represent the positions and orientations of the palm.
×
After multi-demonstrations and data preprocessing, GMM is used to encode their distributed characteristics, and GMR is introduced to generate a single trajectory and the corresponding probability distribution according to the input phase variables. To properly characterize the distributions of multi-trajectories, and generate a suitable trajectory for EDMPs framework, we selected 5 Gaussian distribution functions for multi-space parameters’ learning in our experiment, i.e., N2 = 5. The learning results are depicted in Figure 3.
×
On this basis, the positions and quaternions of the generated trajectory are imported to the EDMPs framework, to learn their characteristics both in Cartesian space and 2D sphere manifold. In this scenario, we selected three targets in different positions with different poses to test the generalization ability of the presented approach in multi-spaces. Moreover, to obtain a relatively higher learning accuracy, we set \(\alpha_{\theta } = {4}\beta_{\theta } = 25\) for position and angle-quaternion, \(\alpha_{{\varvec{v}}} = {4}\beta_{{\varvec{v}}} = 25\) for axis-quaternion, and selected 25 RBFs i.e., N1 = 25 to fit corresponding nonlinear terms. Therefore, the reproduced and generalized trajectories including positions and quaternion-based orientations for different targets are successfully obtained, as shown in Figure 4.
×
Figure 4(a) represents the generalization of the positions, and Figure 4(b) represents the generalization of decoupling quaternion-based orientations including angle-quaternion and axis-quaternion, respectively. To characterize the learning and generalizing capability of the EDMPs framework in multi-spaces, the reachability and similarity of the reproduced and generalized trajectories are calculated, as shown in Tables 2 and 3.
Table 2
Reachability and similarity of the generalized quaternion-based orientations
Angle-quaternion
Axis-quaternion
\(e_{c}\)(\(^\circ\))
\(\Delta e_{c}\)
\(\rho_{c}\)
\(e_{r}\)(\(^\circ\))
\(\rho_{r}\)
R
0.0466
0.0041
0.9995
0.5415
0.9999
G1
0.0467
0.0041
0.9986
0.3869
0.9959
G2
0.0466
0.0041
0.9979
0.4465
0.9927
G3
0.0465
0.0041
0.9963
0.4950
0.9870
Table 3
Reachability and similarity of the generalized positions
\(e_{c}\)(mm)
\(\Delta e_{c}\)
\(\rho_{c}\)
x
y
x
y
z
z
x
y
z
R
3.2938
3.4881
2.4581
0.0081
0.2575
0.0110
0.9998
0.9996
0.9998
G1
3.2915
3.4960
2.4576
0.0081
0.2581
0.0110
0.9988
0.6962
0.9999
G2
3.2969
3.4844
2.4576
0.0081
0.2572
0.0110
0.9960
0.9231
0.9999
G3
3.2950
3.4791
2.4570
0.0081
0.2568
0.0110
0.9990
0.7224
0.9996
In Tables 2 and 3, the average \(e_{c}\), \(\Delta e_{c}\) and \(\rho_{c}\) of the generalized positions on the x-, y-, and z-axis are 3.2943 mm, 3.4869 mm, 2.4576 mm, 0.0081, 0.2574, 0.0114, and 0.9984, 0.8353, 0.9998 respectively. The average \(e_{c}\), \(\Delta e_{c}\) and \(\rho_{c}\) of the generalized angle-quaternions are 0.0466°, 0.0041, and 0.9981 respectively. The average \(e_{r}\) and \(\rho_{r}\) of the generalized axis-quaternions are 0.4675° and 0.9939. The absolute errors of the positions and the quaternion-based orientations are less than 3.5 mm and 1°, respectively. Except for the G1 and G3 of the position on the y-axis, the Pearson’s correlation coefficients of the demonstrated and the generalized trajectories are mostly greater than 0.9. The phenomena of G1 on the y-axis is due to the sign of the target is opposite to the demonstrated one, and the G3 is that its target is too close to the starting point. To solve these problems, please refer to Ref. [19]. Nevertheless, the experiment results reveal that the presented approach performs relatively good learning and generalization capabilities both in Cartesian space and 2D sphere manifold.
Based on the definition of the satisfactory region in Section 2.4, we calculate the satisfactory generalization region of axis-quaternions to further verify the generalization capability of our approach on the 2D sphere manifold, as shown in Figure 5.
×
As shown in Figure 5, the satisfactory generalized region with \(e_{r} \in \left[ { - 5^\circ ,5^\circ } \right]\) and \(\rho_{r} \in [0.8,\;1]\) is determined. The region can cover nearly 1/3 of the spherical coordinate. All reachable targets are located on the same hemisphere with the demonstrated target. If the generalized target is too close to the starting point, the nonlinear terms may produce an unexpected influence on the generalized trajectories, and the reachability and similarity will be unsatisfactory. Moreover, if the vectors from the generalized targets to the starting point are opposite to the demonstration, or the generalized targets are located on the other hemisphere of the spherical coordinate, the generalized trajectories will also show an undesired correlation with the demonstrated one, and the reachability is also unsatisfactory. The phenomena are consistent with our experimental results.
3.2 Experimental Verification on Real Robot
To apply our approach in a real scenario, and further verify its effectiveness, we designed a pick-up task based on the above learning and generalization results with the panda robot. Firstly, the variable stiffness including translational and rotational stiffness profiles are obtained through GMM-GMR initialization, and the distribution-based variable impedance control is realized, as shown in Figure 6. The whole control system is based on the ROS network. Figure 7 shows several typical results of this task, and the robot successfully completed the relative tasks with similar trajectory profiles compared with the demonstration.
×
×
As shown in Figure 6, the action is started at the initial phase variable \(s(t_{0} ) = 1\) and finished at \(s(t_{{{\text{end}}}} ) = 0\). According to the probability distribution of multi-trajectories, the diagonal element of translational and rotational stiffness matrices can be obtained from Eq. (25). For the translational stiffness, the stiffness along the x- and y-axis maintained a low stiffness in the initial stage and gradually increased with the execution of the task. The stiffness along the z-axis firstly decreased in the initial stage and then increased to a high level for the targets. A similar trend can be seen in the different dimensions of rotational stiffness. From these results, it can be concluded that the priority of each axis is that the x- and y-axis are greater than the z-axis in this task.
As shown in Figure 7, four bottles were placed on the desk, one of them with the blue cap is the demonstrated target, the others with yellow caps are the generalized targets which placed randomly. The robot was firstly regulated to the initial pose, as shown in Figure 7(a), which is similar to the demonstrated initial pose in Figure 2(a). The initial homogeneous matrix of the human tutor is transformed to the real initial pose of the robot in Figure 7(a) through a transformation matrix, and the demonstrated trajectory is also changed accordingly. Figure 7(b), (c) represent the reproduced trajectory for the demonstrated target. On this basis, we manually adjusted the joint angle of the robot to reach the corresponding generalized targets with reasonable grasping poses. The obtained end poses were imported to the EDMPs framework, and three similar trajectories to the demonstrated curve can be deduced successively. Figure 7(d), (f), (h) describes the intermediate process of the generalized movement, and Figure 7(e), (g), (i) represents the end poses of the robot for the generalized targets.
4 Discussion
It is worth noting that the learning results of the existing DMPs-based frameworks heavily depend on the selected number and distribution of RBFs and the time constants of transformation system. These parameters are determined empirically with the specific tasks. To the best of our knowledge, there is still no literature on how to evaluate the algorithm under the selected RBFs and time constants. Therefore, we proposed several evaluation indicators to characterize the performance of EDMPs, and determined the satisfactory generalized region in our application scenario. As shown in Figure 5. If the generalized targets and the demonstrated one are similar or located in the satisfactory generalized region, the EDMPs framework will perform superior characteristics. But when the difference is too large, especially if the target is located on the other hemisphere of the manifold, the results will be unsatisfactory. This limitation may be overcome by building a knowledge database for the robot, the database including different skills for various tasks and covering the whole sphere on the manifold.
The proposed EDMPs framework can be applied for more complex tasks, such as the human-robot cooperation scenarios, skillful manipulations, etc., where should consider positions and orientations simultaneously. The main difference between our contribution and the predecessors is that our approach can handle the skills on the different kinds of manifolds, including the sphere manifold \(S^{d}\), special orthogonal group \(SO(d)\), special Euclidean group \(SE(3)\), and the manifold of SPD matrix \(\mathcal{S}_{ + + }^{d}\), by reducing the dimensions of these skills and combing with the classical transformation system and our extended transformation system on the 2D sphere manifold. We use quaternions to represent the Riemannian space skills, and decouple the quaternions into Euclidean space and Riemannian space terms \({(}\theta \in {\mathbb{R}}{,}{\varvec{v}} \in S^{2} {)}\). Thus, the decoupled quaternions, as well as the positions, can be learned with our EDMPs framework, simultaneously. The EDMPs provide a new way to learn and generalize multi-space skills.
5 Conclusions
(1)
An EDMPs framework both in Cartesian space and 2D sphere manifold has been presented for transferring kinematic skills including positions and orientations from human to robots. The quaternion-based orientations could be successfully learned and generalized under the 2D-sphere-manifold-based transformation system of the EDMPs framework.
(2)
GMM-GMR algorithms are combined into the presented EDMPs framework that allows us to obtain not only a smooth regression trajectory, but the corresponding probability distribution. The former could be learned with the EDMPs, and the latter could be applied as reference for designing variable impedance controllers.
(3)
The reachability and similarity are defined as the evaluation indicators to characterize the learning and generalization capability of the EDMPs framework under the determined RBFs and the constants of α and β.
(4)
A real-world experiment was implemented with Panda robot. The experimental results show that the absolute errors of Cartesian and Riemannian space skills are less than 3.5 mm and 1.0°, respectively. The Pearson’s correlation coefficients of the Cartesian and Riemannian space skills are mostly greater than 0.9. The developed EDMPs exhibits a relatively good learning ability for the multi-space skills.
The present study takes some references for transferring multi-space skills from human to robots. In the future, we will extend our framework to other industrial applications and various skillful tasks, where need to consider position, orientation, force and stiffness both in Cartesian space and Riemannian manifolds simultaneously, such as polishing, scraping, welding, human-robot cooperation, etc.
Acknowledgements
Not applicable.
Competing Interests
The authors declare no competing financial interests.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.