In this section, we first introduce the motivation of the proposed embedding model and then formulate the idea mathematically.
4.1 Motivation
Composition tables, which specify role chains of relations
1, have been widely used in traditional qualitative spatial and temporal reasoning methods, and are identified as one of the key reasoning techniques [
5,
8,
19].
An embedding method should also be able to automatically find and take full use of such role chains in its inference and reasoning. One core requirement for such an embedding method is to model
asymmetric role chains in composition tables; namely
\(r1 \circ r2 \ne r2 \circ r1\), where
\(\circ\) denotes the composition operation. For example, if we know geographic entity A is
disconnected to geographic entity B and B is
tangential proper part of geographic entity C, the relation of A to C will fall into one of five possible relations, i.e.,
dc,
ec,
po,
tpp or
ntpp according to the composition table. By contrast, if we first know A is
tangential proper part of B and B is
disconnected to C, then the relation of A to C must be
disconnected. This means the order of relations in role chains matters. In order to take this into account, we use quaternions, an extension of complex numbers, to automatically capture role chains from training data, thanks to the non-commutative law of quaternion multiplication. Additionally, quaternions can be readily used to model varying properties of relations (e.g., symmetric and anti-symmetric relations) and inverse relations, which further contributes to inference and reasoning over spatial and temporal information.
In addition to the need of capturing role chains in composition tables, we notice that 3/8 spatial relations in RCC8, and 9/13 temporal relations in Allen’s temporal intervals [
7] are transitive (see Tables
1 and
2). Geometrically, transitive relations usually induce tree-like structures over entities, in which as the depth of a tree increases, the number of child nodes grows exponentially. As shown in Fig.
2, as the root – the US, branches out, more and more child nodes emerge. Also, although some relations (such as
tpp and
tppi) are not transitive, they may still induce a tree-like structure over entities to some degree.
Thereby, an embedding method for spatial and temporal reasoning should be built on a suitable embedding space, which is able to encode non-Euclidean structures exhibited in data (e.g., hierarchies). Past works have demonstrated that hyperbolic embeddings are more suitable for data exhibiting non-Euclidean geometric properties, such as hierarchy [
38]. This is because hyperbolic space can be naturally viewed as a continuous analogy to hierarchical trees in discrete space and it grows exponentially with an increasing radius, which corresponds to an exponential increase in the number of child nodes with increasing tree depth [
54]. Therefore, given the abundance of transitive spatial/temporal relations, we embed entities and relations in hyperbolic space rather than Euclidean space.
Despite the aforementioned advantages of quaternions and hyperbolic space, the technical bottleneck of the model design rests on how to harmonize quaternions and hyperbolic space while preserving their respective properties. The transformation of quaternions, which are originally defined in Euclidean space, into a hyperbolic space is not trivial, since quaternion-related vector operations (e.g., vector addition, matrix-vector multiplication, and vector inner product over quaternions) and geometric metrics (e.g., the closed form of distance) in Euclidean space is hard to be generalized to hyperbolic space.
In this paper, we propose a hyperbolic embedding model, called HyperQuaternionE, in which this challenge is tackled. In the following, we will first introduce preliminary concepts and notations, then propose our model, and finally analyze which relation properties and composition patterns our model can preserve.
4.2 Preliminaries
Mathematically, 3D rotations can be formalized as (
1). We denote the 3D point (
\(\textbf{v} \in {\mathbb{R}^{3}}\)) to be rotated as a pure quaternion
\(v=[0, \textbf {v}]\), a unitary quaternion
\(q_u=[cos(\theta ), sin(\theta )\textbf {u}]\) (
\(\theta\) is the rotating angle and
\(\textbf {u}\) is the rotating axis) as the rotating vector and the resulting point as
\(v^{\prime }=[0, \textbf {v}^{\prime }]\) (
\(\textbf{v}^{\prime }\in {\mathbb{R}^{3}}\)).
where
\(v_{\Vert }\) is the component of
\(\textbf {v}\) parallel to
\(\textbf {u}\) and
\(v_{\bot }\) the component of
\(\textbf {v}\) perpendicular to
\(\textbf {u}\).
\(q_u=p_u^2\) and
\(p_u=[cos(\frac{\theta }{2}), sin(\frac{\theta }{2})\textbf {u}]\). This theorem can be interpreted as the component of
\(\textbf {v}\) perpendicular to
\(\textbf {u}\) is rotated twice by
\(\frac{\theta }{2}\) around
\(\textbf {u}\). Proofs to this theorem can be found in [
59,
60].
When
c is considered, the hyperbolic distance of two points
\({\textbf{x}},{\textbf{y}} \in {\mathbb{B}^{d}_{c}}\) is defined as its geodesic distance in the space, which has the desirable property of forming a tree-like embedding space (see Fig.
3). It is formulated as follows:
$$\begin{aligned} d^c(\mathbf {x},\mathbf {y})=\frac{2}{\sqrt{c}}arctanh(\sqrt{c}~\Vert (-\mathbf {x}) \oplus _c \mathbf {y}\Vert ) \end{aligned}$$
(4)
where
\(\textit{arctanh}(\cdot )\) denotes the inverse hyperbolic tangent. The
Möbius addition (i.e.,
\(\oplus _c\)) of two points
\(\textbf{x},\textbf{y} \in {\mathbb{B}^{d}_{c}}\) can be expressed as below:
$$\begin{aligned} \mathbf {x} \oplus _c \mathbf {y} = \frac{(1+2c\mathbf {x}^T\mathbf {y}+c \Vert \mathbf {y}\Vert ^2)\mathbf {x} + (1-c\Vert \mathbf {x} \Vert ^2)\mathbf {y}}{1 + 2c\mathbf {x}^T\mathbf {y} + c^2\Vert \mathbf {x} \Vert ^2 \Vert \mathbf {y} \Vert ^2} \end{aligned}$$
(5)
where
\(\Vert \cdot \Vert\) is the Euclidean norm. We can obtain that
\(\textbf {x} \oplus _c (-\textbf {x})\) =
\((-\textbf {x}) \oplus _c \textbf {x}\) =
\(\textbf {0}\). This property helps model inverse relations in the embedding space.
4.2.1 Bridging quaternion and hyperbolic space
For a point
\(\textbf{x} \in {\mathbb{B}^{d}_{c}}\), its tangent space representation (
\(\textbf {x}^E\)) is defined as a
d-dimensional vector, which approximates the hyperbolic space
\(\mathbb {B}^d_c\) around
\(\textbf {x}\) (origin). The two mappings (
\(\text {exp}_0^c(\cdot )\) and
\(\text {log}_0^c(\cdot )\)) at the origin have the following closed-form expressions:
$$\begin{aligned} \text {exp}_0^c(\mathbf {x}^E)= & {} \text {tanh}(\sqrt{c}\Vert \mathbf {x}^E \Vert )\frac{\mathbf {x}^E}{\sqrt{c}\Vert \mathbf {x}^E \Vert }=\mathbf {x}^H \end{aligned}$$
(6)
$$\begin{aligned} \text {log}_0^c(\mathbf {x}^H)= & {} \text {arctanh}(\sqrt{c}\Vert \mathbf {x}^H \Vert )\frac{\mathbf {x}^H}{\sqrt{c}\Vert \mathbf {x}^H \Vert }=\mathbf {x}^E \end{aligned}$$
(7)
where
\(\text {exp}_0^c(\cdot )\) maps
\(\textbf {x}^E\) in the tangent space to
\(\mathbb {B}^d_c\) and conversely,
\(\text {log}_0^c(\cdot )\) maps
\(\textbf {x}^H\) in
\(\mathbb {B}^d_c\) to the tangent space. Note that we use
\(\textbf {x}^H\) to denote
\(\textbf {x}\) in the hyperbolic space while
\(\textbf {x}^E\) being in Euclidean space.
The core idea behind the proposed HyperQuaternionE is to encode relations as 3D rotations, and assumes that for a triple
\(\langle h, r, t\rangle\), the tail entity
t is the result of the head entity
h being rotated by relation
r. This indicates two key steps in our method: rotating the head entity by the relation and measuring the distance between the tail entity and the head entity after being rotated. Despite being similar to the rotation family introduced in Section
3, the main difference is that in our method these two steps are performed in different spaces. The rotating step is performed in the tangent space with the aim to use quaternions in order to capture role chains from data, and the distance measuring step is executed in hyperbolic space so as to form a tree-like embedding space for hierarchical data. Mathematically, for a triple
\(\langle h, r, t\rangle\) in a KG, these two steps can be formalized as follows. Note that for entities and relations, their embeddings are first randomly initialized, denoted as
\(\textbf{h}^{E}, \textbf{r}^{E}, \textbf{t}^{E} \in {\mathbb{R}^{d}}\) (d is the dimension), and are learned automatically through training.
In the first step, a 3D rotation on the head entity
h performed by relation
r is achieved by Theorem 1. Concretely, head entities are modeled as 3D points to be rotated, and tail entities are modeled as results of head entities being rotated by relations (i.e., 3D rotation). In order to utilize quaternions to implement 3D rotation, we convert real value entries in
\(\textbf {h}^E\) and
\(\textbf {r}^E\) into quaternions. Hence each head embedding
\(\textbf{h}^{E} \in {\mathbb{R}^{d}}\) can be expressed as
\(\frac{d}{3}\) pure quaternions. Specifically, it can be written as
\(V^E_h=[h_1, h_2, ..., h_i]^T\), where
\(h_i=[0, \textbf {h}_{i}]\) is a pure quaternion and
\(\textbf{h}_{i} \in {\mathbb{R}^{3}}\) (
\(i \in \{1, 2, ..., \frac{d}{3}\}\)) denotes a 3D point. Similarly, each relation is represented by
\(\frac{d}{3}\) unitary quaternions, whose embedding can be written as
\(Q^E_r=[q_{r,1}, q_{r,2},...,q_{r,\frac{d}{3}}]^T\), where each
\(q_{r,i}\) (
\(i \in \{1, 2, ..., \frac{d}{3}\})\) is a unitary quaternion. According to (
3), 3D rotation in the embedding space is given as follows:
$$\begin{aligned} \mathbf {Rot}_{hr,4}^E= & {} \text {Rot3D}(\mathbf {h}^E,\mathbf {r}^E)= Q^E_r \circledcirc V^E_h \circledcirc (Q^E_r)^* \end{aligned}$$
(8)
$$\begin{aligned} \mathbf {Rot}_{hr}^E= & {} concat(\mathbf {Rot}_{hr,4}^E) \end{aligned}$$
(9)
where
\(\circledcirc\) denotes element-wise quaternion multiplication and
\((Q^E_r)^*=[h_1^*, h_2^*, ..., h_i^*]^T\) denotes the conjugate of
\(Q^E_r\).
\(\textbf {Rot}_{hr,4}^E\) is the rotating result of the head entity and contains
\(\frac{d}{3}\) pure quaternions.
\(concat(\cdot )\) is to concatenate three imagery components of these pure quaternions in order to recover the original dimension
d.
In the second step, to form a tree-like embedding space for hierarchical data, we measure the distance between the resulting head embedding and the tail embedding in hyperbolic space. Since the first step is performed in tangent spaces, we first map Euclidean embeddings into hyperbolic embeddings via exponential maps shown in (
6). However, rather than using a generic curvature
c, a relation-aware learnable curvature
\(c_r\) is introduced for each relation because relations of different kinds may yield hierarchical structures of varying degrees. For example, a graph where only the relation
tangential proper part holds between entities would have a higher hierarchy index than the one induced by the relation
disconnected. The relation-aware exponential maps are shown below.
$$\begin{aligned} \mathbf {Rot}_{hr}^H= & {} \text {exp}_0^{c_r}(\mathbf {Rot}_{hr}^E) = \text {tanh}(\sqrt{c_r}\Vert \mathbf {Rot}_{hr}^E \Vert )\frac{\mathbf {Rot}_{hr}^E}{\sqrt{c_r}\Vert \mathbf {Rot}_{hr}^E \Vert }\end{aligned}$$
(10)
$$\begin{aligned} \mathbf {t}^H= & {} \text {exp}_0^{c_r}(\mathbf {t}^E) = \text {tanh}(\sqrt{c_r}\Vert \mathbf {t}^E \Vert )\frac{\mathbf {t}^E}{\sqrt{c_r}\Vert \mathbf {t}^E \Vert } \end{aligned}$$
(11)
where
\(\textbf {Rot}_{hr}^H\) and
\(\textbf {t}^H\) are embeddings of
\(\textbf {Rot}_{hr}^E\) and
\(\textbf {t}^E\) in hyperbolic space, respectively.
Finally, the distance is calculated by using the following formula:
$$\begin{aligned} d^{c_r}(\mathbf {Rot}_{hr}^H,\mathbf {t}^H)=\frac{2}{\sqrt{c_r}}arctanh(\sqrt{c_r}~\Vert (-\mathbf {Rot}_{hr}^H) \oplus _{c_r} \mathbf {t}^H \Vert ) \end{aligned}$$
(12)
Equation (
12) is originated from (
4), but contains a relation-aware learnable curvature
\(c_r\) to consider the difference of embedding spaces induced by various relations.
Similar to previous work [
42,
48], we optimize the model by minimizing the distance between
\(\textbf {Rot}_{hr}^H\) and a valid tail
t (meaning that
\(\langle h, r, t\rangle\) exists in our KG) and maximizing that to a negative tail. More specifically, for a triple
\(\langle h, r, t\rangle\) in a KG,
t itself is a positive tail and we construct negative tails by replacing
t with another entity (i.e.,
\(t^\prime\)), which is randomly picked from all other entities. It is done by
n times in order to obtain
n negative tails. Finally, the optimizer is to pull the correct
t towards
\(\textbf {Rot}_{hr}^H\) as close as possible while pushing negative ones far away, which can be formalized as:
$$\begin{aligned} \mathcal {L} = -log \; \sigma (\gamma - d^{c_r}(\mathbf {Rot}_{hr}^H,\mathbf {t}^H)) - \frac{1}{n}\sum _{i=1}^{n}log \; \sigma (d^{c_r}(\mathbf {Rot}_{hr}^H,\mathbf {t^\prime _i}^H)) - \gamma ) \end{aligned}$$
(13)
where
\(\sigma\) denotes the sigmoid function and
\(\gamma\) is a hyper-parameter indicating the tolerance of distance between the positive/negative and the resulting entity embedding.
Likewise, with regard to relation inference, for each positive triple
\(\langle h, r, t\rangle\), we corrupt it by replacing
r with other (spatial/temporal) relations
\(n_r\) times so as to generate
\(n_r\) relation-based negative samples. To consider both tasks, we construct a joint loss function and use a scalar
\(\beta\) to adjust their respective contributions:
$$\begin{aligned} \mathcal {L}^\prime = \mathcal {L} - \beta \frac{1}{n_r}\sum _{i=1}^{n_r}log \; \sigma (d^{c_{r_i}}( {Rot}_{hr_i}^H, {t}^H)) - \gamma ) \end{aligned}$$
(14)
Last but not least, we introduce a way of representing relations such that they can be ensured to be unitary quaternions. This is of great importance to achieve 3D rotations based on Theorem 1. Recall that only three values are needed to determine a unitary quaternion. So for any three arbitrary values
\(\alpha , \theta _1, \theta _2 \in [-\pi , \pi ]\), a unitary quaternion can be constructed as follows:
$$\begin{aligned} q_u=cos(\alpha ) + sin(\alpha )cos(\theta _1)cos(\theta _2)i + sin(\alpha )cos(\theta _1)sin(\theta _2)j+ sin(\alpha )sin(\theta _1)k ~ \end{aligned}$$
(15)
Based on the definition of quaternion norm (see Property 3),
\(\Vert q_u \Vert =1\) can be readily ensured (See Appendix
1 for proofs). In what follows, we analyze relation properties and composition patterns that are preserved by using the proposed model.
$$\begin{aligned} \mathbf {t}_i^E= & {} q_{r,i} \mathbf {h}_i^E q_{r,i}^*\end{aligned}$$
(16)
$$\begin{aligned} \mathbf {h}_i^E= & {} q_{r,i} \mathbf {t}_i^E q_{r,i}^* \end{aligned}$$
(17)
Thus, when we plug (
16) into (
17), it yields:
$$\begin{aligned} \mathbf {h}_i^E = q_{r,i} (q_{r,i} \mathbf {h}_i^E q_{r,i}^*) q_{r,i}^*= q_{r,i}^2 \mathbf {h}_i^E (q_{r,i}^*)^2 \end{aligned}$$
(18)
The correspondence of
\(\textbf {h}_i^E\) in hyperbolic space is given by (
6):
$$\begin{aligned} \mathbf {h}_i^H = \text {tanh}(\sqrt{c_{r}}\Vert \mathbf {h}_i^E \Vert )\frac{\mathbf {h}_i^E}{\sqrt{c_{r}}\Vert \mathbf {h}_i^E \Vert } \end{aligned}$$
(19)
When we substitute
\(\textbf {h}_i^E\) in (
19) with (
18), we obtain the following:
$$\begin{aligned} \mathbf {h}_i^H= & {} \text {tanh}(\sqrt{c_r}\Vert \mathbf {h}_i^E \Vert )\frac{q_{r,i}^2 \mathbf {h}_i^E (q_{r,i}^*)^2}{\sqrt{c_r}\Vert \mathbf {h}_i^E \Vert } \\= & {} q_{r,i}^2 \frac{\text {tanh}(\sqrt{c_r}\Vert \mathbf {h}_i^E \Vert ) \mathbf {h}_i^E}{\sqrt{c_r}\Vert \mathbf {h}_i^E \Vert } (q_{r,i}^*)^2 \\= & {} q_{r,i}^2 \mathbf {h}_i^H (q_{r,i}^*)^2 \\\Leftrightarrow & {} q_{r,i}^2 = \pm 1 \end{aligned}$$
It indicates that the sufficient and necessary condition of modeling symmetric relations is that
\(q_{r,i}^2 = \pm 1\) holds. Clearly, in 3D space, a rotation angle of
\(k*180^\circ\) (
\(k \in \{1,3,5,...\}\)) satisfies this condition. Likewise, we can derive that
\(q_{r,i}^2 \ne \pm 1\) is the sufficient and necessary condition for modeling anti-symmetric relations.
If
\(\langle h, r1, t\rangle\) and
\(\langle t, r2, h\rangle\) hold, similarly, according to Theorem 1, in the tangent space for each rotation we have:
$$\begin{aligned} \mathbf {t}_i^E= & {} q_{r1,i} \mathbf {h}_i^E q_{r1,i}^*\end{aligned}$$
(20)
$$\begin{aligned} \mathbf {h}_i^E= & {} q_{r2,i} \mathbf {t}_i^E q_{r2,i}^* \end{aligned}$$
(21)
The correspondence of
\(\textbf {h}_i^E\) in hyperbolic space is given by (
6):
$$\begin{aligned} \mathbf {h}_i^H = \text {tanh}(\sqrt{c_{r2}}\Vert \mathbf {h}_i^E \Vert )\frac{\mathbf {h}_i^E}{\sqrt{c_{r2}}\Vert \mathbf {h}_i^E \Vert } \end{aligned}$$
(22)
Then, we can obtain:
$$\begin{aligned} \mathbf {h}_i^H= & {} \text {tanh}(\sqrt{c_{r2}}\Vert \mathbf {h}_i^E \Vert )\frac{(q_{r2,i}q_{r1,i}) \mathbf {h}_i^E (q_{r2,i}q_{r1,i})^*}{\sqrt{c_{r2}}\Vert \mathbf {h}_i^E \Vert } \\= & {} (q_{r2,i}q_{r1,i}) \frac{\text {tanh}(\sqrt{c_{r2}}\Vert \mathbf {h}_i^E \Vert ) \mathbf {h}_i^E}{\sqrt{c_{r2}}\Vert \mathbf {h}_i^E \Vert } (q_{r2,i}q_{r1,i})^* \\= & {} (q_{r2,i}q_{r1,i}) \mathbf {h}_i^H (q_{r2,i}q_{r1,i})^* \\\Rightarrow & {} q_{r2,i} = \pm q_{r1,i}^* \end{aligned}$$
Clearly, this equation can have multiple solutions. For instance, for a relation
r1 with its quaternion representation in a dimension being
\(q_{r1,i}=[\alpha _1, \textbf {v}_1]\), it inverse relation
r2 at the same dimension can be constructed as
\(q_{r2,i}=[\alpha _1, -\textbf {v}_1]\) or
\(q_{r2,i}=[-\alpha _1, \textbf {v}_1]\).
Non-commutative composition of relations implies that
\(r1 \circ r2 \ne r2 \circ r1\) while commutative composition indicates that
\(r1 \circ r2 = r2 \circ r1\). Here
\(\circ\) refers to quaternion multiplication. According to Theorem 2,
\(r1 \circ r2\) yields another relation
r3, namely
\(r1 \circ r2=r3\), and likewise
\(r2 \circ r1=r4\). Due to the non-commutative law of quaternion multiplication (see (
1)),
\(r3 \ne r4\) can be naturally guaranteed. On the other hand, in special cases, for example, when
r1 and
r2 share the same rotating axis, we can conclude that
\(r1 \circ r2 = r2 \circ r1=r3=r4\) (i.e., commutative composition).
Table
3 summarizes varying properties of relations and patterns of relation composition that different models can preserve. As can be seen, the proposed HyperQuaternionE achieves all. Note that our HyperQuaternionE method can be applied to other KGs where aforementioned properties of relations and patterns of relation composition exist commonly. We leave the investigation as future work.
Table 3
Varying properties and patterns modeled by differing models
Property | Symmetric | ✗ | ✓ | ✓ | ✓ | ✓ |
| Anti-symmetric | ✓ | ✓ | ✓ | ✓ | ✓ |
| Inversion | ✓ | ✓ | ✓ | − | ✓ |
Composition | commutative | ✓ | ✓ | ✓ | − | ✓ |
| non-commutative | ✗ | ✗ | ✓ | − | ✓ |
Hierarchy | induced by transitive relations | ✗ | ✗ | ✗ | ✓ | ✓ |