Skip to main content
Top

An Introduction to Tensors for Path Signatures

  • Open Access
  • 2026
  • OriginalPaper
  • Chapter
Published in:

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This chapter delves into the world of tensors, starting with their basic definitions and properties. It explores the distinction between linear and bilinear operators, introducing direct sums and tensor products as ways to create new vector spaces. The chapter also covers the universal property of tensor products and their role in representing multilinear relationships. Additionally, it discusses the tensor algebra and its applications in path signatures, providing exercises and examples to illustrate key concepts. The chapter concludes with solutions to the exercises, ensuring a thorough understanding of tensors and their operations.

1 Introduction

Throughout mathematics, computer science, and physics, the term tensor is used to describe a myriad of similar, but fundamentally different mathematical objects. For amusement, we invite the reader to visit the “100 Questions: A Mathematical Conventions Survey” and check1 “Question 45: What is a tensor?” That answer’s wide distribution seems to hint at a gap, as folks knowledge goes, to the mathematical meaning of a tensor (even among the informed?). It is thus necessary to be clear and unambiguous with our definitions and language so that at the end of this chapter the reader will be able to agree with us on the answer. For practical purposes, it often suffices to describe a tensor as a multidimensional array that extends the concept of a matrix [3]. This is surely true, but only after a certain structure on the underlying space is assumed. Tensors are much more, especially when the underlying spaces are infinite-dimensional [1, 2, 4].
For path signatures, seeing tensors as multidimensional arrays is a good starting point, but their power is only fully realized with an understanding of the operations that go with them: that is, with the algebra mixed in. The assumption behind the next sections is that the reader is largely unfamiliar with tensors (but having heard of multidimensional arrays).
Why do tensors come up in path signatures and how to read this chapter? We saw in chapter “A Primer on the Signature Method in Machine Learning” that path signatures rely heavily on iterated integrals and involve products of path components across different times. The path signature is a sequence of terms encoding information about a path at different levels of complexity: the first-level signature captures linear information via integrals of individual components; the second-level signature captures pairwise interactions, so-called bilinear relationships, like \(\iint \mathrm {d}x^i\mathrm {d}x^j\), meaning they combine two inputs in a way that is linear in each (with \(x^i\) fixed, we have linearity in \(x^j\), and vice-versa) whereas joint linearity does not hold; higher levels of the signature require more iterated integrals and in turn one requires multilinear relationships to describe these complicated interactions. Tensors and multilinear maps are natural and powerful mathematical tools to represent and analyze these interactions.
Section 2 introduces the basic definitions and properties of tensors by guiding the reader to first understand the difference between linear and bilinear operators, and how linear operators can be recovered from bilinear ones via the tensors product operation and the so-called universal property.
Section 3 then goes into a few other deeper properties and connects more explicitly tensors and algebras by introducing the tensor algebra. We will leave to subsequent chapters the development of further concepts like shuffle algebras, and power series to obtain exponentials and logarithms (alluded to in chapter “A Primer on the Signature Method in Machine Learning”).

2 A Brief Introduction to Tensors

This section introduces two operations: direct sums and tensor products, two different ways of making new vector spaces out of old ones. Formally, each is a way of equipping the Cartesian product of vector spaces, \(U\times V\), with a linear structure. The first leads to linear operators while the second leads to bilinear ones. While both are related, in many aspects they behave very differently.2 In a nutshell, bilinear maps exhibit separate linearity in U and V  while linear maps exhibit global linearity in \(U\times V\). The distinction is particularly important in the algebra of tensors, where bilinear maps give rise to the tensor product structure through the so-called universal property. Linear maps correspond to mappings on the direct product space.

2.1 Direct Sums

Let us recall that for two sets X and Y , their Cartesian product \(X\times Y\) is defined as
$$\displaystyle \begin{aligned} X\times Y:=\left\{ (x,y):x\in X,y\in Y \right\}, \end{aligned}$$
that is, the set of all ordered pairs where the first is an element of X and the second an element of Y .
Given vector spaces U and V , their Cartesian product does not immediately have a linear structure (i.e. is not immediately a vector space). In other words, after constructing the set\(U\times V\) it is not clear how to add two ordered pairs, or multiply them by scalars. We must define a way to add and scale the elements of this set, and it turns out there are multiple, sensible and useful definitions. Direct sums are the simplest way to equip the Cartesian product of two (or more) vector spaces with a linear structure of its own.
Definition 2.1
Let \(U,V\) be vector spaces. On the Cartesian product \(U\times V\) we define the following linearity operations: for \(\mathbf {u},{\mathbf {u}}_1,{\mathbf {u}}_2\in U\), \(\mathbf {v},{\mathbf {v}}_1,{\mathbf {v}}_2\in V\) and \(\lambda \in \mathbb {R}\)
$$\displaystyle \begin{aligned} ({\mathbf{u}}_1,{\mathbf{v}}_1)+({\mathbf{u}}_2,{\mathbf{v}}_2)&:= ({\mathbf{u}}_1+{\mathbf{u}}_2,{\mathbf{v}}_1+{\mathbf{v}}_2),{} \end{aligned} $$
(1)
$$\displaystyle \begin{aligned} \lambda(\mathbf{u},\mathbf{v})&:=(\lambda\mathbf{u},\lambda\mathbf{v}).{} \end{aligned} $$
(2)
One can check that \(U\times V\), equipped with the operations from (1) and (2), is a vector space, which it is customary to denote as \(U\oplus V\), i.e the direct sum of U and V .
Exercise 2.2
State the axioms that define a vector space and show that \(U\oplus V\) is indeed a vector space.
It can also be checked that if \(B_U\) and \(B_V\) are bases for U and V  (respectively) with \(0_U\) and \(0_V\) denoting the zero elements of U and V  (respectively) then the set
$$\displaystyle \begin{aligned} B = \{(\mathbf{u},0_V) : \mathbf{u}\in B_U\} \cup\{ (0_U,\mathbf{v}):\mathbf{v}\in B_V \} \end{aligned}$$
is a basis for \(U\oplus V\). It follows immediately that \(\operatorname {dim}(U\oplus V)=\operatorname {dim} (U)+\operatorname {dim} (V)\).
Example 2.3
Let \(U=\mathbb {R}^3\) and \(V=\mathcal {M}_{2\times 2}(\mathbb {R})\) be the space of real 2-by-2 matrices. A generic element of \(U\oplus V\) is an ordered pair \((\mathbf {u},\mathbf {A})\) for a vector \(\mathbf {u}\in \mathbb {R}^3\) and a matrix \(\mathbf {A}\in \mathcal {M}_{2\times 2}(\mathbb {R})\). A concrete example would be letting
$$\displaystyle \begin{aligned} (\mathbf{u},\mathbf{A}) = \left(\begin{bmatrix}3\\-1\\4\end{bmatrix}, \begin{bmatrix}1&-1\\2&3\end{bmatrix}\right) \qquad \text{and}\qquad (\mathbf{v},\mathbf{B}) = \left(\begin{bmatrix}-2\\1\\1\end{bmatrix}, \begin{bmatrix}5&3\\-1&2\end{bmatrix}\right) \end{aligned} $$
then we have
$$\displaystyle \begin{aligned} (\mathbf{u},\mathbf{A})+(\mathbf{v},\mathbf{B}) = \left( \begin{bmatrix}1\\0\\5\end{bmatrix},\begin{bmatrix}6&2\\1&5\end{bmatrix} \right). \end{aligned}$$
A basis for \(U\oplus V\) is (with a slight abuse of notation for what ‘0’ means)
$$\displaystyle \begin{aligned} & \left\{ \left( \begin{bmatrix}1\\0\\0\end{bmatrix},0 \right), \left( \begin{bmatrix}0\\1\\0\end{bmatrix}, 0 \right),\left( \begin{bmatrix}0\\0\\1\end{bmatrix},0 \right), \left( 0, \begin{bmatrix}1&0\\0&0\end{bmatrix} \right), \left( 0, \begin{bmatrix}0&1\\0&0\end{bmatrix} \right), \right. \\ & \qquad \quad \left. \left( 0, \begin{bmatrix}0&0\\1&0\end{bmatrix} \right), \left( 0, \begin{bmatrix}0&0\\0&1\end{bmatrix} \right) \right\}. \end{aligned} $$
We observe that indeed \(\operatorname {dim}(U\oplus V) = 7= \operatorname {dim}(\mathbb {R}^3)+\operatorname {dim} (\mathcal {M}_{2\times 2}(\mathbb {R}))\).
Remark 2.4 (On Notation)
Elements of \(U\oplus V\) can also be written additively, that is \(\mathbf {u}+\mathbf {v}\) denotes the vector \((\mathbf {u},\mathbf {v})\in U\oplus V\). This notation is harmless because of Definition 2.1, as it behaves in the expected way. We can then restate the two linearity equations (1) and (2) in more natural notation:
$$\displaystyle \begin{aligned} {\mathbf{u}}_1+{\mathbf{v}}_1+{\mathbf{u}}_2+{\mathbf{v}}_2 = {\mathbf{u}}_1+{\mathbf{u}}_2+{\mathbf{v}}_1+{\mathbf{v}}_2,\qquad \lambda(\mathbf{u}+\mathbf{v}) = \lambda \mathbf{u}+\lambda \mathbf{v}, \end{aligned}$$
and we note that there is no confusion in the first equality with the two different meanings of \(+\) since due to the associativity and commutativity of addition, the four terms can be rearranged in an arbitrary way to give the same result.3 Importantly, both the bracket notation, \((\mathbf {u},\mathbf {v})\), and the additive notation, \(\mathbf {u}+\mathbf {v}\), are used interchangeably in the path signature community.
Definition 2.1 generalizes easily to a finite number of summands: if \(U_1,\dotsc ,U_n\) are vector spaces, the set \(U_1\times \dotsm \times U_n\) carries a linear structure given by componentwise addition and multiplication by scalars. The resulting vector space is denoted by \(U_1\oplus \dotsb \oplus U_n\). Extending this concept to infinite families requires some care.
Definition 2.5
Consider an infinite index set I and let \((U_i:i\in I)\) be a family of vector spaces. The direct sum is defined to be the set of all sequences \(({\mathbf {u}}_i:i\in I)\) such that \({\mathbf {u}}_i\neq 0\) for finitely many indices \(i\in I\). Addition and scalar multiplication are defined componentwise. The resulting vector space is denoted by
$$\displaystyle \begin{aligned} \bigoplus_{i\in I}U_i. \end{aligned}$$
The finiteness constraint in this definition means that direct sum is a subset of the Cartesian product of the spaces, that is,
$$\displaystyle \begin{aligned} \bigoplus_{i\in I}U_i\subseteq\prod_{i\in I}U_i, \end{aligned}$$
where \(\prod _{i\in I}U_i\) is simply the set of all sequences indexed by I.
For our purposes it will be enough to consider countable families of vector spaces, that is, we will take \(I=\mathbb {N}\). In this case, elements of \(\bigoplus _{n\in \mathbb {N}}U_n\) may be denoted as
$$\displaystyle \begin{aligned} ({\mathbf{u}}_0,{\mathbf{u}}_1,{\mathbf{u}}_2,\dotsc) \end{aligned}$$
with the convention that there is only a finite number of non-zero elements in the sequence. The Cartesian product also carries the same linear structure and is indeed a vector space, whose elements consist of arbitrary \(\mathbb {N}\)-indexed sequences, which are still denoted as above where all entries may be non-zero.
It should be noted that in the case that \(I\subset \mathbb {N}\) is a finite set, say \(I=\{1,\dotsc ,N\}\), both spaces coincide but the inclusion becomes strict as soon as I is countable. In particular, when dealing with finite collections of spaces there is no ambiguity in the notation.
The main application of direct sums in the world of signatures is to decompose a vector space in terms of subspaces of objects sharing similar “shape” properties.
Definition 2.6
A vector space V  is said to be graded if it can be decomposed as a direct sum:
$$\displaystyle \begin{aligned} V = \bigoplus_{n\in\mathbb{N}}V_n. \end{aligned}$$
The subspace \(V_n\) is called the homogeneous component of degreen. For \(v\in V_n\) we write \(|v|=n\) for its degree.
We also note that this definition includes the case of finitely many summands, in which case there is \(N\in \mathbb {N}\) such that \(V_n=\{0\}\) for all \(n>N\).

2.2 Tensor Product and Tensors

We have now seen how the direct product is one way of equipping the Cartesian product of two vector spaces with a vector space structure. There is another way, the tensor product, in many ways similar, but with a structure such that it is compatible with multilinear functions in a way to be made precise later.
Definition 2.7
Let \(U,V\) be vector spaces. On the set \(U\times V\) define the following bilinearity operations: for \(\mathbf {u}, {\mathbf {u}}_1, {\mathbf {u}}_2\in U\), \(\mathbf {v}, {\mathbf {v}}_1, {\mathbf {v}}_2\in V\) and \(\lambda \in \mathbb {R}\)
$$\displaystyle \begin{aligned} ({\mathbf{u}}_1, \mathbf{v}) + ({\mathbf{u}}_2, \mathbf{v}) &:= ({\mathbf{u}}_1 + {\mathbf{u}}_2, \mathbf{v}), \\ (\mathbf{u}, {\mathbf{v}}_1) + (\mathbf{u}, {\mathbf{v}}_2) &:= (\mathbf{u}, {\mathbf{v}}_1 + {\mathbf{v}}_2), \\ \lambda(\mathbf{u},\mathbf{v}) &:=(\lambda\mathbf{u},\mathbf{v}) \quad \text{and}^4 \quad \lambda(\mathbf{u},\mathbf{v}):= (\mathbf{u},\lambda \mathbf{v}). \end{aligned} $$
4 As in the direct sum case (Definition 2.1), it can be verified that \(U\times V\)equipped with the bilinearity operations is a vector space. We denote the tensor product of U and V  as \(U\otimes V\). For elements of the tensor product space, we write \(\mathbf {u} \otimes \mathbf {v}:= (\mathbf {u},\mathbf {v})\in U\otimes V\). By a slight abuse of language we also refer to \(\mathbf {u}\otimes \mathbf {v}\) as the tensor product of the vectors \(\mathbf {u}\in U\) and \(\mathbf {v}\in V\). As we will see later in Sect. 3, this name is justified.
Exercise 2.8
Show that \(U\otimes V\) is a vector space (recall Exercise 2.2).
Exercise 2.9
In this exercise we see why the \(\otimes \) notation is an intuitive way of writing the tensor product: (a) Rewrite the bilinearity operations of Definition 2.7 using the notation \(\mathbf {u} \otimes \mathbf {v}\) instead of \((\mathbf {u},\mathbf {v})\); (b) Expand \( ({\mathbf {u}}_1+{\mathbf {u}}_2)\otimes ({\mathbf {v}}_1+{\mathbf {v}}_2)\), where of course the addition \({\mathbf {u}}_1+{\mathbf {u}}_2\in U\) is simply the addition in U, and the same for V ; (c) We have \(\lambda \mathbf {u}\otimes \lambda \mathbf {v} \propto (\mathbf {u}\otimes \mathbf {v})\). What is the constant of proportionality?
Contrary to the direct sum case, for tensor products its not always possible to write \({\mathbf {u}}_1\otimes {\mathbf {v}}_1+{\mathbf {u}}_2\otimes {\mathbf {v}}_2\) as a single tensor product of a vector in U with a vector in V .5
Exercise 2.10
Show that \(0_U\otimes \mathbf {v}=\mathbf {u}\otimes 0_V=0_{U\otimes V}\) for all \(\mathbf {u}\in U, \mathbf {v}\in V\).
In the same vein as the comment offered just before Definition 2.5, the Definition 2.7 admits a straightforward generalization to finitely many vector spaces and extending this concept to infinite families requires some care. In particular, we write
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-97239-3_2/MediaObjects/617302_1_En_2_Equm_HTML.png
The image shows a mathematical formula representing the tensor product of a matrix U repeated times. The notation is U^{otimes } := U otimes U otimes cdots otimes U , with a brace underneath indicating " times." The symbol otimes denotes the tensor product.
Proposition 2.11
For\(U,V\)vector spaces with bases\(B_U,B_V\), respectively, the set
$$\displaystyle \begin{aligned} B = \{\mathbf{u}\otimes \mathbf{v}:\mathbf{u}\in B_U,\mathbf{v}\in B_V\} \end{aligned}$$
is a basis for\(U\otimes V\). It follows that\(\operatorname {dim}(U\otimes V)=\operatorname {dim} (U) \cdot \operatorname {dim} (V)\).
We can tell immediately by the dimension of the spaces that the direct product and tensor product produce fundamentally different vector structures on the same set.6 What distinguishes the tensor product vector space from all other possible linear structures is the following universal property.
Theorem 2.12
Let\(U,V,W\)be vector spaces, and let\(f\colon U\times V\to W\)be a bilinear map. That is, f satisfies for all\(\mathbf {u}, {\mathbf {u}}_1, {\mathbf {u}}_2\in U\), \(\mathbf {v}, {\mathbf {v}}_1, {\mathbf {v}}_2\in V\)and\(\lambda \in \mathbb {R}\)
$$\displaystyle \begin{gathered} f({\mathbf{u}}_1+{\mathbf{u}}_2,\mathbf{v}) = f({\mathbf{u}}_1, \mathbf{v}) + f({\mathbf{u}}_2, \mathbf{v}), \qquad f(\mathbf{u}, {\mathbf{v}}_1+{\mathbf{v}}_2) = f(\mathbf{u}, {\mathbf{v}}_1)+f(\mathbf{u}, {\mathbf{v}}_2), \\ \mathit{\text{and}}\quad f(\lambda \mathbf{u}, \mathbf{v}) = f(\mathbf{u}, \lambda \mathbf{v}) = \lambda f(\mathbf{u},\mathbf{v}). \end{gathered} $$
Then, there exists a uniquelinearfunction\(\hat f\colon U\otimes V\to W\)such that\(f(\mathbf {u},\mathbf {v}) = \hat f(\mathbf {u}\otimes \mathbf {v})\).
We say that bilinear functions factor through the tensor product. In fact, the tensor product is characterized by this property, in the sense that any other vector space Z equipped with a map \(\, \underline {\otimes }\,\colon U\times V\to Z\) factorizing bilinear functions must be isomorphic to \(U\otimes V\). In other words, the tensor product is the unique (up to isomorphism) vector space having this property. For the sake of simplicity we will omit the proof of this result, but refer the interested reader to the classical texts [1, 2]. At first sight bilinear maps are not quite as powerful as linear maps, nonetheless, the universal property fixes things as it allows to write the bilinear map as a linear map of the tensor product and thus recovers the neat results of linear maps that were not available (at the cost of using tensor products).
The next example highlights the difference between direct sums and tensor products.
Example 2.13 (Direct Sums \(\oplus \) vs. Tensor Products \(\otimes \))
Let \(U=V=\mathbb {R}\). The direct sum \(U\oplus V\) satisfies \(U\oplus V\cong \mathbb {R}^2\). Indeed, elements of \(U\oplus V\) are ordered pairs of real numbers with component-wise addition and scalar multiplication. Moreover, a basis for \(U\oplus V\) is \(\{(1,0),(0,1)\}\) which is the canonical basis of \(\mathbb {R}^2\), so \(\operatorname {dim}(U\oplus V)=2\).
On the contrary, we will show now that \(U\otimes V\) satisfies \(U\otimes V\cong \mathbb {R}\) which is obviously not\(\mathbb {R}^2 \cong U\oplus V\). Consider the map \(\varphi \colon U\times V\to \mathbb {R}\) given by \(\varphi (x,y)=xy\); this map \(\varphi (x,y)\) is clearly bilinear. By the universal property, there exists a unique map \(\hat {\varphi }\colon U\otimes V\to \mathbb {R}\), given by \(\hat {\varphi }(x\otimes y)=xy\). The map \(\varphi \) is injective since the equation \(\hat {\varphi }(x\otimes y)=0\) implies that either \(x=0\) or \(y=0\); in any case \(x\otimes y=0\) by Exercise 2.10. Finally, if \(\lambda \in \mathbb {R}\) then \(\hat {\varphi }(\lambda \otimes 1)=\lambda \) so that \(\hat {\varphi }\) is surjective. In particular, \(\mathbb {R}\otimes \mathbb {R}\) is spanned by the vector \(1\otimes 1\) so that \(\operatorname {dim}(U\otimes V)=1\).
Exercise 2.14
Show that if U is any vector space, then \(U\otimes \mathbb {R}\) and \(\mathbb {R}\otimes U\) are isomorphic to U. (Hint: generalize Example2.13.)
The word tensor has many different meanings across different fields [1, 2, 4]. We will mostly be interested in the case where we are taking tensor products of a finite number of finite-dimensional vector spaces, represented as \(\mathbb {R}^d\) for some integer \(d\ge 1\). In this setting, tensors may be represented in a simpler, more concrete way, by working with canonical bases. It is at this specific juncture (assuming a basis) that it is intuitive to define a tensor as a multidimensional array—see Remark 2.16.
Definition 2.15 (Tensor: Order and Shape)
Take \(n\ge 1\), set \(d_1,\dotsc ,d_n\ge 1\) and take the associated vector spaces \(\mathbb {R}^{d_j}\) for \(j=1,\dotsc ,n\). An orderntensor of shape\((d_1,\dotsc ,d_n)\) is an element of the tensor product \(\mathbb {R}^{d_1}\otimes \dotsm \otimes \mathbb {R}^{d_n}\).
It should be clear that elements of, say, \(\mathbb {R}^{2}\otimes \mathbb {R}^{4}\), \(\mathbb {R}^{4}\otimes \mathbb {R}^{2}\), and \(\mathbb {R}^3 \otimes \mathbb {R}^3\), are all order 2 tensors, but their shapes are all very different—and, just like matrices, they cannot be added together as the shapes do not match.
Remark 2.16 (Basis, Vectors and Their Components, and Some Notation)
Denote by \(\{{\mathbf {e}}_1,\dotsc ,{\mathbf {e}}_{d}\}\) the canonical basis of \(\mathbb {R}^{d}\) for some \(d\geq 1\). We have seen that the set
$$\displaystyle \begin{aligned} \left\{{\mathbf{e}}_{i_1}\otimes\dotsm\otimes {\mathbf{e}}_{i_n}:i_j\in \{1,\dotsc,d_j\}\text{ for all }j=1,\dotsc,n\right\} \end{aligned}$$
is a basis for \(\mathbb {R}^{d_1}\otimes \dotsm \otimes \mathbb {R}^{d_n}\), which we call the canonical basis. In particular, an order n tensor \(\mathbf {T}\) is determined by the n-dimensional array of its coefficients in this basis:
$$\displaystyle \begin{aligned} \mathbf{T} = \sum_{i_1=1}^{d_1} \dotsb \sum_{i_n=1}^{d_n} {\mathbf{T}}^{i_1 \dotsb i_n}{\mathbf{e}}_{i_1}\otimes\dotsm\otimes {\mathbf{e}}_{i_n}. \end{aligned}$$
As long as we keep this in mind, the assignment \(\mathbf {T}\mapsto ({\mathbf {T}}^{i_1 \dotsb i_n})\) defines a one-to-one correspondence between elements of the tensor product \(\mathbf {T}\in \mathbb {R}^{d_1}\otimes \dotsm \otimes \mathbb {R}^{d_n}\) and multidimensional arrays \(({\mathbf {T}}^{i_1\dotsm i_n})\in \mathbb {R}^{d_1\times \dotsm \times d_n}\). We remark once again that this isomorphism depends on the fixing of a basis and is, in general, not canonical.
Thus, notation wise, we shall refer to tensors using either a symbol (i.e. \(\mathbf {T}\)) or in component notation (i.e. \({\mathbf {T}}^{ijk}\)—using superscript notation for its components); for multiple tensors or vectors we use subscript notation, i.e., \({\mathbf {T}}_1,{\mathbf {T}}_2,\dotsc \) or \({\mathbf {u}}_1,{\mathbf {u}}_2,\dotsc \).
For example, \({\mathbf {C}}^{ij}\), \({\mathbf {T}}^{ijk}\) and \({\mathbf {Q}}^{ijkp}\) refer to the components of tensors \(\mathbf {C}\), \(\mathbf {T}\), and \(\mathbf {Q}\) of order 2, 3, and 4 respectively. In particular, the order 1 tensors \({\mathbf {e}}_1,\dotsc ,{\mathbf {e}}_d\) constitute the canonical basis of \(\mathbb {R}^d\). The ith basis vector \({\mathbf {e}}_i\) has components given in the canonical basis by
$$\displaystyle \begin{aligned} {\mathbf{e}}_i^j=\begin{cases}1&j=i\\0&\text{else}\end{cases} \qquad \text{for }j=1,\dotsc,d. \end{aligned}$$
Example 2.17
We see that tensors of order 1 and 2 can be identified with column vectors and matrices, respectively. A scalar is by convention an order 0 tensor. For example, \(\mathbf {u}\), \(\mathbf {A}\), and \(\mathbf {T}\) are tensors of order \(1, \,2,\) and 3 respectively:
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-97239-3_2/MediaObjects/617302_1_En_2_Figa_HTML.png
The image displays mathematical notation involving vectors, matrices, and tensors. It includes a vector \( \mathbf{u} = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} \in \mathbb{R}^3 \), a matrix \( \mathbf{A} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \in \mathbb{R}^2 \otimes \mathbb{R}^2 = (\mathbb{R}^2)^{\otimes 2} \cong \mathbb{R}^{2 \times 2} \), and a tensor \( \mathbf{T} \) represented as a 3D array with elements 1 to 8, indicating \( \mathbf{T} \in (\mathbb{R}^2)^{\otimes 3} \cong \mathbb{R}^{2 \times 2 \times 2} \). The image illustrates the relationships between these mathematical structures.
(3)
Remark 2.18
The tensor is an intrinsic object, in the sense that tensors do not depend on any choice of basis. From Remark 2.16, we see that a tensor can be uniquely described by a multidimensional array of numbers, but this is only true once we have fixed a basis. For many applications, it is possible, and practical, to think of tensors only in the terms of their components in a particular (e.g. the canonical) basis. Nonetheless, we encourage the reader to be mindful with the language and recognize when they are being loose with the concepts. This is exactly the same idea to how we can think of linear transformations from \(\mathbb {R}^n\) to \(\mathbb {R}^n\) in terms of an n-by-n square matrix, once we have fixed a basis. For example, the linear map \((x_1,x_2)\mapsto (x_1+x_2,x_1-x_2)\) can be represented by the matrix \(\begin {pmatrix}1&1\\1&-1\end {pmatrix}\) in the standard basis, but in the eigenbasis, it is represented by the diagonal matrix \(\begin {pmatrix}\sqrt {2}&0\\ 0&-\sqrt {2}\end {pmatrix}\). The characterizing property of tensors is that given a change of basis, we immediately know how the coordinate representation transforms. In this example we can go from the first to the second matrix representation by multiplication with an invertible matrix (diagonalization).
Writing explicit examples for the tensor product quickly becomes cumbersome, nevertheless we offer a few simple examples.
Example 2.19 (Tensor Multiplication)
Let \(\mathbf {u}\in \mathbb {R}^2\), \(\mathbf {v}\in \mathbb {R}^3\) be vectors, that is, order 1 tensors of shapes \((2)\) and \((3)\), respectively. By definition, their tensor product is an order 2 tensor of shape \((2,3)\): \(\mathbf {u}\otimes \mathbf {v}\in \mathbb {R}^2\otimes \mathbb {R}^3\). In the canonical basis they are represented by 2 and 3 coefficients, respectively. Namely
$$\displaystyle \begin{aligned} \mathbf{u}={\mathbf{u}}^1{\mathbf{e}}_1+{\mathbf{u}}^2{\mathbf{e}}_2,\quad \mathbf{v}={\mathbf{v}}^1{\mathbf{e}}_1+{\mathbf{v}}^2{\mathbf{e}}_2+{\mathbf{v}}^3{\mathbf{e}}_3, \end{aligned}$$
or in the more traditional column vector notation,
$$\displaystyle \begin{aligned} \mathbf{u}={\mathbf{u}}^1\begin{bmatrix}1\\0\end{bmatrix}+{\mathbf{u}}^2\begin{bmatrix}0\\1\end{bmatrix}=\begin{bmatrix}{\mathbf{u}}^1\\{\mathbf{u}}^2\end{bmatrix}\quad \text{ and }\quad \mathbf{v}=\begin{bmatrix} {\mathbf{v}}^1\\{\mathbf{v}}^2\\{\mathbf{v}}^3\end{bmatrix}. \end{aligned}$$
By using the bilinearity of the tensor product (Definition 2.7) we may obtain the coordinates of \(\mathbf {u}\otimes \mathbf {v}\) in the canonical basis. Indeed, recalling that the components of both vectors are scalars,
$$\displaystyle \begin{aligned} {} \begin{aligned} \mathbf{u}\otimes\mathbf{v} &= \left( {\mathbf{u}}^1{\mathbf{e}}_1+{\mathbf{u}}^2{\mathbf{e}}_2 \right)\otimes\left( {\mathbf{v}}^1{\mathbf{e}}_1+{\mathbf{v}}^2{\mathbf{e}}_2+{\mathbf{v}}^3{\mathbf{e}}_3 \right)\\ &= \begin{array}{l}\displaystyle {\mathbf{u}}^1{\mathbf{v}}^1{\mathbf{e}}_1\otimes{\mathbf{e}}_1+{\mathbf{u}}^1{\mathbf{v}}^2{\mathbf{e}}_1\otimes{\mathbf{e}}_2+{\mathbf{u}}^1{\mathbf{v}}^3{\mathbf{e}}_1\otimes{\mathbf{e}}_3\\\displaystyle +{\mathbf{u}}^2{\mathbf{v}}^1{\mathbf{e}}_2\otimes{\mathbf{e}}_1+{\mathbf{u}}^2{\mathbf{v}}^2{\mathbf{e}}_2\otimes{\mathbf{e}}_2+{\mathbf{u}}^2{\mathbf{v}}^3{\mathbf{e}}_2\otimes{\mathbf{e}}_3. \end{array} \end{aligned} \end{aligned} $$
(4)
Thus, in the canonical basis the order 2 tensor \(\mathbf {u}\otimes \mathbf {v}\) has components given by \((\mathbf {u}\otimes \mathbf {v})^{ij}={\mathbf {u}}^i{\mathbf {v}}^j\). Identifying the canonical basis of order 2 tensors with matrices (i.e., \({\mathbf {e}}_i \otimes {\mathbf {e}}_j\) forming the standard 2-by-2 matrix basis), we may write
$$\displaystyle \begin{aligned} \mathbf{u}\otimes\mathbf{v} = \begin{bmatrix} {\mathbf{u}}^1{\mathbf{v}}^1\ & {\mathbf{u}}^1{\mathbf{v}}^2\ & {\mathbf{u}}^1{\mathbf{v}}^3 \\ {\mathbf{u}}^2{\mathbf{v}}^1\ & {\mathbf{u}}^2{\mathbf{v}}^2\ & {\mathbf{u}}^2{\mathbf{v}}^3 \end{bmatrix} \quad \text{and likewise}\quad \begin{bmatrix} {\mathbf{v}}^1{\mathbf{u}}^1\ & {\mathbf{v}}^1{\mathbf{u}}^2 \\ {\mathbf{v}}^2{\mathbf{u}}^1 &{\mathbf{v}}^2{\mathbf{u}}^2 \\{\mathbf{v}}^3{\mathbf{u}}^1&{\mathbf{v}}^3{\mathbf{u}}^2\end{bmatrix}\in\mathbb{R}^3\otimes\mathbb{R}^2. \end{aligned}$$
Consider now the matrix
$$\displaystyle \begin{aligned} \mathbf{A}:=\begin{bmatrix}{\mathbf{A}}^{11}\ &{\mathbf{A}}^{12}\\{\mathbf{A}}^{21}&{\mathbf{A}}^{22}\end{bmatrix}\in\mathbb{R}^2\otimes\mathbb{R}^2. \end{aligned}$$
Where, as before, the entries in the matrix notation corresponds to the coordinates in the canonical order 2 tensor basis: \(\mathbf {A}= {\mathbf {A}}^{11}{\mathbf {e}}_1\otimes {\mathbf {e}}_1 + {\mathbf {A}}^{12}{\mathbf {e}}_1\otimes {\mathbf {e}}_2 + {\mathbf {A}}^{21}{\mathbf {e}}_2\otimes {\mathbf {e}}_1 +{\mathbf {A}}^{22}{\mathbf {e}}_2\otimes {\mathbf {e}}_2 \). Note that although \(\mathbf {A}\) is an element of \(\mathbb {R}^2\otimes \mathbb {R}^2\), it does not necessarily mean that it can be written as \(\mathbf {u}\otimes \mathbf {v}\) for some \(\mathbf {u},\mathbf {v}\in \mathbb {R}^2\). That is, the components \({\mathbf {A}}^{ij}\) are not necessarily of the form \({\mathbf {A}}^{ij}={\mathbf {u}}^i{\mathbf {v}}^j\) for some vectors \(\mathbf {u}\) and \(\mathbf {v}\) of the appropriate dimensions. In the cases where it is possible to find such a decomposition, we say that \(\mathbf {A}\) is a rank 1 tensor. On the other hand, \(\mathbf {A}\) can always be written as a linear combination of sums of tensor products of some \({\mathbf {u}}_i,{\mathbf {v}}_i\in \mathbb {R}^2\), that is, sums of rank 1 tensors.
Continuing with the example, we may compute
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-97239-3_2/MediaObjects/617302_1_En_2_Figb_HTML.png
Mathematical notation depicting a tensor product. The equation is \( \mathbf{u} \otimes \mathbf{A} = \begin{bmatrix} \mathbf{u}^1 \mathbf{A}^{11} & \mathbf{u}^1 \mathbf{A}^{12} \\ \mathbf{u}^1 \mathbf{A}^{21} & \mathbf{u}^1 \mathbf{A}^{22} \end{bmatrix} \begin{bmatrix} \mathbf{u}^2 \mathbf{A}^{11} & \mathbf{u}^2 \mathbf{A}^{12} \\ \mathbf{u}^2 \mathbf{A}^{21} & \mathbf{u}^2 \mathbf{A}^{22} \end{bmatrix} \in \mathbb{R}^2 \otimes \mathbb{R}^2 \otimes \mathbb{R}^2 = (\mathbb{R}^2)^{\otimes 3} \). The image shows a 3D representation of the tensor product with matrices and vectors.
where the expression on the right-hand side is a matrix-like notation for organizing the components of the order 3 tensor \(\mathbf {u}\otimes \mathbf {A}\).
We will later see, in Sect. 3, that the computation performed in Eq. (4) makes use of a larger structure. The tensor product can be thought of as a non-commutative analogue of polynomial multiplication, in the sense that it is an associative and bilinear operation. Note that it is, however, not commutative as the results of \(\mathbf {u}\otimes \mathbf {v}\) and \(\mathbf {v}\otimes \mathbf {u}\) are order 2 tensors of different shapes, at least in this example. In general, the results may differ even thought the shapes match.
Example 2.20 (Connecting to Signatures of Chapter “A Primer on the Signature Method in Machine Learning”)
Suppose \({\mathbf {x}}\colon [0,1]\to \mathbb {R}^d\) is a smooth vector-valued path, which simply means, e.g., that for each \(t\in [0,1]\) we may write \({\mathbf {x}}_t=({\mathbf {x}}^1_t,\dotsc ,{\mathbf {x}}^d_t)\in \mathbb {R}^d\) in the canonical basis. In particular, for each \(t\in [0,1]\), \({\mathbf {x}}_t\) is a tensor of order 1.
We may use the tensor product to compactly write the collection of iterated integrals of \(\mathbf {x}\). That is,
$$\displaystyle \begin{aligned} \int_0^t\int_0^s\mathrm{d}{\mathbf{x}}_u\otimes\mathrm{d}{\mathbf{x}}_s\in\mathbb{R}^d\otimes\mathbb{R}^d \end{aligned}$$
is an order 2 tensor (i.e. a d-by-d matrix) with components (for \(\mathbf {x}\) “sufficiently nice”)
$$\displaystyle \begin{aligned} \left( \int_0^t\int_0^s\mathrm{d}{\mathbf{x}}_u\otimes\mathrm{d}{\mathbf{x}}_s \right)^{ij} = \int_0^t\int_0^s\dot{x}^i_u\dot{x}^j_s\,\mathrm{d}u\mathrm{d}s. \end{aligned}$$
Before delving into tensors properties we can go back to “Question 45: What is a tensor?” of the “100 Questions: A Mathematical Conventions Survey”. We hope to have convinced the reader that a tensor is nothing other than “an element of a tensor product of vector spaces”.

3 A Little Bit More on Tensors: The Tensor Algebra

We now venture into a few additional properties of tensors.
Definition 3.1
An associative algebra is a vector space A equipped with a bilinear map \(m\colon A\times A\to A\), called product, satisfying the associativity condition
$$\displaystyle \begin{aligned} m\big(m(\mathbf{x},\mathbf{y}),\mathbf{z}\big)=m\big(\mathbf{x}, m(\mathbf{y},\mathbf{z})\big). \end{aligned}$$
The universal property of the tensor product (see Theorem 2.12) yields that equivalently, it may be represented by a linear map \(m\colon A\otimes A\to A\), which is the one we will use from now on.
We say that an algebra A is unital if it has a distinguished element \(1_A\in A\), called the unit, satisfying for all \(\mathbf {x}\in A\)
$$\displaystyle \begin{aligned} m(1_A\otimes \mathbf{x}) = \mathbf{x} = m(\mathbf{x}\otimes 1_A). \end{aligned}$$
It is customary to write the product, denoted \(\cdot _A\), of two elements of A using infix notation,7 that is, \(\mathbf {x}\cdot _A\mathbf {y}:= m(\mathbf {x}\otimes \mathbf {y})\). Oftentimes, when no confusion can arise, we write simply \(\mathbf {x}\cdot \mathbf {y}\) or even omit the symbol completely and just write \(\mathbf {xy}\) instead. As we will mostly work with unital associative algebras, from here on we simply write algebra. In this notation, the condition for \((A,\cdot _A)\) to be an algebra can be written as
$$\displaystyle \begin{aligned} (\mathbf{x}\cdot_A \mathbf{y})\cdot_A \mathbf{z} = \mathbf{x}\cdot_A(\mathbf{y}\cdot_A \mathbf{z}) \end{aligned}$$
for all \(\mathbf {x},\mathbf {y},\mathbf {z}\in A\) and the unit satisfies \(1\cdot _A \mathbf {x}=\mathbf {x}\cdot _A 1=\mathbf {x}\) for all \(\mathbf {x}\in A\). We note that bilinearity of the product translates into the distributivity of \(\cdot _A\) over \(+\), e.g.,
$$\displaystyle \begin{aligned} (\mathbf{x}+\mathbf{y})\cdot_A\mathbf{z} = \mathbf{x}\cdot_A\mathbf{z}+\mathbf{y}\cdot_A\mathbf{z},\quad \lambda(\mathbf{x}\cdot_A\mathbf{y})=(\lambda\mathbf{x})\cdot_A\mathbf{y}=\mathbf{x}\cdot_A(\lambda\mathbf{y}), \end{aligned}$$
and so on. We also remark that we do not require the product to be commutative, that is, we do not enforce that \(\mathbf {x}\cdot _A\mathbf {y}=\mathbf {y}\cdot _A\mathbf {x}\) for every \(\mathbf {x},\mathbf {y}\in A\), although this may hold for some pairs of elements. In case this identity does hold for every \(\mathbf {x},\mathbf {y}\in A\) we say A is commutative (or Abelian).
Example 3.2
Let \(\mathcal {M}_{n\times n}(\mathbb {R})\) be the space of n-by-n square matrices with real entries with its usual vector space structure (entry-wise addition and scalar multiplication). The matrix product \(\mathbf {A}\cdot \mathbf {B}:= \mathbf {A}\mathbf {B}\) equips A with the structure of a (non-commutative) associative algebra with unit \(1_A={\mathbf {I}}_n\), the n-by-n identity matrix.
Example 3.3
Denote by \(\mathbb {R}[x]\) the space of polynomials in a single variable x, and for polynomials \(\mathbf {p}(x)\) and \(\mathbf {q}(x)\) define the multiplication rule by
$$\displaystyle \begin{aligned} (\mathbf{p}\cdot\mathbf{q})(x):=\mathbf{p}(x)\mathbf{q}(x). \end{aligned}$$
It is clear that this multiplication is bilinear in \(\mathbf {p}\) and \(\mathbf {q}\) and satisfies the associativity condition. The unit for this product is the constant polynomial \(1_A(x)=1\). For instance
$$\displaystyle \begin{aligned} (x^3+1)\cdot(x^2+x) = x^5 + x^4 + x^2 + x. \end{aligned}$$
In fact, since the monomials \(\{x^n:n\ge 1\}\) form a linear basis for A, the four terms on the right-hand side correspond simply to \(x^3\cdot x^2\), \(x^3\cdot x\) and so on, where we use the bilinearity of the product. This is an example of a commutative algebra.
Example 3.4
Many structures with which one is already very familiar are just algebras in disguise, Table 1 gives a few examples.
Table 1
Various examples of algebras. Some authors use the term algebra to refer to a vector space equipped with any bilinear operation, not necessarily associative. In that sense, the cross product is an algebra that is not associative
Vector space
Bilinear operator
Associative
Commutative
Unitary
\(\mathbb {C}\)
Complex product
Yes
Yes
Yes
\(\mathbb {R}^3\)
Vector cross product: \(\mathbf {a} \times \mathbf {b}\)
No
No
No
\(\mathbb {R}[x]\)
Multiplication
Yes
Yes
Yes
\(\mathcal {M}_{n\times n} (\mathbb {R})\)
Matrix multiplication
Yes
No
Yes
The prime example of an algebra in relation to signatures is the tensor algebra.
Definition 3.5
Let V  be a finite-dimensional vector space. The tensor algebra overV  is the vector space
$$\displaystyle \begin{aligned} T(V):= \bigoplus_{n\ge 0}V^{\otimes n} \quad \text{with}\ V^{\otimes 0}\cong\mathbb{R}\mathbf{1}. \end{aligned}$$
The product is simply the tensor product, and its unit is the vector \(\mathbf {1}\) spanning \(V^{\otimes 0}\).
The tensor algebra is therefore a graded vector space in the sense of Definition 2.6, where order n tensors are placed in degree n. We stress the fact (see Definition 2.5) that elements of \(T(V)\) are finite sequences of tensors of arbitrary order. For this reason, vectors in \(T(V)\) are usually called tensor (or non-commutative) polynomials. Later on we will construct the space of tensor series, which are infinite sequences.
Remark 3.6
When \(V=\mathbb {R}^d\) the product can be written more explicitly in terms of the canonical basis \(\{{\mathbf {e}}_1,\dotsc ,{\mathbf {e}}_d\}\). Introducing the word notation, recall Example 1.5 in chapter “A Primer on the Signature Method in Machine Learning”, \({\mathbf {e}}_{i_1\dotsm i_n}:= {\mathbf {e}}_{i_1}\otimes \dotsm \otimes {\mathbf {e}}_{i_n}\in (\mathbb {R}^d)^{\otimes n}\) for \((i_1,\dotsc ,i_n)\in \{1,\dotsc ,d\}^n\), the product (denoted for now by \(\cdot _{T(V)}\)) is then defined as
$$\displaystyle \begin{aligned} {\mathbf{e}}_{i_1\dotsm i_n}\cdot_{T(V)}{\mathbf{e}}_{j_1\dotsm j_m} = {\mathbf{e}}_{i_1\dotsm i_nj_1\dotsm j_m}. \end{aligned}$$
For this reason it is commonly known as the concatenation product. In this case, it corresponds to the product introduced in Definition 2.7. Common notations for \(\mathbf {x}\cdot _{T(V)}\mathbf {y}\) include \(\mathbf {x}\otimes \mathbf {y}\) and \(\mathbf {x}\mathbf {y}\).
Theorem 3.7
The tensor algebra enjoys the following universal property: given any algebra A and any linear map\(f\colon V\to A\), there exists a unique map\(\hat {f}\colon T(V)\to A\), such that\(f(\mathbf {u}\otimes \mathbf {v})=f(\mathbf {u})\cdot _A f(\mathbf {v})\)for all\(\mathbf {u},\mathbf {v}\in T(V)\).
As is the case with the tensor product, this property actually characterizes the tensor algebra in the sense that any other algebra satisfying this property is necessarily isomorphic to \(T(V)\) for some vector space V .
We note that even though V  is finite-dimensional, \(T(V)\) is always infinite dimensional since, owing to Proposition 2.11,
$$\displaystyle \begin{aligned} \operatorname{dim} V^{\otimes n}=\left( \operatorname{dim} V \right)^n. \end{aligned}$$
For this reason, while the tensor algebra is a neat theoretical construction, it is not very useful for practical purposes. There are a couple of ways of obtaining finite-dimensional versions of \(T(V)\) which preserve its structure. The most common in signature applications is truncation. The basic idea is that we want to preserve “low order” information while still retaining the algebra structure, where the meaning of “order” is in the sense of tensor level. Luckily, the straightforward idea of just discarding high-order information works, with the caveat that the product has to be slightly modified.
Definition 3.8
Given \(N\ge 1\), the level-Ntruncated tensor algebra is the finite-dimensional graded vector space (recall Definition 2.6)
$$\displaystyle \begin{aligned} T^N(V):=\bigoplus_{n=0}^NV^{\otimes n} \quad \text{with product}\quad \mathbf{x}\cdot_N\mathbf{y}=\begin{cases}\mathbf{x}\otimes\mathbf{y}&\text{ if }|\mathbf{x}|+|\mathbf{y}|\le N\\ 0&\text{ else }.\end{cases}. \end{aligned}$$
Following from Definition 2.6, we note that in particular every element of \(T^N(V)\) can be written as a sequence of homogeneous elements, that is every \(\mathbf {v}\in T^N(V)\) is of the form \(\mathbf {v}=({\mathbf {v}}_0,{\mathbf {v}}_1,\dotsc ,{\mathbf {v}}_N) \) with \({\mathbf {v}}_n\in V^{\otimes n}\) (with some of them eventually zero). Hence, the product \(\cdot _N\) is well-defined for all \(\mathbf {x},\mathbf {y}\in T^N(V)\) and not just for homogeneous tensors—elements in \(T^N(V)\) are thus finite sequence of tensors of order up to N, with componentwise addition and multiplication by scalars.
It can be checked that \(T^N(V)\) is an algebra8 and
$$\displaystyle \begin{aligned} \operatorname{dim} T^N(V) = \frac{d^{N+1}-1}{d-1} \quad \text{where} \ \operatorname{dim} V=d. \end{aligned}$$
Example 3.9
Let us take \(N=2\) and \(V=\mathbb {R}^d\). The space \(T^2(V)\cong \mathbb {R}\mathbf {1}\oplus \mathbb {R}^d\oplus (\mathbb {R}^d)^{\otimes 2}\) consists of elements of the form \((a,\mathbf {x},\mathbf {A})\) with \(a\in \mathbb {R}\), \(\mathbf {x}\in \mathbb {R}^d\) and \(\mathbf {A}\in \mathcal {M}_{d\times d}(\mathbb {R})\).9
The product reads
$$\displaystyle \begin{aligned} (a,\mathbf{x},\mathbf{A})\otimes(a',\mathbf{x}',\mathbf{A}') = (aa', a\mathbf{x}'+a'\mathbf{x}, a'\mathbf{A} + a\mathbf{A}' + \mathbf{x}\otimes \mathbf{x}'). \end{aligned}$$
We remark that this product is not commutative, meaning that in general the above expression will be different from that of \((a',\mathbf {x}',\mathbf {A}')\otimes (a,\mathbf {x},\mathbf {A})\).
Exercise 3.10
Let \(N=2\) and \(V=\mathbb {R}^d\). Show that an element \((a,\mathbf {x},\mathbf {A})\in T^2(V)\) is invertible if and only if \(a\neq 0\), and compute its inverse.
Definition 3.11
The extended tensor algebra is the direct product
$$\displaystyle \begin{aligned} T\left(\mkern-3mu\left( V\right)\mkern-3mu\right) := \prod_{n=0}^{\infty}V^{\otimes n}. \end{aligned}$$
We identify \(T\left (\mkern -3mu\left ( V\right )\mkern -3mu\right )\) with the space of infinite sequences \(\mathbf {u}=({\mathbf {u}}_0,{\mathbf {u}}_1,\dotsc )\) with \({\mathbf {u}}_0\in \mathbb {R}\), \({\mathbf {u}}_1\in V\), and so on. The product is induced by the product on \(T(V)\) and is given, for \(\mathbf {u}=({\mathbf {u}}_0,{\mathbf {u}}_1,\dotsc )\) and \(\mathbf {v}=({\mathbf {v}}_0,{\mathbf {v}}_1,\dotsc )\), by \(\mathbf {u}\mathbf {v}=\mathbf {w}=({\mathbf {w}}_0,{\mathbf {w}}_1,\dotsc )\) where
$$\displaystyle \begin{aligned} {\mathbf{w}}_n = \sum_{k=0}^{n}{\mathbf{u}}_k\otimes{\mathbf{v}}_{n-k}\in(V^*)^{\otimes n}. \end{aligned}$$
This product mimics polynomial multiplication and is sometimes called the Cauchy product for this reason. Since this space contains arbitrarily long sequences of tensors, its elements are commonly called tensor series. We note that the tensor algebra \(T(V)\) is a strict subspace of \(T\left (\mkern -3mu\left ( V\right )\mkern -3mu\right )\).
For each integer \(N\ge 1\) there is a canonical projection \(\pi _N\colon T\left (\mkern -3mu\left ( V\right )\mkern -3mu\right )\to T^N(V)\), preserving multiplication, given simply by discarding tensors of degree greater than N, that is,
$$\displaystyle \begin{aligned} \pi_N({\mathbf{u}}_0,{\mathbf{u}}_1,\dotsc) = ({\mathbf{u}}_0,{\mathbf{u}}_1,\dotsc,{\mathbf{u}}_N). \end{aligned}$$
In the realm of signatures, this projection is used to produce finite-dimensional versions of the signature (see Example 3.12 just below) that are suitable for its representation in a computer.
Example 3.12
Recall Example 2.20. Our prime example of an element in \(T\left (\mkern -3mu\left ( \mathbb {R}^d\right )\mkern -3mu\right )\) is the signature of a smooth \(\mathbb {R}^d\)-valued path \(\mathbf {x}\colon [0,1]\to \mathbb {R}^d\). Its signature over the interval \([s,t]\subseteq [0,1]\), denoted by \(S(\mathbf {x})_{s,t}\), is the tensor series of iterated integrals:
$$\displaystyle \begin{aligned} S(\mathbf{x})_{s,t}:=\left( 1,\int_s^t\mathrm{d}{\mathbf{x}}_u,\int_s^t\int_s^{u_2}\mathrm{d}{\mathbf{x}}_{u_1}\otimes \mathrm{d}{\mathbf{x}}_{u_2},\dotsc \right). \end{aligned}$$
Projecting to the level-2 truncated tensor algebra we get
$$\displaystyle \begin{aligned} \pi_2S(\mathbf{x})_{s,t}=\left( 1,\int_s^t\mathrm{d}{\mathbf{x}}_u, \int_s^t\int_s^{u_2}\mathrm{d}{\mathbf{x}}_{u_1}\otimes \mathrm{d}{\mathbf{x}}_{u_2}\right). \end{aligned}$$

4 Solutions to Exercises

Solution 4.1 (To Exercise 2.2)
We must check that the operations satisfy the axioms of a vector space, that is, that \(+\) is associative and commutative, and that scalar multiplication distributes over \(+\). Let \(({\mathbf {u}}_1,{\mathbf {v}}_1),({\mathbf {u}}_2,{\mathbf {v}}_2)\in U\oplus V\). Then
$$\displaystyle \begin{aligned} ({\mathbf{u}}_1,{\mathbf{v}}_1)+({\mathbf{u}}_2,{\mathbf{v}}_2) &= ({\mathbf{u}}_1+{\mathbf{u}}_2, {\mathbf{v}}_1+{\mathbf{v}}_2) = ({\mathbf{u}}_2+{\mathbf{u}}_1, {\mathbf{v}}_2+{\mathbf{v}}_1) \\ &= ({\mathbf{u}}_2,{\mathbf{v}}_2) + ({\mathbf{u}}_1,{\mathbf{v}}_1). \end{aligned} $$
Moreover, if \(({\mathbf {u}}_3,{\mathbf {v}}_3)\in U\oplus V\) then
$$\displaystyle \begin{aligned} (({\mathbf{u}}_1,{\mathbf{v}}_1)+({\mathbf{u}}_2,{\mathbf{v}}_2))+({\mathbf{u}}_3,{\mathbf{v}}_3) &= ({\mathbf{u}}_1+{\mathbf{u}}_2,{\mathbf{v}}_1+{\mathbf{v}}_2)+({\mathbf{u}}_3,{\mathbf{v}}_3) \\ & = ({\mathbf{u}}_1+{\mathbf{u}}_2+{\mathbf{u}}_3,{\mathbf{v}}_1+{\mathbf{v}}_2+{\mathbf{v}}_3) \\ &= ({\mathbf{u}}_1,{\mathbf{v}}_1)+({\mathbf{u}}_2+{\mathbf{u}}_3,{\mathbf{v}}_2+{\mathbf{v}}_3) \\ & = ({\mathbf{u}}_1,{\mathbf{v}}_1)+(({\mathbf{u}}_2,{\mathbf{v}}_2)+({\mathbf{u}}_3,{\mathbf{v}}_3)). \end{aligned} $$
Likewise, for any \(\lambda \in \mathbb {R}\) we have
$$\displaystyle \begin{aligned} \lambda(({\mathbf{u}}_1,{\mathbf{v}}_1)+({\mathbf{u}}_2,{\mathbf{v}}_2))&=\lambda({\mathbf{u}}_1+{\mathbf{u}}_2,{\mathbf{v}}_1+{\mathbf{v}}_2)\\ &= (\lambda({\mathbf{u}}_1+{\mathbf{u}}_2),\lambda({\mathbf{v}}_1+{\mathbf{v}}_2)) = (\lambda {\mathbf{u}}_1+\lambda {\mathbf{u}}_2,\lambda {\mathbf{v}}_1+\lambda {\mathbf{v}}_2) \\ &= (\lambda {\mathbf{u}}_1,\lambda {\mathbf{v}}_1) + (\lambda {\mathbf{u}}_2,\lambda {\mathbf{v}}_2) = \lambda({\mathbf{u}}_1,{\mathbf{v}}_1)+\lambda({\mathbf{u}}_2,{\mathbf{v}}_2). \end{aligned} $$
We have used throughout that U and V  are vector spaces.
Additive inverses are given simply by \(-(\mathbf {u},\mathbf {v}) = (-\mathbf {u},-\mathbf {v})\) while the neutral element is \(0_{U\oplus V}=(0_U,0_V)\).
Solution 4.2 (To Exercise 2.8)
We have to check that the operations satisfy the axioms of a vector space. Since addition is defined symmetrically, we only check one side. Let \({\mathbf {u}}_1,{\mathbf {u}}_2\in U\) and \(\mathbf {v}\in V\). Then
$$\displaystyle \begin{aligned} ({\mathbf{u}}_1,\mathbf{v}) + ({\mathbf{u}}_2,\mathbf{v}) = ({\mathbf{u}}_1+{\mathbf{u}}_2,\mathbf{v}) = ({\mathbf{u}}_2+{\mathbf{u}}_1,\mathbf{v}) = ({\mathbf{u}}_2, \mathbf{v}) + ({\mathbf{u}}_1,\mathbf{v}). \end{aligned}$$
Associativity follows in a similar way:
$$\displaystyle \begin{aligned} \left( ({\mathbf{u}}_1,\mathbf{v})+({\mathbf{u}}_2,\mathbf{v}) \right) + ({\mathbf{u}}_3,\mathbf{v}) &= ({\mathbf{u}}_1+{\mathbf{u}}_2,\mathbf{v})+({\mathbf{u}}_3,\mathbf{v}) \\ & = \big(({\mathbf{u}}_1+{\mathbf{u}}_2)+{\mathbf{u}}_3,\mathbf{v}\big) =\big ({\mathbf{u}}_1+({\mathbf{u}}_2+{\mathbf{u}}_3),\mathbf{v}\big) \\ &= ({\mathbf{u}}_1,\mathbf{v})+({\mathbf{u}}_2+{\mathbf{u}}_3,\mathbf{v}) \\ & = ({\mathbf{u}}_1,\mathbf{v})+\left( ({\mathbf{u}}_2,\mathbf{v})+({\mathbf{u}}_3,\mathbf{v}) \right). \end{aligned} $$
Now, for any \(\lambda \in \mathbb {R}\) we see that (using throughout that U and V  are vector spaces)
$$\displaystyle \begin{aligned} \lambda\left( ({\mathbf{u}}_1,\mathbf{v})+({\mathbf{u}}_2,\mathbf{v}) \right) &= \lambda({\mathbf{u}}_1+{\mathbf{u}}_2,\mathbf{v}) = (\lambda({\mathbf{u}}_1+{\mathbf{u}}_2),\mathbf{v}) \\ & = (\lambda{\mathbf{u}}_1+\lambda{\mathbf{u}}_2,\mathbf{v}) = (\lambda{\mathbf{u}}_1,\mathbf{v})+(\lambda{\mathbf{u}}_2,\mathbf{v}) \\ & = \lambda({\mathbf{u}}_1,\mathbf{v}) + \lambda({\mathbf{u}}_2,\mathbf{v}). \end{aligned} $$
Additive inverses are given by \(-(\mathbf {u},\mathbf {v})=(-\mathbf {u},\mathbf {v})=(\mathbf {u},-\mathbf {v})\).
Solution 4.3 (To Exercise 2.9)
(a)
Let \(U,V\) be vector spaces. On the set \(U\times V\) define the following bilinearity operations: for \(\mathbf {u}, {\mathbf {u}}_1, {\mathbf {u}}_2\in U\), \(\mathbf {v}, {\mathbf {v}}_1, {\mathbf {v}}_2\in V\) and \(\lambda \in \mathbb {R}\)
$$\displaystyle \begin{aligned} ({\mathbf{u}}_1 \otimes \mathbf{v}) + ({\mathbf{u}}_2 \otimes \mathbf{v}) &:= ({\mathbf{u}}_1 + {\mathbf{u}}_2) \otimes \mathbf{v}, \\ (\mathbf{u} \otimes {\mathbf{v}}_1) + (\mathbf{u} \otimes {\mathbf{v}}_2) &:= \mathbf{u} \otimes ({\mathbf{v}}_1 + {\mathbf{v}}_2) \ \text{ and } \ \lambda(\mathbf{u} \otimes \mathbf{v}) := (\lambda \mathbf{u}) \otimes \mathbf{v} =: \mathbf{u} \otimes (\lambda \mathbf{v}). \end{aligned} $$
 
(b)
Set \(\mathbf {u}\ {:}{-}\ ({\mathbf {u}}_1+{\mathbf {u}}_2)\). It is definitely not necessary to write \(\mathbf {u}\ {:}{-}\ ({\mathbf {u}}_1+{\mathbf {u}}_2)\), but it may help to see how the axioms from (a) can be applied. We then have,
$$\displaystyle \begin{aligned} ({\mathbf{u}}_1+{\mathbf{u}}_2)\otimes({\mathbf{v}}_1+{\mathbf{v}}_2) &= \mathbf{u}\otimes ({\mathbf{v}}_1+{\mathbf{v}}_2) =\mathbf{u}\otimes {\mathbf{v}}_1 + \mathbf{u}\otimes {\mathbf{v}}_2 \\ & =({\mathbf{u}}_1+{\mathbf{u}}_2)\otimes {\mathbf{v}}_1 + ({\mathbf{u}}_1+{\mathbf{u}}_2)\otimes {\mathbf{v}}_2 \\ & ={\mathbf{u}}_1\otimes {\mathbf{v}}_1+{\mathbf{u}}_2\otimes {\mathbf{v}}_1 + {\mathbf{u}}_1\otimes {\mathbf{v}}_1+{\mathbf{u}}_2\otimes {\mathbf{v}}_2. \end{aligned} $$
 
(c)
We have \(\lambda \mathbf {u}\otimes \lambda \mathbf {v} =\lambda ^2 (\mathbf {u}\otimes \mathbf {v})\). Compare this with the linear scaling (2) in Definition 2.1.
 
Solution 4.4 (To Exercise 2.10)
It suffices to check that for any \(\mathbf {u}\in U\) and any elementary tensor \(\mathbf {u}'\otimes \mathbf {v}'\in U\otimes V\) it holds that \(\mathbf {u}\otimes 0_V + \mathbf {u}'\otimes \mathbf {v}'=\mathbf {u}'\otimes \mathbf {v}'\).
Indeed, since we can write \(0_V=\mathbf {v}'-\mathbf {v}'\) it follows that
$$\displaystyle \begin{aligned} \mathbf{u}\otimes 0_V + \mathbf{u}'\otimes\mathbf{v}' &= \mathbf{u}\otimes(\mathbf{v}'-\mathbf{v}')+\mathbf{u}'\otimes\mathbf{v}' \\ &= \mathbf{u}\otimes\mathbf{v}'+\mathbf{u}\otimes(-\mathbf{v}')+\mathbf{u'}\otimes\mathbf{v}' = \mathbf{u}'\otimes\mathbf{v}' \end{aligned} $$
where in the last equality we have used that \(\mathbf {u}\otimes (-\mathbf {v})\) is the additive inverse of \(\mathbf {u}\otimes \mathbf {v}\).
The check for \(0_U\otimes \mathbf {v}\) can be done in a similar way.
Solution 4.5 (To Exercise 2.14)
We must find a bijective linear function \(\Psi :\mathbb {R}\otimes U\to U\). It suffices to define \(\Psi (\lambda \otimes \mathbf {u})=\lambda \mathbf {u}\). Linearity follows from the fact that the right-hand side is bilinear in \((\lambda ,\mathbf {u})\) and the properties of the tensor product. Injectivity is immediate since \(\lambda \otimes \mathbf {u}\in \operatorname {ker}\Psi \) if and only if \(\Psi (\lambda \otimes \mathbf {u})=\lambda \mathbf {u}=0_U\), which in turn implies that either \(\lambda =0\) or \(\mathbf {u}=0_U\) by the axioms of vector spaces, and in both cases this means that \(\lambda \otimes \mathbf {u}=0_{\mathbb {R}\otimes U}\). Therefore, it follows that \(\operatorname {ker}\Psi =\{0\}\), i.e., \(\Psi \) is injective.
Bijectivity can be shown by noting that every \(\mathbf {u} \in U\) can be obtained as \(\mathbf {u} = \Psi (1 \otimes \mathbf {u})\) (which in particular implies that the inverse map is \(\Psi ^{-1}(\mathbf {u})=1\otimes \mathbf {u}\)), or by noting that since \(\operatorname {dim}(\mathbb {R}\otimes U)=\operatorname {dim}(\mathbb {R})\cdot \operatorname {dim}(U)=\operatorname {dim}(U)\), by the rank-nullity theorem it follows that
$$\displaystyle \begin{aligned} \operatorname{dim}\big(\operatorname{im}(\Psi)\big) =\operatorname{dim}\big(\mathbb{R}\otimes U\big)-\operatorname{dim}\big(\operatorname{ker}(\Psi)\big) =\operatorname{dim}(U) \qquad \text{so that} \quad \operatorname{im}(\Psi)=U. \end{aligned}$$
Solution 4.6 (To Exercise 3.10)
The unit element in \(T^2(V)\) is \(\mathbf {1}=(1,0,0)\). From the product formula in Example 3.9 we see that the entries of the inverse element \((a',\mathbf {x}',\mathbf {A}'):=(a,\mathbf {x},\mathbf {A})^{-1}\) must satisfy
$$\displaystyle \begin{aligned} aa' = 1,\quad a\mathbf{x}'+a'\mathbf{x} = 0 \quad \text{and}\quad \mathbf{A}+\mathbf{A}'+\mathbf{x}\otimes \mathbf{x}' = 0. \end{aligned}$$
The first equation is solvable if and only if \(a\neq 0\), in which case \(a'=a^{-1}\). Inserting this in the second equation it follows that \(\mathbf {x}' = -\tfrac {1}{a^2}\mathbf {x}.\) Lastly, from the third equation we see that \(\mathbf {A}' = -\tfrac {1}{a^2}\mathbf {A} + \tfrac {1}{a^3}\mathbf {x}\otimes \mathbf {x}.\) Hence, in \(T^2(V)\), we have that
$$\displaystyle \begin{aligned} (a,\mathbf{x},\mathbf{A})^{-1} = (a^{-1},-a^{-2}\mathbf{x}, - a^{-2}\mathbf{A} + a^{-3}\mathbf{x}\otimes \mathbf{x}). \end{aligned}$$
This can also be seen from the more general formula \({\mathbf {A}}^{-1} = \sum _{n\ge 0}(\mathbf {1}-\mathbf {A})^{\otimes n}\).

Acknowledgements

We thank Ana Djurdjevac (FU-Berlin) for the helpful comments.
NT acknowledges support from DFG CRC/TRR 388 “Rough Analysis, Stochastic Dynamics and Related Fields”, Project B01.
GdR acknowledges partial support by the Engineering and Physical Sciences Research Council (EPSRC) [grant number EP/R511687/1], from the FCT—Fundação para a Ciência e a Tecnologia, I.P., under the scope of the projects UIDB/00297/2020 and UIDP/00297/2020 (Center for Mathematics and Applications, NOVA Math), and by the UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding Guarantee [Project UKRI343].
Note The arXiv version of this manuscript—https://arxiv.org/abs/2502.15703 —contains two extra sections discussing the factoring of tensor product expressions to a minimal number of terms. These sections are not needed for the theory presented in this book, but tensor rank and tensor factorizations are an elegant way of becoming familiar with the language of tensors and tensor products that are used throughout the book. A GitHub repository is attached to the arXiv version.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Title
An Introduction to Tensors for Path Signatures
Authors
Jack Beda
Gonçalo dos Reis
Nikolas Tapia
Copyright Year
2026
DOI
https://doi.org/10.1007/978-3-031-97239-3_2
1
Find the survey here: https://cims.nyu.edu/~tjl8195/survey/results.html#q45, at the time this manuscript was written.
 
2
For instance, there is no open mapping theorem for bilinear surjective maps, nor is there a Hahn-Banach theorem for bilinear continuous forms.
 
3
Formally, there is a canonical isomorphism between \(U\oplus V\) and \(V\oplus U\), so that the pairs \((\mathbf {u},\mathbf {v})\) and \((\mathbf {v},\mathbf {u})\) can be identified.
 
4
This last element of the definition is actually imposing the vectors \((\lambda \mathbf {u},\mathbf {v})\) and \((\mathbf {u},\lambda \mathbf {v})\) to be equal in \(U\otimes V\). This could be formalized by the use of quotient spaces, but doing such would involve a level of additional complexity not really necessary at the moment. (This also takes care of the apparent non-uniqueness of \(0_{U\otimes V}\) hinted at in Exercise 2.10.)
 
5
As an aside, it is this property of tensor product spaces that mathematically captures the phenomenon of entanglement of quantum particles. A quantum state like \(|\varphi \rangle = \begin {bmatrix} 1\\ 0 \end {bmatrix} \otimes \begin {bmatrix} 0\\ 1 \end {bmatrix} + \begin {bmatrix} 0\\ 1 \end {bmatrix}\otimes \begin {bmatrix} 1\\ 0 \end {bmatrix}\) is entangled as it cannot be written as \(\begin {bmatrix} a\\ b \end {bmatrix}\otimes \begin {bmatrix} c\\ d \end {bmatrix}\) for any scalars \(a,b,c,d\). This entangled state says “I have two particles, and their spins are always opposite, but I cannot know which one is spin-up, and which one is spin-down”.
 
6
Recall \(\operatorname {dim}(U\oplus V) = \operatorname {dim}(U) + \operatorname {dim}(V)\), whereas \(\operatorname {dim}(U\otimes V) = \operatorname {dim}(U) \cdot \operatorname {dim}(V)\).
 
7
Infix notation is a way of writing mathematical (and logic) expressions where operators are placed between the operands they act upon. This is the most familiar notation to us humans and matches how we (humans) interpret math expressions.
 
8
Technically speaking \(T^N(V)\) is a quotient of \(T(V)\) by a two-sided ideal.
 
9
Note that we are tacitly identifying real valued 2 tensors (of shape \((2,2)\)) with real valued 2-by-2 matrices—looking back at Remark 2.16 and Examples 2.17 and 2.19, we have implicitly made the assumption of working with the canonical basis of \(\mathbb {R}^2\).
 
1.
go back to reference J. Diestel, J.H. Fourie, J. Swart, The Metric Theory of Tensor Products (American Mathematical Society, Providence, RI, 2008). Grothendieck’s résumé revisited
2.
go back to reference W. Hackbusch, Tensor spaces and numerical tensor calculus, in Springer Series in Computational Mathematics, vol. 56, 2nd edn. (Springer, Cham, 2019)
3.
go back to reference S. Rabanser, O. Shchur, S. Günnemann, Introduction to tensor decompositions and their applications in machine learning. arXiv:1711.10781 (2017)
4.
go back to reference M. Reed, B. Simon, Methods of Modern Mathematical Physics I, 2nd edn. (Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New York, 1980). Functional analysis
    Image Credits
    Salesforce.com Germany GmbH/© Salesforce.com Germany GmbH, IDW Verlag GmbH/© IDW Verlag GmbH, Diebold Nixdorf/© Diebold Nixdorf, Ratiodata SE/© Ratiodata SE, msg for banking ag/© msg for banking ag, C.H. Beck oHG/© C.H. Beck oHG, OneTrust GmbH/© OneTrust GmbH, Governikus GmbH & Co. KG/© Governikus GmbH & Co. KG, Horn & Company GmbH/© Horn & Company GmbH, EURO Kartensysteme GmbH/© EURO Kartensysteme GmbH, Jabatix S.A./© Jabatix S.A.