Feedback shift register (FSR) sequences are used for many cryptographic applications such as pseudorandom number generators for stream cipher cryptosystems, see [
10]. A binary
n-stage feedback shift register (FSR) is a mapping
\(\mathfrak {F}\) from
\(\mathbb {F}_{2}^{n}\) to
\(\mathbb {F}_{2}^{n}\) of the form
$$ \mathfrak{F}: (x_{0},x_{1},\ldots,x_{n-1}) \mapsto (x_{1},x_{2},\ldots,x_{n-1},f(x_{0},x_{1},\ldots,x_{n-1})), $$
where
f is a function from
\(\mathbb {F}_{2}^{n}\) to
\(\mathbb {F}_{2}\). Consider the binary sequence
\(\mathcal {S}=(s_{i})_{i\geq 0}\) whose first
n terms are given and the remaining terms are uniquely determined by the recurrence relation
$$ s_{i+n}=f(s_{i},\ldots,s_{i+n-1}), \quad i\geq 0. $$
The sequence
\(\mathcal {S}\) is called the output sequence of the FSR. An output sequence of a short FSR is considered weak in cryptographic applications. In order to determine this shortness for an infinite sequence
\(\mathcal {S}\), Jansen [
12,
13] introduced the notion of
N th
maximum order complexity of \(\mathcal {S}\), denoted as
\(M(\mathcal {S},N)\), which is the length of the shortest FSR that generates the first
N elements of
\(\mathcal {S}\). If the mapping
\(\mathfrak {F}\) is a linear transformation, then the FSR is called a linear feedback shift register (LFSR). This leads to the notion of the
linear complexity profile of
\(\mathcal {S}\) denoted as
\(L(\mathcal {S},N)\). We have obviously
\(M(\mathcal {S},N)\leq L(\mathcal {S},N)\). The sequence
\((M(\mathcal {S},N))_{N}\) is called
the maximum order complexity profile of
\(\mathcal {S}\) and
\(M(\mathcal {S})=\sup _{N\geq 1} M(\mathcal {S},N)\) is called
the maximum order complexity of
\(\mathcal {S}\).
An explicit construction of sequences with perfect linear complexity profile is given in [
1]. These are known under the name of apwenian sequences (see [
17]). Diem [
7] observed that these sequences and sequences based on function expansion into expansion series can be efficiently computed from relatively short sequences. This leads to the notion of
expansion complexity, denoted as
\(E(\mathcal {S},N)\), see Definition 2 below for more details. It was proved by Mérai, Niederreiter and Winterhof [
16] that the expansion complexity and the linear complexity of an infinite sequence satisfy
\(E(\mathcal {S},N)\leq L(\mathcal {S},N)+1\).
A natural question is to ask about the mutual relationship between the maximum order complexity and the expansion complexity. It is known that these two complexities have expected behavior
\(\log N\) for the maximum order complexity, see [
12,
20], and
\(\sqrt {N}\) for the expansion complexity, see [
17, Theorem 6.2] and [
16]. This suggests that maximum order complexity is smaller than the expansion complexity. However, the Thue–Morse sequence has a large maximum order complexity and a bounded expansion complexity. Thus in order to determine the unpredictability of a sequence, both the expansion complexity and the maximum order complexity are useful and needed.
In recent years, research focused in particular on automatic sequences, i.e. sequences that are generated by a deterministic finite automaton. These sequences are not suitable for cryptographic applications as we will state later. Nevertheless, their pseudorandom behavior changes radically when the sequence is rarefied along special subsequences such as polynomial subsequences. One can extend the definition of automatic to morphic sequences which is an again larger class.
1.1 Measures of complexity
Let us first introduce some measures of complexity that we will focus on in the present paper.
A sequence with small maximum order complexity is not suitable for cryptographic applications. Indeed, such a sequence can be built with relatively short blocks of consecutive terms. However, a sequence with large maximum order complexity can still be predictable, as it is known for the Thue–Morse sequence, see [
15,
28]. Note also that the largest possible order of magnitude of
\(M(\mathcal {S},N)\) is
N and the expected value of
\(M(\mathcal {S},N)\) is
\(\log N\), see [
12,
20].
The maximum order complexity has been studied by several authors, for general results see [
11‐
13,
20,
31] or [
21,
27‐
29] for applications to some particular sequences such as the Thue–Morse sequence or the Rudin–Shapiro sequence. From a computational perspective, Jansen [
12, Proposition 3.17] showed how Blumer’s DAWG (Direct Acyclic Weighted Graph) algorithm [
3] can be used to compute the maximum order complexity in linear time and memory. In the last section of this paper, we use DAWG algorithm to formulate some conjectures (Section
3.2).
Diem [
7] introduced the expansion complexity of a sequence as follows.
A truly random sequence has expected value
\(E(\mathcal {S},N)\) of order
N1/2, see [
17].
Binary, and more generally,
p-ary automatic sequences are not good pseudorandom sequences since their expansion complexity satisfies
\(E(\mathcal {S},N)<+\infty \) by Christol’s theorem, see [
6]. In fact, automatic sequences generated by a finite automaton do not have sufficiently many different factors of a fixed length.
The subword complexity, denoted by
\(p_{\mathcal {S}}(k)\), counts how many different subwords of fixed length
k the sequence contains. It can again serve as a measure of pseudorandomness. A sequence
\(\mathcal {S}\) over {0,1} is
normal if for every
k ≥ 1 and for any block of length (
b0,…,
bk− 1) ∈{0,1}
k, we have
$$ \underset{N \to \infty }{\lim} \frac{1}{N} \text{Card}\{i<N: s_{i}=b_{0},\ldots,s_{i+k-1}=b_{k-1} \}=\frac{1}{2^{k}}. $$
One can associate to a binary sequence a
symbolic dynamical system based on the shift on
\(\{0,1\}^{\mathbb {N}}\). The
topological entropy of such a dynamical system is
\(\lim _{k\rightarrow +\infty } \frac {\log _{2} p_{\mathcal {S}}(k)}{k}\), where
\(\log _{2}\) denotes the base 2-logarithm. A sequence with 0 topological entropy is said to be
deterministic and it should not provide a pseudo-random sequence since such a sequence has a small subword complexity, see [
23].
1.2 Automatic and morphic sequences
Let k ≥ 2 and Σ be a finite alphabet. We denote by Σ∗ the set of all finite or infinite words over Σ. A morphism f is a mapping \(f:{{{\varSigma }}}^{*} \rightarrow {{{\varSigma }}}^{*}\) satisfying f(uv) = f(u)f(v) for all words u and v. A morphism is said to be k-uniform if |f(x)| = k for all x ∈Σ, where |x| is the length of the word x. If there is b ∈Σ such that f(b) starts with b, i.e. f(b) = bu for some u ∈Σ∗, we say that f is b-prolongable. In this case, let us denote fω(b) the fixed point of f given by fω(b) = bf(u)f2(u)…
For equivalent definitions of automatic sequences, we refer to [
2]. An emblematic example of automatic sequence is the Thue–Morse sequence, or Prouhet–Thue–Morse sequence, independently introduced by Thue [
30], Morse [
18] and Prouhet sixty years before [
22]. This sequence appears in many different fields of mathematics, see [
14], and is still an important object of recent work.
The Thue–Morse sequence is 2-automatic and the corresponding morphism is
$$ \begin{array}{@{}rcl@{}} f:\left\{ \begin{array}{lll} 0 \mapsto 01 \\ 1 \mapsto 10. \end{array} \right. \end{array} $$
The Thue–Morse sequence has a large maximum order complexity, see [
28], but this sequence is far from being pseudorandom. Indeed, it is well-known that
\(E(\mathcal {T},N)\leq 5\) for all
N since
h(
x,
y) = (
x + 1)
3y2 + (
x + 1)
2y +
x satisfies
\(h(x,G_{\mathcal {T}}(x))=0\) where
\(G_{\mathcal {T}}(x)\) is the generating function of
\(\mathcal {T}\), see [
2, Example 12.1.12]. Also, its subword complexity is small,
\(p_{\mathcal {T}}(k)\ll k\) (see [
2, Corollary 10.3.2] for a general result for all automatic sequences).
The behavior of this sequence regarding the defined pseudorandomness measures changes when this sequence is rarefied along specific subsequences. Drmota, Mauduit et Rivat [
8] showed that the Thue–Morse sequence along squares is normal and Moshe [
19] showed that the polynomial subsequences of degree
d ≥ 2 of the Thue–Morse sequence have an exponential subword complexity. Sun and Winterhof [
27] and Popoli [
21] showed that the maximum order complexity of the Thue–Morse sequence along polynomial subsequences remains comparatively large. A large maximum order complexity is desired, however, it should be noted that it should be not too large since the correlation measure of order 2 gets large in that case (see [
12, Proposition 3.1]).
Furthermore, these polynomial subsequences are no longer automatic sequences, see [
2, Theorem 6.10.1], and their expansion complexity is no longer bounded by Christol’s theorem. These statements mean that the polynomial subsequences of the Thue–Morse sequence are better candidates for cryptographic applications than the original sequence.
In order to generalize the investigation to other morphic sequences and sum of digits function, let us introduce the Zeckendorf base sum of digits function, which is related to the Fibonacci sequence (see [
2, Theorem 3.8.1], [
32]).
Since \(F_{n}=\lfloor \frac {\varphi ^{n}}{\sqrt {5}}+\frac {1}{2} \rfloor \) for n ≥ 0, the index n(a) of the largest Fibonacci number that is not greater than an integer a > 1 is \(\lfloor \frac {\log (a \sqrt {5}+1/2)}{\log (\varphi )} \rfloor \). We call n(a) − 1 the length of a.
Thus we can define the sum of digits function in Zeckendorf base by
\(s_{Z}(n)={\sum }_{i \geq 0} \varepsilon _{i}(n)\), see [
9,
24,
26] or sequence A007895 in
On-Line Encyclopedia of Integer Sequences (OEIS) for discussions on this sequence. We denote by
\(\mathcal {S}_{Z}\) the sequence defined by
\(\mathcal {S}_{Z}=\left (s_{Z}(n)\mod 2\right )_{n\geq 0}\), an analog of the Thue–Morse sequence. It is known that
\(\mathcal {S}_{Z}\) is a morphic sequence, see for example [
4, p.14] or [
2, Examples 7.8.2, 7.8.4]. An example of the corresponding morphism and coding is
$$f:\left\{ \begin{array}{lll} a \mapsto ab \\ b \mapsto c \\ c \mapsto cd \\ d \mapsto a \end{array} \right. \quad \text{and} \quad g:\left\{ \begin{array}{lll} a \mapsto 0 \\ b \mapsto 1 \\ c \mapsto 1 \\ d \mapsto 0. \end{array} \right. $$
Thus, the sequence modulo 2 is obtained by iterating
f on the letter
a and recoding with
g :
\(\mathcal {S}_{Z}=\texttt {0}\texttt {1}\texttt {1}\texttt {1}\texttt {0}\texttt {1}\texttt {0}\texttt {0}\texttt {1}\texttt {0}\texttt {0}\texttt {0}\texttt {1}\texttt {1}\texttt {0}\texttt {0}\texttt {0}\texttt {1}\texttt {0}\texttt {1}\texttt {1}\ldots \) This sequence is not automatic, see [
9, Remark 1]. Indeed the authors show that the
k-kernel is infinite for every
k, see [
2] for more details on equivalent definitions of automatic sequences. This implies that the expansion complexity of
\(\mathcal {S}_{Z}\) is not bounded, since this sequence is not automatic, and motivates our study of this sequence. In fact, morphic non-automatic sequences are better candidates for cryptographic applications regarding this complexity. However morphic sequences have subword complexity at most bounded by a quadratic function, see [
2, Corollary 10.4.9], and their associated dynamical system has 0 topological entropy, see [
23, Corollary V.20] for an alternative proof.
As the sum of digits function for the usual
q-base,
sZ is a subadditive function, i.e. for any integers
n1,
n2 ≥ 0 we have
sZ(
n1 +
n2) ≤
sZ(
n1) +
sZ(
n2), see [
26, Proposition 1]. Consider the Zeckendorf expansion of
n1 and
n2,
$$ \begin{array}{@{}rcl@{}} n_{i}=\sum \limits_{j=k_{i}}^{K_{i}}\varepsilon_{j}^{(i)}F_{j+2}, \qquad i=1,2; \end{array} $$
with
\(\varepsilon _{k_{i}}^{(i)}=\varepsilon _{K_{i}}^{(i)}=1\). Without loss of generality suppose that
k1 ≤
k2. We say that
n1 and
n2 are
non-interfering if
k2 −
K1 ≥ 2. This means that non-interfering integers have digital blocks that do not overlap and no normalization after the addition has to be executed. We therefore have
$$ s_{Z}(n_{1}+n_{2})=s_{Z}(n_{1})+s_{Z}(n_{2}). $$
(1)
1.3 Main results
In the following we will determine a lower bound for the maximum order complexity of \(\mathcal {S}_{Z}=(s_{Z}(n) \mod 2)_{n}\) and for \(\mathcal {S}_{Z,P}=(s_{Z}(P(n)) \mod 2)_{n}\) with \(P \in \mathbb {Z}[X]\) a monic polynomial such that \(P(\mathbb {N}_{0}) \subset \mathbb {N}_{0}\). In what follows, we use Vinogradov’s notation f ≪ g is defined as |f|≤ c|g| for some c > 0.
Since the correlation measure is bounded below by the maximum order complexity (see [
17]), Theorem 1.1 shows that
\(\mathcal {S}_{Z}\) is
non-random in this specific respect. We mention also the recent result of Shutov [
25] that shows that the autocorrelation of order 2 with lag equal to 1 of
\((-1)^{s_{Z}(n)}\) is large. This serves as motivation to look at subsequences, that we address in following result.
We will discuss the heuristics that supports the real growth of the maximum order complexity in the final part of the paper (Section
3.2). In particular, we conjecture that the lower bound is indeed the actual growth of the maximal order complexity.
The rest of the paper is structured as follows. In Section 2, we prove Theorems 1.1 and 1.2 where we first prove the linear case and the particular case
P(
X) =
Xd for
d ≥ 2 before tackling the general case. We conclude the paper with some final remarks (Section
3) about possible generalizations of this result to other numerations systems and some heuristically supported conjectures.