Dieses Kapitel stellt Stratified Type Theory (StraTT) vor, einen bahnbrechenden Ansatz für Typuniversen, der von den traditionellen hierarchiebasierten Systemen abweicht. Anstatt Universen zu schichten, stratifiziert StraTT, inspiriert von Leivants Stratifiziertem System F, die Urteile selbst nach Ebenen. Dieses Design zielt darauf ab, die Wiederverwendbarkeit ohne die Komplexität des Ebenenpolymorphismus zu verbessern, was die Programmierung und das Debugging in Proof-Assistenten häufig erschwert. Das Kapitel stellt ein Subsystem, subStraTT, vor, das geschichtete abhängige Funktionen und Verschiebungen aufweist, und erweitert es auf das vollständige StraTT mit schwebenden unabhängigen Funktionen. Die logische Konsistenz des subStraTT wird in Agda nachgewiesen, während die Typensicherheit des vollständigen StraTT in Coq mechanisiert wird. Das Kapitel beinhaltet auch eine prototypische Implementierung eines Typenprüfers, der die Praktikabilität von StraTT in der abhängig typisierten Programmierung demonstriert. Anhand detaillierter Beispiele und metatheoretischer Beweise untersucht das Kapitel die Expressivität und Konsistenz von StraTT und bietet einen umfassenden Überblick über sein Potenzial, Typentheorie und formale Verifikation zu revolutionieren.
KI-Generiert
Diese Zusammenfassung des Fachinhalts wurde mit Hilfe von KI generiert.
Abstract
A hierarchy of type universes is a rudimentary ingredient in the type theories of many proof assistants to prevent the logical inconsistency resulting from combining dependent functions and the type-in-type axiom. In this work, we argue that a universe hierarchy is not the only option for universes in type theory. Taking inspiration from Leivant’s Stratified System F, we introduce Stratified Type Theory (StraTT), where rather than stratifying universes by levels, we stratify typing judgements and restrict the domain of dependent functions to strictly lower levels. Even with type-in-type, this restriction suffices to enforce consistency.
In StraTT, we consider a number of extensions beyond just stratified dependent functions. First, the subsystem subStraTT employs McBride’s crude-but-effective stratification (also known as displacement) as a simple form of level polymorphism where global definitions with concrete levels can be displaced uniformly to any higher level. Second, to recover some expressivity lost due to the restriction on dependent function domains, the full StraTT includes a separate nondependent function type with a floating domain whose level matches that of the overall function type. Finally, we have implemented a prototype type checker for StraTT extended with datatypes and inference for level and displacement annotations, along with a small core library.
We have proven subStraTT to be consistent and StraTT to be type safe, but consistency of the full StraTT remains an open problem, largely due to the interaction between floating functions and cumulativity of judgements. Nevertheless, we believe StraTT to be consistent, and as evidence have verified the ill-typedness of some well-known type-theoretic paradoxes using our implementation.
1 Introduction
Every term in a dependent type theory has a type, including types such as \(\textsf{Nat}\). Types are classified by the type universes to which they belong, and as type universes are themselves types, they must each belong to some type universe. In Martin-Löf Type Theory [28], these universes form a hierarchy: universe \( \star _k\) has type \( \star _{k+1}\) thus preventing any universe from classifying itself. Otherwise, the system would be inconsistent.
Many contemporary proof assistants, such as Coq [10], Agda [33], Lean [32], F* [38], and Arend [9], include universe hierarchies. To make these systems easier to use, they often automatically infer the levels of each universe, so programmers can write, for instance,
instead of
. They also include forms of level polymorphism, so that definitions can be reused at multiple universe levels. However, supporting such generality means that the proof assistant must handle level variable constraints, level expressions, or both. As a result, programming with and especially debugging errors involving universe levels can be painful.
Anzeige
Still not sure this is the right question either. So we ask: can type universes and reusability coexist without resorting to level polymorphism?
In this work, we design Stratified Type Theory (StraTT), a new approach for type universes, and evaluate mechanisms for reusability that don’t include level polymorphism. The key idea of our design is that we do not stratify universes into a hierarchy; instead, we stratify typing judgements themselves by levels. This approach is inspired by Leivant’s Stratified System F [23], a predicative variant of System F [16, 34].
Consider the formation rule F-poly for System F’s type polymorphism in Figure 1. The quantification is said to be impredicative because it quantifies over all types including itself. In contrast, the formation rule SF-poly for Stratified System F disallows impredicativity by restricting polymorphic quantification to only types that are well formed at strictly lower stratification levels. The type well-formedness judgement tracks the stratification level with an index \( k \).
Fig. 1.
Select rules from (Stratified) System F
To extend stratified polymorphism to dependent types, there are two ways to read this judgement form. We could interpret
as a type A living in some stratified type universe \( \star _k\), which would correspond to a usual predicative type theory. Alternatively, we could continue to interpret the level k as a property of the judgement and annotate the dependent typing judgement form as \( \Gamma \vdash a :^{ k } A \). Analogously to stratified polymorphic types \(\forall x^j \mathpunct {.} B\), we introduce stratified dependent function types \( \Pi x \!:^{ j }\! A \mathpunct {.} B \). They similarly quantify over arguments at the annotated level \( j \), which must be strictly lower than the overall level of the type. This allows us to remove the level annotation from universes, so we have \( \Gamma \vdash \star :^{ k } \star \) for any \( k \).
Anzeige
Moving levels off of universes and onto judgements and function domains opens up the opportunity to really take advantage of McBride’s crude-but-effective stratification [30]. Following Favonia, Angiuli, and Mullanix [18], we refer to this as displacement to prevent confusion. Given some signature \(\Delta \) of global definitions, we are permitted to use any definition with all of its concrete levels uniformly displaced upwards. Displacement is less effective than level polymorphism in MLTT for types that involve multiple universes, such as \( \star _0 \rightarrow \star _3\), since we’d still be stuck with the relative difference of \(3\) between the two universes. With stratified functions, this type would look like \( \Pi X \!:^{ 0 }\! \star \mathpunct {.} \star \), with only a single level annotation to displace.
However, we find that even with displacement, stratifying all function types is too restrictive and rules out terms that are otherwise typeable in MLTT even without level polymorphism. Going back to Stratified System F, we observe that with respect to the levels, ordinary function types are more flexible than polymorphic function types. Their formation rule SF-fun in Figure 1 allows the level of the domain type to be equal to the overall level of the function type. It is this flexibility we’re missing that would recover some lost expressivity, so we add an analogous separate function type that is nondependent but has no fixed domain level. If the overall level of the nondependent function type is raised, we say that the level of the domain floats to the same level.
We divide our design into two parts. The subsystem subStraTT features only stratified dependent functions and displacement, and the full system StraTT adds floating nondependent functions. We have proven in Agda the logical consistency of the former. Even with type-in-type, the stratification restriction on the domains of dependent functions prevents the kind of self-referential trickery that is needed for the usual paradoxes.
We conjecture, but have not proven, the consistency of the full StraTT. Floating functions permit covariant behaviour of the domain with respect to levels, and our existing Agda proof doesn’t extend to this new feature. That doesn’t mean that the system is inconsistent: it may be sufficiently different from usual predicative type theories to require an entirely different approach or an alternative foundation outside of Agda. Indeed, our experience with the system provides evidence that consistency does hold. We have found it impossible to use StraTT to encode some well-known type-theoretic paradoxes. We also have verified its syntactic metatheory, giving us further insight into its design.
The contributions of our paper are as follows:
A subsystem subStraTT, featuring only stratified dependent functions and displacement, which is then extended to the full StraTT with floating nondependent functions. \(\hookrightarrow \) Section 2
Examples to demonstrate the expressivity of StraTT and especially to motivate floating functions. \(\hookrightarrow \) Section 3
Two major metatheorems: logical consistency for subStraTT, which is mechanized in Agda, and type safety for StraTT, which is mechanized in Coq. Consistency for the full StraTT remains an open problem. \(\hookrightarrow \) Section 4
A prototype implementation of a type checker, which extends StraTT to include datatypes to demonstrate the effectiveness of stratification and displacement in practical dependently-typed programming. \(\hookrightarrow \) Section 5
We discuss potential avenues for proving consistency of the full StraTT and compare the useability of its design to existing proof assistants in terms of working with universe levels in Section 6 and conclude in Section 7. Our Agda and Coq mechanizations along with the prototype implementation are available at https://github.com/plclub/StraTT, which is also archived as a paper artifact [8]. Where lemmas and theorems are first introduced, we include a footnote indicating the corresponding source file and lemma name in the development.
2 Stratified Type Theory
In this section, we present Stratified Type Theory in two parts. First is the subsystem subStraTT, which contains the two core features of stratified dependent function types and global definitions with level displacement. We then extend it to the full StraTT by adding floating nondependent function types. As the system is fairly small with few parts, we delay illustrative examples to Section 3, and begin with the formal description.
2.1 The subsystem subStraTT
The subsystem subStraTT is a cumulative, extrinsic type theory with types à la Russell, a single type universe, dependent functions, an empty type, and global definitions. The most significant difference between subStraTT and other type theories with these features is the annotation of the typing judgement with a level in place of universes in a hierarchy. We use the naturals and their usual strict order and addition operation for our levels, but they should be generalizable to any displacement algebra [18]. The syntax for terms, contexts \(\Gamma \), and signatures \(\Delta \) is given below, with \( x , y , z \) for variable and constant names and \( i , j , k \) for levels.
A context consists of declarations \( x :^{ k } A \) of variables \( x \) of type \( A \) at level \( k \); variables represent locations where an entire typing derivation may be substituted into the term, so they also need level annotations. A signature consists of global definitions \( x :^{ k } A :== a \) of constants \( x \) of type \( A \) definitionally equal to \( a \) at level \( k \); they represent complete typing derivations that will eventually be substituted into the term. The typing judgement
, whose derivation rules are given in Figure 2, states that the term \( a \) is well typed at level \( k \) with type \( A \) under the context \(\Gamma \) and signature \(\Delta \).
Fig. 2.
Typing rules (subStraTT)
Because stratified judgements replace stratified universes, the type of the type universe \( \star \) is itself at any level in rule DT-Type. Stratification is enforced in dependent function types in rule DT-Pi: the domain type must be well typed at a strictly smaller level relative to the codomain type and the overall function type. Similarly, in rule DT-AbsTy, the body of a dependent function is well typed when its argument and its type are well typed at a strictly smaller level, and by rule DT-AppTy, a dependent function can only be applied to an argument at the strictly smaller domain level.
Remark 1
The level annotation on dependent function types is necessary for consistency. Informally, suppose we have some unannotated type \( \Pi X \!:\! \star \mathpunct {.} B \) and a function of this type, both at level 1. By cumulativity, we can raise the level of the function to 2, then apply it to its own type \( \Pi X \!:\! \star \mathpunct {.} B \). In short, impredicativity is reintroduced, and stratification defeated.
Rules DT-Bottom and DT-Absurd are the uninhabited type and its eliminator, respectively. The eliminator appears to only be able to eliminate a falsehood into the same level, but cumulativity, formally defined shortly, will permit raising the level of a falsehood, which can then be eliminated at that level.
Remark 2
More generally, the level of a well-typed term must match that of its type, which we prove later as Regularity (Lemma 9). Intuitively, the level of a typing judgement represents the level of all the subderivations (up to cumulativity) used to construct its derivation tree, which enforces predicativity at the derivation level. Since proving regularity amounts to constructing a derivation for the type out of the subderivations of the term, the level of the type could not possibly be any higher than that of the term.
In rules DT-Var and DT-Const, variables and constants at level \( j \) can be used at any larger level \( k \), which we refer to as subsumption. This permits the following cumulativity lemma, allowing entire derivations to be used at higher levels.
Lemma 1
(Cumulativity).1 If
and \( j \le k \) then \( \Delta ; \Gamma \vdash a :^{ k } A \).
Constants are further annotated with a superscript indicating how much they’re displaced by. If a constant \( x \) is defined with a type \( A \), then \( x ^{ i } \) is an element of type \( A \) but with all of its levels incremented by \( i \). The metafunction \( a ^{+ i } \) performs this increment in the term \( a \), defined recursively with \( ( \Pi x \!:^{ j }\! A \mathpunct {.} B ) ^{+ i } = \Pi x \!:^{ i + j }\! A ^{+ i } \mathpunct {.} B ^{+ i } \) and \( ( x ^{ j } ) ^{+ i } = x ^{ i + j } \). Constants come from signatures and variables from contexts, whose formation rules are given in Figure 3.
Fig. 3.
Signature and context formation rules (excerpt)
In rule DT-Conv, we use an untyped definitional equality
that is reflexive, symmetric, transitive, and congruent. The full set of rules are given in Figure 4, including \(\beta \)-equivalence for functions (rule DE-Beta) and \(\delta \)-equivalence of constants \( x \) with their definitions (rule DE-Delta). When a constant is displaced as \( x ^{ i } \), we must also increment the level annotations in their definitions by \( i \).
Fig. 4.
Definitional equality rules (subStraTT)
Given a well-typed, locally-closed term \( \Delta ; \varnothing \vdash a :^{ k } A \), the entire derivation itself can be displaced upwards by some increment \( i \). This lemma differs from cumulativity, since the level annotations in the term and its type are displaced as well, not just that of the judgement.
Lemma 2
(Displaceability (empty context)).2 If \( \Delta ; \varnothing \vdash a :^{ k } A \) then \( \Delta ; \varnothing \vdash a ^{+ i } :^{ k + i } A ^{+ i } \).
With \( x :^{ k } A :== a \) in the signature, \( x ^{ i } \) is definitionally equal to \( a ^{+ i } \), so this lemma justifies rule DT-Conv, which would give this displaced constant the type \( A ^{+ i } \) at level \( k + i \).
2.2 Floating functions
As we’ll see in the next section, subStraTT alone is insufficiently expressive, with some examples being unexpectedly untypeable and others being simply clunky to work with as a result of the strict restriction on function domains. The full StraTT system therefore extends the subsystem with a separate nondependent function type, written \( A \mathrel {\rightarrow } B \), whose domain doesn’t have the same restriction.
Fig. 5.
Typing and definitional equality rules (floating functions)
The typing rules for nondependent function types, functions, and application are given in Figure 5. The domain, codomain, and entire nondependent function type are all typed at the same level. Functions take arguments of the same level as their bodies, and are thus applied to arguments of the same level.
This distinction between stratified dependent and unstratified nondependent functions corresponds closely to Stratified System F: type polymorphism is syntactically distinct from ordinary function types, and the former forces the codomain to be a higher level while the latter doesn’t. From the perspective of Stratified System F, the dependent types of StraTT generalize stratified type polymorphism over types to include term polymorphism.
We say that the domain of these nondependent function types floats because unlike the stratified dependent function types, it isn’t fixed to some particular level. The interaction between floating functions and cumulativity is where this becomes interesting. Given a function f of type \( A \mathrel {\rightarrow } B \) at level \( j \), by cumulativity, it remains well typed with the same type at any level \( k \ge j \). The level of the domain floats up from \( j \) to match the function at \( k \), in the sense that f can be applied to an argument of type \( A \) at any greater level \( k \). This is unusual because the domain isn’t contravariant with respect to the ordering on the levels as expected, and is why, as we’ll see shortly, the proof of consistency in Section 4.1 can’t be straightforwardly extended to accommodate floating function types.
3 Examples
3.1 The identity function
In the following examples, we demonstrate why floating functions are essential. Below on the left is one way we could assign a type to the type-polymorphic identity function. For concision, we use a pattern syntax when defining global functions and place function arguments to the left of the definition. (The subscript is part of the constant name.)
Stratification enforces that the codomain of the function type and the function body have a higher level than that of the domain and the argument, so the overall identity function \( \mathsf { id_0 } \) is well typed at level 1. While \( x \) and X have level 0 in the context of the body, by subsumption we can use \( x \) at level 1 as required.
Alternatively, since the return type doesn’t depend on the second argument, we can use a floating function type instead, given above on the right. Since we still have a dependent type quantification, the function \( X \mathrel {\rightarrow } X \) is still typed at level 1. This means that \( x \) now has level 1 directly rather than through subsumption.
So far, there’s no reason to pick one over the other, so let’s look at a more involved example: applying an identity function to itself. This is possible due to cumulativity, and we’ll follow the corresponding Coq example below.
Here, since
can be assigned type
, it can be applied as the first argument to
. For the second argument, while
itself doesn’t have this type, we can \(\eta \)-expand it to a function that does, since
is a subtype of
, so
can be passed to
.
If we try to write the analogous definition in subStraTT without using floating functions, we find that it doesn’t type check! The problematic subterm is underlined in red below.
After \(\eta \)-expansion, \( \lambda X \mathpunct {.} \lambda x \mathpunct {.} id \; X \; x \) has the correct type \( \Pi X \!:^{ 0 }\! \star \mathpunct {.} \Pi x \!:^{ 0 }\! X \mathpunct {.} X \), but at level 2, the declared level of id itself. Meanwhile, the second argument of id expects an argument of that type but at level 1. We couldn’t just raise the level annotation for that argument to 2, either, since that would raise the level of id to 3.
If we instead use floating functions for the nondependent argument, the analogous definition then does type check, since the second argument of type X can now be at level 2.
This definition of \( \mathsf { idid_1 } \) is now shaped the same as the Coq version, only with level annotations on domains where Coq has the corresponding level annotations on
. If we were to turn on universe polymorphism in Coq, it would achieve the same kind of expressivity of being able to displace \( \mathsf { idid_1 } \) in StraTT.
As an additional remark, even with floating functions, repeatedly nesting identity function self-applications is one way to non-trivially force the level to increase. The following definitions continue the pattern from \( \mathsf { idid_1 } \); the corresponding Coq definitions would similarly require higher universe levels on their
annotations.
In the untyped setting, these correspond to \( \lambda id \mathpunct {.} id \; id \), \( \lambda id \mathpunct {.} id \; ( \lambda id \mathpunct {.} id \; id ) \; id \), and \( \lambda id \mathpunct {.} id \; ( \lambda id \mathpunct {.} id \; ( \lambda id \mathpunct {.} id \; id ) \; id ) \; id \). All of \( \mathsf { idid_1 } \; ( \lambda X \mathpunct {.} \lambda x \mathpunct {.} x ) \), \( \mathsf { idid_2 } \; ( \lambda X \mathpunct {.} \lambda x \mathpunct {.} x ) \), and \( \mathsf { idid_3 } \; ( \lambda X \mathpunct {.} \lambda x \mathpunct {.} x ) \) reduce to \( \lambda X \mathpunct {.} \lambda x \mathpunct {.} x \).
3.2 Decidable types
The following example demonstrates a more substantial use of StraTT in the form of type constructors as floating functions and how they interact with cumulativity. Later in Section 5 we’ll consider datatypes with parameters, but for now, consider the following Church encoding [6] of decidable types, which additionally uses negation defined as implication into the empty type.
The \( \textsf{ yes } \; X \) constructor decides X by a witness, while the \( \textsf{ no } \; X \) constructor decides X by its refutation. We can show that deciding a given type is irrefutable.3
The same exercise of trying to define \( \textsf{ neg } \) and \( \textsf{ Dec } \) using only dependent functions and not floating functions has the same effect of no longer being able to type check \( \textsf{ irrDec } \), even if we allow ourselves to use displacement. More interestingly, let’s now compare these definitions to more-or-less corresponding ones in Agda.
Universe polymorphism is required to capture some of the expressivity of floating functions. For instance, to talk about the negation or the decidability of a type at level 1, by cumulativity it suffices to use \( \textsf{ neg } \) and \( \textsf{ Dec } \) respectively (without displacement!) in StraTT, but we must use
and
in Agda. However, since the constructors for \( \textsf{ Dec } \) use the type argument dependently, in StraTT the level of that argument is fixed at 0. The constructors must be displaced to \( \textsf{ yes }^{ 1 } \) and \( \textsf{ no }^{ 1 } \) to construct proofs of \( \textsf{ Dec }^{ 1 } \), just as
and
would construct proofs of
.
3.3 Leibniz equality
Although nondependent functions can often benefit from a floating domain, sometimes we don’t want the domain to float. Here, we turn to a simple application of dependent types with Leibniz equality [22, 27] to demonstrate a situation where the level of the domain needs to be fixed to a strictly lower level even when the codomain doesn’t depend on the function argument.
$$\begin{aligned} & \textsf{ eq } :^{ 1 } \Pi X \!:^{ 0 }\! \star \mathpunct {.} X \mathrel {\rightarrow } X \mathrel {\rightarrow } \star & \textsf{ refl } :^{ 1 } \Pi X \!:^{ 0 }\! \star \mathpunct {.} \Pi x \!:^{ 0 }\! X \mathpunct {.} \textsf{ eq } \; X \; x \; x \\ & \textsf{ eq } \; X \; x \; y :==\Pi P \!:^{ 0 }\! X \mathrel {\rightarrow } \star \mathpunct {.} P \; x \mathrel {\rightarrow } P \; y & \textsf{ refl } \; X \; x \; P \; px :== px \end{aligned}$$
An equality \( \textsf{ eq } \; A \; a \; b \) states that two terms are equal if given any predicate P, a proof of \( P \; a \) yields a proof of \( P \; b \); in other words, \( a \) and \( b \) are indiscernible. The proof of reflexivity should be unsurprising.
We might try to define a nondependent predicate stating that a given type X is a mere proposition, i.e. that all of its inhabitants are equal.
But this doesn’t type check, since the body contains an equality over elements of X, which necessarily has level 1 rather than the expected level 0. We must assign \( \textsf{ isProp } \) a stratified function type, given below on the left; informally, stratification propagates dependency information not only from the codomain, but also from the function body.
Going one further, we define above on the right a predicate \( \textsf{ isSet } \) stating that X is an h-set [40], or that its equalities are mere propositions, by using a displaced \( \textsf{ isProp } \) so that we can reuse the definition at a higher level; here, \( \textsf{ isProp }^{ 1 } \) now has type \( \Pi X \!:^{ 1 }\! \star \mathpunct {.} \star \) at level 2. Once again, despite the type of \( \textsf{ isSet } \) not being an actual dependent function type, we need to fix the level of the domain.
4 Metatheory
4.1 Consistency of subStraTT
We use Agda to mechanize a proof of logical consistency — that no closed inhabitant of the empty type exists — for subStraTT, which excludes floating nondependent functions. For simplicity, the mechanization also excludes global definitions and displaced constants, which shouldn’t affect consistency: if there is a closed inhabitant of the empty type that uses global definitions, then there is a closed inhabitant of the empty type under the empty signature by inlining all global definitions. The proof files are available at https://github.com/plclub/StraTT under the agda/ directory. The only axiom we use is function extensionality4
The core construction of the consistency proof is a three-place logical relation
among a term, its type, and its level, which we would aspirationally like to define as in Figure 6. Informally, this represents the interpretation of the type \( A \) as a set of closed terms which behave according to that type. For instance, a term f is in the interpretation of a function type if for every term \( y \) which behaves according to the domain, the term \( f \; y \) behaves according to the codomain. Consistency follows from the fact that the interpretation of the empty type is empty. In our working metatheory, we use \(\textbf{0}\) for falsehood, \(\textbf{1}\) for truthhood, \(\wedge \) for conjunction, \(\longrightarrow \) for implication, and \(\forall \) and \(\exists \) for universal and existential quantification.
Fig. 6.
Ill-formed logical relation between terms and types
However, this definition isn’t necessarily well formed. It isn’t defined recursively on the structure of the terms or the types, because in the cases involving dependent functions, we need to talk about the substituted type
. It isn’t defined inductively, either, because again in the dependent function case, the inductive itself would appear to the left of an implication as \(y \in \llbracket A \rrbracket _j\), making the inductive definition non-strictly-positive.
The solution is to define the logical relation as an inductive–recursive definition [14]. This design is adapted from a concise proof of consistency for MLTT in Coq by Liu [25], which uses an impredicative encoding in place of induction–recursion. This is a simplified and pared down adaptation of a proof of decidability of conversion for MLTT in Coq by Adjedj, Lennon-Bertrand, Maillard, Pédrot, and Pujet [2], which in turn uses a predicative encoding to adapt a proof of decidability of conversion for MLTT in Agda by Abel, Öhman, and Vezzosi [1] that uses induction–recursion.
Figure 7 sketches the inductive–recursive definition, which splits the logical relation into two parts: an inductive predicate on types and their levels
, and a relation between types and terms defined recursively on the predicate on the type, which we continue to write as
.
Fig. 7.
Inductive–recursive logical relation between terms and types
In the last inductive rule, in place of
, we instead use parallel reduction
, which is a reduction relation describing all visible reductions being performed in parallel from the inside out. This is justified by the following lemma, where
is the reflexive, transitive closure of \( A \Rightarrow B \).
Lemma 3
(Implementation of definitional equality).5
iff there exists some
such that
, which we write as
.
Even now, this inductive–recursive definition is still not well formed. In particular, in the inductive rule for dependent functions, if \( A \) is \( \star \), then by the recursive case for the universe, \(\llbracket y \rrbracket _j\) could again appear to the left of an implication. However, we know that \(j < k\), which we can exploit to stratify the logical relation just as we stratify typing judgements. We do so by parametrizing each logical relation at level k by an abstract logical relation defined at all strictly lower levels \(j < k\), then at the end tying the knot by instantiating them via well-founded induction on levels. This technique is adapted from an Agda model of a universe hierarchy by Kovács [21], which originates from McBride’s redundancy-free construction of a universe hierarchy [31, Section 6.3.1]. As the constructions are now fairly involved, we defer to the proof file6 for the full definitions, in particular
for the inductive predicate and
for the recursive relation. For the purposes of exposition, we continue to use the old notation.
Because the logical relation only handles closed terms, we deal with contexts and simultaneous substitutions \(\sigma \) separately by relating the two via yet another inductive–recursive definition in Figure 8, with a predicate on contexts
and a relation between substitutions and contexts
.
denotes applying the simultaneous substitution \(\sigma \) to the term \( A \), and \( \sigma [ x ] \) denotes the term which \(\sigma \) substitutes for \( x \).7
Fig. 8.
Inductive–recursive logical relation between substitutions and contexts
The most important lemmas that are needed are semantic cumulativity, semantic conversion, and backward preservation.
Lemma 4
(Cumulativity).8 Suppose \(j < k\). If \(\llbracket A \rrbracket _{j}\) then \(\llbracket A \rrbracket _{k}\), and if \(a \in \llbracket A \rrbracket _{j}\) then \(a \in \llbracket A \rrbracket _{k}\).
Lemma 5
(Conversion).9 Suppose
. If \(\llbracket A \rrbracket _{k}\) then \(\llbracket B \rrbracket _{k}\), and if \(a \in \llbracket A \rrbracket _{k}\) then \(a \in \llbracket B \rrbracket _{k}\).
Lemma 6
(Backward preservation).10 If \( a \Rightarrow ^{*} b \) and \(b \in \llbracket A \rrbracket _{k}\) then \(a \in \llbracket A \rrbracket _{k}\).
We can now prove the fundamental theorem of soundness of typing judgements with respect to the logical relation by induction on typing derivations, and consistency follows as a corollary.
Theorem 1
(Soundness).11 Suppose \(\llbracket \Gamma \rrbracket \) and \(\sigma \in \llbracket \Gamma \rrbracket \). If \( \Gamma \vdash a :^{ k } A \), then \(\llbracket A \{ \sigma \} \rrbracket _{k}\) and \( a \{ \sigma \} \in \llbracket A \{ \sigma \} \rrbracket _{k}\).
Corollary 1
(Consistency).12 There are no \( b \), k such that \( \varnothing \vdash b :^{ k } \bot \).
The problem with floating functions This proof can’t be extended to the full StraTT. While floating nondependent function types can be added to the logical relation directly as below, cumulativity will no longer hold.
In particular, given
, when trying to show
, we have by definition
, a term x, and
, but no way to cast the latter into
to obtain
as desired via the induction hypothesis, because such a cast would go downwards from a higher level k to a lower level j, rather than the other way around as provided by the induction hypothesis. Trying to incorporate the desired property into the relation, perhaps by defining it as
, would break the careful stratification of the logical relation that we’ve set up.
The violation of cumulativity due to floating functions is independent of our method of logical relations. If we try to prove consistency via a translation into an existing type theory with a cumulative universe hierarchy, for instance Agda with cumulative universes, a similar direct translation of floating functions would cause the same issue. Concretely, suppose we translate the type
at some level k into the Agda function type
. To prove that the translation preserves StraTT ’s cumulativity, we would require a function of the type
, which has the same problem of needing a downward cast. Such a translation would still need to be stratified by level to be well defined, so a universe-polymorphic translation to
wouldn’t be viable either.
4.2 Type safety of StraTT
While we haven’t yet proven its consistency, we have proven type safety of the full StraTT. We use Coq to mechanize the syntactic metatheory of the typing, context formation, and signature formation judgements of StraTT, recalling that this covers all of stratified dependent functions, floating nondependent functions, and displaced constants. We also use Ott [36] along with the Coq tools LNgen [4] and Metalib [3] to represent syntax and judgements and to handle their locally-nameless representation in Coq. The proof scripts are available at https://github.com/plclub/StraTT under the coq/ directory. We begin with some basic common properties of type systems, namely weakening, substitution, and regularity lemmas, as well as a generalized displaceability lemma. Next, we introduce a notion of restriction, which formalizes the idea that lower judgements can’t depend on higher ones, along with a notion of restricted floating, which is crucial for proving that floating function types are syntactically cumulative. Only then are we able to prove type safety.
As we haven’t mechanized the syntactic metatheory of definitional equality \( \Delta \vdash A \equiv B \), we state as axioms some standard, provable properties [5, Section 5.2], which are orthogonal to stratification and only used in the final proof of type safety. The equivalent lemmas for subStraTT, however, have been mechanized in Agda13 as part of the consistency proof.
Axiom 1
(Function type injectivity).14 If \( \Delta \vdash A_{{\textrm{1}}} \mathrel {\rightarrow } B_{{\textrm{1}}} \equiv A_{{\textrm{2}}} \mathrel {\rightarrow } B_{{\textrm{2}}} \) then \( \Delta \vdash A_{{\textrm{1}}} \equiv A_{{\textrm{2}}} \) and \( \Delta \vdash B_{{\textrm{1}}} \equiv B_{{\textrm{2}}} \). If \( \Pi x \!:^{ j_{{\textrm{1}}} }\! A_{{\textrm{1}}} \mathpunct {.} B_{{\textrm{1}}} \equiv \Pi x \!:^{ j_{{\textrm{2}}} }\! A_{{\textrm{2}}} \mathpunct {.} B_{{\textrm{2}}} \) then \( \Delta \vdash A_{{\textrm{1}}} \equiv A_{{\textrm{2}}} \) and \( j_{{\textrm{1}}} = j_{{\textrm{2}}} \) and \( \Delta \vdash B_{{\textrm{1}}} \equiv B_{{\textrm{2}}} \).
Axiom 2
(Consistency of definitional equality).15 If \( \Delta \vdash A \equiv B \) then \( A \) and \( B \) do not have different head forms.
Basic properties We extend the ordering between levels \( j \le k \) to an ordering between contexts
that also incorporates weakening in Figure 9. Stronger contexts have higher levels and fewer assumptions.
Fig. 9.
Context subsumption rules
This ordering is contravariant in the typing judgement: we may lower the context without destroying typeability. This result subsumes a standard weakening lemma.
Lemma 7
(Weakening).16 If \( \Delta ; \Gamma \vdash a :^{ k } A \) and \(\Delta \vdash \Gamma '\) and \( \Gamma ' \le \Gamma \) then \( \Delta ; \Gamma ' \vdash a :^{ k } A \).
The substitution lemma reflects the idea that an assumption \( x :^{ k } B \) is a hypothetical judgement. The variable \( x \) stands for any typing derivation of the appropriate type and level.
Typing judgements themselves ensure the well-formedness of their components: if a term type checks, then its type can be typed at the same level. Because our type system includes the non–syntax-directed rule DT-Conv, the proof of this lemma depends on several inversion lemmas, omitted here.
Lemma 9
(Regularity).18 If \( \Delta ; \Gamma \vdash a :^{ k } A \) then \(\vdash \Delta \) and \(\Delta \vdash \Gamma \) and \( \Delta ; \Gamma \vdash A :^{ k } \star \).
Generalizing displaceability in an empty context, derivations can be displaced wholesale by also incrementing contexts, written \( \Gamma ^{+ i } \), where \( ( \Gamma , x :^{ k } A ) ^{+ i } = \Gamma ^{+ i } , x :^{ k + i } A ^{+ i } \).
Lemma 10
(Displaceability).19 If \( \Delta ; \Gamma \vdash a :^{ k } A \) then \( \Delta ; \Gamma ^{+ j } \vdash a ^{+ j } :^{ k + j } A ^{+ j } \).
If we displace a context, the result might not be stronger because displacement may modify the types in the assumptions. In other words, it is not the case that \( \Gamma \le \Gamma ^{+ k } \).
Restriction The key idea of stratification is that a judgement at level \( k \) is only allowed to depend on assumptions at the same or lower levels. One way to observe this property is through a form of strengthening result, which allows variables from higher levels to be removed from the context and contexts to be truncated at any level. Formally, we define the restriction operation, written \( \lceil \Gamma \rceil ^{ k } \), which filters out all assumptions from the context with level greater than k. A restricted context may be stronger since it could contain fewer assumptions.
(Restriction).21 If \(\Delta \vdash \Gamma \) then \(\Delta \vdash \lceil \Gamma \rceil ^{ k } \) for any \( k \), and if \( \Delta ; \Gamma \vdash a :^{ k } A \) then \( \Delta ; \lceil \Gamma \rceil ^{ k } \vdash a :^{ k } A \).
Restricted floating Subsumption allows variables from one level to be made available to all higher levels using their current type. However, when we use this rule in a judgement, it doesn’t change the context that is used to check the term. This can be restrictive — we can only substitute their assumptions with lower level derivations.
In some cases, we can raise the level of some assumptions in the context when we raise the level of the judgement without displacing their types or the rest of the context. For example, suppose we have a derivation for the judgement
. We could derive the same judgement at a higher level \( k > j \) where we also raise the level of f to \( k \). However, we can’t raise \( x \) from its lower level \( i \) because then it would be invalid as an argument to f. In general, we can only raise the level of variables at the same level as the entire judgement.
To prove this formally, we must work with judgements that don’t have any assumptions above the current level by using the restriction operation to discard them. Next, to raise certain levels, we introduce a floating operation on contexts \( \mathopen { \uparrow _{ j }^{ k } } \Gamma \) that raises assumptions in \(\Gamma \) at level \( j \) to a higher level \( k \) without displacing their types.
Lemma 13
(estricted Floating).23 If \( \Delta ; \Gamma \vdash a :^{ j } A \) and \( j \le k \) then \( \Delta ; \mathopen { \uparrow _{ j }^{ k } } ( \lceil \Gamma \rceil ^{ j } ) \vdash a :^{ k } A \).
The restricted floating lemma is required to prove cumulativity of judgements.
Lemma 14
(Cumulativity).24 If \( \Delta ; \Gamma \vdash a :^{ j } A \) and \( j \le k \) then \( \Delta ; \Gamma \vdash a :^{ k } A \).
In the nondependent function case \( \Delta ; \Gamma \vdash \lambda x \mathpunct {.} b :^{ j } A \mathrel {\rightarrow } B \), where we want to derive the same judgement at level \( k \ge j \), we get by inversion the premise \( \Delta ; \Gamma , x :^{ j } A \vdash b :^{ j } B \), while we need \( \Delta ; \Gamma , x :^{ k } A \vdash b :^{ k } B \). Restricted floating and weakening allows us to raise the level of \( b \) together with the single assumption \( x \) from level \( j \) to level \( k \).
Type Safety We can now show that this language satisfies the preservation (i.e. subject reduction) and progress lemmas with respect to call by name \(\beta \delta \)-reduction
, whose rules are given in Figure 10. For progress, values are type formers and abstractions.
Theorem 2
(Preservation).25 If \( \Delta ; \Gamma \vdash a :^{ k } A \) and \( \Delta \vdash a \leadsto a' \) then \( \Delta ; \Gamma \vdash a' :^{ k } A \).
Theorem 3
(Progress).26 If \( \Delta ; \varnothing \vdash a :^{ k } A \) then \( a \) is a value or \( \Delta \vdash a \leadsto b \) for some \( b \).
Fig. 10.
Call by name reduction rules
5 Prototype implementation
We have implemented a prototype type checker, which can be found at https://github.com/plclub/StraTT under the impl/ directory, including a brief overview of the concrete syntax27 This implementation is based on pi-forall [41], a simple bidirectional type checker for a dependently-typed programming language.
These are apparently called “octagonal” integer constraints. For convenience, displacements and level annotations on dependent types can be omitted; the type checker then generates level metavariables in their stead. When checking a single global definition, constraints on level metavariables are collected, which form a set of integer inequalities on metavariables. An SMT solver checks that these inequalities are satisfiable by the naturals and finally provides a solution that minimizes the levels. Therefore, assuming the collected constraints are correct, if a single global definition has a solution, then a solution will always be found. However, we don’t know if this holds for a set of global definitions, because the solution for a prior definition might affect whether a later definition that uses it is solveable. Determining what makes a solution “better” or “more general” to maximize the number of global definitions that can be solved is part of future work.
The implementation additionally features stratified datatypes, case expressions, and recursion, used to demonstrate the practicality of programming in StraTT. Restricting the datatypes to inductive types by checking strict positivity and termination of recursive functions is possible but orthogonal to stratification and thus out of scope for this work. The parameters and arguments of datatypes and their constructors respectively can be either floating (i.e. nondependent) or fixed (i.e. dependent), with their levels following rules analogous to those of nondependent and dependent functions. Additionally, datatypes and constructors can be displaced like constants, in that a displaced constructor only belongs to its datatype with the same displacement.
We include with our implementation a small core library28 and all the examples that appear in this paper have been checked by our implementation29 In the subsections to follow, we examine three particular datatypes in depth: decidable types, propositional equality, and dependent pairs.
5.1 Decidable types
Revisiting an example from Section 3, we can define \( \textsf{ Dec } \) as a datatype.
The lack of annotation on the parameter indicates that it’s a floating domain, so that \( \lambda X \mathpunct {.} \textsf{ Dec } \; X \) can be assigned type \( \star \mathrel {\rightarrow } \star \) at level 0. Datatypes and their constructors, like variables and constants, are cumulative, so the aforementioned type assignment is valid at any level above 0 as well. When destructing a datatype, the constructor arguments of each branch are typed such that the constructor would have the same level as the level of the scrutinee. Consider the following proof that decidability of a type implies its double negation elimination, which requires inspecting the decision.
By the level annotation on the function, we know that dec and nn both have level 1. Then in the branches, the patterns \( \textsf{ Yes } \; y \) and \( \textsf{ No } \; x \) must also be typed at level 1, so that \( y \) has type X and \( x \) has type \( \textsf{ neg } \; X \) both at level 1.
5.2 Propositional equality
Datatypes and their constructors, like constants, can be displaced as well, uniformly raising the levels of their types. We again revisit an example from Section 3 and now define a propositional equality as a datatype with a single reflexivity constructor.
This time, the parameter has a level annotation indicating that it’s fixed at 0, while its indices are floating. Displacing \( \textsf{ Eq } \) by 1 would then raise the fixed parameter level to 1, while the levels of \( \textsf{ Eq }^{ 1 } \) itself and its floating indices always match but can be 2 or higher by cumulativity. Its sole constructor would be \( \textsf{ Refl }^{ 1 } \) containing a single argument of type X at level 1. Displacement is needed to state and prove propositions about equalities between equalities, such as the uniqueness of equality proofs30
5.3 Dependent pairs
Because there are two different function types, there are also two different ways to define dependent pairs. Using a floating function type for the second component’s type results in pairs whose first and second projections can be defined as usual, while using the stratified dependent function type results in pairs whose second projection can’t be defined using the first. We first take a look at the former.
Due to stratification, the projections need to be defined at level 1 and 2 respectively to accommodate dependently quantifying over the parameters at level 0 and the pair at level 1. Even so, the second projection is well typed, since P can be used at level 2 by subsumption to be applied to the first projection at level 2 also by subsumption in the return type of the second projection.
As the two function types are distinct, we do need both varieties of dependent pairs. In particular, with the above pairs alone, we aren’t able to type check a universe of propositions
.
In the second variant of dependent pairs where P is a stratified dependent function type, the domain of P is fixed to level 0, so in the type in \( \textsf{ dsnd } \), it can’t be applied to the first projection, but it can still be applied to the first component by matching on the pair. Now we’re able to type check \( \textsf{ DPair } \; \star \; \textsf{ isProp } \).
In both cases, the first component of the pair type has a fixed level, while the second component is floating, so using a predicate at a higher level results in a pair type at a higher level by subsumption. Consider the predicate \( \textsf{ isSet } \), which has type
is also well typed at level 2.
Unfortunately, the first projection \( \textsf{ dfst } \) can no longer be used on an element of this pair, since the predicate is now at level 2, nor can its displacement
\(^{1}\), since that would displace the level of the first component as well. Without proper level polymorphism, which would allow keeping the first argument’s level fixed while setting the second argument’s level to 2, we’re forced to write a whole new first projection function.
In general, this limitation occurs whenever a datatype contains both dependent and nondependent parameters. Nevertheless, in the case of the pair type, the flexibility of a nondependent second component type is still preferable to a dependent one that fixes its level, since there would need to be entirely separate datatype definitions for different combinations of first and second component levels, i.e. one with levels 0 and 1 (as in the case of
), one with levels 0 and 2 (as in the case of
), and so on.
6 Discussion
6.1 On consistency
The consistency of subStraTT tells us that the basic premise of using stratification in place of a universe hierarchy is sensible. However, as we’ve seen that directly adding floating functions to the logical relation doesn’t work, an entirely different approach may be needed to show the consistency of the full StraTT.
One possible direction is to take inspiration from the syntactic metatheory, especially Restricted Floating (Lemma 13), which is required specifically to show cumulativity of floating functions. Since cumulativity is exactly where the naïve addition of floating functions to the logical relation fails, the key may be to formulate this lemma more semantically.
Another possibility is based on the observation that due to cumulativity, floating functions appear to be parametric in their stratification level, at least starting from the smallest level at which it can be well typed. This observation suggests that some sort of relational model may help to interpret levels parametrically.
Nevertheless, we strongly believe that StraTT is indeed consistent. Restriction (Lemma 11) in particular intuitively tells us that nothing at higher levels could possibly be smuggled into a lower level to violate stratification. As a further confidence check, we have verified that four type-theoretic paradoxes which are possible in an ordinary type theory with type-in-type do not type check in our implementation. These paradoxes are Burali-Forti’s paradox [7] and Russell’s paradox [35] as formulated by Coquand [11], and Girard’s paradox [16] as formulated by Hurkens [20]. In each case, the definitions reach a point where a higher-level term needs to fit into a lower-level position to proceed any further — exactly what stratification is designed to prevent. Appendix A examines these paradoxes in depth.
6.2 On useability
Useability comes down to the balance between practicality and expressivity. On the practicality side, our implementation demonstrates that if a definition is well typed, then its levels and displacements can be completely omitted and inferred, providing a workflow comparable to Coq or Lean. Additionally, constants are displaced uniformly, so StraTT doesn’t exhibit the same kind of exponential blowup in levels and type checking time that can occur when using universe-polymorphic definitions in Coq or Lean31
On the other hand, if a definition is not well typed, debugging it may involve wading through constraints among generated level metavariables in situations normally having nothing to do with universe levels, since stratification now involves levels everywhere, in particular when using dependent function types.
On the expressivity side, the displacement system of StraTT falls somewhere between level monomorphism and prenex level polymorphism; in some scenarios, it works just as well as polymorphism. For instance, to type check Hurkens’ paradox as far as StraTT can, the Coq formulation of the paradox (without type-in-type) requires universe polymorphism, and the Agda formulation of the paradox (without type-in-type) requires definitions polymorphic over at least three universe levels. This is due to types that involve multiple syntactic universes, such as
, which only involves one level in StraTT, while the corresponding Agda type
requires three. In Hurkens’ paradox, these three Agda levels must vary independently, but StraTT achieves the same effect via displacement and floating.
However, in other scenarios, the expressivity of level polymorphism over multiple level variables is truly needed. In particular, merely having a type constructor with both a dependent domain and a nondependent domain interacts poorly with cumulativity. Suppose we have some type constructor
and a function over elements of this type
. By cumulativity, if \( y \) has level
, then
is still well typed by cumulativity at level
, but \( \textsf{ f } \) can no longer be applied to it, since the level of \( y \) is now too high. We would like the second argument of \( \textsf{ f } \) to float along with \( \textsf{ T } \), but this isn’t possible due to dependency. Making the level of the second argument polymorphic (subject to the expected constraints) would resolve this issue.
6.3 Related work
StraTT is directly inspired by Leivant’s stratified polymorphism [12, 23, 24], which developed from Statman’s ramified polymorphic typed \(\lambda \)-calculus [37]. Stratified System F, a slight modification of the original system, has since been used to demonstrate a normalization proof technique using hereditary substitution [15], which in turn has been mechanized in Coq as a case study for the Equations package [26]. More recently, an interpreter of an intrinsically-typed Stratified System F has been mechanized in Agda by Thiemann and Weidner [39], where stratification levels are interpreted as Agda’s universe levels. Similarly, Hubers and Morris’ Stratified \(\textrm{R}_\omega \), a stratified System \(\textrm{F}_\omega \) with row types, has been mechanized in Agda as well [19]. Meanwhile, displacement comes from McBride’s crude-but-effective stratification [29, 30], and we specialize the displacement algebra (in the sense of Favonia, Angiuli, and Mullanix [18]) to the naturals.
7 Conclusion
In this work, we have introduced Stratified Type Theory, a departure from a decades-old tradition of universe hierarchies without, we conjecture, succumbing to the threat of logical inconsistency. By stratifying dependent function types, we obstruct the usual avenues by which paradoxes manifest their inconsistencies; and by separately introducing floating nondependent function types, we recover some of the expressivity lost under the strict rule of stratification. Although proving logical consistency for the full StraTT remains future work, we have proven it for the subsystem subStraTT, and we have provided supporting evidence by proving its syntactic metatheory and showing how well-known type-theoretic paradoxes fail.
Towards demonstrating that StraTT isn’t a mere theoretical exercise but could form a viable basis for theorem proving and dependently-typed programming, we have implemented a prototype type checker for the language augmented with datatypes, along with a small core library. The implementation also features inference for level annotations and displacements, allowing the user to omit them entirely. We leave formally ensuring that our rules for datatypes don’t violate existing metatheoretical properties as future work as well.
Given the various useability tradeoffs discussed, as well as the incomplete status of its consistency, we don’t see any particularly compelling reason for existing proof assistants to adopt a system based on StraTT. However, we don’t see any showstoppers either, so we believe it to be suitable for further improvement and iteration. Ultimately, we hope that StraTT demonstrates that alternative treatments of type universes are feasibile and worthy of study, and opens up fresh avenues in the design space of type theories for proof assistants.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Burali-Forti’s paradox [7] in set theory concerns the simultaneous
and non–
of an ordinal. In type theory, we instead consider a particular datatype \( \textsf{ U } \) due to Coquand [11]3233 along with a
predicate for \( \textsf{ U } \).
Note that both of these definitions are strictly positive, so we aren’t using any tricks relying on negative datatypes. We can show that all elements of \( \textsf{ U } \) are well founded. If we ignore stratification and use type-in-type, we can also construct an element \( \textsf{ loop } \) that is provably not well founded.
In the branch of \( \textsf{ nwfLoop } \), by pattern matching on the type of the scrutinee, X is bound to \( \textsf{ U } \) and f to \( \lambda u \mathpunct {.} u \), so \( h \; \textsf{ loop } \) correctly has type \( \textsf{ WF } \; \textsf{ loop } \). Note that this definition passes the usual structural termination check, since the recursive call is done on a subargument from h. Then \( \textsf{ nwfLoop } \; ( \textsf{ wf } \; \textsf{ loop } ) \) is an inhabitant of the empty type.
However, with stratification, \( \textsf{ U } \) with level 1 is too large to fit into the type argument of \( \textsf{ MkU } \), which demands level 0, so \( \textsf{ loop } \) can’t be constructed in the first place. This is also why the level of a datatype can’t be strictly lower than that of its constructors, despite such a design not violating the regularity lemma.
A.2 Russell’s paradox
The \( \textsf{ U } \) above was originally used by Coquand [11] to express a variant of Russell’s paradox [35]34,35 First, an element of \( \textsf{ U } \) is said to be regular if it’s provably inequal to its subarguments; this represents a set which doesn’t contain itself.
The trick is to define a \( \textsf{ U } \) that is both regular and nonregular. Normally, with type-in-type, this would be one that represents the set of all regular sets.
Stratification once again prevents \( \textsf{ R } \) from type checking, since the pair projection returns a \( \textsf{ U } \) and not a
as required by the constructor
. The type contained in the pair can’t be displaced to
either, since that would make the pair’s level too large to fit inside
\(^{2}\).
A.3 Hurkens’ paradox
Although we’ve seen that stratification thwarts the paradoxes above, they leverage the properties of datatypes and recursive functions, which we haven’t formalized. Here, we turn to the failure of Hurkens’ paradox [20] as further evidence of consistency, which in contrast can be formulated in pure StraTT without datatypes. Below is the paradox in Coq without universe checking.
If we replace unsetting universe checking with
, then the definitions check up to
.
The corresponding StraTT code, too, checks up to \( \textsf{ M } \), using displacement as needed, and is verified in the implementation36
The next definition \( \textsf{ D } \) doesn’t type check, since \( \textsf{ sig } \) takes a displaced
. The type of \( x \) can’t be displaced to fix this either, since p takes an undisplaced \( \textsf{ U } \) and not a \( \textsf{ U }^{ 1 } \). Being stuck trying to equate two different levels is reassuring, as conflating different universe levels is how we expect a paradox that exploits type-in-type to operate.
Note this differs from irrefutability of the law of excluded middle, \( \textsf{ neg } \; ( \textsf{ neg } \; ( \Pi X \!:^{ 0 }\! \star \mathpunct {.} \textsf{ Dec } \; X ) ) \), which cannot be proven constructively.