1 Introduction

Formal representations of EFSMs in Isabelle that can be translated to the SAL input language while retaining important semantics.

A set of operators compatible with Isabelle’s LTL framework that can be translated to the SAL LTL system in such a way that they retain the semantics of the Isabelle statements.

A justification as to the equivalence of our Isabelle and SAL semantics.

Tool support for automatically converting the Isabelle representation into the SAL input language

Tool support for automatically converting the Isabelle and SAL representations into the GraphViz DOT display format

Tool support for automatically converting GraphViz DOT diagrams into the Isabelle EFSM formalism.
2 Background
2.1 Definitions
2.2 Tools
if
conditions then
updates” rules. This is explained in detail in Sect. 3.2.2.3 Limitations of current technologies
3 Representing models
3.1 EFSMs
if
condition then
updates rules, which correspond to EFSM transitions. The main challenge to tackle here is that a state in SAL refers to a program state, i.e. a mapping from variables to their values at a particular point in time. There is no inherent notion of a separate control flow state here, but this is easily resolved by representing states as an additional local variable cfstate
to be checked along with the rest of the transition guards and updated to the new value along with the rest of the updates.3.2 Transitions
record
type such that each component can be easily accessed by its name. The type definitions for these components are shown in Fig. 4.
gexp
(detailed in Sect. 3.5) and the output and update functions are defined using another datatype aexp
(detailed in Sect. 3.4). Outputs are simply a list of expressions to be evaluated. Updates are a list of pairs, the first element being the index of the register to be updated, and the second element being an arithmetic expression to be evaluated.if
condition then
updates rules, each of which represents a “transition”. This maps well to Definition 1, in which transitions also have guards and updates, and most transition fields from Fig. 4 have an obvious mapping.cfstate
) where the Isabelle version is not. This is because in SAL we must encode the control flow as a local variable which the transitions update. This is discussed further in Sect. 5.output_sequence ! insert
operation used in the assignment of o’
is analogous to the Cons
operation on lists except that, as will be discussed in Sect. 3.3, lists must have a fixed maximum length. In the case of our simple drinks machine, a maximum length of one is sufficient to represent every output produced by the model. The insert
construction is not as aesthetically pleasing as the standard comma separated lists used by Isabelle, but SAL does not appear to provide a way of defining such custom syntaxes.INPUT
variable. Our Isabelle formalisation represents transition labels as string literals. This is convenient for the wider inference framework [24, 28], but falls foul of SAL’s requirement for enumerable datatypes. Indeed, SAL has no native support for strings out of the box. To resolve this, we make use of the fact that EFSMs have a finite number of transitions, and thus a finite number of transition labels. This enables us to use a similar trick as for control flow states and simply form an enumeration of the transition labels which occur in the model. For example, the transition labels of our simple drinks machine in Fig. 2 are select, coin, and vend.Arity
field to record the number of inputs the transition expects to receive. This is then checked when the EFSM processes executions. In SAL, we represent this as a guard on each transition with no difference in behaviour between the two representations.o
behaves just like any another variable. A consequence of this is that the outputs of the current transition are not observable until the next state because they are set along with the register updates. This has no significant effect on the functionality of the models, but it is important to bear in mind when specifying LTL properties, as will be discussed in Sect. 4.3.3 Input and data values
value
. This tags its members as either a number (Num
) or a string (Str
) and is defined as follows.
We use a similar representation in both Isabelle and SAL, although, as will be discussed below, our SAL representation must contain an extra “bottom element”.value option
. The Isabelle option
type is used to make partial functions total. It takes a type argument and is defined as being either None
(the bottom element used to represent an undefined value) or Some x
, where x
is an element from the specified type, in this case a value
. Since the number of registers used by any EFSM is known to be finite [24, 27], we make use of the theory of finite functions (FinFun [32]) from the HOL library. Here, a FinFun is a function which is constant except for finitely many points. This corresponds to a map (or dictionary) in conventional programming languages.option
datatype, it is straightforward to define one. Each register can then be defined as a local variable of type value option
. Representing inputs in SAL is more challenging due to the fact that all datatypes need to be enumerable and both integers and strings have an infinite number of inputs. SAL has a native integer type, but models involving this cannot be checked. To tackle this, we define a type BOUNDED_INT
in SAL which takes two parameters, provided by the user at runtime, which specify the minimum and maximum integer values to consider. This means that when SAL declares a property to hold, we can only be certain that it holds for integers within the specified range.BOUNDED_INT
spans the range \(10\ldots 10\) and we have an expression that should evaluate to 12, it actually evaluates to \(9\). To stop SAL producing spurious counterexamples that exploit this, we must explicitly guard against it. This is discussed further in Sect. 3.5.value
type in SAL must have three cases where in Isabelle it has only two.
B_value
, the atomic element ValueBB
represents the bottom element used to pad out sequences. Clearly we do not want this element to occur in the value
type, but we do want the other two. Consequently, we define the value
type as {g : B_value  g /= valueBB}
meaning that value
is a subtype of B_value
which does not include ValueBB
.3.4 Arithmetic and outputs
aexp
) datatype is defined as follows.
L
) and variable values (tagged V
). Here, the variable type is an argument a’
to aexp
. This will be explained in Sect. 4. There are then three recursive cases to support the basic numeric operations of addition, subtraction, and multiplication. Division is not supported as this is intuitively a floating point operation, and we only support integer numeric value
s.Plus
, Minus
, and Times
are only supported over numeric values. Most notably, we do not support the concatenation of strings with the Plus
operator. This is mostly because the Isabelle formalisation was built to support the inference technique in [24, 25], which is numerically focussed, but also because the expression of such an operation in SAL is impossible when representing strings as a finite enumeration.Plus (V (R 1)) (V (I 0))
has no inherent meaning. We therefore define an arithmetic evaluation function (aval
) as follows. This takes an arithmetic expression and a mapping from variables to values and returns a value.
aval
actually returns a value option
rather than just a value
. This is because Definition 1 is not strongly typed, so we must account for badly typed expressions. Consider, for example, the output expression \(r_2 + i_0\) from the coin transition in Fig. 2. As discussed above, addition is only supported for numerical values, but there is nothing to stop us from calling the coin transition with input “hello”, in which case the result of evaluating this expression is undefined. Since Isabelle functions must be total, we define an arithmetic for the value
datatype in terms of option
s to allow the bottom element, None
, to represent undefined values. In general, a binary function \(f: {\mathbb {Z}} \rightarrow {\mathbb {Z}} \rightarrow {\mathbb {Z}}\) is lifted to the optional arithmetic to become \(f^\prime : {\mathbb {Z}}{} \texttt { option} \rightarrow {\mathbb {Z}}{} \texttt { option} \rightarrow {\mathbb {Z}}{} \texttt { option}\). The full details of this are described in [24].aexp
datatype and aval
function over it. Unfortunately, the aexp
datatype is defined recursively so, like with the lists used to represent input sequences, SAL cannot handle models specified this way. Instead, we must use a shallow embedding to define arithmetic. What this means is that arithmetic expressions are represented using SAL’s existing logical and arithmetic operations rather than with a dedicated datatype and evaluation function.aval
during translation instead of during model execution. This means that SAL does not have to deal with the possibility of infinite arithmetic expressions, even though, for obvious reasons, only a finite number could ever actually be used in a real EFSM. To handle arithmetic addition, subtraction, and multiplication in terms of value option
s, we define the auxiliary functions value_plus
, value_minus
, and value_times
, each of which lifts its respective function as above. These are then used in arithmetic expressions in place of their corresponding aexp
constructors in Isabelle.3.5 Guards
gexp
in terms of arithmetic expressions, and a corresponding gval
function is defined similarly to (and on top of) aval
.
Eq
), inequality in the form of the “greater than” operator (Gt
), or set membership (In
). Finally, we have the logical connective Nor
, which is equivalent to “not or”. This is used instead of the more conventional logical operations (conjunction, disjunction, and negation) because it allows all three to be represented with a single operator. This makes the datatype smaller and thus shrinks proofs over it. For ease of expression, we also define functions to express logical and, or, and not in terms of nor, as well as functions for the <, \(\le \), \(\ge \), and \(\ne \) operators.gval
during translation and representing guards with a shallow embedding using functions like maybe_nor
in guard expressions instead of datatype constructs.4 Representing properties
4.1 LTL in SAL
output_vend
, declare that it concerns the drinks
machine, and state our property in LTL. SAL does not support free variables, but the arbitrary drink d can be universally quantified without changing the semantics of the formula.o=output_sequence ! insert(Some(d),output_sequence ! empty)
corresponds to the output [d]
. Recall that we must use an optional semantics for evaluating outputs. Thus, if we get a drink d
, this will be represented as Some(d)
.4.2 LTL in Isabelle
Linear_Temporal_Logic_on_Streams
formalisation [37] as this supports variables of arbitrary datatypes and predicates over them (e.g. \(x> 0\)) in expressions where [40] only supports Boolean valued variables.watch
which takes an EFSM and an infinite execution (a stream of (label, inputs) pairs) and executes it, starting in the initial state, resulting in a stream of 5tuples as defined above. Properties over a particular model m then take the form \(\varphi (\texttt {watch m t})\), where t
represents an arbitrary (infinite) execution. Full details of this can be found in [24].4.2.1 Making EFSMs complete
watch
function. In SAL, we make use of the ELSE
keyword to define a transition which may be taken only when no others can be. This takes the model into the explicit NULL_STATE
mentioned in Example 2.4.3 Translating LTL
value
list and returns \(\textit{true}\) if it equals the input at the head of the stream.value option
list and returns true if this is equal to the output at the head of the stream.check_exp
which is the most interesting. Here, we can supply an arbitrary guard expression as per Sect. 3.5 to be evaluated. Here, though, we may wish to express properties over the outputs of the EFSM as well as over its inputs and registers. We can do this easily by defining a new vname
datatype, ltl_vname
, as follows.
Here, we have inputs represented as Ip
, outputs as Op
, and registers as Rg
. As mentioned in Sect. 3.3, the gexp
datatype takes a type parameter which is used to represent variables, so we simply use ltl_vname gexp
s in place of the vname gexp
s used for transition guards, and we can then define expressions in terms of inputs, outputs, and registers.aand
and impl
operators, which are themselves syntactical constructs that allow us to integrate logical operators into stream expressions. For example, \((p~\texttt {aand}~q) \,s\) is equivalent to \((\lambda s.\ p s \wedge q s)\).5 Implementation
5.1 Translating Isabelle to SAL
STRING
type contains only the dummy string discussed in Sect. 3.3. The largest literal integer in the model is 100 and occurs in the guards of the two vend
transitions. A comment is output at the top of the file to indicate this to the user.cfstate
variable as an extra guard condition, and update it to its new value after firing. For example, the coin transition from Fig. 2 is given the precondition
and an extra postcondition
which leaves cfstate
unchanged. By contrast, the
transition gets the same precondition, but is given the postcondition
to move the model on to the next state.a__A
and have a precondition to check that
and a postcondition to update
The other would be called a__B
and have a precondition
and postcondition
LTLProperty
is an LTL property, EFSMname
is an EFSM, and
is a free variable. The Isabelle source files may contain other additional lemmas which cannot be meaningfully translated to SAL. Lemmas which are not of the above form are simply ignored. The LTLs we retain are semantically equivalent in both languages, and the internal structure can be exported with either SAL or Isabelle syntax. As with EFSM transitions, the order of LTL properties is maintained across both languages.5.2 Translating SAL to Isabelle
5.3 Translation to and from DOT
5.4 Limitations
watch
function as discussed in Sect. 4 over a free execution. Hyperproperties, properties involving multiple models, and arbitrary Isabelle lemmas are not supported by our toolset. This has the knockon effect that, in translating a theory file from Isabelle to SAL and then back to Isabelle, unsupported lemmas are lost as there is no meaningful way to carry them over to SAL.6 Worked examples
6.1 LinkedIn
GET
parameters of the link, so can be edited freely. If \(u_1\) knows or can obtain suitable parameter values, the server can be made to provide the full details of \(u_2\)’s profile both online and as a PDF. The researchers in [20] discovered that modifying some of the parameter values in the URL caused the server to generate valid links to download detailed PDFs of user profiles which were not in their network even though they had not paid to access this information.free
or a paid
user. The view and pdf actions both take three parameters. The first of these is the ID of the target profile. To improve readability, these have been replaced by friendID
(representing the ID of the friend’s profile), and otherID
(representing the ID of the nonfriend’s profile). The second parameter represents whether the target profile is a friend or not. This appears as name
if the target profile is a friend and OUT_OF_NETWORK
if they are not. The third parameter is a pseudorandom authentication token.
otherID
in Fig. 6), they could replace the value of the second parameter (OUT_OF_NETWORK
) with name
. This caused the server to generate a link to the full detailed PDF file and a valid authentication token (4zoF
) which could then be manually inserted as the third argument on a second run to view the full profile online.free
user were published in [20], but the written description of the paid
user functionality was sufficient to infer traces of what would happen if a paid user had attempted the same process. Traces involving the paid
user were added as a means of inserting a differential between free and paid users; however, due to the vulnerability of the LinkedIn system, these traces did not affect the inferred EFSM beyond an additional login transition.
free
user, they should never be able to get the pdf
action to output the detailed report for users who are not their friend”.friendID
and otherID
, this becomes the requirement that after logging in as free
, the pdf action called with the input otherID
should not output detailed_pdf_of_otherID
. Fig. 7 shows this as an Isabelle lemma using the LTL definitions from Sect. 4.
[“detailed_pdf_of_otherID”]
. Unfolding the definition of pdf2
reveals the contradiction since this is exactly its output. This brings the proof state to False
, which reveals that the lemma in Fig. 7 is untrue and that there is a flaw in the system, but is too far removed from the original property to provide much insight into how this flaw might either be fixed by the system administrator or exploited by malicious individuals.
free
. Next, they call the view action with the parameters otherID
, name
, and MNn5
. This takes the model into state \(s_6\) in Fig. 6, from which the pdf action can be called with the same parameters to obtain the detailed PDF of the other user’s profile.paid
or free
, and checked this in the guard expressions of view and pdf transitions. The attempted attack now moves the system to the state from which only the summary_pdf_of_otherID
output can be produced by the pdf action. As mentioned in Sect. 5.3, the use of DOT to visualise this change was useful here.proved
. This provides some level of assurance that our desired property holds, but is not conclusive proof. As discussed in Sect. 2, we can only use SAL to check a finite subset of transition inputs. For the strongest possible assurance, we use our tool to translate our modified SAL back to Isabelle and prove the property there, as shown in Fig. 9.6.2 Lift controller
down43  
down43Stop  
up34Stop  
down32  
down32Stop  
up23  
up23Stop  
down21Stop  
up12  
up12Stop 
proved
. The outer globally operator states that the property must hold true throughout the entire execution of the model. The inner predicate states that if eventually there is a next state in which we successfully open the doors, this does not occur until either the label is motorstop or the output is
. It is this second disjunct which is the key, since this exploits the fact that, when we open the doors, the outputs are the current floor n and the string
. We could make this more explicit by writing
instead, but this does not affect the validity of the property since opendoors is the only transition which could ever produce this output.arbitrary
keyword to achieve this, the coinduction package is still very new and lacks this infrastructure. Phrasing our property as in Fig. 12 is a workaround for this.7 Discussion
7.1 Correctness of translations
check_exp
function, and these translatable features are an important contribution of this work.nitpick
, produces counterexamples for conventional Isabelle theorems that are then automatically checked by the proof assistant. While this automatic checking of counter examples is not currently implemented for our translation system, it would be quite possible and is left for future work.7.2 Scalability
arbitrary
.