Dieses Kapitel untersucht den kritischen Bereich der Äquivalenzüberprüfung neuronaler Netzwerke und konzentriert sich auf die Herausforderungen und Fortschritte bei der Gewährleistung, dass sich neue neuronale Netzwerke den etablierten Referenznetzen äquivalent verhalten. Es beginnt mit der Komplexität der Spezifizierung neuronaler Netzwerkverhalten und der relativ einfachen Formalisierung von Äquivalenzeigenschaften, die für verschiedene Anwendungen wie Umschulung, Schnitt und Schüler-Lehrer-Ausbildung von entscheidender Bedeutung sind. Der Text geht auf die theoretischen Grundlagen der Äquivalenzverifizierung ein und hebt die KoNP-Vollständigkeit der Top-1-Äquivalenzverifizierung hervor, eine bedeutende Erkenntnis, die bisher nicht etabliert wurde. Es werden Differentielle Zonotope eingeführt, ein innovativer abstrakter Bereich, der die Effizienz differentiellen Denkens bei der Äquivalenzüberprüfung verbessert. Diese Methode zeigt in Verbindung mit einer geeigneten Verfeinerung heuristischer und Generatorkompressionstechniken eine überlegene Leistung gegenüber bestehenden Ansätzen. Das Kapitel schlägt außerdem eine vertrauensbasierte Klassifikationsäquivalenzeigenschaft vor, die eine Überprüfung über größere Teile des Eingabebereichs ermöglicht. Diese Eigenschaft nutzt die Vertrauenswerte, die durch neuronale Netzwerke der Klassifikation bereitgestellt werden, und bietet einen globaleren und praktischeren Ansatz zur Überprüfung der Äquivalenz. Der Evaluierungsbereich zeigt die Umsetzung dieser Methoden in einem neuen Tool, VeryDiff, das erhebliche Beschleunigungen erzielt und in verschiedenen Benchmark-Szenarien modernste Prüfer übertrifft. Das Kapitel schließt mit einer Diskussion über die Beschränkungen und zukünftigen Richtungen der differentiellen Verifikation und betont ihr Potenzial, den Bereich der Verifikation neuronaler Netzwerke zu revolutionieren.
KI-Generiert
Diese Zusammenfassung des Fachinhalts wurde mit Hilfe von KI generiert.
Abstract
When validated neural networks (NNs) are pruned (and retrained) before deployment, it is desirable to prove that the new NN behaves equivalently to the (original) reference NN. To this end, our paper revisits the idea of differential verification which performs reasoning on differences between NNs: On the one hand, our paper proposes a novel abstract domain for differential verification admitting more efficient reasoning about equivalence. On the other hand, we investigate empirically and theoretically which equivalence properties are (not) efficiently solved using differential reasoning. Based on the gained insights, and following a recent line of work on confidence-based verification, we propose a novel equivalence property that is amenable to Differential Verification while providing guarantees for large parts of the input space instead of small-scale guarantees constructed w.r.t. predetermined input points. To this end, we propose an improved approach for (approximately) constraining an NN’s softmax confidence. We implement our approach in a new tool called VeryDiff and perform an extensive evaluation on numerous old and new benchmark families, including new pruned NNs for particle jet classification in the context of CERN’s LHC where we observe median speedups \(>300\times \) over the State-of-the-Art verifier \(\alpha ,\beta \)-CROWN.
1 Introduction
Specifying what an NN is supposed to do is a difficult problem, that is at most partially solved. One class of specifications that is comparatively easy to formalize are equivalence properties: Given an “old” reference NN \(f_1\), we aim to prove that a “new” NN \(f_2\) behaves in some way equivalently. For example \(\varepsilon \) equivalence [17, 22] requires that the numerical outputs of \(f_1\) and \(f_2\) for the same input point differ by at most \(\varepsilon \) or Top-1 equivalence [17] requires that the two NNs’ classifications match. Known applications of equivalence verification are verification after retraining or pruning [33], student-teacher training [17, 28], analysis of sensitivity to NN-based preprocessing steps [21] and construction of quantized NNs [19]. Several publications [9, 14, 17, 22, 23, 28, 33] have proposed methods for the verification of equivalence properties (sometimes calling it “approximate conformance” [14]). While it is known that equivalence verification w.r.t. the \(\varepsilon \) equivalence (Definition 2) property is coNP-complete [28], the complexity-theoretic status of Top-1 equivalence verification (Definition 3) remained unclear so far.
Although the field of equivalence verification has already seen significant improvements in scalability, it has also been critiqued for its shortcomings in practical applicability [20]. On the one hand, further speedups are necessary to make the tooling applicable to realistic case studies. On the other hand, besides greater efficiency, we also require equivalence properties that hold more globally: “Classic” equivalence properties have often only been evaluated w.r.t. small \(\varepsilon \)-balls around individual datapoints. In this work, we take steps in both directions of greater applicability. First, we extend the well-known and efficient Zonotope abstract domain to differential reasoning and demonstrate its superior performance over pre-existing equivalence verification approaches by coupling Differential Zonotopes with a suitable refinement heuristic and generator compression techniques. Secondly, we propose a confidence-based classification equivalence property that can be verifiable on larger parts of an input space by constraining the input space w.r.t. the reference NN’s confidence in its outputs.
Anzeige
Contribution. This work encompasses multiple theoretical and practical contributions to the field of equivalence verification:
(C1)
We prove that deciding if two \(\textrm{ReLU} \) NNs are Top-1 equivalent is a coNP-complete decision problem, i.e. it is as hard as \(\varepsilon \)-equivalence verification [28] or the classic NN verification problem [15, 24].
(C2)
We propose Differential Zonotopes: An abstract domain that allows the usage of the differential verification methodology w.r.t. the Zonotope abstract domain by propagating a Zonotope bounding the difference between two NNs in lock-step with a reachability analysis for the individual NNs.
(C3)
We implement the proposed approach in a new tool and evaluate its efficiency. For \(\varepsilon \) equivalence we achieve median speedups >10 for 8 of 9 comparisons (4.5 in the other case).
(C4)
For Top-1 equivalence we demonstrate empirically that Differential Zonotopes do not aid verification. We provide a theoretical intuition for this observation and demonstrate this is a fundamental limitation of Differential Verification in general – independently of the chosen abstract domain.
(C5)
Based on these insights, we propose a new confidence-based equivalence property for classification NN which is 1. verifiable on larger parts of the input space of NNs; 2. amenable to differential verification. Furthermore, we propose a simpler and more precise linear approximation of the \(\textrm{softmax}\) function in comparison to prior work [2]. In additional experiments, we demonstrate that our tool can certify 327% more benchmark queries than \(\alpha ,\beta \)-CROWN for confidence-based equivalence.
Related Work. Prior work on Zonotope-based NN verification [10, 27] verified non-relational properties w.r.t. a single NN. We extend this work by providing a methodology that allows reasoning about differences between NNs. Equivalence properties, can in principle, be analyzed using classical NN verification techniques such as \(\alpha ,\beta \)-CROWN [25, 35, 37‐39] for a single NN by building “product-networks” (similar to product-programs in classical program verification [6]), but early work on NN equivalence verification demonstrated that this approach is inefficient due to the accumulation of overapproximation errors in the two independent NNs [22]. While this view was recently challenged by [14], Section 7 conclusively demonstrates that tailored verification tools still outperform State-of-the-Art “classical” NN verification tools for NNs with similar weight structures.
Prior work suggested using Star-Sets for equivalence verification without analyzing weight differences and heavily relied on LP solving [28]. Prior work on differential verification [22, 23] did not verify the equivalence of classifications and also fell short of using the Zonotope abstract domain. In this work, we argue that analyzing classification-based equivalence properties is particularly important, because numerical bounds alone (e.g. the ones provided by prior differential verification [22, 23]) cannot guarantee equivalence in outcome. Section 7 compares to equivalence verifiers [17, 23, 28]. In another line of work, QEBVerif [40] proposes a sound and complete analysis technique tailored to quantized NNs which is not directly applicable to other kinds of NNs studied in our evaluation.
Another line of research analyzes relational properties w.r.t. multiple runs of a single NN. All listed works do not verify equivalence. For example, Banerjee et al. [5] propose an abstract domain for relational properties, but assume that all executions happen on the same NN. This makes their approach incompatible with our benchmarks which require the analysis of multiple different NN. Another incomplete relational verifier [4] also assumes executions on a single NN and requires tailored relaxations not available for equivalence properties. Encoding relational properties via product NNs has also been explored by Athavale et al. [2]. We compare against Marabou (which they used) and we prove that our approximation of softmax, though simpler, is always more precise.
Overview. Section 2 introduces the necessary background on NN verification via Zonotopes, equivalence verification and confidence based NN verification. Section 3 proves the coNP-completeness of Top-1 equivalence. Subsequently, we introduce Differential Zonotopes as an abstract domain for differential reasoning via Zonotopes (Section 4) and explain how Differential Zonotopes can be used to perform equivalence verification (Section 5). Section 6 explains why Top-1 equivalence does not benefit from differential reasoning in general and derives a new confidence-based equivalence property that may hold on large parts of the input space and can be verified more efficiently using differential verification. Finally, Section 7 provides an evaluation of our approach. The appendix of our paper can be found in the preprint [29].
Anzeige
2 Background
We deal with the verification of piece-wise linear, feed-forward neural networks (NNs). A NN with input dimension \(I \in \mathbb {N}\) and output dimension \(O \in \mathbb {N}\) and \(L\in \mathbb {N}\) layers can be summarized as a function \(f:\mathbb {R}^I \rightarrow \mathbb {R}^O\) which maps input vectors \(\textbf{x}^{(0)} \in \mathbb {R}^I\) to output vectors \(\textbf{x}^{(L)} \in \mathbb {R}^O\). In more detail, each layer of the NN consists of an affine transformation \(\tilde{\textbf{x}}^{(i)} = W^{(i)} \textbf{x}^{(i-1)} + b^{(i)}\) (for a matrix \(W^{(i)}\) and a vector \(b^{(i)}\)) followed by the application of a non-linear function \(\textbf{x}^{(i)} = h^{(i)}\left( \tilde{\textbf{x}}^{(i)}\right) \). Many feed-forward architectures can be compiled into this format [26]. We focus on the case of NNs with \(\textrm{ReLU}\) activations, i.e. \(h^{(i)}\left( \tilde{\textbf{x}}^{(i)}\right) = \textrm{ReLU}\left( \tilde{\textbf{x}}^{(i)}\right) =\max \left( 0,\tilde{\textbf{x}}^{(i)}\right) \) for all \(1 \le i \le L\). \(f^{(i)}\left( \textbf{x}\right) \) is the computation of the NN’s first i layers. We uniformly denote vectors in bold (\(\textbf{v}\)) and matrices in capital letters (M) and Affine Forms/Zonotopes are denoted as
/\(\mathcal {Z}\).
NN Verification. A well-known primitive in the literature on NN verification are Zonotopes [10, 12, 27]: An abstract domain that allows the efficient propagation of an interval box over the input space through (piece-wise) affine systems:
Definition 1
(Zonotope). A Zonotope with input dimension n and output dimension m is a collection of mAffine Forms of the structure
We denote a single Affine Form as a tuple \(\left( \textbf{g},c\right) \). Given an Affine Form
and a vector \(\textbf{v}\in \left[ -1,1\right] ^d\) (\(d \le n\)) we denote by
the Affine Form (or value for \(d=n\)) where the first d values of \(\mathbf {\epsilon }\) are fixed to \(\textbf{v}\), i.e. to \(\left( \textbf{g}_{d+1:n},\textbf{g}_{1:d}\textbf{v}+c\right) \). For
, we denote the set of points described by
as
.
Via
we denote the values reachable given the (input) vector \(\textbf{v}\). To improve clarity, some transformations applied to Zonotopes will be described for the 1-dimensional case, i.e. to a single Affine Form. Nonetheless, a major advantage of Zonotopes lies in their Matrix representation: m Affine Forms are then a matrix \(G\in \mathbb {R}^{n \times m}\) and a vector \(\textbf{c} \in \mathbb {R}^m\) (then denoted as \(\mathcal {Z} =\left( G,\textbf{c}\right) \)). A component of \(\textbf{g}\)/a column of G is called a generator. Given a Zonotope \(\mathcal {Z} \) we denote its i-th Affine Form (represented by G’s i-th row and \(\textbf{c}\)’s i-th component) as \(\mathcal {Z} _i\). Similar to the affine forms, we define \( \langle \mathcal {Z}\rangle = \left\{ \textbf{x} \in \mathbb {R}^m \left| \mathbf {\epsilon }\in \left[ -1,1\right] ^n, \textbf{x}_i = \mathcal {Z}_i\left( \mathbf {\epsilon }\right) \right\} \right. \). Zonotopes are a good fit for analyzing (piece-wise) linear NN as they are closed under affine transformations [3]:
Proposition 1
(Affine Zonotope Transformation [12]). For some Zonotope \(\mathcal {Z}=\left( G,\textbf{c}\right) \) and an affine transformation \(h\left( \textbf{x}\right) =W\textbf{x}+\textbf{b}\), the Zonotope \(\hat{\mathcal {Z}}=\left( WG,W\textbf{c}+\textbf{b}\right) \) exactly describes the transformation h applied to the points \(x \in \langle \mathcal {Z}\rangle \), i.e. for all \(d \le n\) and \(\textbf{v} \in \left[ -1,1\right] ^d\): \( \left\{ W\textbf{x}+\textbf{b} ~\left| ~ \textbf{x} \in \langle \mathcal {Z}\left( \textbf{v}\right) \rangle \right\} = \langle \hat{\mathcal {Z}}\left( \textbf{v}\right) \rangle \right. \)
This proposition implies \(\left\{ W\textbf{x}+\textbf{b} ~\left| ~ \textbf{x} \in \langle \mathcal {Z} \rangle \right\} = \langle \hat{\mathcal {Z} } \rangle \right. \), but it is even stronger as it also guarantees a linear map from an input (\(\textbf{v}\)) to all reachable outputs (\(\langle \hat{\mathcal {Z} }\left( \textbf{v}\right) \rangle \)). Zonotopes also admit efficient computation of interval bounds for their outputs:
Proposition 2
(Zonotope Output Bounds [12]). Consider some Affine Form
it holds for all \(\textbf{x} \in \left[ -1,1\right] ^n\) that:
Fig. 1.
\(\textrm{ReLU}\) approximation
Zonotopes cannot exactly represent the application of \(\textrm{ReLU}\), but we can approximate the effect by distinguishing three cases: 1. The upper-bound
is negative and thus \(\textrm{ReLU}\left( x\right) =0\) for
; 2. The lower-bound
is positive and thus \(\textrm{ReLU}\left( x\right) =x\) for
; 3. The \(\textrm{ReLU} \)-node is instable and its output is thus piece-wise linear. The first two cases can be represented as an affine transformation and the third case requires an approximation (see Figure 1 for intuition): Interpolation between \(\textrm{ReLU}\left( x\right) =0\) and \(\textrm{ReLU}\left( x\right) =x\) (see the blue line in Figure 1 and \(\lambda \textbf{g}\) in Proposition 3) yields a representation, that we can turn into a sound overapproximation (meaning all possible output values of \(\textrm{ReLU}\left( x\right) \) are contained). This is achieved by adding a new generator that appropriately bounds the error of the interpolated function (see orange lines in Figure 1 and
in Proposition 3). This result is summarized as follows:
Proposition 3
(\(\textrm{ReLU}\)Zonotope Transformation [27]). Consider some Affine Form
. Define a new Affine Form
such that:
for
. Then
guarantees for all \(d \le n\) and \(\textbf{v} \in \left[ -1,1\right] ^d\) that:
NN verification via Zonotopes typically proceeds as follows: An input set described as Zonotope is propagated through the NN using the transformers from Proposition 1 and 3. This yields an overapproximation of the NN’s behavior. Depending on the verification property, one can either check the property by computing the Zonotope’s bounds (Proposition 2) or by solving a linear program. If a property cannot be established, the problem is refined by either splitting the input space, w.r.t. its dimensions (input-splitting, e.g. [27]) or w.r.t. a particular neuron to eliminate the \(\textrm{ReLU} \)’s nonlinearity (neuron-splitting, e.g. [3]).
Equivalence Verification. To show that two NNs \(f_1,f_2\) behave equivalently, we can, for example, verify that the NNs’ outputs are equal up to some \(\varepsilon \):
Definition 2
(\(\varepsilon \)Equivalence [17, 22]). Given two NNs \(f_1,f_2:\mathbb {R}^I \rightarrow \mathbb {R}^O\) and an input set \(X \subseteq \mathbb {R}^I\) we say \(f_1\) and \(f_2\) are \(\varepsilon \)equivalent w.r.t. a p-norm iff for all \(\textbf{x} \in X\) it holds that \(\left\Vert f_1\left( \textbf{x}\right) - f_2\left( \textbf{x}\right) \right\Vert _p < \varepsilon \)
Deciding \(\varepsilon \) equivalence is coNP-complete [28]. Another line of work proposes verification of Top-1 equivalence which is important for classification NNs [17]:
Definition 3
(Top-1 Equivalence [17]). Given two NNs \(f_1,f_2:\mathbb {R}^I \rightarrow \mathbb {R}^O\) and an input set \(X \subseteq \mathbb {R}^I\), \(f_1\) and \(f_2\) are Top-1 equivalent iff for all \(\textbf{x} \in X\) we have \(\textrm{argmax}_i~f_{1, i}\left( \textbf{x}\right) = \textrm{argmax}_i~f_{2, i}\left( \textbf{x}\right) \). More formally, Top-1 equivalence guarantees that for all \(k\in \left[ 1,O\right] ,\textbf{x}\in X\) if for all \(j\ne k\) it holds that \(f_{1,k}\left( \textbf{x}\right) \ge f_{1,j}\left( \textbf{x}\right) \) then it also holds that \(f_{2,k}\left( \textbf{x}\right) \ge f_{2,j}\left( \textbf{x}\right) \) for all \(j\ne k\).
For \(\varepsilon \) equivalence, prior work by Paulsen et al. [22, 23] introduced differential verification: For two NNs of equal depth (i.e. \(L_1=L_2\)), equivalence can be verified more effectively by reasoning about weight differences. To this end, Paulsen et al. used symbolic intervals [30, 31] not only for bounding values of a single NN, but to bound the difference of values between two NNs \(f_1,f_2\), i.e. at any layer \(1 \le i \le L\) we compute two linear symbolic bound functions \(l^{(i)}_\varDelta \left( \textbf{x}\right) ,u^{(i)}_\varDelta \left( \textbf{x}\right) \) such that \(l^{(i)}_\varDelta \left( \textbf{x}\right) \le f_1^{(i)}\left( \textbf{x}\right) - f_2^{(i)}\left( \textbf{x}\right) \le u^{(i)}_\varDelta \left( \textbf{x}\right) \). Differential bounds are computed by propagating symbolic intervals through two NNs in lock-step. This enables the computation of bounds on the difference at every layer.
Confidence Based Verification Many classification NNs provide a confidence for their classification by using the Softmax function \(\textrm{softmax}_i\left( x\right) = \frac{e^{x_i}}{\sum _{i=1}^n x_i}\). A recent line of work [2] proposes to use the confidence values classically provided by classification NNs as a starting point for verification. For example, Athavale et al. [2] propose a global, confidence-based robustness property which stipulates that inputs classified with high confidence must be robust to noise (i.e. we require the same classification for some bounded perturbation of the input). While this introduces reliance on the NN’s confidence, it enables global specifications verified on the full input space and this limitation can be addressed by an orthogonal direction of research that aims at training calibrated NNs [1, 13], i.e. NNs that correctly estimate the confidence of their predictions.
3 Complexity of Top-1 Verification
Prior work showed that the classic NN verification problem for a single \(\textrm{ReLU}\)-NN is an NP-complete problem [15, 24]. Similarly, the problem of finding a violation for \(\varepsilon \) equivalence is NP-complete for \(\textrm{ReLU}\)-NNs as the single-NN verification problem can be reduced to this setting [28]. We now show, that finding a violation of Top-1 equivalence (Top-1-Net-Equiv) is also NP-complete implying that proving absence of counterexamples is coNP-complete (proof see [29, Sec. A.1]):
Theorem 1
(Top-1-Net-Equivis coNP-complete). Let \(X\subseteq \mathbb {R}^I\) be some polytope over the input space of two \(\textrm{ReLU} \)-NNs \(f_1,f_2\). Deciding whether there exists \(\textbf{x} \in X\) and a \(k \in \left[ 1,O\right] \) s.t. \(\left( f_1\left( \textbf{x}\right) \right) _k \ge \left( f_1\left( \textbf{x}\right) \right) _i\) for all \(i \in \left[ 1,O\right] \) but for some \(j\in \left[ 1,O\right] \) it holds that \(\left( f_2\left( \textbf{x}\right) \right) _k < \left( f_2\left( \textbf{x}\right) \right) _j\) is NP-complete.
Proof Sketch. Our proof differs from the proof for \(\varepsilon \) equivalence in the reduced problem which is the “classical” NN verification problem [15, 24] for \(\varepsilon \) equivalence verification. To apply a similar proof technique to Top-1 equivalence, we require an NN verification instance with only strict inequality constraints in the input and output. Therefore, we first prove the NP-completeness of verifying strict inequality constraints (Definition 8 in [29, Sec. A.1]) by adapting prior proofs [15, 24]. Then, we can reduce this problem to Top-1 equivalence verification. The reduction works by constructing an instance of Top-1-Net-Equiv that has a Top-1 violation iff the violating input also satisfies the constraints of the original NN verification problem. \(\square \)
Fig. 2.
Visualization for the construction of a new Affine Form via \(\textsc {ReLU}_\varDelta \)
4 Equivalence Analysis via Zonotopes
Given two NNs \(f_1,f_2:\mathbb {R}^I \rightarrow \mathbb {R}^O\) with L layers each, we follow the basic principle of differential verification, i.e. we bound the difference \(f_1^{(i)}-f_2^{(i)}\) at every layer \(1 \le i \le L\). For our presentation, we assume that all layers of the NN have the same width, i.e. the same number of nodes. In practice, this limitation can be lifted by enriching the thinner layer with zero rows [33]. Our approach to differential verification is summarized in Algorithm 1: First and foremost, we perform a classic reachability analysis via Zonotopes for the two NNs \(f_1,f_2\). To this end, we propagate a given input Zonotope \(\mathcal {Z} _{\text {in}}\) through both NNs resulting in output Zonotopes \(\mathcal {Z} ',\mathcal {Z} ''\). This part of the analysis uses the well-known Zonotope transformers described in Proposition 1 and 3. The individual reachability analysis is complemented with the computation of the Differential Zonotope\(\mathcal {Z} ^\varDelta \) which is initialized with 0 (meaning the NN’s inputs are initially equal) and computed in lock-step to the computation of \(\mathcal {Z} '\) and \(\mathcal {Z} ''\): At every layer, we overapproximate the maximal deviation between the two NNs. Using the transformers described in the remainder of this Section, we can prove that the Differential Zonotope always overapproximates the difference between the two NNs (proof see [29, Sec. A.2]):
Theorem 2
(Soundness). Let \(f_1,f_2\) be two feed-forward ReLU-NNs, \(\mathcal {Z} _{\text {in}}\) some Zonotope mapping n generators to I dimensions and \(\mathcal {Z} ',\mathcal {Z} '',\mathcal {Z} ^\varDelta \) the output of \(\textsc {Reach}_\varDelta \left( f_1,f_2,\mathcal {Z} _{\text {in}}\right) \). The following statements hold for all \(\textbf{v} \in \left[ -1,1\right] ^n\):
For the descriptions of transformations, we again focus on a single Affine Form. In this section, we denote the representation of an Affine Form as
where \(\textbf{e}\) are the n original (exact) generators present in an Affine Form of \(\left( \mathcal {Z} _{\text {in}}\right) _i\) and \(\textbf{a}\) are the p (approximate) generators added via \(\textrm{ReLU}\) transformations. For Differential Affine Forms we split the approximate generators into \(\textbf{a}'^\varDelta ,\textbf{a}''^\varDelta \) and \(\textbf{a}^\varDelta \), distinguishing their origin (
,
or generators added to
directly). This yields the Affine Form
. We assume that the corresponding vectors have equal dimensions (i.e. \(\textbf{e}'\),\(\textbf{e}''\) and \(\textbf{e}^\varDelta \) all have dimension \(n_1\); \(\textbf{a}'\) and \(\textbf{a}'^\varDelta \) have dimension \(n_2\) and \(\textbf{a}''\) as well as \(\textbf{a}''^\varDelta \) have dimension \(n_3\)). The points described by two Affine Forms and a Differential Affine Form are then described via common generator values \(\mathbf {\epsilon _1}, \mathbf {\epsilon _2}, \mathbf {\epsilon _3}, \mathbf {\epsilon _4}\) across all three Affine Forms. This definition naturally generalizes to the multidimensional (Zonotope) case by fixing all \(\mathbf {\epsilon }\) values across dimensions and Zonotopes. We also denote the points contained in a tuple
using \(\langle \cdot \rangle \):
Definition 4
(Points Contained by Differential Affine Form). Given two Affine Forms
and a Differential Affine Form
with resp. a matching number of columns, we define the set of points described by
where \(n_1,n_2,n_3,n_4\) are resp. the number of columns in \(\textbf{e}^\varDelta ,\textbf{a}'^\varDelta ,\textbf{a}''^\varDelta ,\textbf{a}^\varDelta \) and \(\overline{n}=n_1+n_2+n_3+n_4\):
Omitting the computation of \(\mathcal {Z} ^\varDelta \), \(\textsc {Reach}_\varDelta \) corresponds to computing reachable outputs of \(f_1,f_2\) w.r.t. a common input Zonotope \(\mathcal {Z} _{\text {in}}\). This algorithm implicitly computes the naive Differential Affine Form
i.e. the affine form
.
Affine Transformations. For affine transformations, we construct a transformation based on the insights from Differential Symbolic Bounds [22] that exactly two quantities determine the output difference: First, the difference accumulated so far (represented by the current Differential Affine Form
and scaled by the layer’s affine transformation); second, the difference between the affine transformations in the current layer. Given two affine transformations \(w^{(l)}_1 x + b^{(l)}_1\) and \(w^{(l)}_2 x + b^{(l)}_2\) and Affine Forms
we propose \(\textsc {Affine}_\varDelta \) which returns a new Affine Form
with:
(1)
We scale prior differences by \(w^{(l)}_1\) and add the new difference \((w^{(l)}_1-w^{(l)}_2)\) to \(\hat{\textbf{e}}^\varDelta \), \(\hat{\textbf{a}}''\) (scaled by the reachable values of \(f_2\), i.e. by
) and \(\hat{c}^\varDelta \) (and resp. \((b^{(l)}_1-b^{(l)}_2)\)). The transformation is sound (proof in [29, Sec. A.2] generalizes to Zonotopes):
Lemma 1
(Soundness ofAffine\(_\varDelta \)). For affine transformations \(\alpha _1\left( x\right) =w^{(l)}_1 x + b^{(l)}_1\) and \(\alpha _2\left( x\right) =w^{(l)}_2 x + b^{(l)}_2\) and Affine Forms
the transformation \(\text {Affine}_\varDelta \) is sound, i.e. if
is the result of \(\textsc {Affine}_\varDelta \), then for all \(d \le n\) and \(\textbf{v} \in \left[ -1,1\right] ^d\) with
and
the outputs of \(\textsc {Affine}\) (Proposition 1) we get:
\(\textrm{ReLU}\)Transformations. Since \(\textrm{ReLU}\) is piece-wise linear, we cannot hope to construct an exact transformation for this case. However, by carefully distinguishing the possible cases (e.g. both nodes are solely negative, one is negative one is instable, etc.) we provide linear representations for 6 out of the 9 cases (both nodes stable or a negative and an instable node) while overapproximating the other three. Given
and a Differential Affine Form
as well as
(the result of applying \(\textrm{ReLU} \) in the individual NNs), we compute the result of the \(\textrm{ReLU} _\varDelta \) transformation, using the case distinctions from Table 1. The transformation is sound (proof see [29, Sec. A.2]):
Lemma 2
(Soundness of\(\textrm{ReLU} _\varDelta \)). Consider Affine Form
and a Differential Affine Form
. Let
/
be the result of the \(\textsc {ReLU}\) transformation on
/
(see Proposition 3). Define
such that it matches the case distinctions in Table 1. Then
overapproximates the behavior of \(\textrm{ReLU} \), i.e. for all \(d \le n\) and \(\textbf{v} \in \left[ -1,1\right] ^d\) with
we get that:
Unfortunately, reasoning about the difference of two \(\textrm{ReLU}\)s is not particularly intuitive. The first 6 cases in Table 1 follow from substituting Node 1 and Node 2 values. In the other cases (Positive + Instable and All Instable), the difference is not linear in the input or output Zonotopes. Thus, we append an additional generator to the Differential Zonotope, i.e. to \(\textbf{a}^\varDelta \). The approximation for the case of two instable neurons is plotted in Figure 2: The bounding planes ensure that 0 is always reachable and that the prior difference is within the bound.
Table 1.
Case Distinction for the construction of a new Affine Form
. Additions of vectors with different lengths signify addition for the components in the common-length prefix. We distinguish the nodes’ different phases by positive (\(+\)), negative (–), and instable (\(\sim \)). The variables \(\lambda \), \(\mu \) and \(\nu \) are initialized as follows:
,
(accordingly for \(\lambda '',\mu ''\)) and
,
,
5 Verification of Equivalence Properties
To verify equivalence with \(\textsc {Reach}_\varDelta \), we proceed as follows: We propagate a Zonotope \(\mathcal {Z} _{\text {in}}\) through the two NNs \(f_1,f_2\) as explained above. Subsequently, we check a condition on the outputs \(\left( \mathcal {Z} ',\mathcal {Z} '',\mathcal {Z} ^\varDelta \right) \) that implies the desired equivalence property. If this check fails, we refine the Zonotope by splitting the input space. For the naive case, we compute \(\mathcal {Z} ^\varDelta \) as explained in Section 4.
\(\varepsilon \)equivalence. Checking \(\varepsilon \) equivalence w.r.t. \(\mathcal {Z} ^\varDelta \) works similarly to the approach for symbolic interval-based differential verification [22, 23]: We compute \(\mathcal {Z} ^\varDelta \)’s interval bounds and check for an absolute bound \(> \varepsilon \). If all bounds are smaller, we have proven \(\varepsilon \) equivalence for \(\langle \mathcal {Z} _{\text {in}} \rangle \) (see also Lemma 7 in [29, Sec. A.2]).
Top-1 Equivalence. Since Top-1 equivalence considers the order of outputs, it cannot be read off from \(\mathcal {Z} ^\varDelta \)’s bounds directly. We frame the property as an LP optimization problem in Definition 5. The LP has an optimal solution \(\le 0\) if no \(\textbf{x} \in \langle \mathcal {Z} _{\text {in}}\rangle \) classified as k in \(f_1\) is classified as j in \(f_2\). Formally, the LP computes an upper bound for \(\left( f_2\left( \textbf{x}\right) \right) _j-\left( f_2\left( \textbf{x}\right) \right) _k\) (maximized expression) under the condition that \(\left( f_1\left( \textbf{x}\right) \right) _k\) is the maximum of \(f_1\left( \textbf{x}\right) \) (first constraint) and under the additional condition that \(\mathcal {Z} ^\varDelta \) bounds the difference between \(f_1\) and \(f_2\), resp. the difference between the reachable points in \(\mathcal {Z} '\) and \(\mathcal {Z} ''\) (second constraint; see Definition 9 in [29, Sec. A.2]). The first constraint ensures a gap t between the largest and second largest output of \(f_1\). In this section, we only consider \(t=0\).
Definition 5
(Top-1 Violation LP). Given \(\mathcal {Z} '=\left( G',\textbf{c}'\right) =\left( E',A',\textbf{c}'\right) ,\mathcal {Z} ''=\left( E'',A'',\textbf{c}''\right) ,\mathcal {Z} ^\varDelta =\left( G^\varDelta ,\textbf{c}^\varDelta \right) \), a constant \(t \ge 0\) and \(k,j\in \left[ 1,O\right] \) with \(k \ne j\) the Top-1 Violation LP is defined below. \(E'\) and \(E''\) have \(n_1\) columns, \(A'\) has \(n_2\) columns, \(A''\) has \(n_3\) columns and \(A^\varDelta \) has \(n_4\) columns. \(\textbf{x}\) contains generators for these matrices in order and has dimension \(\bar{n}=n_1+n_2+n_3+n_4\).
If the LP’s maximum is positive, we check whether the generated counterexample is spurious. Verification requires \(\mathcal {O}\left( O^2\right) \) LP optimizations. In practice, we reuse the same constraint formulation for each k and optimize over all possible \(j \ne k\) admitting warm starts. Our approach is sound (see proof in [29, Sec. A.2]):
Lemma 3
(Soundness for Top-1).
Consider \(\mathcal {Z} ',\mathcal {Z} '',\mathcal {Z} ^\varDelta \) provided by \(\textsc {Reach}_\varDelta \) w.r.t. \(Z_{\text {in}}\): If for all \(k,j \in \left[ 1,O\right] \) (\(k \ne j\)) the Top-1 Violation LP with \(t=0\) has a maximum \(\le 0\), then \(f_1,f_2\) satisfy Top-1 equivalence w.r.t. inputs in \(\langle Z_{\text {in}}\rangle \).
Input Space Refinement. If verification fails, we split the input space in half and solve the verification problems separately. To this end, we use a heuristic to estimate the influence of splits along different input dimensions Splitting can improve the bounds in two ways: Either the reduced input range directly reduces the computed output bounds, or the reduced range reduces the number of instable neurons and hence reduces the over-approximation error w.r.t. output bounds. Our heuristic works similar to forward-mode gradient computation estimating the influence of input dimensions on output bounds. For an analysis with n generators in \(\mathcal {Z} _{\text {in}}\) and m generators in \(\mathcal {Z} '\)/\(\mathcal {Z} ''\) our heuristic requires two matrices (\(n \times m\)) and two additional matrix multiplication per layer. For details see Appendix [29, Sec. B]; we leave a fine-grained analysis of refinement strategies to future work.
Generator Compression. To increase performance we analyze \(Z_{\text {in}}\). In case an input dimension has range 0, we eliminate the generator in \(Z_{\text {in}}\). This optimization speeds up equivalence verification for, e.g., targeted pixel perturbations [22].
Counterexample Generation. To generate counterexamples we check the Zonotope’s center point for property violation. Additionally, we check the input points maximizing property violation according to our Zonotope approximation.
Completeness. As we only employ axis-aligned input-splitting, we cannot provide a completeness guarantee. However, our evaluation (Section 7) demonstrates, that this approach outperforms complete State-of-the-Art solvers.
6 Equivalence Verification for Classification NNs
Fig. 3.
Differential Zonotopes for Top-1 equivalence
Top-1 equivalence is particularly useful when verifying the equivalence of classification NNs. Indeed, there are examples of classification NNs which are \(\varepsilon \)-equivalent, but not Top-1 equivalent w.r.t. some input region (see also Appendix [29, Sec. C.3]). This underlines the importance of choosing the right equivalence property. Unfortunately, as we empirically show in the Appendix [29, Sec. C.2], classic Top-1 equivalence does not benefit from Differential Verification. Moreover, prior work on equivalence verification only provides guarantees for small parts of the input space, e.g. by proving Top-1 equivalence for \(\epsilon \)-balls around given data points. As pointed out in orthogonal work [11], \(\epsilon \)-balls around data points are not necessarily a semantically useful specification. Moreover, proving Top-1 equivalence on large parts of the input space is typically impossible, because pruning NNs invariably will change their behavior. This raises two questions: 1. Why does Top-1 equivalence not benefit from Differential Verification? 2. What equivalence property for classification NNs is verifiable on large parts of the input space while it can benefit from Differential Verification? We will answer these questions in order.
Ineffectiveness for Top-1 equivalence. Our initial intuition would have been that the tighter bounds in \(\mathcal {Z} ^\varDelta \) should also aid the verification of Top-1 equivalence. To refute this intuition, consider the sketch in Figure 3: The light blue area represents a Zonotope for the output space reachable via \(f_1\) (i.e. \(\mathcal {Z} '\)) and the red area describes a Zonotope for the output space reachable via \(f_2\) (i.e. \(\mathcal {Z} ''\)). The orange Zonotope describes \(\mathcal {Z} ^\varDelta \), i.e. it is a bound for the difference between the two outputs. Depending on the nature of \(\mathcal {Z} '\)’s and \(\mathcal {Z} ^\varDelta \)’s generators (i.e. if they are shared or not), the output region for \(\mathcal {Z} '-\mathcal {Z} ^\varDelta \) could reach as far as the solid black line. Thus, while \(\mathcal {Z} ^\varDelta \) limits the difference for individual input points, it does not necessarily provide effective bounds for the reachable values of \(f_2\) resp. \(\mathcal {Z} ''\). However, if via an LP formulation, the reachable values from \(\mathcal {Z} '\) were restrained to \(x_1 \ge 0.6\) (dark blue area on the right), then adding \(\mathcal {Z} ^\varDelta \) would yield the black dashed region meaningfully constrains the behavior of \(f_2\) beyond the constraints by \(\mathcal {Z} ''\) (red area). We could then prove that \(f_2\)’s first dimension (\(x_1\)) is positive. Notably, our constraint on \(f_1\) (\(\ge 0.6\)) is stricter than our constraint on \(f_2\) (\(\ge 0\)) making the Differential Zonotope useful even if it has reachable values in negative \(x_1\) direction. Top-1 equivalence imposes equal constraints on the difference between two dimensions in both NNs. Thus, if the difference has a negative bound in \(\mathcal {Z} ^\varDelta \) (a likely outcome), Differential Verification, independently of the considered abstract domain, cannot help for Top-1 equivalence: While the concrete regions would look different for other abstract domains, the outcome would be the same given a negative differential bound for \(x_1\).
Confidence-Based Equivalence We propose the notion of \(\delta \)-Top-1 equivalence. Our key idea is to integrate the confidence values (computed by \(\textrm{softmax}\)) into the verified property. In contrast to the arbitrary threshold \(x_1 \ge 0.6\), constraints based on confidence values have intuitive meaning:
Definition 6
(\(\delta \)-Top-1 equivalence). Given two NNs \(f_1,f_2 : \mathbb {R}^I \rightarrow \mathbb {R}^O\), \(\delta \in \left[ \frac{1}{2},1\right) \) and an input region \(Y \subseteq \mathbb {R}^I\), \(f_2\) is \(\delta \)-Top-1 equivalent w.r.t. \(f_1\) iff \(f_1,f_2\) are Top-1 equivalent w.r.t. \(X_{f_1,Y}\left( \delta \right) = \left\{ \textbf{x} \in Y~\left| ~\exists i \in \left[ 1,O\right] \textrm{sofmax}_i\left( f_1\left( \textbf{x}\right) \right) \ge \delta \right\} \right. \).
Fig. 4.
Approximation Errors w.r.t. confidence \(\delta \): Ours (solid) and Athavale et al. [2] (dashed).
We assume that \(f_1,f_2\) are \(\textrm{ReLU} \) NNs with one \(\textrm{softmax}\) function after the last layer. Verifying \(\delta \)-Top-1 equivalence achieves multiple objectives: First, even for larger Y (e.g. intervals over standard deviations for normalized inputs), we may provide guarantees for a suitable \(\delta \). Secondly, we rely on confidence estimates of a component we already trust: The reference NN. Finally, from a technical perspective, constraining the confidence level of \(f_1\) to \(\ge \delta \) while “only” requiring the same classification in \(f_2\) achieves the asymmetry necessary for exploiting Differential Verification. Unfortunately, deciding \(\delta \)-Top-1 equivalence is a coNP-hard decision problem (proof in [29, Sec. A.3]):
Corollary 1
(Complexity of\(\delta \)-Top-1). Let \(Y \subseteq \mathbb {R}^I\) be a polytope, \(f_1,f_2\) be two \(\textrm{ReLU} \)-\(\textrm{softmax}\)-NNs, \(\frac{1}{2} \le \delta < 1\). Deciding whether there exists a \(\textbf{y} \in Y\) and \(k \in \left[ 1,O\right] \) s.t. \(\left( f_1\left( x\right) \right) _k \ge \delta \) but \(\exists j\in \left[ 1,O\right] ~\left( f_2\left( \textbf{x}\right) \right) _k < \left( f_2\left( \textbf{x}\right) \right) _j\) is NP-hard.
Due to the usage of \(\textrm{softmax}\), the previous NP-membership argument does not apply in this case. To verify \(\delta \)-Top-1 equivalence, we part from prior work on confidence-based verification [2] and propose the following approximation of all vectors \(\textbf{z}\) for which output i has a confidence \(\ge \delta \) (proof in [29, Sec. A.3]):
Lemma 4
(Linear approximation of\(\textrm{softmax}\)). For \(\delta \in \left[ 1/2,1\right) \) the following set relationship holds:
We can then prove \(\delta \)-Top-1 equivalence for \(\delta \ge \frac{1}{2}\) by reusing the Top-1 Violation LP with t chosen appropriately (proof in [29, Sec. A.3]):
Corollary 2
(Soundness for\(\delta \)-Top-1). Given \(\mathcal {Z} ',\mathcal {Z} '',\mathcal {Z} ^\varDelta \) from \(\textsc {Reach}_\varDelta \) w.r.t. \(Z_{\text {in}}\), \(\frac{1}{2} \le \delta < 1\). If for all \(k,j \in \left[ 1,O\right] \) (\(k \ne j\)) the Top-1 Violation LPs with \(t=\ln \left( \frac{\delta }{1-\delta }\right) \) have maxima \(\le 0\), then \(f_2\) is \(\delta \)-Top-1 equivalence w.r.t. \(f_1\) on \(\langle Z_{\text {in}}\rangle \).
In comparison to the \(\textrm{softmax}\) approximation by Athavale et al. [2], we approximate via one polytope in the output space (in contrast to a 35-segment piece-wise linear approximation). Additionally, our approximation is parametrized in the confidence threshold \(\delta \) while their approximation is uniform across confidence values. We now analyze the precision of the two approximations. Given a desired confidence level \(\delta \), we consider its error the maximal deviation below \(\delta \) still encompassed by the approximation. For Athavale et al. [2] this maximal error is given as a function \(\textrm{err}_{\sigma }\left( n,\upsilon \right) \) in the input dimension n and the sigmoid approximation error \(\upsilon \) (see Definition 10 in [29, Sec. A.3] or [2, Thm. 1]). For our approximation, we want to derive the error via the minimal confidence value that is still part of \(P_n\left( \delta \right) \), i.e. as \(\textrm{err}_{\text {poly}}\left( n,\delta \right) = \delta - \min \left\{ \max _i \textrm{softmax}\left( \textbf{z}\right) _i ~\left| ~ \textbf{z}\in P_n\left( \delta \right) \right\} \right. \). We can derive the following properties for our approximation error \(\textrm{err}_{\text {poly}}\left( n,\delta \right) \) in relation to the approximation error \(\textrm{err}_{\sigma }\left( n,\upsilon \right) \) incurred by Athavale et al. [2] (proof in [29, Sec. A.3]):
Lemma 5
(Maximal Error for our\(\textrm{softmax}\)approximation). Consider \(n\ge 2\) and \(\delta \ge \frac{1}{2}\), then:
Note, that while \(\textrm{err}_{\text {poly}}\left( n,1\right) \) is well defined for \(\delta \) the necessary bound \(\ln \left( \frac{\delta }{1-\delta }\right) \) is not, i.e. we can only check for \(\delta <1\). The observations described in Lemma 5 are also observable in Figure 4: By parameterizing the approximation in the confidence threshold \(\delta \) we achieve significant precision gains over prior work – independent of output dimensionality (n) and in particular as we approach \(\delta =1\).
7 Evaluation
We implemented Differential Zonotope verification in a new tool1 called VeryDiff in Julia [7]. First, we analyze the efficiency of Differential Zonotopes compared to our naive approach for verifying \(\varepsilon \) or (\(\delta \)-)Top-1 equivalence. We also compare the performance of VeryDiff to the previous State-of-the-Art and demonstrate significant performance improvements for \(\varepsilon \) and \(\delta \)-Top-1 equivalence across all benchmark families. Our Appendix [29, Sec. C] contains an extended evaluation.
Fig. 5.
MNIST benchmark queries for which VeryDiff with(out) Differential Zonotopes proves equivalence.
Experimental Setup. We compare six different tools or configurations on old and new benchmark families. A detailed summary of baselines, benchmark families and NN architectures can be found in the Appendix [29, Sec. C.1]. For \(\varepsilon \) and Top-1 equivalence, we evaluate on preexisting and new ACAS and MNIST NNs (airborne collision avoidance and handwritten digit recognition) where the second NN is generated via pruning (and possibly further training). For MNIST, we evaluate w.r.t. input regions generated by Paulsen et al. [23] which prove equivalence on \(L_\infty \) bounded perturbations of images (\(L_\infty \) Properties) or targeted pixel perturbations (Pixel Attacks) For \(\delta \)-Top-1 equivalence, we introduce a new NN verification benchmark for particle jet classification at CERN’s Large Hadron Collider (LHC) [8]. We analyze equivalence w.r.t. pruned and further trained NN. NNs in this context come with strict real-time requirements making pruned NNs highly desirable [8]. We verify equivalence for boxes defined via standard deviations over the normalized input space.
Where Differential Verification helps. We compared VeryDiff with Differential Zonotopes activated to the naive computation without Differential Zonotopes. A summary of all results can be found in Table 6 [29, Sec. C.2]. We see significant improvements for \(\varepsilon \) equivalence (1430% more instances certified for the MNIST (VeriPrune) benchmark family). Figure 5 breaks down the verified \(\varepsilon \) equivalence queries with/without Differential Zonotopes by input radius across all MNIST benchmark queries and shows that Differential Verification helps to verify larger input regions. On the other hand, differential analysis slows down the verification of Top-1 equivalence verification. We provide a complementary theoretical analysis to this observation in Section 6 and posit this is a fundamental limitation of Differential Verification and not specific to our implementation. In contrast, certification of confidence-based equivalence can profit from Differential Verification across two benchmark families. While we see diminishing speedups as we push the confidence threshold \(\delta \) closer to 1 (implicitly reducing the input space with guarantees), realistic thresholds \(\delta > 0.5\) (e.g. 0.9 instead of \(1-10^{-7}\)) profit most from differential verification (up to speedups of 677 over the naive technique on commonly solved queries for LHC w.r.t. \(\delta =0.99\)). We found queries where \(\varepsilon \) and Top-1 equivalence results differ – underlining the importance of choosing equivalence properties w.r.t. the task at hand.
Table 2.
Verification results for \(\varepsilon \) and \(\delta \)-Top-1 equivalence: Speedups for commonly solved instances; improvements reported w.r.t. the best other tool. (*\(\alpha ,\beta \)-CROWN for MNIST spends 80% of its time on neuron-bound refinement. Speedups may be exaggerated while increases in solved instances are accurate).
\(\boldsymbol{\varepsilon }\)Equivalence. We compare our tool with other \(\varepsilon \) equivalence verification techniques from the literature and summarize the results in Table 2. Across both benchmarks from prior literature (ACAS and MNIST), we significantly outperform equivalence-specific verifiers (NNEquiv [28], MILPEquiv [17], NeuroDiff [23]) as well as general NN verification techniques (\(\alpha ,\beta \)-CROWN [18, 25, 32, 35‐39], Marabou [16, 34]) for certification of equivalence. \(\alpha ,\beta \)-CROWN outperforms VeryDiff in the search for counterexamples. We suspect this is due to its adversarial attack techniques [38]. Due to incompatible differences in the property checked by NeuroDiff’s implementation [29, Sec. C.1], we performed a separate comparison where VeryDiff outperforms NeuroDiff on the same property.
\(\boldsymbol{\delta }\)-Top-1 Equivalence. Differential Verification significantly improves upon the generic State-of-the-Art NN verifier \(\alpha ,\beta \)-CROWN (see Table 2). We concede that \(\alpha ,\beta \)-CROWN’s attack techniques outperform VeryDiff’s counterexample generation (see \(\varepsilon \) equivalence). Hence, our evaluation focuses on equivalence certification. The objective for \(\delta \)-Top-1 equivalence is to provide guarantees for low values \(\delta \). \(\alpha ,\beta \)-CROWN was only able to provide guarantees for 10 of the NNs in the benchmark set. In each case, VeryDiff was able to prove equivalence for lower (i.e. better) or equal \(\delta \) values. For the 3 NNs where the provided guarantees of \(\alpha ,\beta \)-CROWN and VeryDiff matched, both tools only verified equivalence for \(\delta =1-10^{-7}\), i.e. they provided an extremely limited guarantee. This underlines that VeryDiff is a significant step forward in the verification of \(\delta \)-Top-1 equivalence.
Limitations Larger weight differences between NNs and accumulating \(\textrm{ReLU} \) approximations may decrease speedups achievable via Differential Verification (see discussion in Appendix [29, Sec. C.4]). Nonetheless, VeryDiff outperforms alternative verifiers – often by orders of magnitude. Another limitation for confidence-based NN verification is the possibility of satisfied for high confidence thresholds (for mitigation via calibration [1, 13] see Athavale et al. [2]). However, for equivalence verification, we consider the reference NN \(f_1\) (incl. its confidence) trustworthy.
8 Conclusion
We introduced Differential Zonotopes as an abstract domain for NN equivalence verification. Our extensive evaluation shows that we outperform the Differential Verification tool NeuroDiff [22, 23]) as well as State-of-the-Art NN verifiers. Moreover, our paper provides insights into the circumstances where differential reasoning does (not) aid verification. As discussed in Sections 6 and 7, whether Differential Verification helps is not always straightforward for specifications involving classification. We believe that confidence-based equivalence is the way forward to scale equivalence verification beyond tiny input regions such as \(\epsilon \)-balls criticized in the literature [11]. Finally, we introduced a simpler approximation for \(\textrm{softmax}\) that is provably tighter than prior work [2]. This approximation is also of independent interest as it allows more precise verification of softmax NNs using simple linear constraints.
Future Work. We see potential in extending generic NN verifiers (e.g. Marabou’s network level reasoner [34]) with differential abstract domains to improve their reasoning capabilities for relational properties. Beyond equivalence, we hope to explore the applicability of Differential Verification techniques to other properties such as robustness or fairness.
Acknowledgements
This work was supported by funding from the pilot program Core-Informatics of the Helmholtz Association (HGF). Additional financial support was provided by the program “Invest BW” from the Baden-Württemberg Ministry of Economic Affairs, Labour and Tourism.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.