Sie können Operatoren mit Ihrer Suchanfrage kombinieren, um diese noch präziser einzugrenzen. Klicken Sie auf den Suchoperator, um eine Erklärung seiner Funktionsweise anzuzeigen.
Findet Dokumente, in denen beide Begriffe in beliebiger Reihenfolge innerhalb von maximal n Worten zueinander stehen. Empfehlung: Wählen Sie zwischen 15 und 30 als maximale Wortanzahl (z.B. NEAR(hybrid, antrieb, 20)).
Findet Dokumente, in denen der Begriff in Wortvarianten vorkommt, wobei diese VOR, HINTER oder VOR und HINTER dem Suchbegriff anschließen können (z.B., leichtbau*, *leichtbau, *leichtbau*).
Der Aufsatz untersucht die entscheidende Rolle der Unsicherheit im Softwareengineering und konzentriert sich darauf, wie Designvariabilität und operative Unsicherheiten in softwareintensiven Systemen zusammenwirken. Es stellt das Uncertainty Interaction Problem (UIP) vor und präsentiert ein neuartiges Rahmenwerk, HaiQ, das Designvariabilität mit stochastischer Verhaltensmodellierung und -analyse verbindet. HaiQ ermöglicht die umfassende Erforschung verschiedener Systemkonfigurationen unter Unsicherheit und bietet quantitative Garantien für Systemeigenschaften. Der Artikel beleuchtet auch die Anwendbarkeit des Rahmenwerks über verschiedene Bereiche hinweg und sein Potenzial, die Widerstandsfähigkeit und Anpassungsfähigkeit des Systems zu verbessern. Indem er zentrale Forschungsfragen aufgreift und Fallstudien präsentiert, demonstriert er die Durchführbarkeit und Allgemeingültigkeit des vorgeschlagenen Ansatzes und macht ihn zu einer wertvollen Ressource für Fachleute, die Unsicherheit in komplexen Softwaresystemen managen wollen.
KI-Generiert
Diese Zusammenfassung des Fachinhalts wurde mit Hilfe von KI generiert.
Abstract
In software-intensive systems, navigating the complexities that emerge from the interaction of design variability and stochastic operational uncertainties presents a daunting challenge. This paper delves into the dynamics between these two dimensions of uncertainty, offering novel insights about how modeling can contribute to the analysis of their combined impact upon system properties. By elevating the abstraction level at which probabilistic models are conceptualized, our approach enables an integrated analysis framework that considers both structural and quantitative dimensions of design spaces. Through the introduction of novel language constructs, our methodology facilitates the direct referencing of structural relationships within probabilistic behavioral specifications. Furthermore, the adoption of novel quantifiers in probabilistic temporal logic enables evaluating complex properties across diverse design variants, thereby streamlining the assessment of guarantees within the solution space. We demonstrate the feasibility of this approach on four case studies, showcasing its potential to offer comprehensive insights into the trade-offs and decision-making processes inherent in managing different types of structural design variability and operational uncertainties in software-intensive systems.
Communicated by Javier Troya and Alfonso Pierantonio.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1 Introduction
In the evolving landscape of software engineering, uncertainty plays a critical and often challenging role. Software-intensive systems, fundamental in various domains, are increasingly subject to various uncertainties during their life cycle, encompassing design, construction, deployment, and evolution. The inherent complexity of these systems, coupled with their essential roles in supporting crucial tasks, makes managing uncertainty a pivotal concern. Examples are abundant, ranging from mission-critical systems used in disaster relief operations to automatic stock management systems, where the cost of erroneous decisions under uncertainty can lead to severe consequences.
The pursuit to effectively manage and mitigate the effects of uncertainty in software-intensive systems is viewed as a promising avenue for engineering systems that are resilient to runtime changes and the various uncertainties that arise from their execution environment, such as resource availability, human interactions, and system-specific issues like faults or the use of machine learning components [1].
Anzeige
So far, the focus has been on developing methodologies to detect, represent, and mitigate uncertainties originating from diverse sources, each impacting the system’s functional and non-functional requirements in unique ways. However, a critical challenge that persists is the interaction of uncertainties.
Individual uncertainties, whether pertaining to goals, resources, or other factors that concern a software-intensive system and its environment, are seldom independent. They often compound, subtly and unpredictably affecting the achievement of system goals and other properties. This phenomenon, recently described as the Uncertainty Interaction Problem (UIP) [2, 3], highlights the complexity of understanding how different types of uncertainties, arising from various sources, interact with each other and impact the properties of software-intensive systems. Addressing this problem involves representing these combined uncertainties, analyzing their emergent effects on requirement satisfaction, and developing strategies for their mitigation. Despite ongoing research efforts, there is a limited understanding of the specific ways these uncertainties interact and the full extent of their impact on system properties [3].
In this paper, we narrow our focus to a specific class of uncertainty interaction that has a remarkable influence on software design and operation: the interplay between design variability and operational uncertainties. Design variability refers to the spectrum of design choices available during the software development process. Each choice leads to different system behaviors and performance levels across various quality dimensions. This variability, while offering flexibility and customization, introduces uncertainty in predicting system behavior and performance outcomes.
Concurrently, operational uncertainties arise at runtime. These are often stochastic in nature, stemming from unpredictable factors such as user interactions, environmental conditions, and system load variations. Operational uncertainties are particularly challenging to model and mitigate, as they are often outside the control of system designers and emerge in some cases, only after the system has been deployed.
Anzeige
The interaction between these two types of uncertainty presents a unique challenge. Design choices made under uncertainty can significantly impact how a system responds to operational uncertainties. For instance, a design optimized for certain operational conditions may perform sub-optimally when those conditions change unexpectedly. Conversely, operational uncertainties can reveal limitations or unforeseen behaviors in certain design choices, affecting the system’s overall resilience and adaptability.
This specific intersection of design variability and operational uncertainties requires a nuanced approach to modeling and analysis. Traditional models that treat these uncertainties in isolation (e.g., constraint solvers, probabilistic model checkers) are insufficient for capturing the complex dynamics at play when these uncertainties interact. On the one hand, formal methods used to validate software designs like Alloy [4], Z [5], B [6], VDM [7], and OCL [8], share a common conceptual foundation that enables designers to capture design variability by implicitly describing collections of alternative structural designs (e.g., software architectures [9, 10], database object–relational mappings [11], network and security models [12, 13]) as sets of relational constraints and reason about them systematically, but they are not equipped for analyzing stochastic behavior. On the other hand, quantitative verification or probabilistic model checking [14‐16] approaches can reason under uncertainty about quantitative guarantees of systems (e.g., on performance, reliability), but use notations (e.g., PRISM [17], PEPA [18], cpGCL [19]) that do not retain the flexibility in describing design variants of relational methods. Recent product line reliability analysis approaches [20‐23] improve on flexibility by introducing systematic treatment of variability, but are not equipped to synthesize or reason about complex design variants (e.g., those that satisfy a non-trivial set of structural constraints of the style that are usually captured by, for instance, OCL or Alloy), and lack languages tailored to check complex properties across design variants (i.e., temporal logics employed like PCTL [14] can capture properties only about a single variant, not collections of variants that describe the design space).
This situation forces designers to explore design variants independently of stochastic factors in the system’s operational environment, limiting their ability to systematically assess how their mutual interdependence affects relevant system properties.
Our research aims to address this gap by proposing a modeling and formal analysis framework that integrates both design variability and stochastic operational uncertainties. This framework is aimed at enabling a more holistic understanding of how design choices influence system behavior under various operational scenarios and vice versa, compared to current techniques based on relational modeling, probabilistic analysis, and quantitative verification, which present various shortcomings in this area (cf. Sect. 2). We posit that this will contribute to the development of more robust, adaptable, and resilient software systems capable of reliable operation in uncertain and changing environments. To further our vision, we investigate the following research questions:
(RQ1)
How can we jointly model and reason about (non-trivial) design space variability and stochastic aspects in a system’s operational environment?
(RQ2)
If feasible, is such an approach general enough to be applicable to different domains and types of analysis?
(RQ3)
What are the trade-offs of employing this class of approach with respect to existing techniques for analyzing systems that operate under uncertainty?
This paper explores these questions by contributing what is, to the best of our knowledge, the first approach that integrates into a single framework (HaiQ) the specification of complex design spaces with stochastic behavior modeling and analysis. Such integration enables obtaining quantitative guarantees on the satisfaction of system properties in the presence of the emergent effects of design space variability and stochastic operational uncertainties.
From a modeling perspective, the first core feature of HaiQ is its capability to transform configurations, derived from relational constraints, into detailed stochastic behavior models. This transformation process facilitates a comprehensive exploration of various system configurations under uncertainty, using probabilistic model checking to evaluate both average- and extreme-case scenarios.
The second core feature of the framework is manifold probabilistic computation tree logic (M-PCTL), an extension of existing probabilistic temporal logics that enables users to evaluate properties across a collection of probabilistic state machines, each of which captures the behavior of a system design variant. This ability is crucial for assessing the resilience of different system configurations to operational uncertainties.
This article is a revised and extended version of the paper presented at the 8th International Conference on Formal Methods in Software Engineering (FormaliSE) in 2020 [24]. In this updated version, the original paper is expanded to deepen the understanding of how modeling within frameworks like HaiQ can address some of the challenges posed by the uncertainty interaction problem, particularly by capturing the interplay of design variability and operational uncertainties faced by software-intensive systems. The paper now opens with a revised introduction that contextualizes this interaction within the UIP. Moreover, it includes a broadened review of related literature, a novel case study on a cyber-physical system that introduces unique variations and uncertainties, and an enriched discussion incorporating these new findings. Additionally, an appendix detailing the HaiQ modeling language is provided, alongside a publicly released repository containing HaiQ’s binaries, source code, and the models from the presented case studies.
The remainder of this paper starts by providing a discussion of related work (Sect. 2), following with an overview of the approach (Sect. 3). Then, we give further details about HaiQ’s modeling language (Sect. 4) and M-PCTL (Sect. 5). Next, we illustrate and evaluate the approach on four scenarios in various domains (Sect. 6), discussing research questions and threats to validity. The paper finishes with conclusions and directions for future work.
2 Related work
Related approaches can be categorized into: (i) relational modeling and structural verification, (ii) probabilistic/quantitative verification, and (iii) structural–quantitative optimization.
Relational Modeling and Structural Verification (RMSV). Pioneering formal methods like Z [5] have a very expressive notation that limits automated theorem proving, which often requires guidance from an experienced user. Alloy [4] can be considered a first-order subset of Z with finite models, which makes it automatically analyzable. Other techniques and languages like VDM [7] and OCL [8] are also based on first-order logic and include tools that can support design-time analysis that allows exhaustive search over a finite space of cases, similarly to Alloy. All the aforementioned methods have a strong focus on structure and are limited in terms of reasoning about complex concurrent behaviors. In contrast, methods like DynAlloy [25], Electrum [26], and Event-B [27] include more sophisticated constructs to reason efficiently about behaviors on top of structures. Despite being able to reason about concurrent behaviors, these methods are not equipped to capture or reason about probabilistic aspects of system behavior.
Quantitative/probabilistic analysis and verification (Q/PV). Modern probabilistic model checkers like PRISM [17] and STORM [28] capture models using textual languages like PRISM, PEPA, or low-level matrix-based descriptions. Stochastic variants of UPPAAL [29] employ graphical specification languages. Although these tools are efficient at analyzing complex probabilistic system behaviors, their specification languages and reasoning mechanisms do not separate process instance attributes from process types, hampering reusability. Möbius [30] overcomes this limitation by exploiting domain-specific models. However, none of the tools described is equipped for reasoning about design variants. In contrast, product line reliability analysis approaches [20‐23] can analyze collections of system designs encoded in a feature model individually or collectively. A recent approach to continuous-time probabilistic design synthesis [31] uses a template-based solution to generate and analyze alternative system designs, but does not make any emphasis on high-level modeling, assuming an existing encoding of design options in a set of discrete variables and leaving out of scope any systematic enforcement of structural constraints in the designs. Compared with the approaches described above, HaiQ’s focus is not only on variability, but also on structure, being able to capture and analyze design alternatives that satisfy complex structural constraints. Moreover, HaiQ includes a temporal logic (M-PCTL) that specifically targets analysis across multiple design variants.
Structural quantitative analysis and optimization (SQA). There is extensive related work on model-based performance prediction [32] and optimization of quantitative aspects of structures (e.g., architectures) [33] that typically use mechanisms like stochastic search and/or Pareto analysis [34‐40]. These and other approaches in systems engineering (e.g., [41]) can optimize quantitative aspects of designs, but they do not support synthesis of configurations. Other approaches [11, 42] combine structural synthesis with simulation and dynamic analysis to provide estimates of quantitative properties of design variants. These approaches share with ours the idea of synthesizing a solution space from a set of constraints and analyzing individual solutions independently. However, they are not compatible with formal verification and hence are not equipped to provide quantitative guarantees under uncertainty, which rely on checking sophisticated properties (typically encoded as temporal logic formulas) via numerical methods and exhaustive state space exploration techniques. An ad hoc combination of architectural synthesis from Alloy specifications with a template engine to generate probabilistic models [43] enables analysis of quantitative properties of configurations, but does not include its own modeling nor property languages.
In summary, HaiQ is to the best of our knowledge, the only existing approach that supports the combined modeling and exploration of structurally complex design variants with stochastic operational uncertainties that enables streamlined analysis of structural and quantitative/probabilistic guarantees across design spaces (Fig. 1).
3 Overview of the approach
The complementary strengths of relational and quantitative verification approaches can be appreciated in the client–server example of Fig. 2. The top part shows an Alloy specification in which two types or signatures (which represent groups of objects in Alloy) c (clients) and s (servers), extend an abstract signature comp (component), which includes a relation l (i.e., a component is linked to others). A predicate clientserver includes the constraints that describe how instances of c and s relate. (In this case, clients can only be linked to servers, and the relation is symmetric.) This specification implicitly describes the set of structures shown on the right of the figure for a scope of maximum one client and two servers. Although this is a trivial example, real system designs include large sets of components and complex structural constraints that result in vast collections of possible system structures that can be automatically synthesized. Despite their structural flexibility, these approaches are not equipped to analyze probabilistic behaviors.
In contrast, the bottom of the figure shows a PRISM discrete-time Markov chain (DTMC) [14] model with two processes or modules (client c0 and server s0) that synchronize on a shared request action [r]. Boolean variables x and y encode that the request has been correctly issued and received, respectively. The request is always correctly sent by the client (probability 1), whereas in the server, the action has two possible outcomes specified with different probabilities (0.4/0.6). Probabilistic model checkers are equipped with mechanisms to compose in parallel the behavior of such stochastic processes (Fig. 2 bottom right) and reason about probabilistic and other quantitative guarantees captured in temporal logics like PCTL and CSL [14]. Despite this analytical advantage, in this kind of specification, modules denote processes among which synchronization is “hardwired” via shared action names (i.e., relations among processes are fixed and explicitly encoded in models). This specification style results in rigid structures, compared to the ones that we can specify in Alloy, meaning that if a designer wants to explore quantitative/probabilistic properties in different combinations of a collection of processes arranged in different topologies, she has to generate and analyze different alternatives manually, or use ad hoc solutions that do not provide any structural guarantees about the resulting models. These ad hoc approaches are labor-intensive, error-prone, and simply infeasible in cases in which process behaviors and structural system constraints are non-trivial.
To investigate automated joint modeling and analysis of structural design variants, (probabilistic) quantitative guarantees, and their interaction, our approach must: (i) provide simple language constructs, yet expressive enough to capture structure, stochastic behavior, and their dependencies in models, (ii) allow systematic checking of sophisticated properties on such models, and (iii) be general enough to be applied to different domains and probabilistic analyses.
Fig. 4
Client–server HaiQ specification (left), implicit structure collection and corresponding state machines (right)
To address these requirements, we propose an approach that includes: (i) a high-level language (HaiQ) for modeling collections of probabilistic models as relational/probabilistic behavioral specifications and (ii) a language for specifying (and associated mechanisms for checking) quantitative temporal properties (probability- and reward-based) on collections of probabilistic models (manifold probabilistic computation tree logic or M-PCTL).
We embody these languages and analysis mechanisms in a relational probabilistic model checker that combines synthesis of structural design variants from relational constraints with probabilistic model checking of such variants (Fig. 3). Our proposal includes three stages:
(i)
Configuration synthesis, during which topological descriptions of configurations are synthesized from the relational constraints of a HaiQ model,
(ii)
Configuration behavioral model generation, which uses configuration descriptions synthesized in the previous stage as a blueprint to generate a probabilistic behavioral model for every configuration (supported model types are DTMC and MDP),
(iii)
Extended probabilistic model checking, which uses as input properties specified in M-PCTL to provide quantitative results about the set of configuration behavioral models generated. Supported analyses include average probability-based and reward-based guarantees for DTMC and worst-/best-case probability-based and reward-based guarantees for MDP.
The HaiQ Modeling Language. HaiQ specifications juxtapose static (or structural) relational descriptions with blocks of probabilistic behavioral specification that can exploit the information about system structure encoded in relations.
To illustrate the main ideas behind the language, we focus on a simple HaiQ specification that builds on our introductory example (Fig. 4). Here, the possible structures of the system are similar to the ones captured by the Alloy specification in Fig. 2. In contrast to Alloy, HaiQ signatures include blocks of behavior specifications (between “</” and “/>”) that enclose guarded probabilistic commands. This is because every signature in HaiQ is also a process type. Let us emphasize that, in contrast to the languages employed by existing probabilistic model checkers, behavioral descriptions in HaiQ are at a higher level of abstraction and denote process types, not process instances. This means that a HaiQ model implicitly describes a process type hierarchy associated with signatures in which behaviors can be inherited and extended. In the example, this can be observed in the command included in signature comp, which is defined in a generic manner and inherited by signatures c and s. In particular, the synchronization action r in the command is prefixed by the relation l defined in the static part of the specification, indicating that the synchronization matches only processes that correspond to signature instances in the relation (independently of the topology under which they are instantiated), rather than being “hardwired” as we saw in the PRISM example in Fig. 2. The top right of Fig. 4 shows the two possible structure instantiations of the specification, in which process instances s0 and s1 synchronize with c0 on actions generated based on the instantiation of relation l. The corresponding probabilistic state machines that result from the parallel composition of the behaviors of the c-s instances are shown in the right bottom.
The details of the modeling language are described in Sect. 4.
Manifold probabilistic computation tree logic (M-PCTL). Ultimately, a HaiQ model defines a set of probabilistic state machines (DTMC or MDP), where each one of them results from the parallel composition of processes instantiated in accordance with each topology that satisfies the model’s structural constraints. Formulas in existing probabilistic temporal logics like PCTL capture properties of a single probabilistic state machine. Hence, our work extends existing languages to query HaiQ specifications and check properties across entire collections of probabilistic state machines, rather than individual instances. M-PCTL is an extension of PCTL including new operators and quantifiers that enable designers to check properties like “across all configurations, the minimum probability that a service will never time out is greater than 0.95,” and “there exists a robot configuration that requires less than 3.2 whr to achieve its goal.”
In our client–server example, the property \(\mathsf{allP_{\ge 0.6} [ F \; all \; c.x \wedge some \; s.x]}\) can be interpreted as “for all configurations, the probability that all requests will be correctly sent by the client and at least one of them will be correctly received by a server is at least 0.6.” In this case, the property check returns true, because for the two possible configurations (with one and two servers), the probability that \(\mathsf{[F \; all \; c.x \wedge some \; s.x]}\) is satisfied is exactly 0.6 in both cases. Moreover, M-PCTL properties can also return structures that optimize or satisfy some quantitative property. For instance, the property \(\mathsf{SmaxP\; [ F \; all \; c.x \wedge some \; s.x]}\) returns a set including a description of the one-server and the two-server configurations, since both of them have the same probability of satisfying \(\mathsf{[F \; all \; c.x \wedge some \; s.x]}\) and hence maximize the value of the property in this case because there are no other feasible solutions.
The syntax and semantics of M-PCTL are described in Sect. 5.
4 The HaiQ modeling language
In this section, we formalize the modeling language and illustrate its main features by modeling a simple network security scenario introduced by Kwiatkowska et al [44].1 The scenario models the progression of a virus infecting a network formed by a grid of N\(\times \)N nodes. The virus remains at a node that is infected and repeatedly tries to infect any uninfected neighbors by first attacking the neighbor’s firewall and, if successful, trying to infect the node.
In the network, there are “low” and “high” nodes divided by “barrier” nodes that scan the traffic between them. Initially, only one corner “low” node is infected. A graphical representation of the network when N=3 is given in Fig. 5.
4.1 Language features
4.1.1 Signatures as structural and behavioral building blocks
The modeling language is tailored for the incremental construction of structured probabilistic models. The basic unit of specification is the signature, which contains structural and behavioral code fragments. The structural part of the language is a subset of the static part of Alloy (c.f. [45], 2.1), in which every signature field is a set that defines implicitly a binary relation between the instances of signatures in which the field is declared and the elements in the set.2 For example, the field conn declared in the first line of the node definition listing encodes a relation capturing the fact that a node can be connected to one or more neighboring nodes (i.e., conn is implicitly encoding the arcs in the network graph shown in Fig. 5). Constraining fields of signatures to this type of relation suffices to capture sophisticated topologies and limits the complexity of specifications. Note that the declaration states that signature node is abstract, so it can be extended by other signatures:
The second part of the signature encodes its behavior and is surrounded by the process block delimiters “</” and “/>.” Process blocks include a set of local variables that can be bounded integers3 or Booleans, as well as formulas that are used as shorthand for complex expressions to avoid code duplications. In the example, variable s specifies the state of the node (uninfected, breached, infected), and formula detect encodes the probability that the virus is detected by the firewall in the node. Constants node_detect and infect are global constants that encode a default probability value for firewall detection and node infection.
Finally, the process block includes a set of probabilistic guarded commands that describe the behavior of the signature. Each command describes the possible transitions of the process in states that satisfy its guard (specified before “->
”), and a set of updates that specify each one of the possible outcomes as sets of new values for (primed or post-) state variables (e.g., s’=infected). Each update is associated with a probability, and unspecified state variables in an update indicate no change in value. In the first command of the example, the guard specifies that when the node is not infected, it can be attacked with two possible outcomes. With probability detect, the firewall will remain unbreached, and with probability 1-detect, the firewall will be successfully breached by the attack.
Note that the prefixing of the action by conn indicates that the behavior specified by this command is replicated for each of the node instances in the relation. Hence, for an example system structure with a node instance A in which A.conn={B,C}, each command in the process block labeled as [conn:attack] is denoting two alternative actions [A.B.attack] and [A.C.attack]4 that can synchronize with actions of the same name in nodes B and C (Fig. 6).
Different signature processes in a specification synchronize on shared action names. The third command of the node behavior definition contains an action labeled attack prefixed by conn, just like in the first command that we saw earlier. In this case, the command models the transition in which the infected node attacks a neighboring node with probability 1 (unspecified probability in a command defaults to 1). Consider an example of two neighbor node instances A and B, in which A.conn={B} and B.conn={A} (Fig. 7). In the scheme, nodes synchronize on the same action name [A.B.attack] both when the node attacks and receives an attack from a neighboring node. However, the commands synchronize pairwise because the guards in commands on the same module are mutually exclusive. Hence, whenever node A attacks, node B receives the attack, and vice versa.
4.1.3 Subtyping and extension
Signatures can be extended to incorporate new structural and behavioral elements. Similarly to Alloy, a signature that extends another one is considered its subtype. However, in this case subtyping also implies inheritance not only of the structural elements of the extended signature, but also of its behavioral definition. The following listing encodes the subtypes of node for the network scenario:
First, signature barrierNode extends signature node to create a special type of node in which the attack detection probability is different to the rest of the nodes (encoded in constant barrier_detect). This is achieved by overriding the original definition of the formula detect in the abstract signature node.
In contrast, highNode and lowNode are subtypes of node that preserve all its structural and behavioral elements intact because we are only interested in formally distinguishing them as special locations, but do not require to incorporate any special behaviors.
Finally, we incorporate infectedNode as a subtype of lowNode to include an infected node in the scenario. This type is constrained to a singleton (keyword one) and overrides the declaration of the node state variable, which is now initialized as infected.
4.1.4 Topology
The possible topologies of design alternatives that can be generated by a HaiQ model are defined implicitly by a set of structural constraints expressed in relational logic sentences.
In the example, the first constraint imposes that all nodes must be reachable in the network from any other node (operator * represents the reflexive transitive closure, so for a given node n, it contains the elements n.conn, n.conn.conn, etc.). The second constraint specifies that no node must be connected to itself. The third one forces connections to be symmetric, whereas the fourth constraint imposes that low nodes and high nodes must never be directly connected. Figure 7 shows two of the many legal configurations that can be synthesized from the virus infection scenario model (apart from the one shown in Fig. 5) for a scope of N=3 by running:
.
4.1.5 Rewards
HaiQ can be used to reason, not just about the probability that a collection of possible behaviors specified by a model behaves in a certain fashion, but about a wider range of quantitative measures relating to behavior. This includes properties such as “expected time” or “expected power consumption.” This is achieved by augmenting signatures with rewards, which are real values associated with certain model states or transitions. We include in our example a reward that enables us to quantify the number of attack attempts carried out by the different nodes:
The reward is associated with transitions in which the action attack is executed from any state. Every time the action is executed from a state in which the guard before the colon is satisfied (in this case, the guard is true and is therefore always satisfied), a reward of one unit will be accrued for attacks. The prefixing of the action by relation conn results in reward cumulation for any of the instances of the action executed among any arbitrary pair of nodes.
A HaiQ specification is formally characterized as a tuple \((\Sigma , \mathcal {C}, \delta )\), where \(\Sigma \) is a set of signatures that implicitly define a hierarchy of types (including their behavior), \(\mathcal {C}\) is a set of structural constraints expressed in first-order predicate logic that determines the allowed topologies when instantiating the hierarchy defined by \(\Sigma \), and the total function \(\delta :\Sigma \rightarrow \mathbb {N}\) is a scope that determines the maximum allowable number of instances of every signature type in \(\Sigma \).
Definition 1
(Signature) A signature \(\sigma \) is a tuple \((\sigma ^{\uparrow },\mathcal {R},\beta )\):
\(\sigma ^{\uparrow } \in \Sigma \cup \{ \bot \}\) references the supertype signature that the current signature extends (if any).
\(\mathcal {R}\) is a set of tuples \((id, \sigma ')\) typed by \( sVarIds \times \Sigma \), where sVarIds is the set of identifiers that form a static variable namespace. Each tuple in the set implicitly declares a relation between every instance of \(\sigma \) and a set of the instances of \(\sigma '\) (possibly constrained by \(\mathcal {C}\)).
\(\beta \) is a behavior specification.
Definition 2
(Behavior specification) A behavior specification is a tuple \((\mathcal {V}, \mathcal {K}, \mathcal {L})\), where \(\mathcal {V}\) is a set of local state variables, \(\mathcal {K}\) is a set of commands that describe the behavior of the signature, and \(\mathcal {L}\) is a set of reward functions for states and transitions.
We define the extension function for a signature set element \(X \in \{\mathcal {R}, \beta .\mathcal {V}, \beta .\mathcal {K}, \beta .\mathcal {L}\}\) as \(ext(\sigma , X) \triangleq ext(\sigma .\sigma ^{\uparrow }.X) \uplus \sigma .X \) if \(\sigma .\sigma ^{\uparrow }\ne \bot \), and \(ext(\sigma , X)\triangleq \sigma .X\) otherwise. We obtain a “flat” set of expanded signature types (denoted \(\Sigma _f\)) by applying the function \(flat(\sigma )\triangleq (\bot , ext(\sigma ,\mathcal {R}), (ext(\sigma ,\beta .\mathcal {V}),ext(\sigma ,\beta .\mathcal {K}),ext(\sigma ,\beta .\mathcal {L})))\) to all signatures in \(\Sigma \). We use the shorthand \(\sigma _f\) for \(flat(\sigma )\).
Each command in \(\mathcal {K}\) is a tuple \((\alpha , r, g, \mathcal {U})\), where:
\(\alpha \in actIds \cup \{\bot \}\) is a label drawn from an action namespace. If \(\alpha =\bot \), the command encodes an internal action.
\(r \in \sigma _f.\mathcal {R} \cup \{\bot \}\) is a relation that constrains synchronization on action \(\alpha \) to instances in r. If \(r=\bot \), the scope of the command labeled by \(\alpha \) is global and the action can synchronize with any other process that includes a command with the same label.
g is a guard built over the variables in \(\sigma _f.\mathcal {V}\).
\(\mathcal {U}\) is a set of probabilistic updates. Each update is a couple \((\lambda , u)\), where \(\lambda \in [0,1]\) is a probability and u is a set of assignments of fresh values to local state variables \((v,\nu )\) (\(v \in \sigma _f.\mathcal {V}\), and \(\nu \) is a value typed by v’s data type).
All commands \([r:\alpha ]\; g \; \rightarrow \lambda _1: u_1 + \ldots + \lambda _n: u_n\) must satisfy the invariants \(\sum _{i:\{1\dots n\}} \lambda _i=1\), and \(\alpha =\bot \Rightarrow r=\bot \).
4.2.1 Configurations
The specification \((\Sigma , \mathcal {C}, \delta )\) implicitly defines a set of configurations or structures. Each configuration is composed by a set of instances of signatures, connected in a topology that satisfies \(\mathcal {C}\).
The instance of a signature \(\sigma \) is just a labeled process with fresh variables that corresponds to the behavior definition of \(\sigma _f\).
Definition 3
(Instance) An instance of a signature \(\sigma \) is a couple \((l, \beta )\), where \(l \in procIds\) is a label drawn from a process namespace, and \(\beta =\sigma _f.\beta _\nu \) is a process specification with fresh variables. We denote the type of an instance n of signature \(\sigma \) by \(type(n)\triangleq \sigma _f\).
The set of nodes in a configuration take values in the possible \(\Sigma \)-signature instances, denoted by \(\mathcal {I}_\Sigma \).
Definition 4
(Configuration) A configuration is a graph \(\mathcal {G} = (\mathcal {N}, \mathcal {E})\) satisfying the constraints imposed by \(\mathcal {C}\) and \(\delta \), where \(\mathcal {N} \subseteq \mathcal {I}_\Sigma \) is a set of nodes, and \(\mathcal {E}\) is a set of labeled arcs typed by \(sVarIds \times \mathcal {I}_\Sigma \times \mathcal {I}_\Sigma \) that capture relations between instances.
4.2.2 From configurations to stochastic behavior models
A HaiQ specification implicitly describes a collection of probabilistic models \(\mathcal {M}\). Each model in \(\mathcal {M}\) is obtained as the parallel composition of the signature behaviors in one possible instantiation of \(\Sigma \) that satisfies the constraints imposed by \(\mathcal {C}\) and \(\delta \).
Algorithm 1 describes the generation of a probabilistic state machine from a configuration c. The algorithm starts by extending every process n in the configuration to produce the wiring for communication with the rest of the processes. The new set of commands for each process \(\mathcal {K}_\nu \) is initialized in line 3 with the commands that do not contain any references to relations (i.e., both commands for internal actions and global events). Then, the algorithm substitutes every command k that refers to a relation (\(k.r \ne \bot \)) by a set of commands \(\mathcal {K}^*\), where every command contains the resolved reference to every element of the relation (lines 4–8). Resolution of references is done via function uid in line 7, which generates a unique label id for a pair of labels. The operation is commutative, so that synchronization on the generated id is possible on both ends of the communication (n and \(n'\)). Finally, the algorithm returns the probabilistic state machine that corresponds to the parallel composition of the extended processes for the instances in the configuration.5
5 Manifold probabilistic computation tree logic (M-PCTL)
Probabilistic computation tree logic (PCTL) [47] is used to quantify properties related to probabilities and rewards in single system specifications described as a probabilistic state machine (e.g., DTMC, MDP, probabilistic timed automata or PTA). This section introduces manifold PCTL, which in contrast targets quantification across collections of design alternatives that correspond, in this case, to the state machines generated from the set of structures that satisfy the constraints of a HaiQ specification (c.f. Sect. 4.2).
This section first overviews a version of PCTL extended with a reward quantifier targeted at checking properties over DTMC and MDP extended with reward structures [48] and then builds on it to introduce M-PCTL.
5.1 PCTL
In the syntax definition below, \(\Phi \) and \(\phi \) are, respectively, formulas interpreted over states and paths of a probabilistic state machine M extended with rewards, i.e., (M, \(\rho \)). Properties in PCTL are specified exclusively as state formulas (\(\Phi \)). Path formulas (\(\phi \)) have an auxiliary role on probability and reward quantifiers \({\textsf { P}}\)/\({\textsf { R}}\):
In this definition, a is an atomic proposition, \(\sim \in \{<, \le , \ge , > \}, pb \in [0,1], rb \in \mathbb {R}^{+}_{0}\), and \(r \in \rho \).
Intuitively, \({\textsf { P}}_{\sim pb}[\phi ]\) is satisfied in a state s of M if the probability of choosing a path starting in s that satisfies \(\phi \) (denoted as \(Pr_s(\phi )\)6) is within the range determined by \(\sim \!pb\), where pb is a probability bound. Quantification of properties based on \({\textsf {R}}^r_{\sim rb}\) works analogously, but considering rewards, instead of probabilities. The intuitive meaning of path operators \({\textsf {X}}\) and \({\textsf { U}}\) is analogous to the ones in other standard temporal logics. Additional Boolean and temporal operators are derived in the standard way (e.g., \({\textsf {F}} \Phi \equiv {\textsf { true}} \; {\textsf { U}}\; \Phi \)).
5.2 M-PCTL
The main idea behind M-PCTL is to extend checking of probability- and reward-based properties to collections of models. Hence, quantification occurs over a pair \((\mathcal {M}, \rho )\), where \(\mathcal {M}\) is a set of models and \(\rho \) is a set of reward functions.
M-PCTL includes three types of formula. Similarly to PCTL, it includes path (\(\phi \)) formulas (which are the same as in PCTL) and state (\(\Phi \)) formulas, but also an additional type of set formula (\(\Psi \)) that returns the collection of models that satisfy a particular quantitative constraint. The syntax of M-PCTL is:
Concerning state formula quantifiers, \({\textsf { allP}}\) and \({\textsf { someP}}\) determine whether the evaluation of \(Pr(\phi )\) on all or some model in \(\mathcal {M}\) satisfies \(\sim pb\), whereas \({\textsf { maxP}}\) determines whether the maximum probability evaluated across elements of \(\mathcal {M}\) satisfies \(\sim pb\). We define their semantics as:
where \(Pr_{M}(\phi )\) denotes the evaluation of the probability \(Pr(\phi )\) on model M. The analogous reward-based quantifiers \({\textsf { someR}}^r\), \({\textsf { allR}}^r\), \({\textsf { maxR}}^r\), and \({\textsf { minR}}^r\), are defined over the expected reward measure of PCTL, instead of the probabilistic one Pr (c.f. [14]). The use of maxP/minP and maxR/minR quantifiers without a bound implies the quantification of the actual maximum/minimum probability or reward for the path formula \(\phi \), e.g.,
In set formulas, U denotes the universe of models in \(\mathcal {M}\), whereas \(\Psi ^C\) is the standard complement operator of set algebra. The semantics of the main quantifiers in set formulas is:
Quantifier \({\textsf { SsomeP}}\) returns a singleton with an element drawn non-deterministically from \({\textsf { SallP}}_{\sim pb} [\phi ]\) if the set is not empty, and \(\emptyset \) otherwise. Set subtraction is derived as \(\Psi _1 \backslash \Psi _2 \equiv \Psi _1 \cap \Psi _2^C\).
In the virus scenario example, we can write for instance a property to check which network structures have a probability below 0.4 of having all high nodes infected after 50 time units:
The checking of formulas that employ probability and reward quantifiers in M-PCTL can also be constrained to subsets of \(\mathcal {M}\) by employing a scope operator \(\langle \Psi \rangle \) as a prefix for formulas, e.g.,
Hence, the use of a quantifier without the scope operator in a state formula \(\Phi \) is equivalent to \(\langle U \rangle \; \Phi \).
The use of the scope operator enables composition of formulas to check complex properties on the space of design alternatives, e.g., \( \langle \textsf {resilient} \rangle \; \textsf {maxR}^{\textsf {attacks}} [\textsf {F all highNode.s=infected}]\).
The property above determines the expected maximum number of attacks required by the virus to infect all high nodes across all possible network structures that satisfy the resilient property.
Best-/Worst-case scenario. The quantifiers P/R described above are based on the average probabilistic and reward measures of PCTL [14]. However, PCTL can also be used to analyze maximum/minimum probabilities/rewards over probabilistic formalisms that feature non-determinism like MDP or PTA using the quantifiers \(\mathsf{P_{max}}\)/\(\mathsf{P_{min}}\) and \(\mathsf{R^r_{max}}\)/\(\mathsf{R^r_{min}}\) [14] . We define analogous versions of all probability and reward-based quantifiers for M-PCTL to enable best- and worst-case scenario analyses. For instance, property \(\mathsf{SminP_{max} [F^{\le t}\; all\; highNode.s=infected]}\) can be considered as the best configuration in terms of worst-case scenario in the network virus infection example, i.e., the configuration that minimizes the maximum probability of high node infection across all feasible configurations.
6 Evaluation
In this section, we evaluate our approach in case studies from different domains to determine its applicability. We embodied our approach in the HaiQ analyzer7, a prototype tool that implements the generation of design collections from HaiQ specifications as described in Sect. 4.2, as well as checking of M-PCTL properties on them. The tool is implemented in Java, and its back-end employs Alloy’s and PRISM’s APIs for synthesizing configurations and model checking properties on their behavior models, respectively.
6.1 Case studies
We describe the application of our approach to a service-based system, a distributed self-protecting system, and a mobile robotics scenario. The scenarios were chosen because they instantiate different forms of structural variability (architecture, network, physical space) and sources of stochastic operational uncertainties. We finish this section with a discussion of our results.
Figure 8 shows both worst-case and average-case scenario probabilistic guarantees (resulting from MDP-based and DTMC-based analysis, respectively) of the best and worst legal network configurations in our running example (in black and red, respectively). Dashed lines represent average case, and solid lines represent worst-case scenarios.
The plot on the top shows the probability over time that all high nodes in the network will become infected. The average-case DTMC analysis is much more optimistic than the actual worst case, which gives a much more realistic approximation of the minimum infection probability that the system can guarantee. Note that probabilistic estimates in average-case analysis can be approximated with some degree of precision using statistical model checking and Monte Carlo trace-based simulation methods, whereas worst-case analysis requires a technique like probabilistic model checking that performs exhaustive state space exploration.
The plot on the bottom shows a different property in which the probability analyzed is that of having some high node infected, instead of all of them. If we focus for instance on the best configurations in terms of worst-case scenario guarantees, the property \(\mathsf{minP_{max} [F^{\le t} \; some \; highNode.s=infected]}\) (solid black) minimizes across all legal configurations, the maximum probability within the configuration of having some high node infected after \(\textsf{t}\) seconds (we assume a time discretization parameter of one second).
6.1.2 Tele-assistance system (TAS)
The goal of the TAS exemplar system [49] is tracking a patient’s vital parameters to adapt drug type or dose when needed, and taking actions in case of emergency. TAS combines three service types in a workflow (Fig. 9).
When TAS receives a request that includes the vital parameters of a patient, its Medical Service analyzes the data and replies with instructions to: (i) change the patient’s drug type, (ii) change the drug dose, or (iii) trigger an alarm for first responders in case of emergency. When changing the drug type or dose, TAS notifies a local pharmacy using a Drug Service, whereas first responders are notified via an Alarm Service.
The following excerpt shows the TAS workflow modeled as a signature in which static fields correspond to service bindings, and its behavior specification defines a set of local variables that keep track of the workflow status:
Calls to service operations are prefixed by the service binding relation (e.g., analyzeData is prefixed by MSBinding), so that the actual binding between the workflow and the services will be automatically created by the tool when configurations are generated.
The functionality of each service type in TAS is provided by third parties with different levels of performance, reliability, and cost (Table 1). The metrics employed for the quality attributes are the percentage of service failures for reliability, and service response time for performance. Service providers can be created as abstract signatures that encode these attributes as formulas, and include a constraint to include a binding on the service side to the workflow:
Service providers are subtyped by the types of service involved in the composition. Service invocation includes two probabilistic outcomes that model the possibility of service failure:
Every concrete service extends a service type encoded with a set of attribute values. The use of the quantifier lone indicates that the use of every instance is optional, giving flexibility to use alternative services of the same type in the composition:
Finding an adequate design for the system entails understanding the trade-off space by finding the set of system configurations that satisfy: (i) structural constraints, e.g., the Drug Service must not be connected to an Alarm Service, (ii) behavioral correctness properties (e.g., the system is eventually going to provide a response—either by dispatching an ambulance or notifying a change to the pharmacy), and (iii) quality requirements, which can be formulated as a combination of quantitative constraints and optimizations, e.g., (R1) The average failure rate must not exceed fr %, (R2) The average response time must not exceed rt ms, and (R3) Subject to R1 and R2, the cost should be minimized.
We can automatically search the design space to find the best legal configurations with respect to these requirements by checking the composite M-PCTL property constrained_cost:
The formulas labeled as reliable and performant obtain the set of configurations that satisfy the reliability and performance requirements R1 and R2, respectively. Then, we can quantify the minimum cost entailed by these joint requirements by scoping the quantification of the third property mincost to the subset of designs that satisfy the first two properties. For obtaining the configuration(s) that minimize cost for the specified performance and reliability levels, we substitute the quantifier in mincost by SminR.
Figure 10 shows analysis results. The plot on the left shows the minimized cost of configurations for different levels of constraints on response time and reliability. It was computed by checking property \(\mathsf{constrained\_mincost}\) in the region of the state space in which \(\texttt { fr} \in [0.98,1]\) and \(\texttt { rt} \in [15,35]\). As expected, higher response times and lower reliability correspond to lower costs, whereas peaks in cost are reached with lowest failure rates and response times.
The plot on the right is a map that shows which configurations best satisfy design criteria. Out of the set of 90 configurations that can be generated for TAS, only 24 satisfy the criteria in some subregion of the state space. If we consider that designers are interested, e.g., in systems with response times \(\le \)26ms, and with a reliability of \(\ge \)99%, we can determine which are the configurations that best satisfy constraints by checking \(\mathsf{confs\_mincost}\) with \(\texttt {rt}=26\) and \(\texttt { fr}=0.01\) (highlighted in red in the figure).
Designers can take these results and make informed design decisions based, for instance, on the available budget for the project and legal constraints on the level of reliability and timeliness demanded of systems for first-aid response.
6.1.3 Distributed self-protecting system
Some large-scale systems are composed of federated entities that use self-adaptation to improve their behavior with respect to defined quality standards. For example, Netflix’s software infrastructure includes deployments in multiple regions controlled by a local manager (Scryer) to provision the resources required for maintaining resilience against changing customer traffic [50, 51].
We apply our approach on a model of a collective adaptive system (CAS) with similar cooperative systems that defend against an external attack (e.g., DoS). We analyze the resilience that results from the selection of different communication topologies to disseminate security information for preemptive adaptation, quantified as the probability that all members of the CAS survive the attack.
In the scenario [52], an external attacker uses a defined amount of available resources to attempt to breach members of the CAS (e.g., by placing a high number of requests). Each CAS member has the ability to detect the attack and defend itself against it by employing a fixed set of defense resources, and has the ability to notify other members of the CAS of the attack. Once a CAS member is notified, it will adapt and become invulnerable to the attempted breach.
The excerpt above shows the basic encoding of a local self-adaptive manager (signature sam). Environmental and system conditions, such as the reliability of communication channels and the sensitivity of detection mechanisms, can play an important role in the emergent behavior of the CAS and are explicitly captured by parameters that affect the outcome of certain actions (e.g., constant CHANNEL_RELIABILITY limits the ability of a local manager to be notified of an attack by other managers, whereas formula detect encodes the probability that a sam will detect the attack).
We can create models with different communication topologies by encoding a basic set of constraints that impose a connected network without self-loops and with symmetric relations:
And then add extra constraints like the ones shown on the right of Fig. 11 for each one of the topologies to encode.
We can reason about the resilience of the CAS by encoding a property that determines the topologies with the highest chance of attack survival: \(\mathsf{resilience \equiv maxP [F \; all\; sam.status=adapted]}\).
The left-hand side of Fig. 11 shows the resilience of the different topologies for values of channel reliability in the set {0.2, 0.4, 0.6, 0.8, 1}. As expected, all topologies present increased chances of survival with higher channel reliability, saving the star topology, for which resilience is always zero for the amount of sam and attacker resources employed for the experiments. (The central node is a weak link that hampers internode communication if compromised.) To obtain the optimal resilience (and its corresponding topology) for our scenario, we check the M-PCTL resilience property, and an analogous property that employs the SmaxP quantifier on a model that does not include any extra constraints.
6.1.4 Mobile robotics
Physical interactions in software-intensive systems are also a source of uncertainty that impacts system goals. We illustrate this in a scenario in which a drone navigates autonomously in a building from a starting location A to target location E with limited battery, and without bumping into obstacles (Fig. 12). The arcs in the map graph designate drone trajectories of distance d, where pc captures the probability of drone collision when completing a trajectory. Locations in the map among which the drone can move are captured by signature loc:
In the static part of the signature, field conn encodes a relation capturing connections among neighboring locations. In the behavior block, variable droneIn specifies whether the drone is positioned in the location, and formula pc encodes the probability of drone collision with an obstacle when moving toward the location.
Locations synchronize on action moveTo both when the drone leaves and arrives in the location. When the drone leaves location A (A.droneIn’=false), it arrives in location B, and vice versa. Note that the guard to receive the robot refers to the value of the counterpart process variable (conn.droneIn), which must be true (i.e., “contain the drone”), to trigger the action. Reward energyRew is accrued when the drone arrives to a location and is computed based on the distance traveled by the drone (d), SPEED constant, and a BDRATE constant that encodes the battery discharge rate.
To encode location subtypes, we first extend loc in signature startLoc to create a starting location in which the original initialization value of variable droneIn is overriden and set to true:
In contrast, targetLoc is a subtype of loc that preserves all its structural and behavioral elements intact. Both start and target locations are defined as abstract signatures, so we incorporate subtypes of them constrained as singletons to instantiate map locations.
We can consider mission design variability in terms of path alternatives for the drone by writing a set of general relational constraints that guarantee non-cyclic paths in which a target location is reachable from the starting location:
where feasible paths are those in which energy does not exceed the drone battery capacity e, and maxsafety identifies paths that maximize the probability of reaching a target location.
Figure 13 shows results for safety without battery constraint on a 30-location map in which highlighted locations in red have probability of collision PCOL and 1-PCOL, respectively. As expected, higher values of PCOL decrease chances of drone survival, until it hits a minimum of \(\simeq \)0.14 in PCOL=0.5. From that point, the best configuration shifts from path A to path B (highlighted in red).
Basing on our results, in this section we discuss the research questions posed in the introduction and threats to validity.
6.2.1 (RQ1) Feasibility
We have shown that raising the level of abstraction at which probabilistic models are described and queried can effectively enable joint exploration of design space variability and stochastic operational uncertainties by reasoning about structural and quantitative guarantees of design spaces. In terms of modeling, this is achieved by incorporating language constructs that allow structural relations to be referenced from elements of (probabilistic) behavioral specifications. In terms of analysis, incorporating novel quantifiers to check properties on collections of models in probabilistic temporal logic enables streamlined specification of sophisticated properties to study guarantees across the solution space, including worst- and best-case analysis that considers both variability in design and operational uncertainties that are stochastic in nature.
Our results show that our approach is applicable to systems in different domains, in which design variability, as well as operational uncertainties are introduced by disparate sources. Our approach is particularly suited to problems in which structure/topology is relevant and multiple instances of similar, but different components that exhibit probabilistic behavior (i.e., with different parameters or slight variations in behavior) interact in complex ways. Moreover, we have shown that the approach can be applied to different analyses (average- and best-/worst-case probabilities and rewards) and probabilistic formalisms (MDP and DTMC). This is effectively enabled by the fact HaiQ operates one level higher in the abstraction ladder, with respect to existing description languages in probabilistic model checking, and that the target language (PRISM) in which configuration behavioral models are generated (Step 2 in Fig. 3, implementing Algorithm 1) supports alternative underlying semantics for MDP and DTMC (c.f. [46]). Similarly, M-PCTL extends PCTL, which can be naturally used to check properties both in DTMC and MDP.
With respect to alternative approaches to analyze probabilistic behavior, HaiQ comes with trade-offs in terms of effort, reusability, computational cost, and analytical capabilities:
Effort. In HaiQ, structure and probabilistic behavior are expressed in a compact manner, and their combined analysis streamlined. This contrasts with ad hoc solutions that demand developing specific infrastructure and are error-prone. The analysis of a CAS scenario similar to the one in Section 6.1.3 [52] required combining the PRISM preprocessor [53] with scripts that demanded topologies to be encoded as matrixes in separate text files, leading to multiple trial and error rounds (due to errors in matrix encodings, script tuning). For TAS, the problem has also been solved employing a custom template engine and a python script that generates probabilistic models based on analysis of Alloy specifications [43]. We also built a planner that solves the class of problem described in the robotics scenario (Sect. 6.1.4) using an algorithm that generates paths and encodes them into PRISM models to analyze solutions in a probabilistic space [54]. All these solutions required weeks of work, contrasting with a single-day specification effort (at most) required to solve the problem withHaiQ in all cases. Other approaches that do not include structural synthesis (e.g., direct specification in a model checker) require encoding explicitly all elements of every design alternative (e.g., topology), making it more error-prone and orders of magnitude more demanding than HaiQ in terms of modeling effort for non-trivial problems.
Reusability. Compared to the ad hoc solutions developed for the case studies presented, which required specific infrastructure, HaiQprovides an infrastructure that can be reused across a range of different domains. Furthermore, specifications inHaiQ are also more reusable than those of existing probabilistic model checkers, in which behavior types and communication topology are intertwined with the specific instances of processes. In contrast, behavioral type hierarchies can be reused as “libraries” across the same problem class in HaiQ, since the specifics of instances are defined separately. This can be observed, e.g., in the specification of service types in TAS, or map and path constraints in the robotics scenario, which are reusable in other service-based and mobile robotics systems, respectively. To the best of our knowledge, this level of reusability existed in relational modeling approaches, but not in quantitative verification.
Computational cost and analytical capabilities. Prototype analysis performance behaves differently, depending on the problem type. In TAS, checking the compositional property constrained_mincost entailed exploring 90 configurations for an overall time of 6s (\(\simeq \)2% was used for configuration synthesis).8 Checking the resilience property for a fixed topology in CAS took \(\le \)40s in the worst case. However, checking for the optimal configuration took exploring 254 configurations (\(\simeq \)1hr, \(\simeq \)0.01% used for synthesis). These numbers are explained by the low complexity of synthesis (few structural constraints to consider) relative to the possibly large number of configurations whose behavior has to be checked individually. In contrast, the mobile robotics scenario showed that increased number of map locations shifted the configuration synthesis model checking ratio to \(\simeq \)50-50% with 80 locations (overall time 5s). This is explained by the relatively low number of feasible configurations, compared to the cost required to model check their behavior. Although HaiQ solution times are acceptable, the PRISM-based robotics planner outperforms HaiQ for maps with 50+ locations, indicating that for this problem type our approach is valid for prototyping and design-time analysis, but runtime infrastructure requires investing in optimized solutions. In any case, HaiQ itself introduces a negligible overhead in analysis, and is as efficient as the underlying verification technology that it employs (back-end engines capable of checking PCTL on PRISM models like STORM [28] work out of the box). Anyway, it is worth noting that the most computationally demanding use cases of the approach correspond to analytical capabilities that did not exist in other approaches, like obtaining the configurations that best satisfy a set of quantitative guarantees for a given set of assumptions about stochastic uncertainties in the environment.
The approach is inspired by a specific style of relational description (Alloy) and behavioral formalisms (DTMC and MDP). However, the constructs employed to formalize structures are fairly standard and synthesis of configurations is adaptable to other languages/models (e.g., OCL). Concerning behavior descriptions, the fact that the approach was successfully instantiated for different probabilistic formalisms and analyses hints at feasibility of adapting the approach to other formalisms such as continuous-time Markov chains (CTMC) for finer-grained time analysis. Focusing on internal validity, the degree of formal assurance on configurations provided by the approach is computationally expensive, and entails risks on the cost both of configuration synthesis and behavior analysis (derived from exploring potentially large state spaces of individual configuration behavior). These risks could be mitigated by exploiting the hierarchical relations that are naturally present in software designs, in which components interact in a structured way [55]. Hence, synthesis of different subsystems with local constraints could be done independently and then composed, reducing the cost of configuration synthesis. This mitigation would also allow parallelism in the analysis to be exploited, in which the behavior of configurations of subsystems can be independently analyzed in parallel [56, 57]. Although these extensions remain out of the scope of this paper, we plan to investigate exploitation of hierarchical relations and parallelism to shift the practical applicability of the approach from the highly abstracted versions of software-intensive systems presented in this paper, to real-world scenarios. Another risk derived from structural synthesis is that Alloy can generate additional isomorphic configurations that can add unnecessary computation time in some situations, although this does not affect the soundness of the results.
7 Conclusions and future work
This paper introduces what is, to the best of our knowledge, the first approach that enables the joint modeling and exploration of design variability and stochastic operational uncertainties in a way that is able to provide quantitative and structural guarantees across the solution space.
This is enabled by combining the advantages of relational modeling, structural synthesis, and quantitative verification. Our experience applying it in different domains shows that: (RQ1) raising the level of abstraction by (i) incorporating modeling constructs that allow structural relations to be referenced from elements of (probabilistic) behavioral specifications and (ii) incorporating novel quantifiers to check properties across collections of models in probabilistic temporal logics, enables automated joint reasoning about structural and (probabilistic) quantitative guarantees across spaces of alternative system designs, (RQ2) our approach is general enough to be applied to different probabilistic formalisms (DTMC and MDP), types of analyses (average and best/worst probability- and reward-based analysis), and domains where uncertainty is introduced by disparate sources, and (RQ3) the approach brings new analytical capabilities to the validation of designs of software-intensive systems that operate under uncertainty, compared to existing quantitative verification approaches (e.g., automated identification of structural variants that optimize probability/reward-based guarantees). With respect to (RQ3), our experience and further works that have employed HaiQ since its inception (e.g., [58]) also point at reduction of specification effort and improved reusability with respect to existing probabilistic model checking techniques (c.f. Sect. 6.2.3).
In future work, we plan to support to expand the range of formalisms that allow reasoning about continuous-time (e.g., PTA) and game-theoretical quantitative verification (e.g., stochastic, multiplayer games or SMG). Moreover, we plan to extend our approach to support further sources and types of uncertainty, hierarchical specification [55] and compositional verification [56] techniques (c.f., Sect. 6.2.4) to improve modularity and scalability.
Acknowledgements
Work partially funded by the Spanish Government (FEDER, Ministerio de Ciencia e Innovación–Agencia Estatal de Investigación) under projects TED2021-130523B-I00 and PID2021-125527NB-I00.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Javier Cámara
is an Associate Professor of Computer Science at the University of Málaga, Spain, and Honorary Visiting Fellow at the Department of Computer Science, University of York, UK. His current research interests include self-adaptive and autonomous systems, software architecture, formal methods, as well as cyber-physical and AI systems. He received his European PhD with honors from the University of Málaga, Spain. For more information, contact him at jcamara@uma.es or visit https://javier-camara.github.io/.
Figure 14 shows the grammar of HaiQ. The definitions in black correspond to the static part of the language, while definitions in blue correspond to the behavioral aspects of the language.
A.0.1 Static/structural
The static part of HaiQ excludes the following from its syntax, with respect to Alloy:
1.
Functions and assertions.
2.
Imports.
3.
Non-unary relations (i.e., only sets are allowed).
Every model starts by a declaration of its type (lines 1–2). If model type is not declared, the default is dtmc. Then, the model can include arbitrary sets of signature, facts, predicates, and command declarations (line 3). For fact and predicate declarations, the syntax is the standard one used in Alloy, with the exceptions noted above, which eliminate the arrow operator, as well as the domain and range restriction operators (limitation 3). Command declarations (lines 25–27) are standard.
With respect to signature declarations, inheritance in Alloy can be multiple, and this is not allowed in HaiQ, so a signature can be the subtype (and therefore extend) only one signature (line 4).
A.0.2 Behavioral
Every signature declaration includes a behavioral block that defines a probabilistic process type. These probabilistic process types form the behavioral part of the language (highlighted in blue). The behavioral block of a signature (line 5) includes enumerated types, variable, commands, formulas, and reward declarations.
The main body of the behavioral block consists of sets of guarded probabilistic commands (c.f., line 9) that encode process types (either DTMC or MDP) that follow the pattern:
Here, event (c.f., \(\langle \)event\(\rangle \), line 10) can be a simple label that is “hardwired” to synchronize with any other event of the same name in a different process (e.g., “send”), or a label prefixed by a relation (e.g., “conn:send”), indicating that the event can synchronize with any other event named “send” in related processes via “conn.” If no event is specified, the command encodes an internal action (“[ ]”).
After the specification of the event, guard is a predicate (line 12) over the behavioral model variables (lines 7–8), which determines whether the precondition to execute the command is satisfied.
The command also encodes a set of alternative postconditions for the execution of the command, defined as probabilistic updates (lines 14–15). Each probabilistic update consists of a probability (\(p_i\)) which can be encoded as a simple rational number or as a complex expression (c.f., line 13), and the variable update itself, \(update_i\), which is specified as a set of primed variable updates (see \(\langle \)behUpdateExpr\(\rangle \) in line 15), e.g., “(x’=1) & (y’=true).” Unspecified probability for an update defaults to probability 1.
In the MDP model type, multiple commands with overlapping guards introduce local non-determinism, whereas in DTMC the overlapping transitions result in a probabilistic choice in which probabilities of the different branches are normalized (similarly to the standard semantics described for reactive modules-style languages like PRISM).
The syntax of these labels is used only for illustration purposes. Internally, the HaiQ compiler generates a unique identifier using a commutative function (the identifier for inputs (A,B) is the same as for (B,A)), resulting in a label that can be employed on both endpoints of process synchronization (c.f. Sect. 4.2.2).
The source code for the HaiQ analyzer, example models found in this paper, and OSX binaries for a ready-to-use version of the tool are available at https://github.com/javier-camara/HaiQ
Experiments were run on OSX10.1.5, Java 1.8.0_111, 2.8GHz Core i7, 16GB RAM.
1.
Weyns, D., Calinescu, R., Mirandola, R., Tei, K., Acosta, M., Bennaceur, A., Boltz, N., Bures, T., Cámara, J., Diaconescu, A., Engels, G., Gerasimou, S., Gerostathopoulos, I., Yaman, S.G., Grassi, V., Hahner, S., Letier, E., Litoiu, M., Marsso, L., Musil, A., Musil, J., Rodrigues, G.N., Perez-Palacin, D., Quin, F., Scandurra, P., Vallecillo, A., Zisman, A.: Towards a research agenda for understanding and managing uncertainty in self-adaptive systems. ACM SIGSOFT Softw. Eng. Notes 48(4), 20–36 (2023)CrossRef
2.
Cámara, J., Troya, J., Vallecillo, A., Bencomo, N., Calinescu, R., Cheng, B.H.C., Garlan, D., Schmerl, B.R.: The uncertainty interaction problem in self-adaptive systems. Softw. Syst. Model. 21(4), 1277–1294 (2022)CrossRef
3.
Cámara, J., Calinescu, R., Cheng, B.H.C., Garlan, D., Schmerl, B.R., Troya, J., Vallecillo, A.: Addressing the uncertainty interaction problem in software-intensive systems: challenges and desiderata. In: Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems, MODELS (ACM), pp. 24–30 (2022)
Spivey, J.M.: Z Notation: A Reference Manual. Prentice Hall International Series in Computer Science, 2nd edn. Prentice Hall, Hoboken (1992)
6.
Abrial, J., Lee, M.K.O., Neilson, D., Scharbach, P.N., Sørensen, I.H.: The B-Method. In: VDM ’91: Formal Software Development, LNCS, vol. 552. Springer, pp. 398–405 (1991)
7.
Bjørner, D.: The Vienna development method (VDM): software specification and program synthesis. In: Mathematical Studies of Information Processing, Proceedings of the International Conference, LNCS, vol. 75. Springer, pp. 326–359 (1978)
8.
Warmer, J., Kleppe, A.: The Object Constraint Language: Getting Your Models Ready for MDA. Addison-Wesley, Boston (2003)
9.
Maoz, S., Ringert, J.O., Rumpe, B.: Synthesis of component and connector models from crosscutting structural views. In: European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE’13. ACM, pp. 444–454 (2013)
10.
Wong, S., Sun, J., Warren, I., Sun, J.: A scalable approach to multi-style architectural modeling and verification. In: 13th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS 2008), pp. 25–34 (2008)
11.
Bagheri, H., Tang, C., Sullivan, K.J.: TradeMaker: automated dynamic analysis of synthesized tradespaces. In: 36th International Conference on Software Engineering. ACM, pp. 106–116 (2014)
12.
Zave, P.: A formal model of addressing for interoperating networks. In: FM 2005: Formal Methods, International Symposium of Formal Methods Europe, LNCS, vol. 3582. Springer, pp. 318–333 (2005)
Kwiatkowska, M.Z., Norman, G., Parker, D.: Stochastic model checking. In: Formal Methods for Performance Evaluation, 7th International School on Formal Methods for the Design of Computer, Communication, and Software Systems, SFM, LNCS, vol. 4486. Springer, pp. 220–270 (2007)
15.
Calinescu, R., Ghezzi, C., Kwiatkowska, M.Z., Mirandola, R.: Self-adaptive software needs quantitative verification at runtime. Commun. ACM 55(9), 69–77 (2012)CrossRef
16.
Filieri, A., Ghezzi, C., Tamburrelli, G.: Run-time efficient probabilistic model checking. In: Proceedings of the 33rd International Conference on Software Engineering, ICSE. ACM, pp. 341–350 (2011)
17.
Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Computer Aided Verification: 23rd International Conference, CAV, vol. 6806. Springer, pp. 585–591 (2011)
18.
Gilmore, S., Hillston, J.: The PEPA workbench: a tool to support a process algebra-based approach to performance modelling. In: Computer Performance Evaluation, Modeling Techniques and Tools, 7th International Conference, LNCS, vol. 794. Springer, pp. 353–368 (1994)
19.
Jifeng, H., Seidel, K., McIver, A.: Probabilistic models for the guarded command language. Sci. Comput. Program. 28(2), 171–192 (1997)MathSciNetCrossRef
20.
Ghezzi, C., Sharifloo, A.M.: Model-based verification of quantitative non-functional properties for software product lines. Inf. Softw. Technol. 55(3), 508–524 (2013)CrossRef
Cámara, J.: HaiQ: synthesis of software design spaces with structural and probabilistic guarantees. In: FormaliSE@ICSE 2020: 8th International Conference on Formal Methods in Software Engineering. ACM, pp. 22–33 (2020)
25.
Frias, M.F., Galeotti, J.P., Pombo, C.L., Aguirre, N.: DynAlloy: upgrading alloy with actions. In: 27th International Conference on Software Engineering (ICSE). ACM, pp. 442–451 (2005)
26.
Macedo, N., Brunel, J., Chemouil, D., Cunha, A., Kuperberg, D.:Lightweight specification and analysis of dynamic systems with rich configurations. In: T. Zimmermann, J. Cleland-Huang, Z. Su (eds.) Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, Seattle, WA, USA, November 13–18, 2016. ACM, pp. 373–383 (2016)
27.
Abrial, J.: Modeling in Event-B: System and Software Engineering. Cambridge University Press, Cambridge (2010)CrossRef
28.
Hensel, C., Junges, S., Katoen, J., Quatmann, T., Volk, M.: The probabilistic model checker storm. Int. J. Softw. Tools Technol. Transf. 24(4), 589–610 (2022)CrossRef
29.
David, A., Jensen, P.G., Larsen, K.G., Mikučionis, M., Taankvist, J.H.: In: Tools and Algorithms for the Construction and Analysis of Systems, LNCS, vol. 9035, pp. 206–211. Springer, Berlin (2015)
30.
Courtney, T., Gaonkar, S., Keefe, K., Rozier, E., Sanders, W.H.: Möbius 2.3: an extensible tool for dependability, security, and performance evaluation of large and complex system models. In: Proceedings of the 2009 IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2009, Estoril, Lisbon, Portugal, June 29–July 2, 2009. IEEE CS, pp. 353–358 (2009)
31.
Calinescu, R., Ceska, M., Gerasimou, S., Kwiatkowska, M., Paoletti, N.: Designing robust software systems through parametric Markov chain synthesis. In: 2017 IEEE International Conference on Software Architecture, ICSA 2017, Gothenburg, Sweden, April 3–7, 2017. IEEE, pp. 131–140 (2017)
Grunske, L., Aleti, A.: Quality optimisation of software architectures and design specifications. J. Syst. Softw. 86(10), 2465–2466 (2013)CrossRef
34.
Esfahani, N., Malek, S., Razavi, K.: GuideArch: guiding the exploration of architectural solution space under uncertainty. In: 35th International Conference on Software Engineering, ICSE. IEEE CS, pp. 43–52 (2013)
35.
Aleti, A., Bjornander, S., Grunske, L., Meedeniya, I.: ArcheOpterix: an extendable tool for architecture optimization of AADL models. In: Model-Based Methodologies for Pervasive and Embedded Software, MOMPES ’09. ICSE Workshop on 2009, 61–71 (2009)
36.
Meedeniya, I., Moser, I., Aleti, A., Grunske, L.: Architecture-based reliability evaluation under uncertainty. In: 7th International Conference on the Quality of Software Architectures, QoSA 2011 and 2nd International Symposium on Architecting Critical Systems, ISARCS. ACM, pp. 85–94 (2011)
37.
Martens, A., Koziolek, H., Becker, S., Reussner, R.: Automatically improve software architecture models for performance, reliability, and cost using evolutionary algorithms. In: International Conference on Performance Engineering. ACM, WOSP/SIPEW, pp. 105–116 (2010)
38.
Bondarev, E., Chaudron, M.R.V., de Kock, E.A.: Exploring performance trade-offs of a JPEG decoder using the Deepcompass framework. In: 6th WS on Software and Performance. ACM, WOSP, pp. 153–163 (2007)
39.
Becker, S., Koziolek, H., Reussner, R.H.: The palladio component model for model-driven performance prediction. J. Syst. Softw. 82(1), 3–22 (2009)CrossRef
MacCalman, A.D., Beery, P.T., Paulo, E.P.: A systems design exploration approach that illuminates tradespaces using statistical experimental designs. Syst. Eng. 19(5), 409–421 (2016)CrossRef
42.
Dwivedi, V., Garlan, D., Pfeffer, J., Schmerl, B.: Model-based assistance for making time/fidelity trade-offs in component compositions. In: 11th International Conference on Information Technology: New Generations, ITNG 2014. IEEE CS (2014)
43.
Cámara, J., Garlan, D., Schmerl, B.R.: Synthesis and Quantitative Verification of Tradeoff Spaces for Families of Software Systems. In: Software Architecture: 11th European Conference, ECSA, LNCS, vol. 10475. Springer, pp. 3–21 (2017)
44.
Kwiatkowska, M., Norman, G., Parker, D., Vigliotti, M.: Probabilistic mobile ambients. Theor. Comput. Sci. 410(12–13), 1272–1303 (2009)MathSciNetCrossRef
45.
Jackson, D.: Software Abstractions: Logic, Language, and Analysis. MIT Press, Cambridge (2006)
Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Form. Asp. Comput. 6(5), 512–535 (1994)CrossRef
48.
Andova, S., Hermanns, H., Katoen, J.: Discrete-time rewards model-checked. In: Formal Modeling and Analysis of Timed Systems: First International Workshop, FORMATS, LNCS, vol. 2791. Springer, pp. 88–104 (2003)
49.
Weyns, D., Calinescu, R.: Tele assistance: a self-adaptive service-based system exemplar. In: 10th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, SEAMS 2015. IEEE CS, pp. 88–92 (2015)
Glazier, T.J., Cámara, J., Schmerl, B.R., Garlan, D.: Analyzing resilience properties of different topologies of collective adaptive systems. In: IEEE International Conference on Self-Adaptive and Self-Organizing Systems Workshops, SASO Workshops. IEEE CS, pp. 55–60 (2015)
Cámara, J., Schmerl, B.R., Garlan, D.: Software architecture and task plan co-adaptation for mobile service robots. In: S. Honiden, E.D. Nitto, R. Calinescu (eds.) SEAMS ’20: IEEE/ACM 15th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, Seoul, Republic of Korea, 29 June–3 July, 2020. ACM, pp. 125–136 (2020)
55.
Kang, E., Milicevic, A., Jackson, D.: Multi-representational security analysis. In: Proceedings of the 24th Symposium on Foundations of Software Engineering, FSE (2016)
56.
Johnson, K., Calinescu, R., Kikuchi, S.: An incremental verification framework for component-based software systems. In: Proceedings of the 16th International ACM Sigsoft Symposium on Component-based Software Engineering. ACM, CBSE ’13 (2013)
57.
Stevens, C., Bagheri, H.: Parasol: efficient parallel synthesis of large model spaces. In: A. Roychoudhury, C. Cadar, M. Kim (eds.) Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, Singapore, November 14–18, 2022. ACM, pp. 620–632 (2022)
58.
Skandylas, C., Khakpour, N., Cámara, J.: Security countermeasure selection for component-based software-intensive systems. In: 22nd IEEE International Conference on Software Quality, Reliability and Security, QRS 2022, Guangzhou, China, December 5–9, 2022. IEEE, pp. 63–72 (2022)