In the following, we identify four sources for attributing reliability to a computational process such as computer simulations. It is important to note that each source offers a different ‘degree of reliability’ to computer simulations. For instance, expert knowledge by itself is a rather weak source for the reliability of most computer simulation. The reason for this is that it could be idiosyncratic in several ways, and therefore not reliable in the epistemic sense required. Verification and validation methods, on the other hand, are stronger forms of reliability for they depend on mathematical machinery and thus are epistemically more secure. This is the reason why the latter, and not expertise knowledge, are on many occasions decisive for attributing reliability to computer simulations. Having said this, we are unable to offer here a measurement of the degree of reliability for each source. Instead, we offer an analysis of each individual source.
1.
Verification and validation methods
2.
Robustness analysis for computer simulations
3.
A history of (un)successful implementations
4.1 Verification and Validation
Verification and
validation7 are the general names given to a host of methods used for increasing the reliability of scientific models as well as computer simulations. Understanding their role, then, turns out to be essential for attributing reliability to computer simulations.
In
verification, it is standard that formal methods are at the center for the reliability of computer software, whereas in
validation benchmarking is responsible for confirmation of the outcomes (Oberkampf and Roy
2010, Preface). In verification methods, then, the relationship of interest is between the specification of a model and the computer software, whereas in validation methods the relationship of interest is between computation and the empirical world. Here are two standard definitions largely accepted and used by the community of researchers:
Verification: the process of determining that a computational model accurately represents the underlying mathematical model and its solution.
Validation: the process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model. (Oberkampf et al.
2003)
In recent philosophical studies, these definitions have been adapted to include computer simulations. Eric Winsberg, for instance, takes it that “
verification, [...] is the process of determining whether or not the output of the simulation approximates the true solutions to the differential equations of the original model.
Validation, on the other hand, is the process of determining whether or not the chosen model is a good representation of the real-world system for the purpose of the simulation” (Winsberg
2010, 19–20). Another example of a philosopher discussing verification and validation in computer simulations is Margaret Morrison. Although she agrees with Winsberg that verification and validation are two methods not always clearly divisible, she nevertheless downplays the need for verification methods claiming that validation is a more crucial method for assessing the reliability of computer simulation (Morrison
2009, 43).
The scientific and computational communities, in contrast, have a more diverse set of definitions to offer, all tailored to the specificities of the simulation under study. In
verification studies, for instance, the literature provides two methods particularly important for computer simulations. These are
code verification and
calculation verification.
8 Their importance lies in the fact that both methods focus on the correctness of the discretization procedure, a key element for implementing mathematical models as computer simulations.
William Oberkampf and Timothy Trucano have further argued that it is useful to segregate code verification into two activities, namely,
numerical algorithm verification and
software quality engineering. The purpose of numerical algorithm verification is to address the mathematical correctness of the implementation of all the numerical algorithms that affect the numerical accuracy of the results of the simulation. The goal of this verification method is to demonstrate that the numerical algorithms implemented as part of the simulation model are correctly implemented and performing as intended (Oberkampf and Trucano
2002, 720). Software quality engineering, on the other hand, sets the emphasis on determining whether the simulation model is reliable and produces, most of the time, trustworthy results. The purpose of software quality engineering is to verify the simulation model and the results of the simulation on a specific computer hardware, in a specified software environment—including compilers, libraries, I/O, etc. These verification procedures are primarily in use during the development, testing, and maintenance of the simulation model (Oberkampf and Trucano
2002, 721).
As for
calculation verification, it is generally depicted as the method that prevents three kinds of errors: human error in the preparation of the code, human error in the analysis of the results, and numerical errors resulting from computing the discretized solution of the simulation model. A definition for calculation verification is “the process of determining the correctness of the input data, the numerical accuracy of the solution obtained, and the correctness of the output data for a particular simulation” (Oberkampf et al.
2003, 34).
The process of
validation consists in showing that the results of the simulation correspond, more or less accurately and precisely, to those obtained by measurement and observation of the target system. Oberkampf and Trucano highlight three key aspects of validation methods. These are “i) quantification of the accuracy of the computational model by comparing its responses with experimentally measured responses, ii) interpolation or extrapolation of the computational model to conditions corresponding to the intended use of the model, and iii) determination if the estimated accuracy of the computational model, for the conditions of the intended use, satisfies the accuracy requirements specified” (Oberkampf and Trucano
2008, 724).
It is important to mention that, with the introduction of computer simulations in experimental contexts, validation does not exclusively depend on contrasting results against empirical data. Ajelli and team have shown how it is possible to run different computer simulations and use their results to assert their mutual reliability – in this case, there is not a mere convergence of results, but also of key variables (Ajelli et al.
2010), as we argue in Sect.
4.2.
The role of verification and validation methods in attributing reliability to computer simulations is rather straightforward: on the one hand, they make sure that the implementation of well established theories is correctly carried out and not much information is missed; on the other, they provide good reasons to trust the results of the simulations because they match, with more or less accuracy, empirical data.
4.2 Robustness Analysis for Computer Simulations
When systems under study are inherently too complex and particular degrees of precision and accuracy in idealized models are required but not delivered by fundamental theories, then
robustness analysis becomes a suitable alternative method for determining the trustworthiness of results (Weisberg
2013, 156).
Robustness analysis, as presented by Richard Levins (
1966) and further elaborated by Michael Weisberg (
2013) allows researchers to learn about the results of a given model and whether they are an artifact of it (e.g., due to a poor idealization) or whether they are related to core features of the model (Weisberg
2013, 156). At its heart, robustness analysis consists of two steps, the first one consisting in examining a group of models to determine if they all predict a common result—called the
robust property; during the second step, models are analyzed for those structures in the model that generate the sought robust property. The results from these two steps are combined in order to formulate the
robust theorem, “a conditional statement linking common structure to robust property, prefaced by a
ceteris paribus clause” (Weisberg
2013, 158). It is important to emphasize that robust theorems do not make claims about the frequency with which the robust property occur in target systems. Rather, it makes the conditional claim about what happens if a model is instantiated in an specific way (Weisberg
2013, 169).
Following Weisberg, the ideal case of robustness analysis requires researchers to examine a group of similar but distinct models in search of a robust behavior. The aim of such an examination is to formulate sufficiently diverse models in such a way that the discovery of a robust property is not due to mere luck in the way the models were analyzed but rather because the property is actually there (Weisberg
2013, 158). The question now is how to formulate such diverse models. Weisberg suggests a list of possibilities, none of which consists of changes in the parametrization of the model and of initial and boundary conditions, but in significant modifications to the structure of the model. Reinterpreting these possibilities in terms of modification in computer simulations, they include varying the regularity of the grid, varying the number of attributes of a process, and varying the heterogeneity of the utility function, among others.
Let us note that Weisberg’s analysis of robustness relies on the number of (heterogeneous) models that researchers are able to create. The more models available, the more likely it is that the robust property identified across models can actually be found in a real-world system (Weisberg
2013, 160ff). In computer simulations, the computational power allows researchers to produce a large number of heterogeneous models at a relatively low cost (e.g., in terms of human resources, money, time, etc.). In this sense, inferring that a robust property is present in the simulation models, and therefore that the core structure is giving rise to such a property, is a much simpler task with computer simulations.
Now, the core assumption in robustness analysis is that if a sufficiently heterogeneous set of models give rise to a property, then it is very likely that the real-world phenomenon also shows the same property. Furthermore, robustness analysis allows researchers to infer that, when the robust property is observed in a real-world system, then it is very likely that the core structure of the computer simulation corresponds to the causal structure giving rise to the real-world phenomenon. Robustness analysis, therefore, is a key player in the process of attributing reliability to computer simulations.
Consider the following example of robustness analysis in computer simulations. Ajelli et al. provide a side-by-side comparison of two computer simulations, a stochastic agent-based model and a structured meta-population stochastic model (GLobal Epidemic and Mobility—GLEaM). The agent-based model includes an explicit representation of the Italian population through highly detailed data on the socio-demographic structure. In addition, and for determining the probability of commuting from municipality to municipality, Ajelli et al. use a general gravity model used in transportation theory. However, the epidemic transmission dynamics is based on an ILI (Influenza-like Illness) compartmentalization, which in turn is based on stochastic models that integrate susceptible, latent, asymptomatic infections, and symptomatic infections (Ajelli et al.
2010, 5). The authors define their agent-based model as “a stochastic, spatially-explicit, discrete-time, simulation model where the agents represent human individuals [...] One of the key features of the model is the characterization of the network of contacts among individuals based on a realistic model of the socio-demographic structure of the Italian population.” (Ajelli et al.
2010, 4) The authors also mention that both GLEaM and the agent-based model are dynamically calibrated in that they share exactly the same initial and boundary conditions (Ajelli et al.
2010, 6).
On the other hand, GLEaM is a multiscale mobility network based on high-resolution population data that estimates the population with a resolution given by cells of 15
\(\times \) 15 min of arc. Balcan et al. explain that a typical GLEaM consists of three data layers. A first layer, where the population and mobility allows the partition of the world into geographical regions. This partition defines a second layer, the subpopulation network, where the inter-connection represents the fluxes of individuals via transportation infrastructures and general mobility patterns. Finally, and superimposed onto this layer, is the epidemic layer, that defines inside each subpopulation the disease dynamic (Bruno et al.
2009). In the study by Ajelli et al., GLEaM also represents a grid-like partition where each cell is assigned the closest airport. The subpopulation network uses geographic census data, and the mobility layers obtain data from different databases, including the International Air Transport Association database consisting in a list of airports worldwide connected by direct flights.
By increasing spatial resolution, changing grid size, the topography of the network, internal functions, and several other structures – tailored to what each model can offer to alter – Ajelli et al. are able to identify a series of robust properties and thus elaborate a series of robust theorems.
9 To illustrate just one case, Ajelli et al. reported to have found that the two computer simulations “display a very good agreement in the timing of the epidemic, with a very limited variation in the time of the simulated epidemic activity peaks. In the metapopulation approach the fraction of the population affected by the epidemic is larger (by 5–10%) than in the agent-based approach. This difference is due to the assumption of homogeneity and thus the lack of detailed structure of contacts (besides the age structure) in the metapopulation approach with respect to the agent-based approach” (Ajelli et al.
2010, 11). In this case, robustness analysis provides good reasons to believe that core structures in GLEaM and the agent-base simulation correspond very well to the actual timing of the epidemic. Researchers are thus justified in believing claims about results of these simulations – and from those created from these two simulations.
4.3 A History of (Un)successful Implementations
The history of science offers a long record of successes and accomplishments, as well as failures and incompetence. What does such a disruptive history tell us about the scientific enterprise? In the context of experimental practice, Hacking (
1988) and Galison (
1997) have argued that mature science has been, by and large, cumulative since the seventeenth century. Such a claim builds on the idea that (un)successful implementation of a theory, a model, or even two chemicals in a laboratory setup are part of the corpus of knowledge as much as the theory, the model, and the two chemicals in question.
Something very similar can be said about the success, failure and cumulative nature of computer simulations. The simulation model as a whole is conceptualized, designed, programmed and executed in a series of stages that do not remain constant over time (Durán
2018). In each stage, the knowledge relied upon to devise each method comes from a wide range of domains, including mathematics, logic and computer theory, sociology and cognitive psychology. Over time, techniques are improved upon, reconfigured, and radically revised when the technology changes or a new method is envisaged. For instance,
design prototyping is a sub-field of software engineering that helps developers assess alternative design strategies and decide which is best for a particular project. There are no standard methods for choosing the best strategy, but rather the designers may address the requirements of the simulation with several different design approaches to see which has the best properties. For instance, a simulation involving networking may be built as a ring in one prototype and as a star in another, and performance characteristics evaluated to see which structure is better at meeting performance goals or constraints (Pfleeger and Atlee
2009, Chapter 5). In this respect, for some cases the best option will be to draw from a body of successful implementations (e.g., of successful implementations of ring networking simulations); for some other cases, a new strategy will populate such a body (e.g., failures in communication protocols, and the success in a new networking topology). In both cases, they integrate a history of (un)successful implementations.
This is, we believe, part of what Massimi and Bhimji (
2015) have in mind when they claim that the epistemic reliability of computer simulations come from the credentials supplied by well grounded scientific knowledge. Although we agree with this claim, we must keep in mind that the methodology of computer simulations is dynamic and non-hierarchical. That is to say that researchers make constant changes to their simulations, rather than merely implementing a well grounded theory once and for all. It is also to say that well grounded scientific knowledge is, to today’s scientific standards, also knowledge generated by computer simulations. In this vein, well grounded scientific knowledge depends as much on computer simulations as the latter depend on scientific knowledge. Naturally, such a dynamism in the methodology might introduce sources of unreliability (e.g., using a method that has been historically successful in one domain into a complete different domain). However, the simulation model itself is, at some point, methodologically stabilized – as opposed to constant tinkering.
In this respect, we follow Eric Winsberg who, borrowing in turn from Hacking, claimed that building techniques have their own life for “they carry with them their own history of prior (un)successes and accomplishments, and, when properly used, they can bring to the table independent warrant for belief in the models they are used to build” (Winsberg
2003, 122). We include such history of (un)successful implementations as an important source for attributing reliability to computer simulations.
4.4 Expert Knowledge
The last source we offer here for computer reliabilism can be found in the different disciplines that constitute Science and Technology Studies. In there, a great deal of attention is put on understanding the notion and role of experts in science and engineering. Harry Collins and Robert Evans argue that standard theories of expertise [e.g., the
relational theory of expertise, which take expertise to be a matter of the experts’ relations with other experts (Collins and Evans
2007, 2)] fall short in a series of respects. They usually provide no guidance on how to legitimize and identify the experts nor how to choose between competing experts [see the
periodic table of expertises (Collins and Evans
2007, 14)]; furthermore they leave out of consideration the analysis of the citizen’s role in technological decision-making and, if the proper measures are not in place, they can be dangerously idiosyncratic. Collins and Evans propose as alternative the
realist theory, which takes that expertise is some sort of attribute or possession that groups of experts have and that individuals acquire through their membership of those groups. “Acquiring expertise” Collin and Evans conclude, “is therefore a social process – a matter of socialization into the practices of an expert group – and expertise can be lost if time is spent away from the group” (Collins and Evans
2007, 3).
To us, the expert is interpreted in the realist mode proposed by Collin and Evans, with the condition that having membership of a given group does not mean strict participation in that group. Thus, to us the mathematician and physicist that know the underlying theory that will be implemented as a simulation very well, but know nothing about the implementation itself, are as much an expert in the computer simulation as the computer scientist that knows how to implement the theory but little or nothing about the theory itself.
As Claus Beisbart indicates, scientists believe the results of their simulations because they trust the assumptions upon which such simulations are built (Beisbart
2017). These assumptions are here interpreted as being suggested and approved by the relevant actors, that is, the experts. Furthermore, by and large scientists believe the results of their simulations because they fall within an expected range. Marco Ajelli et al. provide us with a good example of the interplay between the assumptions built into the simulation model and what experts typically anticipate. To Ajelli et al. “[t]he epidemic size profile shows an expected overall mismatch of 5–10% depending on the reproductive rate, which is induced by the homogeneous assumption of the metapopulation strategy” (Ajelli et al.
2010, 2).
With these ideas in mind, it is possible to argue that the expert is a key contributor to the reliability of computer simulations:
10 the theory and assumptions built into the simulation, along with the implicit theory supporting the computation largely depend on the experts, and/or they determine the range within which results can be accepted.
Expert knowledge also plays an important role in determining the robustness of a simulation as well as in participating in a history of (un)successful implementations. In the latter case, because they are the main actors in creating such (un)successful history. In the former case, because the expert’s abilities to identify and judge relevantly similar structures is paramount for claims about robust properties. According to Weisberg, there are occasions where researchers rely on judgment and experience, not mathematics or simulation, to determine whether a common structure gives rise to the robust behavior as well as judge whether the common structure contains important mathematical similarities as opposed to just intuitive qualitative similarities (Weisberg
2013, 159). Ajelli et al. again offer an interesting assertion that combines claims about robustness and the modeling assumptions advanced by experts: “[t]he good agreement of the two approaches [i.e., the agent-based simulation and the GLEaM simulation] reinforces the message that computational approaches are stable with respect to different data integration strategies and modeling assumption” (Ajelli et al.
2010, 2)