1 Introduction
#
SAT) which determines the number of satisfying assignments for a given propositional formula. As the translation of feature models to propositional logic is well-researched (Benavides et al. 2010; Batory 2005), #
SAT solvers can be applied out of the box to compute the cardinality of feature models. However, #
SAT is at least as hard as SAT because after computing #
SAT (i.e., the number of satisfying assignments) it is trivial to determine whether a formula is SAT (i.e., there is at least one satisfying assignment). In general, #
SAT is assumed to be harder (Burchard et al. 2015; Valiant 1979). While it is widely accepted that regular SAT is typically easy for industrial feature models (compared to randomly generated formulas (Pett et al. 2019; Mendonça et al. 2009)), this has not been explored for #
SAT.#
SAT solvers for the analysis of feature models. Analyses based on feature-model counting can only be applied in practice if available #
SAT solvers scale to industrial feature models considering time restrictions for typical use cases, such as interactive settings (Fritsch et al. 2020; Sprey et al. 2020; Krieter et al. 2017; Acher et al. 2013; Benavides et al. 2007) or continuous integration environments (Pett et al. 2021). We thus evaluate the runtimes of analyzing feature models with publicly available #
SAT solvers. Furthermore, we provide recommendations on which solvers to use for analyzing feature models to reduce runtimes.#
SAT solvers rely on a variety of techniques to compute the number of satisfying assignments. While some solvers only report the number of satisfying assignments (Bayardo Jr and Pehoushek 2000; Sang et al. 2004; Thurley 2006; Burchard et al. 2015; Biere 2008), other solvers apply knowledge compilation to different target languages, such as binary decision diagrams (BDDs) http://buddy.sourceforge.net/manual/main.html. Accessed: 02 Mar 2020; https://github.com/vscosta/cudd. Accessed: 13 Jun 2020 (Toda and Soh 2016), deterministic decomposable negation normal forms (d-DNNFs) (Darwiche 2004; Lagniez and Marquis 2017; Muise et al. 2010), sentential decision diagrams (SDDs) (Oztok and Darwiche 2015), and extended affine decision trees (Koriche et al. 2013). The compiled target languages may be reused for further feature-model analyses. We analyze the benefits of different techniques to identify promising classes of #
SAT solvers.#
SAT solvers. In particular, we analyze the correlation between the runtimes and a variety of structural metrics.#
SAT solvers, which estimate the number of satisfying assignments for a given formula, may be beneficial. We inspect the benefits of approximate #
SAT solvers when applied to industrial feature models.#
SAT solvers which are publicly available. For our empirical evaluation, we consider 15 subject systems with overall 130 feature models. We provide the framework and data used for the empirical evaluation on Zenodo.1 In particular, our work provides the following contributions:
#
SAT technology on 130 industrial feature models.#
SAT solvers out of 21 off-the-shelf tools.#
SAT technologies.#
SAT solvers and structural metrics of the feature model.#
SAT solvers.#
SAT solvers. Second, we examine the runtimes of two approximate #
SAT solvers. Third, we consider four additional subject systems. Fourth, we analyze the correlation between the runtime and 12 structural metrics of the feature models. Fifth, we improve the accuracy of our results by repeating the measurements and applying statistical tests to study the significance of our results. Overall, the evaluation subsumes the previous evaluation (Sundermann et al. 2020) except for analyzing the evolution of systems. We consider a more thorough analysis (compared to the previous evaluation (Sundermann et al. 2020)) of the evolution as out of scope for this work.2 Motivating example
#
SAT solver, the developer finds that the cardinality is 42 before the change and 25 afterwards. The immense decrease in the cardinality, if unexpected, may already be an indicator for a design problem, because the set of available cars is almost halved. Further, the cardinality can be combined with domain knowledge for more sophisticated insights as follows. In the old version, there are 21 cars with an Automatic gearbox and 21 cars with a Manual gearbox. The newly introduced constraint has no impact on cars with Manual. Thus, there are still 21 cars with Manual in the new version which implies that only 25 − 21 = 4 valid configurations with Automatic remain. Due to the tree hierarchy, the introduced constraint (Automatic ⇒ USA) requires each automatic car to also have Radio, Navigation, DigitalMaps, and USB. This side effect was probably unintended and can be fixed by changing the constraint to Automatic ∧ DigitalMaps ⇒ USA. While the original constraint (Automatic ⇒ USA) induces an immense and possibly unintended reduction in the variability, it introduces no traditional anomalies (e.g., core, false-optional, or dead features; cf. (Benavides et al. 2010) for more details). Hence, in such cases it is hard to detect the side effects with traditional SAT-based analyses. In the provided scenario, we used the variability reduction of a feature model update to detect side effects, one of 21 applications of #
SAT we identified in previous work (Sundermann et al. 2021).#
SAT.3 The need for feature-model counting
#
SAT to feature models for multiple aspects, such as detecting design errors, economical estimations, and guidance for developers. Overall, we found 21 applications gathered from the literature or inspired by industry projects, one of which we exemplified in the last section. In the following, we present some exemplary applications that depend on computing the number of valid configurations provided in the original work (Sundermann et al. 2021). Each of the exemplary applications is inspired by insights of our industry projects.#
SAT solver (Sundermann et al. 2021).#
FM. Then, by randomly selecting an integer within the range [1,…,#
FM], each configuration has the same probability to be included in the sample. The bijection can be achieved using #
SAT by recursively assigning the features (Munoz et al. 2019). Following the algorithm of Munoz et al. (Oh et al. 2019), the number of valid configurations needs to be computed for each assignment in the worst case. This requires an efficient #
SAT solver, especially for large systems (Munoz et al. 2019).4 Propositional model counting
#
SAT problem, and different strategies employed by the evaluated solvers. Note that this section is not necessary to understand the empirical evaluation. Hence, the section can be skipped if considering the evaluated solvers as black boxes is sufficient for the reader.#
SAT) is defined as the problem of computing the number of satisfying full assignments of a propositional formula (Gomes et al. 2006; Kübler et al. 2010). #F = |{α | F(α) = 1}| corresponds to the number of satisfying full assignments of formula F. In the following, we present three popular model counting methods employed by the majority of solvers in our empirical evaluation, namely (Davis-Putnam-Logemann-Loveland) DPLL-based (Bayardo Jr and Pehoushek 2000; Sang et al. 2005; Thurley 2006; Burchard et al. 2015; Biere 2008), d-DNNF-based (Darwiche 2004; Lagniez and Marquis 2017; Muise et al. 2010), and BDD-based (Toda and Soh 2016) counting.#
SAT solving corresponds to compiling a propositional formula to d-DNNF and then retrieving the number of satisfying assignments by traversing the d-DNNF. After the compilation, computing the model count takes linear time w.r.t. the number of the d-DNNF nodes (Darwiche 2004).#
SAT may also be computed using a binary decision diagram BDD(F) representing the propositional formula F. A binary decision diagram is a rooted directed acyclic graph two terminal nodes ⊥ and ⊤. Every non-terminal node x is associated with a variable v ∈ vars(F) and has precisely two outgoing edges, named low (setting v to false) and high (setting v to true). Typically, one considers reduced ordered binary decision diagrams (BDD) (Bryant 1986, 2018; Mendonça 2009). A BDD is ordered, if nodes associated to a variable vi always precede nodes associated to a variable vj or vice versa. If a BDD is reduced, then it (1) does not contain nodes with their low and high edges incident with the same node and (2) no two nodes associated with the same variable have the same nodes incident to their low and high edges. All satisfying assignments for F correspond to a path P from the root node to ⊤ (1-path) in BDD(F). Let xv be a node associated to the variable v. If the low edge of xv is contained in P, then v is set to false. Analogously, v is set to true if P contains the high edge of xv. If no node associated to v is contained in P, then v can be assigned an arbitrary value. Consequently, every 1-path induces \(2^{n - \lvert P\rvert }\) satisfying assignments, where \(\lvert P\rvert \) is the number of edges in P, and it suffices to iterate over all 1-paths in BDD(F) to compute #F:
#
SAT computations are required, compiling into d-DNNF or BDD might be beneficial even if the compilation time takes longer than a DPLL-based computation.5 Experiment design
#
SAT solvers on industrial feature models. We provide the required information for the presentation (Section 6) and discussion (Section 7) of the results. The replication package for our empirical evaluation is publicly available.2.#
SAT solvers and the methodology we used to collect them. In Section 5.3, we discuss the selection of subject feature models and provide information (e.g., number of features and domain) of the underlying product line. In Section 5.4, we describe the setup of the experiments regarding the overall procedure of the measurements, considered solvers, considered systems, and applied statistical tests. In Section 5.5, we provide details on the technical setup for the evaluation.5.1 Research questions
#
SAT technology, the performance of exact #
SAT solvers and solver classes, the correlation between structural metrics of the feature model and the runtime of solvers, and the performance of approximate #
SAT solvers. Typically, feature-model analyses, such as counting, are applied in interactive settings or in continuous integration. As those settings mandate short runtimes, we consider an analysis to be scalable if it requires at most a few minutes of runtimes.#
SAT solvers perform on industrial feature models?#
SAT solvers regarding the runtime when computing the cardinality of industrial feature models. Furthermore, we aim to find the most efficient #
SAT solvers to provide recommendations on which #
SAT solvers to use. Here, we consider the runtime required to compute #
SAT for given feature models as the efficiency of a solver.#
SAT solvers perform on industrial feature models?#
SAT solvers (cf. Section 5.3). With RQ2, we analyze the runtimes of different categories of solvers, namely (1) DPLLbased, (2) algebraic-based, and knowledge compilers translating to (3) BDD, (4) d-DNNF, and (5) other formats (i.e., EADT and SDD).We use the insights to discuss the benefits of different solver categories and to give recommendations on promising techniques for counting-based analyses.#
SAT solvers correlate to structural metrics of the feature model?#
SAT solvers. In particular, we examine if there is a correlation between the performance of the #
SAT solvers and structural metrics related to the size and complexity of feature models. The insights can be used as an indicator on the scalability of existing #
SAT solvers for a given feature model depending on their structure. Furthermore, we identify metrics that have a high impact on performance to find promising candidates for more accurate performance predictions in the future. In Table 1, we provide a list of metrics we examined. For each metric, we provide a description and instructions on how to compute the respective metric for a given feature model. The metrics were collected by Bezerra et al (Bezerra et al. 2014) and Bagheri and Gasevic (Bagheri and Gasevic 2011). Those metrics are based on structural properties related to size (e.g., number of features) or related to complexity (e.g., cyclomatic complexity).
Number of features | |
Description | Number of features in the feature model overall |
Formula | |Features|, with Features being the set of features |
Number of leaf features | |
Description | Number of features in the feature model without children |
Formula | |Leaves|, with Leaves being the set of leaf features |
Number of top features | |
Description | Number of children of the root feature |
Formula | |Top|, with Top being the set of children of the root feature |
Number of cross-tree constraints | |
Description | Number of cross-tree constraints in the feature model |
Formula | |CTC|, with CTC being the set of cross-tree constraints |
Number of clauses | |
Description | Number of clauses in the CNF representing the feature model |
Formula | |Clauses|, with Clauses being the set of clauses in the CNF |
Number of literals | |
Description | Number of overall literals appearing in the CNF |
Formula | |Literals|, with Literals being the set of literals in the CNF |
CTC-Density | |
Description | Ratio of unique features appearing in CTC compared to overall number of features |
Formula | \(\frac {|Features_{CTC}|}{\texttt {\#}\text {Features}}\), with FeaturesCTC being the set of features appearing in a cross-tree constraint. |
Depth of tree | |
Description | Depth of the feature tree at the longest path |
Formula | |FeaturesLP|, with FeaturesLP being the set of features in the longest path from the root to a leaf feature. |
Flexibility of configuration | |
Description | Ratio of optional features compared to overall number of features |
Formula | \(\frac {|Features_{Opt}|}{|Features|}\) with FeaturesOpt being the set of features that appear in some but not all valid configurations. |
Ratio of variability | |
Description | Average number of children |
Formula | \(\frac {{\sum }_{f \in Features} |children_f|}{|Features \setminus Leaves|}\), with childrenf being the set of children of feature f. |
Coefficient of connectivity-density | |
Description | Number of edges between features compared to the number of features |
Formula | \(\frac {{\sum }_{f \in Features} |edges_{f}|}{2 \cdot |Features|}\), with edgesf being the set of distinct edges connecting features. For a clause (f1 ∨… ∨ fn), there is an edge between every pair of feature f1,…,fn. |
Cyclomatic complexity | |
Description | Number of distinct cycles between features |
Formula | |cyclesf| with cyclesf being the set of independent cycles spanned by edgesf |
#
SAT solvers perform on industrial feature models?#
SAT solvers, we examine the performance of approximate #
SAT solvers which estimate the number of satisfying assignments for a given formula. There are applications which require exact results, such as uniform random sampling where approximate results would violate uniformity. However, for a multitude of applications, an estimated cardinality may be often sufficient. For example, consider prioritizing features that appear in many valid configurations for two features A and B with #A = 1065 and #B = 1060. For instance, an approximation that ensures that the result is at most 20 times larger/smaller than the exact count, would result in the same prioritizations as exact results as \(\frac {1}{20} \cdot 10^{65} > 20 \cdot 10^{60}\). We examine the performance of approximate #
SAT solvers to provide insights on their benefits.5.2 Evaluated #
SAT solvers
#
SAT solvers used in our empirical evaluation. First, we describe our methodology of gathering the solvers. Second, we list the identified solvers, group them by their type of computing #
SAT, and provide pointers where to find them.#
SAT solvers. Such a coverage should allow for conclusive results on (1) the current scalability of #
SAT technologies when applied to feature models and (2) recommendations on which solvers should be used to analyze feature models at hand.#
SAT solvers (Biere 2008; Bayardo Jr and Pehoushek 2000; Sang et al. 2004, 2005; Darwiche 2002, 2004; Toda and Soh 2016). Third, the solver can be used as a standalone blackbox tool in contrast to tools that require further setup (e.g., a client-server architecture (Lagniez et al. 2018)). Furthermore, we excluded the tool GPUSAT (Fichte et al. 2019) which performs #
SAT on a GPU as we expect issues with the comparability if a solver uses different hardware. We identified #
SAT solvers with the following three approaches.#
SAT tools and respective publications used in product-line analysis. Then, we performed (forward and backward) snowballing from the identified publications. For backward snowballing, we employed data from Google Scholar. Second, we used a list of publicly available #
SAT solvers from the report of the model counting 2020 competition as comparison (Fichte et al. 2021). Both lists are similar with eleven shared #
SAT solvers. The list of the model counting competition contained one solver in addition to the eleven shared solvers while the list from product-line analysis contained four additional solvers. Due to the similarity in the identified solvers, we argue that our list of #
SAT solvers provides a reasonable representation of current #
SAT technology. Third, we added three #
SAT solvers that entered the model counting 2020 competition but were not published beforehand.3#
SAT solvers. Table 2 provides an overview on the exact solvers. The exact #
SAT solvers can be separated in three main categories: DPLL-based solvers, algebraic solvers, and knowledge compilers. We consider a knowledge compiler to be a #
SAT solver if the compiled target language and the compiler support computing the number of satisfying assignments in polynomial time in the size of the target formula.
#
SAT solversSolver | Type | Target format | Reference |
---|---|---|---|
Cachet | DPLL | − | |
countAntom | DPLL | − | (Burchard et al. 2015) |
Ganak | DPLL | − | (Sharma et al. 2019) |
PicoSAT | DPLL | − | (Biere 2008) |
Relsat | DPLL | − | (Bayardo Jr and Pehoushek 2000) |
SharpCDCL | DPLL | − | (Klebanov et al. 2013) |
sharpSAT | DPLL | − | (Thurley 2006) |
McTW | Algebraic | − | MC Competition (Fichte et al. 2021) |
SUMC1 | Algebraic | − | MC Competition (Fichte et al. 2021) |
ADDMC | Algebraic | − | MC Competition (Fichte et al. 2021) |
c2d | Compiler | d-DNNF | |
d4 | Compiler | d-DNNF | (Lagniez and Marquis 2017) |
dSharp | Compiler | d-DNNF | (Muise et al. 2010) |
BuDDy | Compiler | BDD | http://buddy.sourceforge.net/manual/main.html. Accessed: 02 Mar 2020 |
CNF2OBDD | Compiler | BDD | (Toda and Soh 2016) |
Cudd | Compiler | BDD | https://github.com/vscosta/cudd. Accessed: 13 Jun 2020 |
CNF2EADT | Compiler | EADT | (Koriche et al. 2013) |
MiniC2D | Compiler | SDD | (Oztok and Darwiche 2015) |
SDD | Compiler | SDD | (Darwiche 2011) |
countAntom
is the only solver which internally supports multi-threading (Burchard et al. 2015). We evaluated countAntom
with one and four available threads separately to examine the impact of multi-threading on the runtime. We consider evaluating countAntom
also with four threads (opposed to only with a single thread) as the more sensible option due to the following reasons: First, it is reasonable to assume that multiple threads would be used in industrial settings. Second, to allow multi-threading the developers of countAntom
made several adjustments which may put countAntom
at a disadvantage when using a single thread. Third, nevertheless, a larger number of threads may result in a too large advantage for countAntom
. During the remainder of the evaluation, we refer to countAntom
with four threads if not stated otherwise.BuDDy
http://buddy.sourceforge.net/manual/main.html. Accessed: 02 Mar 2020 nor CUDD
https://github.com/vscosta/cudd. Accessed: 13 Jun 2020 support parsing DIMACS directly. In previous work, we implemented a Python-based wrapper called ddueruem
4 (Heß et al. 2021) which uses the ctypes
library5 to interface with their shared libraries and construct the BDD using the API described in their respective manuals http://buddy.sourceforge.net/manual/main.html. Accessed: 02 Mar 2020; https://github.com/vscosta/cudd. Accessed: 13 Jun 2020. As suggested in the manuals of BuDDy
and CUDD
, we enabled automatic variable reordering in both BuDDy
and Cudd
, using the converging variant of the sift algorithm (Rudell 1993). We decided to use our own wrapper ddueruem
due to limitations, namely the missing support for Cudd 3.0.0
and frequent crashes when using BuDDy
in the JavaBDD6 framework, which is often used for product-line analysis (Mendonça 2009; Mendonca and Cowan 2010; Pohl et al. 2011).#
SAT solvers, we also identified two approximate #
SAT solvers, namely ApproxMC
(Chakraborty et al. 2013) and ApproxCount
(Wei and Selman 2005). The solver ApproxCount
iteratively assigns variables to reduce the complexity of a formula. For each assigned variable, the solver estimates the resulting reduction in the number of satisfying assignments. After a user-specified number of assigned variables, the exact #
SAT solver Cachet
is executed with the simplified formula as input. The estimated reduction is then applied to the result of the simplified formula to derive an approximated number of satisfying assignments for the original formula. In our previous work (Sundermann et al. 2020), every feature model with less than 1,000 features was successfully evaluated by most #
SAT solvers. Following this insight, we directed ApproxCount
to start the exact computation at 1,000 remaining variables. For ApproxMC
, we used the default parameters.5.3 Subject systems
#
SAT solvers for analyzing feature models. We argue that the applicability mainly depends on the scalability on industrial feature models, as artificial models might not be representative for industrial usage as observed in other domains (Ansótegui et al. 2009; Heß et al. 2021). Therefore, we only use industrial feature models as subject systems. We consider a feature model to be industrial if it fulfills the following two criteria: (1) it specifies the variability of a product line used in the real world and (2) it does not vastly simplify the complexity (in terms of features and constraints) of the product line. Note that we only consider variability of the problem space (i.e., which valid configurations do exist) opposed to variability in the solution space (i.e., how and where to implement variability).#
SAT solvers on feature models taken from industrial product lines from the automotive, operating system, database, and financial services domain. Table 3 provides an overview on the considered feature models, sorted by the number of features, including name, number of features, number of constraints, and the work they were originally published in. The index i indicates the position of the subject system in diagrams in Section 6.
#
Features)Subject systems | i | # Features | # Constraints | Orig. source |
---|---|---|---|---|
BerkeleyDB | 1 | 76 | 20 | |
axTLS | 2 | 96 | 14 | (Knüppel et al. 2017) |
uClibc | 3 | 313 | 56 | (Knüppel et al. 2017) |
uClinux-base | 4 | 380 | 3,455 | (Knüppel et al. 2017) |
Automotive04 | 5 | 531 | 623 | Confidential |
Automotive03 | 6 | 588 | 1,184 | Confidential |
BusyBox | 7 | 631 | 681 | (Pett et al. 2021) |
FinancialServices | 8 | 771 | 1,080 | |
Embtoolkit | 9 | 1,179 | 323 | (Knüppel et al. 2017) |
CDL (116 Models) | 10 | 1,178–1,408 | 816–956 | (Knüppel et al. 2017) |
uClinux-distribution | 11 | 1,580 | 197 | (Knüppel et al. 2017) |
Automotive05 | 12 | 1,663 | 10,321 | Confidential |
Automotive01 | 13 | 2,513 | 2,833 | |
Linuxv2.6.33.3 | 14 | 6,467 | 3,545 | (Knüppel et al. 2017) |
Automotive02 | 15 | 18,616 | 1,369 | (Knüppel et al. 2017) |
#
SAT solvers. If we evaluated the different CDL models as distinct systems, 89.2% of the overall 130 (14 other + 116 CDL) evaluated feature models are CDL models which results in a huge bias of the results. Therefore, we consider the median of runtimes over the 116 different CDL models as the runtime of the CDL subject system in every experiment if not stated otherwise. To show the similarity in the performance of #
SAT solvers, we also present the runtimes on the different CDL models in Section 6.5.4 Experimental setup
#
SAT solvers, we translate each feature model into conjunctive normal form (CNF) and store it in DIMACS format. We invoke the #
SAT solvers with the DIMACS as input. As the translation to CNF typically requires only a few milliseconds and is equivalent for each solver, we do not include the translation time in the overall runtime. Furthermore, we set a timeout of ten minutes (cf. RQ1 in Section 5.1) for evaluating a single feature model as the baseline for the experiment. The threshold is motivated by applying counting-based analyses in interactive settings and continuous-integration environment which should not exceed a few minutes of runtime to provide fast feedback to developers after changing a feature model.Experiment | Solvers | # Reps. | RQ1 | RQ2 | RQ3 | RQ4 |
---|---|---|---|---|---|---|
Experiment 1a | All Exact # SAT | 1 | × | × | ||
Experiment 1b | Remaining Exact # SAT | 50 | × | × | × | |
Experiment 1c | Remaining Exact # SAT | 1 | × | × | ||
Experiment 2a | All Approximate # SAT | 1 | × | |||
Experiment 2b | Remaining Approximate # SAT | 50 | × |
#
SAT solvers#
SAT solvers (cf. Section 5.2) on the considered feature models (cf. Section 5.3). For the solvers based on knowledge compilation, we consider the overall runtime to compile and compute a result. We use the insights of this experiment to answer RQ1, RQ2, and RQ3. The experiment is separated into three stages.#
SAT solvers. Here, we measure the runtime of each of the 19 exact #
SAT solvers on the 15 subject systems. The idea of Experiment 1a is to remove slow solvers from the following two stages that significantly increase the overall runtime for the experiments. We consider a solver to be slow if the solver requires more than 50% of additional runtime compared to the following solver when ordered by overall runtime for the experiment. We refer to the solvers that are not excluded as remaining solvers. In the second stage (Experiment 1b), we perform the measurements with the remaining #
SAT solvers with 50 repetitions for each feature model for more robust results. In the third stage (Experiment 1c), we further evaluate the runtimes of the remaining solvers on subject systems for which no solver computed a result in Experiment 1a and 1b. Here, we perform one repetition and increase the timeout to 24 hours to examine whether an increase of the timeout allows a successful computation.#
SAT solvers#
SAT solvers on each feature model to provide insights for RQ4. Furthermore, we give a comparison to the exact #
SAT solvers to evaluate the benefits. Analogous to Experiment 1a, we also perform an initial experiment referred to as Experiment 2a with only one repetition per measurement to exclude slow solvers from the following experiments. Then, we repeat the measurements with 50 repetitions on the remaining approximate #
SAT solvers (Experiment 2b), analogous to Experiment 1b.Use case | Sample | Statistical tests | RQ1 | RQ2 | RQ3 | RQ4 |
---|---|---|---|---|---|---|
Comparison solvers system | Unpaired | Mann-Whitney (McKnight and Najab 2010) | × | × | × | |
Comparison solvers overall | Paired | Friedman test (Conover and Iman 1981) | × | × | × | |
Post-Hoc Conover (Conover and Iman 1981) | ||||||
Correlation solver/metric | Paired | Spearman (Zar 1972) | × |
scipy
v1.7.2 implementation of Mann-Whitney.13#
SAT solvers on all 15 subject systems. For the comparison, each data point corresponds to the median of runtimes over the 50 repetitions for a combination of subject system and solver. Here, we apply a Friedman Test followed by a Post-hoc Conover test on the samples of all solvers as we have paired samples (pairs of subject systems) and again do not assume a normal distribution (Conover and Iman 1981). For the tests, we assume a significance level α = 5%. We use the scipy
v1.7.2 implementation of Friedman14 and the scikit_posthocs
implementation of Post-Hoc Conover.15#
SAT solvers and structural metrics (cf. RQ3). Here, we use Spearman’s correlation coefficient rs (Zar 1972) to evaluate the strength of the correlation. For two variables, rs describes their correlation with a value between -1 and 1. The values 1 and -1 describes a very strong positive or negative correlation, respectively. rs = 0 indicates that the variables have no correlation at all. We expect that Spearman’s coefficient provides more sensitive results (compared to Pearson’s coefficient) due to the following reasons (Artusi et al. 2002). First, Spearman’s coefficient is suitable to detect non-linear relationships between the variables, and it is possible that the correlation between a metric and the runtimes is not linear. Second, there may be significant outliers which tend to cause problems for the expressiveness of Pearson’s coefficient. Table 6 shows the levels of correlation strength for the Spearman’s coefficient rs we use. We use the scipy
v1.7.2 implementation of Spearman to compute the correlation coefficients16.
Correlation | Value range |
---|---|
Very weak | 0 ≤|rs| < 0.2 |
Weak | 0.2 ≤|rs| < 0.4 |
Moderate | 0.4 ≤|rs| < 0.6 |
Strong | 0.6 ≤|rs| < 0.8 |
Very strong | 0.8 ≤|rs|≤ 1.0 |
Effect size | Value range |
---|---|
Very small | 0 ≤ d < 0.2 |
Small | 0.2 ≤ d < 0.5 |
Medium | 0.5 ≤ d < 0.8 |
Large | 0.8 ≤ d < 1.3 |
Very large | 1.3 ≤ d |
5.5 Technical setup
#
SAT solvers. For the measurements, we implemented a Python framework to (1) call the solver binaries and provide the input, (2) measure the runtimes with the timeit module,17 and (3) limit the memory usage. For reproducibility, the framework, solvers, and input data are publicly available.18Hyper-threading, turboboost, and caching of the file system were disabled during the entire measurements to reduce computational bias. Furthermore, no other major computations were run on the system during the experiments.6 Results
6.1 Experiment one: Exact #
SAT solvers
#
SAT solvers on each subject system. Each point on the x-axis corresponds to one of the 15 subject systems. The systems are sorted by the number of features in ascending order (cf. Table 3). The y-axis shows the runtime of the different solvers with a logarithmic scale. The different categories are indicated by the colors of the markers (
= DPLL,
= algebraic,
= d-DNNF,
= BDD,
= knowledge compilers to other formats). The red line indicates that a solver hits the timeout. The blue line indicates that an error occurred or a solver passed the memory limit. CDL Median corresponds to the median over all 116 CDL feature models (cf. Section 5.3). The majority of systems (13/15) was successfully evaluated within 10 minutes by at least one solver. For each of the 13 solved systems, the fastest solver required less than one second. However, none of the solvers was able to compute the cardinality of the other two systems, namely Automotive05 and Linux.
#
SAT solver on all 15 subject systems in Experiment 1a. Each bar corresponds to the sum of runtimes for one solver. Note that this sum only includes the median of runtimes for the 116 CDL models instead of the overall sum (cf. Section 5.3). Considering a timeout of 10 minutes per system, the maximum runtime is 150 minutes (hitting the timeout for all 15 systems) which is indicated by the red line. The solvers are sorted by the overall sum of runtimes (ascending). If a subject system could not be evaluated due to timeout, memory limit, or an arbitrary error, we added the timeout (10 minutes) to the overall runtime. Table 8 gives an overview of the performance for the different solvers. Eight of the solvers, namely sharpSAT
, Ganak
, countAntom
(both with four and one thread), d4
, Cachet
, dSharp
, MiniC2D
, and c2d
evaluated 13 out of 15 (86.7%) subject systems within eleven minutes of runtime. The eleven slower solvers, namely McTW
, Relsat
, ADDMC
, CNF2EADT
, SDD
, CNF2OBDD
, SUMC1
, BuDDy
, Cudd
, SharpCDCL
, and PicoSAT
, successfully evaluated at most 73.3% of the 15 subject systems with a timeout of ten minutes for each model. Furthermore, the fastest of the slower solvers (McTW
) requires around 60% more runtime than the slowest of the faster solvers (c2d
). Overall, the eleven slower solvers took 96.8% of the total runtime (10.2 days) required for Experiment 1a. Performing the 50 repetitions with all solvers would result in around 1.4 years of continuous computation just for Experiment 1b. For all following experiments, we only include the eight fastest #
SAT solvers. In Table 8 the excluded solvers are marked with an
-mark in the column remaining. In Fig. 3, the dashed violet line marks the cut for the excluded solvers. Each solver on right side of the line is excluded from the following experiments.
Solver | Solved | % Solved | Ov. Runtime (s) | Remaining |
---|---|---|---|---|
sharpSAT | 13 | 87 | 1,202.6 | |
Ganak | 13 | 87 | 1,203.4 | |
countAntom4t | 13 | 87 | 1,207.8 | |
countAntom1t | 13 | 87 | 1,210.2 | |
dSharp | 13 | 87 | 1,212.7 | |
d4 | 13 | 87 | 1,230.1 | |
Cachet | 13 | 87 | 1,339.9 | |
miniC2D | 13 | 87 | 1,733.6 | |
c2d | 13 | 87 | 1,818.9 | |
mcTw | 11 | 73 | 2,892.0 | |
Relsat | 9 | 60 | 3,602.4 | |
ADDMC | 9 | 60 | 3,625.0 | |
CNF2EADT | 8 | 53 | 4,504.1 | |
SDD | 8 | 53 | 4,705.7 | |
CNF2OBDD | 6 | 33 | 5,558.1 | |
SUMC1 | 6 | 33 | 5,721.8 | |
BuDDy | 3 | 20 | 7,206.6 | |
Cudd | 3 | 20 | 7,206.8 | |
SharpCDCL | 0 | 0 | 9,000.0 | |
PicoSAT | 0 | 0 | 9,000.0 |
#
SAT solver successfully evaluated at most 6 of the 15 systems and required at least 92 minutes overall. Every d-DNNF-based solver needed less than 31 minutes for all subject systems and only failed to evaluate Linux and Automotive05.#
SAT solvers for the 116 CDL feature models. Each point on the x-axis corresponds to one CDL model. The y-axis shows the runtime of different solvers in seconds with a logarithmic scale. For all solvers but Cachet
, the median runtime over all 116 feature models is smaller than two times the minimum value (i.e., the shortest runtime required for one of the 116 feature models for that solver). Also, the maximum is always smaller than two times the median. The results support our claim in Section 5.3 that CDL models are highly similar and handling them as separate subject systems would result in a bias of the measured runtimes.
sharpSAT
(2.5 seconds), Ganak
(3.3 seconds), and countAntom
(7.8 seconds). Over the 13 systems, sharpSAT
is significantly (p < 0.03) faster than every #
SAT solver but Ganak
, Cachet
, and dSharp
. However, each effect size is small (d < 0.47) which matches the expectations as the large differences in runtime between the subject systems for each solver result in a large standard deviation.
sharpSAT
is significantly (p < 0.004) faster with mostly (88.1%) very large effect sizes (d > 1.53) than all other solvers on 6 of the 13 systems. Ganak
is significantly (p < 10− 11) faster than all other #
SAT solvers with very large effect sizes (d > 1.61) for Automotive03. countAntom
is significantly faster (p < 10− 17) than all other solvers with very large effect sizes (d > 1.73) on all 116 CDL models but for no other system. Cachet
is significantly (p < 10− 7) faster than all other solvers for three smaller (less than 1,200 features) systems, namely axTLS, embToolkit, and BerkeleyDB with all effect sizes being very large (d > 1.38) but one with a medium effect size (d = 0.78).countAntom
with four and one thread. countAntom
with four threads is significantly (p = 0.028) faster than with one thread for the 13 systems overall. Still, countAntom
with one thread is significantly (p < 0.036) faster for three subject systems, namely embtoolkit (d = 0.60), uClinux-base (d = 0.82), and Automotive03 (d = 0.21). Overall, countAntom
with four and one thread require 7.8 and 9.2 seconds of runtime, respectively.
#
SAT solvers. ’Correlation Fastest’ shows the correlation between each metric and the runtime of the fastest solver for each instance. Note that the fastest solver varies depending on the evaluated feature model. There is a very strong positive (rs > 0.8) correlation between the runtime and the following metrics: number of features, number of leaf features, number of cross-tree constraints, number of literals, and number of clauses. Consequently, for instance, the runtime of #
SAT solvers tends to increase if the number of features increases. Also, there is a strong correlation between the cyclomatic complexity and the runtime. Every other metric correlates either weakly (0.2–0.39) or very weakly (rs < 0.2) with the runtime of the fastest solver. ’Correlation Range’ shows the minimum and maximum correlation between a metric and a solver. For every metric that has a strong correlation with the runtime of the fastest solver, each solver has an at least strong correlation. This observation is analogous for weakly correlated metrics with one exception (countAntom
has a moderate correlation with the connectivity density).
#
SAT solversMetric | Coefficient fastest | Coefficient range |
---|---|---|
Number of literals | 0.89 (very strong) | 0.82 (very strong)–0.89 (very strong) |
Number of clauses | 0.87 (very strong) | 0.80 (very strong)–0.87 (very strong) |
Number of features | 0.85 (very strong) | 0.73 (strong) –0.94 (very strong) |
Number of leaf features | 0.82 (very strong) | 0.68 (strong) –0.92 (very strong) |
Number of constraints | 0.81 (very strong) | 0.67 (strong)–0.84 (very strong) |
Cyclomatic complexity | 0.77 (strong) | 0.64 (strong)–0.79 (strong) |
Tree depth | 0.34 (weak) | 0.29 (weak)–0.39 (weak) |
Connectivity density | 0.20 (weak) | 0.11 (very weak)–0.57 (moderate) |
Ratio of variability | 0.11 (very weak) | -0.01 (very weak)–0.27 (weak) |
Number of top features | 0.06 (very weak) | -0.01 (very weak)–0.17 (very weak) |
Flexibility of configuration | 0.07 (very weak) | -0.01 (very weak)–0.18 (very weak) |
#
SAT solvers and both metrics (i.e., number of features and constraints), a feature model with respectively more features or constraints does not guarantee a longer runtime. The two systems that reached the timeout, namely Linux and Automotive05, contain 6,467 and 1,663 features. Automotive02 which contains 18,616 features was evaluated within 0.5 seconds. It is important to note that Automotive02 contains only 1,369 constraints while Linux and Automotive05 contain 3,545 and 10,321 constraints, respectively. Also, uClinux-base contains 3,455 constraints, but the fastest solver is about 50 times faster than for Automotive02.
#
Features)Subject systems | Number of valid configurations |
---|---|
BerkeleyDB | 4.1 ⋅ 109 |
axTLS | 8.3 ⋅ 1011 |
uClibc | 1.7 ⋅ 1040 |
uClinux-base | 2.6 ⋅ 1022 |
Automotive04 | 2.5 ⋅ 1021 |
Automotive03 | 2.5 ⋅ 1031 |
BusyBox | 2.1 ⋅ 10201 |
FinancialServices | 9.7 ⋅ 1013 |
Embtoolkit | 5.1 ⋅ 1096 |
CDL (116 Models) | 2.6 ⋅ 10118 – 3.0 ⋅ 10136 |
uClinux-distribution | 4.1 ⋅ 10409 |
Automotive05 | unknown |
Automotive01 | 5.4 ⋅ 10217 |
Linux | unknown |
Automotive02 | 1.7 ⋅ 101534 |
6.2 Experiment two: approximate #
SAT solvers
#
SAT solvers on each feature model. ApproxMC
hit the timeout of ten minutes for each but the two smallest models, namely axTLS (96 features) and BerkeleyDB (76 features). Consequently, ApproxMC
was excluded for Experiment 2b. ApproxCount
hit the timeout for four subject systems.
#
SAT solver (ApproxCount
) with the exact #
SAT solver that required the least time overall, namely sharpSAT
. ApproxCount
hit the timeout for four subject systems, while sharpSAT
hit the timeout for two subject systems. For all 13 feature models that were successfully evaluated by at least one solver, sharpSAT
is significantly faster than ApproxCount
(p < 0.0002) with very large (d > 5.1) effect sizes for every single feature model. Furthermore, ApproxCount
needs 5 minutes for 11 out of 15 systems while sharpSAT
requires less than 2 seconds for those.
7 Discussion
#
SAT solvers perform on industrial feature models? Our results indicate that the scalability of the #
SAT solvers depends on the evaluated feature model. Based on our results, we expect that most industrial feature models can be evaluated within minutes or even seconds by the faster #
SAT solvers we identified. Overall, 13 of the 15 analyzed feature models were successfully evaluated within 10 minutes. In addition, the fastest solver for each of those feature models required even less than one second which we consider scalable as it satisfies typical time restrictions of interactive environments and continuous integration environments. Nevertheless, there are systems for which no available #
SAT solver scales. In our experiment, two systems, namely Automotive05 and Linux, could not be evaluated by any solver not even within a timeout of 24 hours. Our results indicate that the hardness of both systems lies in their high number of features and constraints (c.f. the answer for RQ3).sharpSAT
, Ganak
, countAntom
, dSharp
, d4
, Cachet
, MiniC2D
, and c2d
, successfully evaluated 13 of 15 systems within 10 minutes of overall runtime. sharpSAT
requires the least time to evaluate the 13 subject systems (2.6 seconds overall) closely followed by Ganak
(3.4 seconds). While some solvers performed overall better than others, none of the solver is superior to the other solvers on every feature model. The results indicate that some solvers are inferior regarding the task of computing the cardinality of feature models, namely PicoSAT
, Relsat
, SharpCDCL
, McTW
, SUMC1
, CNF2OBDD
, BuDDy
, Cudd
, CNF2EADT
, and SDD
. Those solvers hit the timeout for at least four subject systems and some even for all systems while being substantially slower for the systems they successfully evaluated.#
SAT solvers perform on industrial feature models? For single #
SAT invocations, as performed in the experiment design at hand, we recommend the usage of the fastest DPLL-based solvers. The three best performing solvers, namely sharpSAT
, Ganak
, and countAntom
are based on exhaustive DPLL.#
SAT invocations, reusing d-DNNFs seems promising. All d-DNNF compilers are part of the eight fastest solvers. For each feature model, that was successfully evaluated by at least one solver, the fastest d-DNNF-based solvers dSharp
and d4
require at most a few seconds in sum for compilation and model counting. For each follow-up computation, the compiled d-DNNF could be re-used (e.g., for computing the number of valid configurations containing certain features). Hence, we expect d-DNNF solvers are likely faster when performing multiple computations, which is required for the majority of counting-based analyses (Sundermann et al. 2021). SDDs can also be re-used and, thus, are a considerable candidate but the best performing SDD-based solver (MiniC2D
) was substantially slower than dSharp
(42 times slower) and d4
(18 times slower).#
SAT solvers, namely algebraic-based, BDDs, and other knowledge compilation formats performed substantially worse than the eight fastest solvers. Both algebraic solvers, namely ADDMC
and SUMC1
, overall successfully evaluated only nine (60%) of the subject systems. Hence, we excluded both solvers after Experiment 1a, and we cannot recommend using these solvers for #
SAT-based analysis of feature models. The three BDD libraries overall successfully evaluated only six (40%) subject systems. BuDDy
and Cudd
even hit the timeout for 12 of 15 subject systems. Therefore, we do not recommend to use current BDD libraries for computing the cardinality of feature models. Nevertheless, BDDs are tractable (i.e., have polynomial time complexity w.r.t. to the size of the BDD) for additional computation types, such as existential quantification (Bryant 2018). Using BDDs for other feature-model analyses may thus still be beneficial.#
SAT solvers correlate to structural metrics of the feature model? The runtime required to compute the cardinality of a feature model generally increases if the feature model grows in size or complexity. There is a strong or very strong positive correlation between the runtime of #
SAT solvers and several structural metrics related to size and complexity, namely number of features, number of leaf features, number of constraints, number of clauses, and number of literals.#
SAT solvers. Each subject system with less than 1,000 features was evaluated within one second by at least one solver, independent of the number of constraints. Analogously, all subject systems with less than 1,000 constraints were evaluated by at least one solver within one second, independent of the number of features.sharpSAT
still evaluated it in less than a second. The reason probably lies in the comparatively low number of constraints (1,369) while Linux and Automotive05 contain 3,545 and 10,321 constraints, respectively. Furthermore, uClinux-base contains 3,455 constraints but the fastest solver is about 50 times faster than for Automotive02 which contains only 1,369 constraints.#
SAT solver computes a result. Thus, we expect that if the fastest solvers do not scale to a feature model, the others will also fail.countAntom
is slower than sharpSAT
for 12 out 13 (successfully evaluated) subject systems, but significantly faster for all 116 CDL feature models. Applying a meta #
SAT solver that selects a suitable #
SAT solver for a given feature model should yield runtime benefits. Our insights on the correlations between structural metrics and runtime may be a useful starting point for future work on predicting performance. In particular, the metrics showing a strong correlation are promising indicators for predicting performance.#
SAT solvers perform on industrial feature models? Approximating the results with the evaluated approximate #
SAT solvers yields no benefits as we can acquire exact results with shorter runtimes. In particular, the fastest exact solver sharpSAT
is significantly faster than both approximate #
SAT solvers for every single successfully evaluated feature model. The slower solver ApproxMC
computed a result only for the two smallest considered feature models. While ApproxCount
computed a result for the majority of models, it scaled to fewer feature models than the fastest exact #
SAT solver sharpSAT
.#
SAT solvers may be that the solvers were evaluated on (and eventually optimized for) formulas from different domains with generally fewer satisfying assignments. The largest formulas evaluated induce up to 1012 (Chakraborty et al. 2013) and up to 1033 (Wei and Selman 2005) satisfying assignments, respectively (compared to up to 101534 in our evaluation). Optimizing those approximations for formulas representing feature models may be beneficial.8 Threats to validity
#
SAT solvers computed the expected result.#
SAT solvers as they use the same CNF. Furthermore, for all feature models but Linux the translation required only a few milliseconds.BuDDy
and Cudd
, due to their incapability to process DIMACS directly. The implementation of our wrapper for BuDDy
and Cudd
could be erroneous or inefficient, yielding a negative impact on their performance. While parsing of the input is handled by the wrapper, the BDDs are constructed entirely by BuDDy
and Cudd
using the parameters and techniques suggested in their respective manuals. For each successful computation of BuDDy
and Cudd
, the returned number of satisfying assignments was correct. Furthermore, for every feature model the parsing time of the wrapper required less than one second and at most 10% of the overall runtime. Note that each time BuDDy
or Cudd
required more than one second of runtime the relative share of parsing time is even lower (at most 2%). Thus, we consider the parsing time of the DIMACS input is negligible compared to the overall runtime and do not expect an impact on our conclusions on the performance of BuDDy
or Cudd
. Furthermore, we decided against using the widely used (Mendonça 2009; Mendonca and Cowan 2010; Pohl et al. 2011) library JavaBDD as it misses support for the latest version of Cudd
(3.0.0) and frequently crashes when using BuDDy
.#
SAT solvers.#
SAT solver is performed in a separate execution of the binary. Thus, the solvers are in the same state at the start of each computation. It may be possible that hardware optimizations induce a warm-up. We disabled turboboost and file system cache to reduce a potential bias.#
SAT solvers.#
SAT solvers are affected by random effects. For example, c2d
(Darwiche 2002, 2004) randomly chooses cuts in order to create a decomposition tree of the formula at the start of the computation. To reduce the bias resulting from randomness, we performed 50 repetitions in Experiment 1b and Experiment 2b and performed statistical tests on the significance of results.#
SAT solvers. For instance, Kübler etal (Kübler et al. 2010) developed their own tool to compute the cardinality of feature models. Their tool is not publicly available and, thus, we could not evaluate and compare it to other solvers. Nevertheless, we evaluated a large variety of different #
SAT solvers. To the best of our knowledge, we included each publicly available #
SAT solver in our benchmark.#
SAT solvers (between few milliseconds and hitting a timeout of 24 hours). Therefore, we expect that our results represent a reasonable indicator for the scalability of #
SAT solvers on other product lines.9 Related work
#
SAT to feature models, (2) usage of #
SAT technology in feature-modelling tools, and (3) computing the cardinality with tools that are not based on propositional logic.#
SAT to feature models#
SAT solvers, Cachet
(Sang et al. 2004) and c2d
(Darwiche 2004), on three different versions of an automotive product line. We evaluated both solvers and they were outperformed by newer solvers on most instances. However, the authors also proposed their own model counter that was not based on conjunctive normal form and performed better than Cachet
and c2d
. However, their solver and their evaluated product lines are not publicly available. Therefore, we could not directly compare the results. Overall, we evaluated 21 solvers on 130 formulas while Kübler et al. evaluated 3 solvers on 3 formulas.#
SAT for uniform random sampling with their tool Smarch. Their results indicate that #
SAT can be used to create a uniformly distributed sample for a variety of industrial feature models. However, their evaluation is limited to one application (uniform random sampling) and limited to one solver (sharpSAT
). Sharma et al. (2018) proposed using #
SAT technology for uniform random sampling and provided an algorithm exploiting d-DNNFs. However, their empirical evaluation is also limited to uniform random sampling and two solvers (d4
and dSharp
). We evaluate 21 solvers including the three solvers considered by Oh et al. (2019) and Sharma et al. (2018).#
SAT technologySAT4J
(Le Berre and Parrain 2010)) to compute the number of valid configurations (Thüm et al. 2011). The tool realizes counting with a regular SAT solver using blocking clauses (Toda and Soh 2016); after finding a valid assignment α, the negation of α is added as a clause to the formula. Thus, α is not a valid assignment for the resulting formula and the next run of the solver returns another assignment until no new satisfying assignments are left. For each satisfying assignment (i.e., valid configuration), an invocation of the SAT solver is required. Our results indicate that industrial feature models induce up to 101500 valid configurations. Therefore, the algorithm should not scale for larger systems.#
SAT solvers evaluated in our experiment (Pohl et al. 2011). Munoz et al. (2019) examined counting the number of valid configurations of feature models with numerical features for uniform random sampling. The authors evaluated an SMT solver, a CP solver, and the #
SAT solver sharpSAT
. The numerical values were translated to propositional logic using bit-blasting (Munoz et al. 2019). In their experiment, sharpSAT
outperformed the CP and SMT solver. This indicates that #
SAT solvers are also a reasonable choice for computing the number of valid configurations for feature models with numerical values and our results (e.g., recommendations of solvers) could also be useful for non-propositional model counting.10 Future work
#
SAT solvers to industrial feature models.#
SAT during the evolution of systems#
SAT and both metrics (i.e., number of features and number of constraints), the evolution of a system may increase the runtime required to evaluate an underlying feature model with a #
SAT solver. This is also indicated by the preliminary results of our previous work (Sundermann et al. 2020). If a product lines evolves over time, even product lines for which #
SAT solvers scale currently may be infeasible to analyze in the future or vice versa.dSharp
, d4
, and c2d
were part of the eight fastest solvers. If we require multiple computations on a single feature model (e.g., to compute the cardinalities for multiple features or partial configurations), exploiting a compiled d-DNNF may be beneficial. However, the research on exploiting an existing d-DNNF is very limited (Sharma et al. 2018) as most work focuses on the compilation process (Darwiche 2002, 2004; Lagniez and Marquis 2017; Oztok and Darwiche 2014; Muise et al. 2010; Huang and Darwiche 2005). While SDDs and BDDs are also considerable target formats for knowledge compilation, all compilers based on these formats performed significantly worse than dSharp
and d4
.#
SAT solvers#
SAT solvers using the default parameters with a few exceptions (e.g., some solvers require specific parameters to perform #
SAT instead of SAT). Other parameterizations (e.g., selecting strategies for variable ordering) may improve the performance of #
SAT solvers. Especially, the runtime of approximate #
SAT solvers is dependent on the given parameters. However, identifying effective parameters is not trivial. To use #
SAT solvers to their full potential requires finding suitable parameters that result in efficient and effective computations.#
SAT solvers and knowledge compilers is optimized for feature models. Optimizing the computations specifically for feature models may improve the performance of solvers. One improvement may be deriving beneficial variable orders using structural information of the feature model. The performance of each considered type of solver is highly dependent on variable ordering (Sang et al. 2005; Muise et al. 2010; Thurley 2006; Wei and Selman 2005; Darwiche 2002; Toda and Soh 2016).11 Conclusion
#
SAT solvers on the task of computing the cardinality of industrial feature models. Overall, we evaluated the #
SAT solvers on 130 feature models from 15 subject systems.#
SAT solvers scale to many, but not to all systems. Out of the 15 evaluated systems, eight solvers computed the cardinality of 13 (86.7%) systems within 10 minutes per system. The solver with the overall shortest runtime is sharpSAT
requiring less than three seconds for all 13 models in total. However, for the two remaining systems, namely Linux and Automotive05, none of the solvers was able to compute a result within 24 hours of runtime.#
SAT solvers for the task of computing the cardinality of feature models. For single #
SAT computations on feature models, we recommend using the DPLL-based solvers sharpSAT
, countAntom
, and Ganak
. For applications requiring multiple #
SAT invocations, reusing d-DNNFs seems promising. All three considered d-DNNF compilers, namely dSharp
, d4
, and c2d
, were within the fastest eight solvers. Surprisingly, each approximate #
SAT solver we evaluated is significantly slower than the fastest exact #
SAT solver for every considered feature model and, thus, yields no benefits over the exact solvers.#
SAT solvers tends to increase for feature models with a larger number of constraints or features. Each feature model with either fewer than 1,000 features or fewer than 1,000 constraints was evaluated within one second by the solver with the shortest runtime for that feature model. Nevertheless, the results indicate that a higher number of constraints or features does not necessarily result in longer runtimes.