3.2 Fuzzy-Set Qualitative Comparative Analysis
To derive DT strategy configurations, we employed fsQCA, a set-theoretic configurational approach. While fsQCA is well suited for small to medium-sized samples (11–50 cases) as well as for large samples (>50 cases), its aims and potential contributions may vary (Greckhamer et al.
2018). When performed on large samples, it can be used for both theory building and testing with the possibility to draw statistical inferences (Greckhamer et al.
2013). Small samples, on the other hand, are particularly well-suited for inductive reasoning and theory building due to a higher familiarity with the cases (Greckhamer et al.
2013). It should furthermore be noted that fsQCA uses an approach known as “modest generalization” (Berg-Schlosser et al.
2009, p. 12). This means that a researcher can build propositions based on an fsQCA and then apply them to cases sharing similar characteristics (Berg-Schlosser et al.
2009). On the one hand, this may be a more limited approach than the one used by regression-based methods since it is more difficult to generalize based on a whole population. On the other hand, this approach is also more robust than drawing generalizations from multiple-case studies with even smaller datasets. In our study, we opted for a small-sized sample since we were mainly interested in theory building due to the scarcity of previous research. FsQCA consists of three subsequent steps: assignment of fuzzy-set membership scores to cases (also known as
calibration),
identification of necessary conditions, and
identification of sufficient configurations (Ragin
2009). We used the fsQCA R package to complete all three steps (Duşa
2019). Table
1 provides an overview of our causal conditions and the outcome along with definitions and selected key sources based on the framework derived in the preceding section. Although the threat of digital disruption cannot be actively controlled by an organization, we include it as an element of potential configurations since we expect them to vary depending on the degree of the threat.
Table 1
Overview of coding elements
Causal conditions | Structural separation | Separation of innovation-related activities into distinct organizational units | |
| Centralization of decision-making | “Decision-making power resides in the hands of a selected few at the upper levels of an organization” (Wong et al. 2011, p. 1210) | Jansen et al.. ( 2006), Mihalache et al. ( 2014), Wong et al. ( 2011), Guadalupe et al. ( 2014) |
| Strategic outsourcing | Reliance of an organization on external partnerships to carry out service innovation | Hottenrott and Lopes-Bento ( 2016), Teece ( 1996), Vial ( 2019), Bouncken and Fredrich ( 2016) |
| Threat of digital disruption | Threat to the core business of an organization posed by new/established market entrants using digital technologies | Skog et al. ( 2018), Matt et al. ( 2015), Leonhardt et al. ( 2018) |
Outcome | Digital service innovation | Successful introduction of new services based on digital technologies | Barrett et al. ( 2015), Goduscheit and Faullant ( 2018) |
FsQCA uses fuzzy-set membership scores ranging between 0 and 1 to determine the degree to which a case is a member of a set (Ragin
2008). For each case and each dimension/outcome, a fuzzy-set membership score is assigned during the
calibration phase (Ragin
2008). Procedures for calibration typically vary with the sample size. Analyses with large samples are most prevalent in IS and business and management research and are typically combined with questionnaire-based surveys or other quantitative data (Soto Setzke et al.
2020; Wagemann et al.
2016). Calibrating this data is often straightforward and includes choosing appropriate thresholds for Likert scales or quantitative data. Smaller sample sizes, on the other hand, typically involve a considerable amount of qualitative, unstructured data. Calibration of this data is quite challenging since few guidelines can be followed and the results may suffer from subjectivity (de Block and Vis
2019). Therefore, several methodological articles providing guidelines regarding the calibration of qualitative data have been published over the last years (see, e.g., Basurto and Speer (
2012); Tóth et al. (
2017); Nishant and Ravishankar (
2020)).
For this paper, we adopted the methodological guidelines proposed by Basurto and Speer (
2012) and closely followed an exemplary application by Iannacci and Cornford (
2018). To calibrate data collected through interviews, they suggest the use of “theoretical ideals” as “the best imaginable case in the context of the study that is logically and socially possible” (Basurto and Speer
2012, p. 166). We defined two ideal cases per condition: a “fully in” case that represents definite full membership in the set and (fuzzy value of 1) a “fully out” case that represents definitive non-membership (fuzzy value of 0). Based on these ideal types, we defined a threshold condition that served as our indicator for deciding for or against inclusion in the set. Lastly, we defined how much a case could deviate from a “fully in” or “fully out” case without passing the threshold. Based on these definitions, we assigned fuzzy values 0.33 and 0.66, thus using a fuzzy 4-value scheme (Tóth et al.
2017). Based on the summary statements of the cases, each case can be calibrated according to the previously proposed ideal types. In the following, we explain our rationale for creating “fully in” and “fully out” cases as well as the threshold conditions.
Structural separation: as our ideal “fully in” case, we defined an organization that completely separated its innovation activities into one or more spin-off organizations. To distinguish the relationship between the main organization and spin-offs from partnerships with external organizations, we account for the fact that these innovation activities may still partly be coordinated by the main organization. For our “fully out” case, no new structures should have been created, neither in the form of spin-offs nor internal units. As a threshold, we chose the condition that spin-offs were created since they demark a major structural separation from the core business (Corley and Gioia
2004). Therefore, smaller structural changes such as creating new digital business units were counted as being more out than in, while creating spin-offs that were still mainly controlled by the main organization were calibrated as more in than out.
Centralization of decision-making: in our ideal “fully in” case, decision-making is entirely centralized in one executive at the highest management level, i.e. the “C-suite”. Our “fully out” case is characterized by a team lead or no specific role at all. Building upon these cases, we defined the threshold to indicate whether decision-making is done in the C-suite or at a lower management level (Guadalupe et al.
2014). Therefore, cases, where a manager or a business unit leader is responsible, were coded as more out than in. Accordingly, cases where a team of different C-level executives and, potentially, managers were responsible, were coded as “more in than out,” since these are part of the C-suite but represent a lower degree of centralization.
Strategic outsourcing: Our “fully in” case represents organizations that rely completely on external partnerships while our “fully out” case represents organizations that do not rely on external partnerships at all. Since partnerships are very common for implementing DT strategies (Vial
2019), we concluded that, apart from the “fully out” case, partnerships would very likely be a part of the majority of DT cases. Thus, we decided to let the threshold indicate to what degree partnerships are used. We coded cases as more out than in if partnerships were used only to implement certain key aspects, but the main effort was still done by the main organization. Accordingly, if the effort was distributed differently, we coded the case as more in than out.
Threat of digital disruption: In the “fully in” case, organizations face an imminent threat of being disrupted while in the “fully out” case, they do not face any considerable threats of disruption in the foreseeable future. We decided to use the timeframe of potential disruption as a threshold: organizations that may face disruption in the long term (5–10 years) were coded as more out than in, while organizations, where disruption may be relevant in the short term (3–5 years), were coded as more in than out.
Digital service innovation: Our “fully in” case represents radical service innovations that are new to the respective industry while our “fully out case” represents cases where ultimately, no new services were launched. Since we are interested in radical innovation, we decided to use the notion of radical innovation as our threshold. If new services had been introduced but they represented mostly incremental improvements of already existing service concepts, they were coded as more out than in. On the other hand, if the organization introduced rather radical services, we classified them as more in than out. Accordingly, for our outcome, we define radical innovation as successful and incremental innovation as unsuccessful.
To facilitate the coding process, we prepared summary statements for each case along with relevant quotes for each dimension. It should be noted that some distinctions may seem subjective and difficult to code, particularly the fine-grained edge cases between fully and more out than in as well as fully in and more in than out. To mitigate this potential imprecision introduced by subjectivity, two authors and another researcher independently calibrated each condition and the outcome for each case, using the ideal types and the respective fuzzy values. Afterward, we assessed interrater reliability for each dimension among all cases by using Krippendorff’s alpha, a measure that checks for chance coincidences (Krippendorff
2018). After coding, interrater reliability exceeded the most conservative threshold of 0.8 for all dimensions. Still, differences in assigned membership scores remained. The researchers then resolved these differences through oral discussion (Krippendorff
2018). For the case Kappa and the condition “Centralization of decision-making”, for example, two researchers assigned a fuzzy value of 0.33 and one assigned a value of 0.66. The discussion then revolved around a quote in which the project lead of the DT strategy stated that he reports to the executive board of Kappa to ensure support for the strategy. During coding, the third researcher concluded that therefore, at least one C-level executive was responsible for the strategy (i.e., a fuzzy value of 0.66). However, the two other researchers argued that the project lead was merely reporting and ensuring support to secure resources for strategy implementation, but the main responsibility was still assigned to the project lead (i.e., a fuzzy value of 0.33). Eventually, the third researcher was convinced and all three agreed on using a fuzzy value of 0.33.
To provide transparency, we provide additional information on the coding process in the appendix. Appendix Table
6 provides a detailed overview of our ideal cases and the conditions that were used to assign fuzzy values based on extant literature along with the value of Krippendorff’s alpha for each dimension. An illustrative example of how fuzzy-set membership scores were assigned to the condition “strategic outsourcing” is shown in Appendix Table
7. Furthermore, Appendix Table
8 shows an example of how case Rho was calibrated. A full overview of membership scores for all cases and dimensions can be found in Appendix Table
9. All other data is available upon request from the authors.
Necessary condition analysis reveals conditions that are present in every case; thus, resulting in a specific outcome. More specifically, this means that the fuzzy-set membership score of the outcome in each case is less than the score of the necessary condition (Schneider and Wagemann
2012). To be considered a necessary condition, a consistency threshold of at least 0.9 should be reached (Schneider and Wagemann
2012). Consistency refers to the degree to which cases with the same conditions share the same outcome (Ragin
2008). Furthermore, the coverage value (i.e., the proportion of the outcome covered by a specific condition) should be assessed for each necessary condition to determine its empirical relevance (Schneider and Wagemann
2012). While necessary conditions are always present when a specific outcome occurs, the condition could also be present while the outcome is not (Ragin
2008). Thus, we proceeded to identify sufficient configurations.
Sufficiency analysis reveals configurations of conditions that guarantee a specific outcome if present in a case (Ragin
2008). Unlike necessary conditions, however, a specific configuration does not always have to be present to produce the outcome. Thus, there can be multiple configurations leading to the same outcome. We first constructed two truth tables showing all 16 (2
k, where k equals the number of conditions) possible configurations of conditions for both outcomes (see Tables
10 and
11 in the appendix). Afterward, we reduced the table by applying the threshold of frequency, raw consistency, and PRI consistency. Since our sample of 17 cases can be classified as medium-sized, we employed a frequency threshold of one (Greckhamer et al.
2013). Thus, configurations that are represented by at least one empirical observation are kept in the truth table. For the raw consistency threshold, we chose a value of 0.85, exceeding the widely accepted conservative threshold of 0.75 (Schneider and Wagemann
2012). As described before, raw consistency assesses the degree of how reliably a configuration results in the outcome and can roughly be compared to the notion of significance in regression analysis (Park et al.
2017). PRI consistency is an alternative consistency measure that “eliminates the influence of cases that have simultaneous membership in both the outcome and its complement” (Park et al.
2017). While there is currently no widely accepted threshold of PRI consistency, we followed the guidelines from Schneider and Wagemann (
2012) and apply a threshold of 0.65. Having reduced the truth table by applying thresholds of frequency, raw consistency, and PRI consistency, we applied the Quine-McCluskey algorithm to further reduce and simplify the remaining truth table. Afterward, we were left with configurations of conditions that lead to our outcome in question (Ragin
2008).
Finally, researchers should test for predictive validity, which “examines how well the model predicts the outcome in additional samples” (Pappas et al.
2017, p. 674; Woodside
2014). While a model may exhibit high values of consistency and coverage for a given sample, this does not necessarily mean that it is also able to make good predictions. To perform the test, the sample is first divided into a subsample and a holdout sample. The researcher then runs the analysis against the subsample and recodes all resulting configurations as a new variable. Each configuration variable is then plotted against the outcome of interest using the holdout sample. To guarantee high predictive validity, the resulting consistency and coverage should not contradict the values from the solution (Pappas et al.
2017).