Subsequently, we provide a comprehensive overview of the theoretical background and our research methodology. First, we discuss scientific and application-oriented literature regarding smart factory networks, and specify the associated role of IT systems. Then, we substantiate the significance of related IT availability risks, and define central requirements for an adequate risk assessment approach regarding IT availability risks in smart factory networks. Second, we examine the corresponding literature, and carve out the research gap. And third, we outline the methodological approach applied to address this research gap.
2.1 Smart factory networks and corresponding IT availability risks
Given the advancements of smart manufacturing technologies and the innovative nature of smart factory networks, scientific literature is constantly evolving and contains a diverse body of literature (e.g., see Haller et al.
2009; Iansiti and Lakhani
2014; Turber and Smiela
2014; Strozzi et al.
2017). Further, there are numerous studies and application-oriented examples of research institutes exploring and describing the implementation of smart manufacturing technologies (e.g., see Hessman
2013; Lucke et al.
2008; Radziwon et al.
2014; Yoon et al.
2012; Zuehlke
2010; Shariatzadeh et al.
2016; Zhong et al.
2017). In corporate practice, we can observe that IoT-based technological solutions such as radio frequency identification (RFID) are widely implemented enabling, for example, the real-time acquisition of data and the real-time monitoring of objects within production processes (Lucke et al.
2008; Fleisch and Thiesse
2007; Zhong et al.
2017). However, the comprehensive and holistic implementation of smart manufacturing technologies in production facilities serving as test beds remains object to laboratory research facilities, such as
SmartFactoryKL, or pilot facilities, such as the
Siemens Electronic Works Facility or the
WITTENSTEIN bastian’ Production Facility (Hessman
2013; Zuehlke
2010; Schlick et al.
2014). This was also found in a dynamic literature review performed by Strozzi et al. (
2017). To structure the diverse body of literature on smart factories, they performed a combination of systemic literature review and bibliographic network analysis. Thereby, they revealed that the biggest literature stream focusses on RFID technology and agent-based intelligent decision support system architecture, both aspects concerning monitoring and scheduling of production processes. Further, they found that research focusses on “models, frameworks, and architectures related to the implementation of the Smart Factory […], along with high-level ‘landscape’ analyses.” A recent example of such research is the work of Jung et al. (
2017), in which a reference factory design and improvement activity model is introduced for designing new and improving existing factories. The model highlights interrelationships of implemented technologies and provides an indication for further improvements through sensors, software tools, or gathered data. Another finding of the study by Strozzi et al. (
2017) is that research focuses more on topics related to the development and adoption of software tools and cloud applications instead of topics related to the adoption of new technologies in manufacturing processes. For instance, Shariatzadeh et al. (
2016) develop an IoT platform-based system architecture and a generic framework for communication interfaces between the digital factory and the smart factory. Other researchers address the potential of the digital twin concept in regard to near-real time data acquisition and analysis (e.g., see Uhlemann et al.
2017; Borodulin et al.
2017; Qi and Fao
2018). In summary, it can be concluded that scientific contributions “propose conceptual works and experiments, and rarely actual test-beds and lessons learned from the practice are described and discussed” (Strozzi et al.
2017).
Another shortcoming of the current literature is the lack of a common definition of the term
smart factory, although widely used in both scientific literature and practice (Radziwon et al.
2014). Based on a collection of different definitions, Radziwon et al. (
2014) define the smart factory as a “manufacturing solution that provides such flexible and adaptive production processes that will solve problems arising on a production facility […].” Hermann et al. (
2015) define the smart factory as a “factory where CPS communicate over the IoT and assist people and machines in the execution of their tasks”. They further describe, that “within the modular structured Smart Factories […], CPS monitor physical processes, create a virtual copy of the physical world and make decentralized decisions”. Based on
SmartFactoryKL and adopting the idea of IoT, Zuehlke (
2010) describes that a “factory-of-things will be composed of smart objects which interact based on semantic services.” Yoon et al. (
2012) describe a smart factory as a “factory system in which autonomous and sustainable production takes place”. And Lucke et al. (
2008) envision the smart factory as a “real-time, context-sensitive manufacturing environment that can handle turbulences in production using decentralized information and communication structures for an optimum of production processes.”
These definitions reflect the specific characteristics of smart factory networks, such as their modular design, which enables functionalities like flexibility, reconfigurability, and adaptability (Brettel et al.
2014; Radziwon et al.
2014; Zuehlke
2010). These functionalities enable smart factory networks to respond to circumstances and turbulences in the real-time production, such as the non-availability of single production components (Lucke et al.
2008). Further, smart factory networks attempt to offer increased productivity, optimized processes, improved capacity utilization, and reduced lead times, as well as enhanced energy and resource efficiency (Brettel et al.
2014; Chui et al.
2010; Radziwon et al.
2014; Schuh et al.
2014; Yoon et al.
2012; Shrouf et al.
2014). These benefits contribute to the ability to produce highly individualized products in low batch sizes in a considerably short time-to-market, at costs comparable to those of mass production (Lasi et al.
2014). This is of central importance for future competitiveness in all manufacturing industries, as customer expectations shift toward mass customization, shorter innovation cycles, and customer participation models (Lasi et al.
2014; Yoon et al.
2012; Iansiti and Lakhani
2014; Turber and Smiela
2014).
The characteristics of smart factory networks are facilitated through concepts such as IoT and production-oriented CPSs, which involve
smart objects, such as intelligent machinery and products. CPS integrate computing and communication capabilities in physical production processes to combine the cyber and physical world (Lee et al.
2015; Wang et al.
2016). Smart objects are connected over the Internet, or other network infrastructures, to form dynamic, intelligent, and self-controlling networks (Broy et al.
2012; Schuh et al.
2014). Within these networks, smart objects control and monitor the production process collaboratively through machine-to-machine communication and exchange information to optimize themselves and the production process (Brettel et al.
2014; Hessman
2013; Schuh et al.
2014; Yoon et al.
2012). Hence, smart objects represent elementary components of the collaborative production infrastructure (Zuehlke
2010; Yoon et al.
2012). Although smart objects control and optimize themselves autonomously on a workflow level, central IT systems are required for an overarching planning and coordination of decentralized smart objects. For example, central IT systems must provide parameters and framework conditions to define a possible course of action for the autonomous control and optimization of smart objects (Schuh et al.
2014). These IT systems are connected with other internal and external networks to facilitate information exchange and collaboration within the supply network. The necessary infrastructure is typically company specific and can be on-premise, cloud-based, or a hybrid form of both (Zuehlke
2010; Yoon et al.
2012; Karnouskos and Colombo
2011; Colombo et al.
2013; Shrouf et al.
2014; Haller et al.
2009).
Due to the high level of interconnectedness between production and IT components, the operation of the physical production process depends on the flawless operation of IT services. Consequently, smart factory networks face new IT security threats that concern the four dimensions of IT security risks
availability,
access,
accuracy, and
accountability (Westerman and Hunter
2009). Thereby, the threats stem from four channels: (1) software bugs and hardware malfunctions, (2) open Internet protocols and shared networks, (3) the numerous parties involved, and (4) a large number of field devices that can be accessed (Amin et al.
2013). IoT and smart manufacturing technologies change requirements on IT security (Wegner et al.
2017) and “the concept of Industry 4.0 generates new categories of risks […] because of the increase of vulnerabilities and threats” (Tupa et al.
2017). Tupa et al. (
2017) argue that “the connection of cyber-space, sophisticated manufacturing of technologies and elements, and using outsourcing of services [are] the main factors increasing vulnerability” and that “the implementation of Industry 4.0 has shown that the connections between humans, systems and objects have become a more complex, dynamic and real-time optimized network”. For instance, central components of an IT infrastructure like an on-premise server are no longer the only critical components of an information network. In fact, all components, including remote manufacturing equipment and internal and external sensors, become critical as “industrial control systems are becoming the target for malicious cyber intrusions” (Wegner et al.
2017). Further, SCADA systems, that control manufacturing processes, were initially designed to operate on closed networks. With IoT applications, SCADA systems are increasingly based on cloud technology resulting in increased interconnectivity and, ultimately, vulnerability (Eden et al. Eden
2017). Therefore, “the challenge to maintain availability will increase as manufacturing evolves from a centralized system supported by external suppliers to a distributed system in which production occurs closer to the point of use” stretching potential points of failure (Wegner et al.
2017).
Given this increasing dependency of the production infrastructure on the reliable functioning of the IT services and the real-time constraint of smart factory networks, especially non-availabilities, that is, the non-usability of an on-demand service, is becoming one of the most critical threats in smart factory networks (Amiri et al.
2014; Cardenas et al.
2008; Lee
2008). Non-availabilities can be caused by events including intentional attacks, such as denial-of-service attacks, simple human errors, random technical failures, or incorrect capacity planning (Amin et al.
2013). Further, the smart factory’s interconnectivity and IT-based integration with its supply network, aside from the benefits incurred through improved collaboration, increase IT availability risks because former protective barriers are at least partially removed and the amount of potential entry points increases (Eden et al.
2017; Smith et al.
2007). For example, modern industrial control systems are connected to office networks and external systems for information exchange, and are no longer isolated through
air gaps (Byres
2013). A study by Byres and Lowe (
2004) emphasizes this increased vulnerability and reveals that security incidents increasingly stem from external sources (70%), compared to internal sources (30%). They mention the increasing interconnection of critical systems and resulting interdependencies as a reason for this development, among others. In combination with the highly interconnected information network of a smart factory, a non-availability of one component can spread in the entire network resulting in cascading failures (Amin et al.
2013). These reinforce the initial failure and can lead to the loss of the operational capability of the entire smart factory network (Danziger et al.
2016). Consequently, IT availability risks play a major role in smart factory networks, and companies must apply corresponding IT security measures.
In this context, comprehensive IT availability risk management in smart factory networks requires economically profound analyses, and a structured, methodological approach to identify and quantify existing IT availability risks and to lay the ground for corresponding IT security investments. For this purpose, the most critical components of the IT system must be identified based on the effects of a component’s non-availability on the production process. An adequate risk assessment approach must take account of smart factory networks’ specific characteristics. Thereby, the modeling of corresponding dependency structures represents an essential requirement for the analysis of resulting cascade failures in the production process. Thus, we formulate the following requirements for an appropriate risk assessment approach for smart factory networks, which is able to support investment decisions regarding IT security measures: (R1) the network structures of the IT system, including dependencies between IT components, must be considered. (R2) The production system’s interdependencies and network structures must be considered. (R3) Losses in the production process caused by IT non-availabilities must be quantified and assigned to responsible IT components, while considering the production infrastructure’s dependencies on the IT system.
2.2 Approaches regarding the assessment of IT availability risks
Risk assessment is an elementary step within the risk management cycle that can be structured along the four phases of (1) identification, (2) assessment, (3) control, and (4) monitoring (Hallikas et al.
2004; Harland et al.
2003). The goal of risk assessment is to identify and evaluate risks in order to decide on appropriate security measures. For this, companies engaged in smart factory networks require appropriate structured approaches for the evaluation of IT availability risks that fulfill the stated requirements R1–R3 due to the aforementioned, specific challenges of smart factory networks (Tupa et al.
2017). For risk assessment within information systems, there exist a magnitude of different approaches within the literature. While some suggest frameworks and approaches for information systems in general, others place a special focus on the characteristics of their respective application field as vulnerabilities and accompanying losses are highly specific, due to characteristics such as IT architecture, or business operations’ varying dependencies on IT services.
Based on a structured review of 125 risk assessment approaches for information systems, Shameli-Sendi et al. (
2016) develop a taxonomy that structures risk assessment approaches along the four categories
appraisement,
perspective,
resource valuation, and
risk measurement. Thereby,
appraisement differentiates risk assessment approaches from a methodological perspective into
quantitative,
qualitative, and
hybrid approaches (Shameli-Sendi et al.
2016). Quantitative methods deploy mathematical functions, objective measurements, and quantitative data to evaluate risk (Karabacak and Sogukpinar
2005; Suh and Han
2003; Sun et al.
2006). For example, the risk assessment framework developed by Jaisingh and Rees (
2001) uses the quantitative risk measure VaR to assess IT security risks. The derived information can then be used to analyze the relationship between the cost of security measures and the risk reduction effects achieved. Niesen et al. (
2016) develop a conceptual framework for data-driven risk assessment based on real-time operational data that becomes available in smart factory environments. By means of their approach, live monitoring of different types of risk becomes feasible. However, their approach does not allow the consideration of specific types of IT related threats, especially availability risks, as appropriate data and relevant indicators are missing. This shows that quantitative approaches often face a lack of necessary detailed data. Further, disadvantages include time-consuming and expensive calculation processes, the complex implementation in practice, and the difficult interpretation of results (Shameli-Sendi et al.
2016). In contrary, qualitative methods use descriptive variables to evaluate the likelihood of occurrence, and the impact of IT non-availability (Caralli et al.
2007; Aagedal et al.
2002). As they do not rely on accurate historical data and are much easier to understand and implement in contrast to quantitative methods, they are widely used in practice (Shameli-Sendi et al.
2016). For instance, Silva et al. (
2014) develop a multi-dimensional risk management model based on Failure Mode and Effect Analysis (FMEA) and fuzzy theory that analyses five dimensions of information security risks: access to information and systems, communication security, infrastructure (hardware and networks), security management, and secure information systems development. Thereby, FMEA provides a structured approach for assessing failure modes according to three risk factors occurrence, severity, and detection that are assessed by expert estimations. The derived results provide information regarding the criticality of the investigated failures that produce vulnerabilities to the company’s information system. Eom et al. (
2007) develop a risk assessment approach for the evaluation of assets regarding their degree of contribution to related business processes. For this, they apply with Delphi teams a qualitative risk analysis methods. Besides the merits of qualitative approaches, shortfalls are that they often lack measurable detail and monetary results to support investment decision making considering cost-efficiency and that results are often times subjective and prone to errors and imprecision (Shameli-Sendi et al.
2016). To overcome the weaknesses of sole quantitative or qualitative approaches, there are hybrid methods combining both types to enable a simple and fast qualitative assessment as well as detailed quantitative analysis for more critical aspects (Yadav and Dong
2014; Rainer et al.
1991; Shameli-Sendi et al.
2016). For example, the initial quantitative risk assessment method developed by Zambon et al. (
2007) considers the IT architecture and dependencies between IT constituents, based on a time-dependent model for business processes. Based on this, they extend their model to a qualitative model for the analysis of availability risks in IT architectures, requiring only commonly available input data (Zambon et al.
2011).
Another category for risk assessment approaches introduced by Shameli-Sendi et al. (
2016) is
risk measurement that differentiates approaches into the two types
non-
propagated and
propagated. While approaches of the
non-
propagated type neglect the propagation of an attack impact on dependent nodes, risk assessment approaches of the
propagated type consider impact propagation in networks to obtain a more precise picture of damage potential (Shameli-Sendi et al.
2016). Regarding non-propagated types, Zhong et al. (
2017) develop a quantitative approach based on RFID and laser scanners to visualize the manufacturing environment for the real-time observation of production and detection of risks and disturbances. Although their model enables real-time monitoring, it does not allow to analyze the causes of occurring failure propagation and, thus, lacks the possibility to analyze dependency structures. Further, it lacks the possibility to quantify the resulting damages from occurring failures and disturbances within the production process. In contrast, there are some approaches that consider propagation effects within information systems. For instance, Fenz et al. (
2011) develop a software-based risk management methodology that supports investment decision making while considering the business criticality of information assets based on their involvement in business processes. Ackermann and Buxmann (
2010) develop a risk assessment model for IT-based service networks that supports IT security investment decisions. This model quantifies IT security risks in relation to different IT security measures, and considers dependencies between different services of the network (i.e., transferred data). Finally, Papa et al. (
2011) develop a qualitative risk assessment model for Supervisory Control and Data Acquisition (SCADA) embedded systems, focusing on availability risks. Their model calculates corresponding risk scores for each SCADA element, considers effects for the entire system, and determines protection measures to reduce risk. Despite these examples, Shameli-Sendi et al. (
2016) state that there are only few risk assessment approaches that consider propagation effects, although these are essential to assess the entire damage potential caused by attacks and errors in complex network environments to provide a profound basis for economically sound investment decisions.
Further, there is no assessment approach, thus far and to the best of our knowledge, for IT availability risks in smart factory networks, that is, no existing approach that considers the specific characteristics of smart factory networks and consequently fulfills the stated requirements R1–R3. However, the consideration of network structures including dependencies between IT components and the production system’s interdependencies and network structures, as well as the transfer of damage potentials to a monetary valuation represent a necessary step in the course of an appropriate risk assessment within smart factory networks. Such an approach is necessary to support organizations with risk-oriented guidance in deducing reasonable investment strategies with regard to IT security measures. As the modeling of dependency structures under consideration of propagation effects represents an essential requirement in this endeavor, we aim to address this research gap in the following section by developing a first approach based on graph theory and matrix notation. We chose graph theory and matrix notation as these are widely used and easily comprehensible methods to depict network structures and complex dependency relations and allow the consideration of characteristics of smart factory networks. Further, we apply VaR as an accepted and widely used standard risk measure to quantify damage potentials with a confidence level and to provide a monetary valuation that is suitable for management practice.
2.3 Research approach and applied concepts
To answer the research questions raised in Sect.
1, under consideration of the requirements set forth in Sect.
2.1, we develop a structured approach for an appropriate assessment of IT availability risks in smart factory networks. This approach uses graph theory and matrix notation methods, as they are widely utilized methods for formalized representation and the analysis of complex and interdependent networks. For example, Wagner and Neshat (
2010), Faisal et al. (
2006), and Buldyrev et al. (
2010) use graph theory and matrix notation to analyze risk in supply chains and critical infrastructures regarding vulnerability, risk mitigation, and cascading failures in interdependent networks. Graph theory enables a relatively simple and transparent application of our approach. These are two important characteristics, since our model represents a first approach that should be easy to use and should have a certain degree of scalability. Besides graph theory, there are other approaches for the formalized representation of networks such as petri nets or system dynamics if other priorities are to be set, for example, if the analyses should be more detailed or more detailed stochastics (e.g., stochastic recovery times) should be used (e.g., Arns et al.
2002; Wu et al.
2007; Fridgen et al.
2014). However, in our opinion, graph theory seems to be an appropriate method for a first attempt, especially for reasons of transparency and complexity reduction. Further, we apply the risk measure VaR for the quantification of IT availability risks, as it is a widely utilized risk measure for downside risks.
To develop and analyze our model, we use the research paradigm introduced by Meredith et al. (
1989). This approach structures research into a “continuous, repetitive cycle of description, explanation, and testing.” By going through these stages in an iterative process, the description and explanation of an observable economic fact in a structured manner are possible. First, we formally describe cause-and-effect-relationships that determine the threat potential of an IT component (e.g., the basic structures and dependencies of smart factory networks). As new findings cannot always be derived from practical observations, we use a formal deductive modeling approach. Afterward, we discuss and explain the derived findings and give practical recommendations. An application in an exemplary real-world scenario indicates the utility of our risk assessment model as an appropriate and profound basis for decision support regarding IT security investments and serves as a starting point for its empirical validation. However, the testing of the findings shall be subject to future case study research.