Skip to main content
Published in: The International Journal of Advanced Manufacturing Technology 7-8/2024

Open Access 20-02-2024 | ORIGINAL ARTICLE

A new redundancy strategy for enabling graceful degradation in resilient robotic flexible assembly cells

Authors: Ziyue Jin, Romeo M. Marian, Javaan S. Chahl

Published in: The International Journal of Advanced Manufacturing Technology | Issue 7-8/2024

Activate our intelligent search to find suitable subject content or patents.

loading …


The development of resilience in manufacturing systems has drawn more attention than ever. Using redundant components is one of the key strategies for building and enhancing the resilience of a manufacturing system. However, current redundancy strategies require duplicated machinery employed either in active or in standby status. This in turn causes extra costs in designing and achieving resilience. Achieving an efficient deployment of the redundant component in the face of failures is also challenging. In this paper, we introduce a novel redundancy strategy, called adaptive standby redundancy (ASR), to achieve resilient performance for discrete manufacturing systems while reducing the cost of employing the duplicated components that are typically used in traditional systems. This novel strategy permits achievement of high levels of utilisation of the system and graceful degradation in case of failure, keeping the system functional. The strategy is then validated in a developed robotic flexible assembly cell (RFAC), which is tested and results on its efficacy and performance enhancement are discussed.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

Despite manufacturing industry having served humans for hundreds of years, with many developed technologies employed, manufacturing industry has faced enormous challenges in recent times. Risk, uncertainty, and disruption are everywhere within systems and accompanying supply chains [1]. One recent example is the world-wide outbreak of COVID-19 since early 2020, which has affected all industries to varying degrees. The impact of the pandemic on global manufacturing industry and supply chains was major and on-going [2, 3]. Companies, both large and small, had to lay-off employees to stem losses from COVID impact, temporarily placing many employees on Government support programs, exacerbating deficits, and ultimately contributing to a systemic crisis that continues to unfold. The chance that a firm can survive severe uncertainties and on-going disruptions heavily depends on its resilience capacity.
The term of resilience has various meanings when considered from different perspectives [47]. In general, an engineering system’s resilience refers to its capacity to withstand interruption and to recover from the consequences to resume its normal functional status [3]. Building resilience in a system is a catalyst to increase its robustness and adaptability [8].
Resilience in manufacturing systems refers to a firm’s capacity to recover, from partial damage or heavy degradation, back to its normal function [3, 5]. In other words, it describes the ability of the system to deal with unexpected disruptions from both internal and external sources, as well as complexities and uncertainties within the manufacturing context and inputs from its supply chain [8]. In this paper, the resilience of a manufacturing system is defined as a system’s capacity to withstanding both internal and external disruptions to maintain system function, with graceful degradation.
Building resilience in manufacturing systems has become particularly important in the post COVID-19 pandemic era [9]. The vulnerability of global supply chains has been unfolded in the pandemic, which in turn urges the supply chain stability to be enhanced by adding flexibility and implementing redundancies [10]. Also, the demands of rapid change in manufacturing systems are increased. Thus, resilient manufacturing systems are expected to improve the adaptability to better serve the fast-evolved business environment. In addition, comparing with traditional manufacturing paradigm, customer satisfaction and cost reduction can also be further optimised in resilient manufacturing systems, as well as achieving the desired sustainability in a long-term perspective.
Building resilience in manufacturing systems is also one of the core aims in the roadmap of Industry 4 paradigm shift [11]. Many cutting edge technologies have been developing in line with Industry 4 vision and can be deployed in the manufacturing systems to establish the resilience and improve the efficiency. For instance, digital twins allow for real-time and remote monitoring to shorten decision-making processes [12]. Horizontally integrated Internet of Things (IoT) and sensors enable real-time tracking for enhancing supply chain visibility [13], and so on.

1.1 Current strategies for building resilience in manufacturing context

It is hard to discuss the strategies for building resilience capacity for manufacturing systems in general scenarios. This is because the best strategies can only be chosen based on the nature and specific characteristics of the system [14]. For example, the methods for building modularity and reconfigurability are often used for discrete manufacturing systems to increase resilience performance [15, 16]. However, establishing flexibility and collaboration is much more often employed in process-based industries [17, 18]. Also, family businesses often utilise social connections and closer relationships with customers to quickly deal with changes in the local market, while non-family businesses prefer to establish strong relationship with raw material suppliers to get lower prices and prioritised supply of raw materials [19]. Moreover, building agile and flexible business models is a better choice for small and medium enterprises (SMEs) as compared to large firms; they have much simpler business processes [20]. Quality management and risk management can also contribute to resilience in manufacturing systems to some degree [21, 22]. Figure 1 summarises the main strategies used for building resilience capacity in a manufacturing context.
Making use of redundancy in a system is one of the most common strategies for building resilience in the manufacturing context [2325]. Regardless of its economic efficiency, the strategy works for most systems in a manufacturing environment as the failed function or processes can be restored immediately by using a duplicated component to replace the faulty element or dysfunctional sub-system [26, 27]. Functional redundancy and capacity redundancy can also enable reconfiguration of systems in the face of failures [28].
The main disruption for discrete manufacturing systems is machine tool(s) or robot(s) failure [29]. Thus, graceful degradation is a key strategy for achieving resilient and fault-tolerant manufacturing systems as it is important to maintain the availability of the systems while interruption happens [30]. Graceful degradation is a designed status of the system for maintaining its continuity in the face of machine/parts failure, which can be achieved by, manually or automatically, reconfiguring the remaining available resources with tolerable sacrifice in the throughput and lead time of production [31]. A certain degree of loss in the production throughput is expected and acceptable in such situation. In return, the production down time can be minimised, or even completely eliminated in some situations. Otherwise, machine failures can cause systems having a short period of time with degraded performance (known as regular degradation) before completely breaking down.

1.2 Research gaps and research questions

A substantial body of research can be found, which explores resilience management for manufacturing supply chains [3, 16, 19, 20, 32, 33]. Also, quite a few studies have been undertaken towards building and optimising resilience capacity in process-based manufacturing systems [34, 35]. However, few academic papers have addressed discrete manufacturing contexts regarding resilience performance. Yet, fewer of them have offered specific solutions. Generally, discussion has considered the problem from a high-level perspective [15, 23].
In this paper, we propose a novel redundancy strategy, called adaptive standby redundancy (ASR), for building resilience in manufacturing systems, with an illustrative example in a robotic flexible assembly cell (RFAC). The proposed strategy can enhance the resilience performance of the system by enabling graceful degradation and improving the system’s total reliability while keeping the system cost-effective.
Despite the clear advantages, the main challenges towards building manufacturing system resilience with redundant elements are achieving simultaneous cost-effectiveness and efficiency. The duplicated elements or sub-systems generate extra costs in building the system [27, 36, 37], which can be worthwhile and acceptable for some crucial systems, such as airplane control systems and nuclear power station control systems. However, for the many, less risk-defined systems in manufacturing industry, with thin profit margins, the investment in those duplicated components can be hard to justify [36]. Also, supervision and actuation/re-routing mechanisms are needed to ensure that, when failures occur, they can be detected, and the backup element can be put in place in a timely manner, which is a challenge and will increase cost to build [38, 39]. It can be seen as a brute force and profligate to simply put redundant components in the system for improving manufacturing systems’ resilience performance, which in turn raises the requirement for more cost-effective and efficient redundancy strategies.
The extra costs of employing dedicated components as the redundancy and the interrupted production process in actuating the standby components are two main challenges in current discrete manufacturing systems. Such challenges are expected to be solved by integrating edging technologies with optimised redundancy strategy. This paper in turn raises two research questions, which are the targets that the proposed ASR strategy is to solve:
RQ 1: Can a new redundancy strategy be developed without employing dedicated redundant units?
RQ 2: Can a new standby redundancy strategy be developed to maintain the system continuity when the primary component fails?
In the remainder of this paper: we critically review related publications in building resilience capacity in manufacturing context in Sect. 2. Then, in Sect. 3, the proposed ASR strategy is explained, including the concept and system components. In Sect. 4, the validation tests are conducted for demonstrating the feasibility of the proposed strategy, which is followed by discussion about efficacy in Sect. 5 A brief conclusion is closing the paper in Sect. 6.
In this literature review, two major strategies are reviewed regarding building resilient manufacturing systems (RMS). Implementing redundancy in the manufacturing systems is one of the most broadly used strategies for enhancing system reliability and building resilience capacity for manufacturing systems. Enabling graceful degradation of capabilities is another key strategy to achieve RMS, which can be accomplished by reconfiguring the system when failures occur. The system’s capacity for reconfiguration is also known as functional redundancy of the system.

2.1 Redundancy strategies in resilient manufacturing systems

In a manufacturing context, the types of redundancy strategies in the system are mainly classified into active redundancy and standby redundancy [40, 41]. It is intuitive that employing redundant elements is an effective way to improve the resilience and reliability of manufacturing systems. However, it is challenging to have an efficient strategy in place to actuate the deployment of the redundant elements when needed [42]. Employed redundant components can not only build the system’s capacity of resilience by directly adding reliability [38], but also enhance the system’s resilience performance from the perspective of safety of the system [25].
A strategy of Active redundancy means that the redundant components run in parallel with the major part in the system from the beginning of the process to deliver the desired functions or services of the system [38]. Duplicated components will be used as the redundancy. When failures occur on the primary component, the remaining redundant component can keep the system functioning. Depending on the criticality of the system, the strategy might be slightly different. For critical systems, full-capacity redundant components are used to ensure desired production throughput level in the face of failures, which adds significant cost when building the system [43]. However, in non-critical systems, it is only required to maintain continuity of the system under interruption. Thus, degradation on throughput may have to be accepted unless remediation strategies are in place [44].
Standby redundancy, on the other hand, is also passive redundancy. The strategy is to set the redundant components to idle or completely shut down until failures happen on the primary component [45]. A detection/switching unit is needed for the strategy to monitor the failures and activate the redundant component when needed. Compared with active redundancy, the main advantage is that the standby redundant component is not necessarily exposed to operational stress or causes of failures, which improves the overall system’s reliability [38]. However, the health condition of the detection/switching units varies, which makes the system’s performance more complicated to assess and manage [46]. In many redundancy allocation problem studies, assumptions have been made that the failure detection and switching mechanism is in perfect condition, which makes the system reliability assessment and optimisation less accurate [47]. Also, the extra costs of setting up the duplicated components cannot be ignored.
Despite being kept away from direct operational stress, it is also hard to maintain and ascertain the original health condition of the standby unit due to external conditions. The research by Lucas et al. [48] demonstrates that electric motors, kept in standby to ensure redundancy, are in fact more affected than the working systems and incur higher risks of failure, due to environmental conditions. In this example, the storage must be carefully selected, and the standby motors need to be properly conditioned to avoid insulation and aging problems that are characteristic to standard cold standby conditions. If these strategies are not implemented correctly, they can provide a false sense of security, while, in reality, when the new subsystem (e.g. motor) is brought online, it can fail immediately or in a in a very short time.
Mixed redundancy is a combined method of using both active and standby redundancy in the system. It is believed that changing redundancy strategy can improve the overall reliability performance of the system [46], which turns out to be the requirement for developing novel redundancy strategies. A few studies found that the strategies of solely employing active or standby redundancy are facing challenges [38, 41]. Thus, mixed redundancy is attracting significant attention of researchers to solve the limitations of solely using active or standby redundancy strategies for both types of manufacturing contexts [49].
Ardakan et al. [45] discussed the application of mixed redundancy strategy in solving reliability-redundancy allocation problems and found considerable improvements comparing with solely using active redundancy or standby redundancy. Then, k-mixed redundancy [50] and G-mixed redundancy [51] are introduced to further enhance the reliability performance in manufacturing systems.
Despite the improved reliability performance, the mixed-redundancy strategies still increase the life cycle costs of the systems as the extra costs of employing duplicated components are essential. Figure 2 demonstrates the system framework for three different redundancy strategies in manufacturing context. In active redundancy strategy, all redundancy units are placed in operation with the primary unit from the beginning to ensure the system’s continuity. In contrast, the redundancy unit will only be activated when the primary unit fails in the standby redundancy strategy. The mixed-redundancy strategy is a combination of active and standby redundancy strategies, which is an optimised solution for complex manufacturing systems.

2.2 Graceful degradation in manufacturing systems

From a different point of view, redundancy strategies can also be tagged as functional redundancy and capacity redundancy [52], which links to a manufacturing system’s reconfigurability performance. Reconfigurable manufacturing systems can not only enhance production flexibility in a cost-effective manner [53, 54], but can also achieve resilience in the system. Making use of functional redundancy and capacity redundancy of machines in manufacturing context can be considered to fulfil different resilience policies by reconfiguring remaining healthy physical assets in the face of failures [23].
Graceful degradation of capability is a designed trade-off strategy to enhance manufacturing systems’ availability and continuity in the face of partial damage to the system [37, 55]. It allows the systems to continue functioning at a degraded capacity level in such scenarios, rather than completely collapse, which is a key feature in the resilient manufacturing systems. Thus, graceful degradation is expected while disruption occurs in the system, unless extra-capacity redundant components are used to recover the system throughput loss from the failures [23, 56]. Strategies were also developed for optimising reconfiguration process and mitigating the degraded performance [5759].

2.3 Discussion on the state-of-the-art

Using duplicated components to build system redundancy in the manufacturing context can effectively improve overall system reliability and build the resilience of the system. However, to the best knowledge of the authors after an extensive literature search, there is no satisfactory solution that can keep the system resilience capacity while controlling the extra costs with employing duplicated components on standby. There was no research that systematically considers cost-effectiveness while building system redundancy, probably due to the large number of variables and unknowns.
Both active and standby redundancy strategies have inherent limitations in building resilience for manufacturing systems. Thus, a novel redundancy strategy is necessary to solve the limitations and build more effective and efficient RMS.

3 The development of the ASR strategy in RFAC

There are two main types of strategies for increasing resilience and agility in discrete manufacturing systems. One is reconfiguration-based methods, the other one is redundancy-based methods. The proposed ASR strategy in this paper is a redundancy-based method, which is developed based on the concept of traditional standby redundancy strategy.
In the ASR strategy, however, there are no duplicated components to be employed as the redundancy, which reduces the constraints on the budget. All robots in the robotic cell can play the role of being the redundancy for others in the face of failures, which allows a more efficient and rapid switching process.
When compared with active redundancy strategy, one major constraint in the standby redundancy strategy is that the manufacturing system cannot maintain its continuity in the face of component failure. Switching mechanisms need to be employed for activating the standby unit, reconfiguring and re-routing production, which takes time, ranging from a few hours to days, even longer, before the system can resume work. Although the resilience capacity can be achieved, such disrupted utilisation rate is not desired in contemporary manufacturing systems.
Traditional reconfigurable assembly systems are facing challenges for satisfying contemporary fast-evolving business environment. Despite the developed adaptability in reconfigurable assembly systems, one of the main limitations is that it will take a quite long time in the reconfiguration process. Interruptions of production processes cannot be avoided, which limits the establishment of adaptability and agility in the discrete manufacturing systems. In a recent research paper, a new reconfiguration strategy has been proposed, which integrates the philosophy of functional redundancy into the reconfigurable robotic tool system for shortening the reconfiguration process time in robotic assembly systems [60]. The ASR strategy, in contrast, is able to maintain the system’s continuity in the face of robot failure. This is because that all robots in the assembly cell are able to act as the redundancy unit for other robots.
ASR strategy is then developed, aiming at resolving such limitations that are discussed in the state-of-the-art strategies. By deploying ASR, RMS is expected to achieve an enhanced system utilisation rate. The performance is discussed in Sect. 5.

3.1 The concept of the ASR strategy

Figure 3 demonstrates a high-level system framework for the proposed ASR strategy which is deployed in a RFAC. As shown in the picture, there are three functional modules developed in the system for the deployment. The RFAC consists of several industrial robots carrying out the physical assembly tasks. A smart controlling module is designed for feeding assembly task commands to the robots, which enables a distributed robotic control strategy. An asset manager module is developed for monitoring the operational conditions of the robots in the workcell and keeping the smart control terminal up to date. In this paper, the operational conditions of the robots are defined as a binary status, namely operational or non-operational. The reason for robots become non-operational can be planned maintenance, power outage or being de-activated after collision, and so on. Communication among three modules is established via an industrial communication protocol, such as OPC UA.
The philosophy of the approach is to enable each independent robot in the RFAC to be capable to act as a substitute for other robots when needed. In production, the robots will play their own duty to conduct assembly jobs as planned and alternately can act as a redundant component when needed. In another words, each functionally identical robot in the RFAC can take over tasks from any other robot that becomes unavailable. The process of job reassignment between robots can be achieved dynamically and automatically, which in turn maintains the system’s continuity in the face of robotic failure.
Specifically, if one of the robots in the RFAC fails during production, the asset manager can detect the issue immediately. Then, the information will be sent to the smart control terminal when it is detected by the asset manager. Based on the updated information, the smart controller can re-assign the rest of the assembly tasks to available robots. The unfinished assembly duties, which originally belonged to the failed robot, are re-routed and re-assigned to other robots, based on pre-defined rules. The process can be achieved on-the-fly to maintain the system’s continuity while such disruption occurred. In a contrasting scenario, the asset manager can also detect the return of a repaired robot in the cell. With the updated information, the smart controller can resume sending motion commands to the repaired robot as originally scheduled, which can also be achieved with no interruptions to production.
The fulfilment of such an on-the-fly process of re-assignment benefits from a novel flexible control strategy for RFAC, which was developed by the authors, and will be further explained in the next sections.

3.2 Smart control terminal

The concept of smart control terminal (SCT) was developed for enhancing the flexibility of the RFAC in previous research by the authors [61]. The developed SCT can send the motion commands remotely to the robots, in a dynamic and flexible manner for assembly tasks, which paves the way for the achievement of the proposed strategy in this research.
For employing the SCT in a RFAC, a task execution file (TEF) needs to be created for each assembly task, in advance, to define the process of the assembly task, including the assembly sequence, designated robots, coordinates of the parts and approach orientation of the end-effectors. The TEF replaces the pre-coded program for the robots, which is readable by the control algorithm in the SCT. Based on the extracted data that is defined in the TEF, the control algorithm can then send motion commands step-by-step to the designated robots for conducting the assembly task. Compared with traditional robotic control methods, the developed method permits on-the-fly control and change of the assembly sequence at the level of the assembly task. This is because the algorithm in SCT can decide which robot the next step of motion command is sending to.
In this research, the control algorithm in the SCT is further developed to receive real-time information on the robot’s health condition (available/unavailable) from the asset manager. In case that the number of available robots is changed, the control algorithm can instantly respond to such interruption, by re-routing and rescheduling the works for the rest of the assembly task.
The number of available robots in the RFAC can be reduced when failure happens on one or some of the robots. In contrast, the number can increase when repaired robots join back into the RFAC and are made available for assembly tasks. Such disruptions can be both expected (planned maintenance or re-introduction of repaired/new robots) and unexpected (machinery failures). The task reschedules for both robot-in and robot-out scenarios will not interrupt production, which in turn improves the adaptability and efficiency of the robotic system. Figure 4 illustrates the high-level process flowchart of the control algorithm.

3.3 Robotic flexible assembly cell

RFAC is a subset of a robotic assembly system with enhanced flexibility capacity, which is a typical discrete manufacturing system [62]. RFAC consists of several industrial robots conducting assembly tasks without or with limited involvement of human operators. The number of the robots in the workcell can be slightly varied, which depends on the specific requirements of the assembly tasks. In this research work, the RFAC is the target system where the resilience capacity was built.
In the proposed ASR strategy, the robots need to be pre-programmed to be able to receive and execute motion commands that are sent from SCT. The script in the robots will run in a loop to update OPC UA variables until a new motion command received. After the received motion command is executed, the script will jump back into the loop for receiving new commands.
The programme in the robots only focus on receiving and executing the motion commands being sent from the SCT, instead of conducting any specific assembly task. Thus, once all robots are well configured, there will be no interruption for robots in changing jobs or taking over other robots’ tasks. Figure 5 is a flowchart for the scripts that run in the robots.

3.4 Asset manager

The asset manager is designed for continuously monitoring the health condition of the robots in the workcell and keeping the robotic control terminal updated with such information. The number of available robots in the workcell could be changed for various reasons, such as machine failures, planned maintenance, repaired robot return and cell upgrade.
The asset manager can be developed into a standalone workstation that networks with the robots in the RFAC or can be established within the robotic controller. Depending on the specific robots that are employed in the RFAC, the strategy of achieving real-time monitoring can be adapted. Figure 6 demonstrates a flowchart for the algorithm of the asset manager.

4 Proof-of-concept tests

4.1 System configuration of the testbed

A testbed has been developed for validating the proposed strategy. Figure 7 shows the physical robots of the testbed. Two industrial collaborative robots, also known as cobots, are employed to build the RFAC. The orange cobot is Aubo-i5 (cobot no. 1), and the silver one is URe5 (cobot no. 2). Tetris-like parts are developed as a substitute of the product to be assembled in the tests.
Both cobots are physically networked with an industrial server and a workstation. The connection with the industrial server provides the cobots with digitised and remote accessibility. A workstation is employed as the smart control terminal. The control algorithm is developed in Python with built-in OPC UA server/client modules for the communication between functional modules. A high-level functional module diagram for the developed system is shown in Fig. 8. As demonstrated, the asset managers for both cobots are established within the robotic control systems. Table 1 shows a Pseudocode for algorithm in the asset manager.
Table 1
Pseudocode for the algorithm in asset manager
Defining variables
Establishing OPC UA connection
while true:
Fetching data from robot
if fetched data = Pre-set criteria:
robot = available
robot = unavailable
Update robot condition to the control terminal

4.2 Validation tests

4.2.1 Design of the tests

An assembly task was designed for validating the proposed ASR strategy and can be conducted in the testbed that was discussed in Sect. 4.1. The task requires two robots to conduct pick-and-place activities to assemble a “product” which is a specific pattern of puzzle containing 16 pieces of Tetris-like parts; see Fig. 9. All parts need to be picked from the raw material tray and placed to a specific location in the assembly area.
A Task Execution File (TEF) is developed into an Excel file to define the assembly sequence for the task and designated robots for each assembly step, as well as the coordinates and approaching orientation for each point (Fig. 10). Each robot is assigned with assembling eight parts, which is defined in Column A of the TEF.
As demonstrated in the TEF (Fig. 10), it includes 18 lines of motion commands for those two cobots (row 3 to row 20). In column A, the numbers indicate to which cobot the command was being sent. Column B is the description for the motion command. The numbers in the column C are the indicators for the motion commands, which are well defined in the robotic system. From column D to O, the coordinates of the parts and end-effector approach orientation are defined.
In the proof-of-concept test, two scenarios were validated against the designed principle of the proposed ASR strategy. Scenario 1 was designed for simulating a case of robot-out that could happen when robots fail, and they are physically AND/OR functionally removed from the cell. In the test, one robot will be made functionally faulty in the middle of the assembly process. It is expected that such interruption will not affect the continuity of the assembly cell. An automated and seamlessly rescheduling process for the assembly task was required in response to robotic failure. It is also expected that the other cobot can continue and finish the assembly task by conducting all the rest of steps, including those steps that were originally designated for the robot that was made functionally faulty. The feasibility and effectiveness of the automated process was validated during the test. Figure 11 demonstrates an overview of the assembly task process.
In an opposite situation, Scenario 2 was designed for simulating a case of robot-in, when a repaired robot returns to the workcell. This is a common interruption for a discrete manufacturing system too. Only one cobot works at the beginning to conduct a new assembly task. Assuming that the functionally faulty cobot has been fixed, it needs to re-join into the workcell, and then, work with the existing robot to share the rest of the assembly task until finishing it. The returned cobot will conduct the assembly steps that were originally assigned to it. Each scenario needs to run a number of times to ensure the adequate repeatability of the system.
Benefiting from the principle of the developed SCT, it was expected that the return of the cobot can be either at the beginning of a new assembly task or in the middle of a running process. Also, the return of the robot will not affect the continuity of the assembly cell, which means the return will be achieved in an automated manner and would not interrupt the assembly process.
The main purpose of the tests we designed was to validate the feasibility of the concept and pinpoint the limitations and potential flaws and bugs in the developed system. Thus, only core elements of the system were developed and implemented for the tests, such as industrial robots, IT infrastructure, including physical network connection and control software. Other non-critical elements of the system will be replaced by alternative solutions. For example, materials will be moved and fed-in manually to the material handling devices, as they are external to the cell. Also, auxiliary subsystems and components, like PLC units, scanners, and sensors, will not be employed at this stage. Instead, the expected automated processes will be functionally emulated, manually, and inputs/outputs will be served to and removed from the cell as needed.

4.2.2 Conduct of the tests

At the beginning of the test, two cobots were configured to work collaboratively to carry out the assembly task. When the assembly task was half-way through (after row 11), the emergency button of cobot no. 1 was pressed down to simulate a robot-out scenario that the cobot no. 1 became faulty and could not work anymore. When such a failure occurred, it was expected that algorithm in the SCT would receive updated information of the status of cobot no. 1 from the asset manager. Then, the assembly task was to be rescheduled to cobot no. 2 alone for the remainder of the assembly task.
In the test for robot-out scenario, the emergence button of cobot no. 1 was pressed down after the row 11 was conducted. Then, cobot no. 2 works alone for the rest of the task. Although rows 13, 15, 17 and 19 were assigned to cobot no. 1, all remainder (from row 12 to 20) will be sent to cobot no. 2 only. There is no need to make any change to column A of the TEF for the expected changeover process. This is because cobot no. 2 was the only available robot in the workcell at the time.
In the following test for robot-in scenario, the same TEF will be used to repeat the assembly task. At the beginning of the task, the cobot no. 1 was still unavailable, and only cobot no. 2 works. Thus, regardless which robot is designated in column A of TEF, motion commands were sent to cobot no. 2 only, as the only available functional unit. Similar to the test for robot-out scenario, when the assembly process was half-way through (after row 11), the emergency button of the cobot no. 1 was released, and the script in the robot was re-started, making cobot no. 1 available to work. The asset manager detected the change in the RFAC and updated the robotic condition for the SCT. As a result, the task schedule changed in row 12 of the TEF. The remaining motion commands were sent to the cobot that was originally designated as the functionally faulty cobot had been repaired and returned to work. It was expected that the return of the cobot would trigger the task reschedule automatically, which maintains the system’s continuity in the face of such a change. Test for scenario 2 was repeated to verify the repeatability of the system.

4.2.3 Outcomes of the test

The test for both scenarios was conducted 10 times. Each cobot was selected as the “faulty” robot 5 times in scenario 1 and as the returning robot five times in scenario 2. The expected automation was achieved, and all 10 tests were performed as expected.
The only failed run was from selecting cobot no. 1 as the returning robot in scenario 2, which led to a further investigation. The investigation showed that the reason of the failed run is not related to the development of the ASR strategy. It was a mechanical issue from a degraded robotic joint of cobot no. 1.
A few more tests were conducted afterwards, and the failure did not occur again. The success rate of the proof-of-concept tests was 100%, and they proved the validity of the concept.

5 Discussion

Comparing them with traditional standby redundancy strategies, we present a discussion of the enhancement to the efficacy and resilience capacity from implementing the proposed ASR strategy. There are two key performance indicators in designing and establishing resilient manufacturing systems.

5.1 Efficacy enhancement on the standby unit switching

It is believed that employing redundant elements is an effective way to improve the resilience and reliability of manufacturing systems. However, it is challenging for standby redundancy strategy to have an efficient mechanism in place to actuate the switching and activating process of the redundant element when failures happen on physical assets.
In this study, the proposed ASR strategy has been set with one key performance criteria, for measuring efficacy. The time for switching to the standby unit in the ASR strategy could be measured in the instance of primary unit failure and compared with the time needed in a traditional standby redundancy strategy.
The ASR strategy is designed to achieve a fully automated switching process in the event of failures in the primary unit for maintaining the system’s continuity of operation. The switching process takes a couple of seconds to detect the changes on the robotic condition and update the SCT. The time needed depends on various factors, such as the complexity of the algorithm in the asset manager and SCT, the actual moment the failure occurs, local network delay and workstation hardware performance.
For measuring the efficacy improvement, a traditional standby strategy was also configured in the RFAC for the comparison. In this deployment, the switching process was to be conducted manually in the face of failures in the primary unit. Two types of the traditional standby redundancy strategies were designed for comparison. The type 1 strategy was to set the working cobot to take over from the functionally failed cobot to maintain the system’s continuity, which in turn degraded the productivity of the system to some degree. The type 2 strategy was to use a fully duplicated cobot to replace the faulty one, which maintained the original system’s throughput. It is worth to mention that all the robot programming was pre-coded and saved in the robotic system.
As demonstrated in Fig. 12, it is clear that the time needed for actuating the standby unit in the ASR strategy is significantly shorter than the other two traditional standby redundancy strategies. Comparing with the type 1 traditional strategy, the ASR strategy has the same degraded throughput level, and clearly shortened the standby unit switching time. On the other hand, although the type 2 strategy is able to maintain the original throughput level after the standby unit is activated, the switching time needed is 20 times higher than the ASR strategy needs. In either traditional strategy, automation process on the standby unit actuation cannot be achieved, that is the reason why both strategies take more time than the ASR strategy does. From this point of view, the research goal is achieved.
The algorithm in the SCT is developed by Python and runs in a computer with Python version 3.8.10. Table 2 lists the specifications of the computer that was used for running the SCT algorithm.
Table 2
Specifications of the computer that runs the SCT algorithm
Intel Core i5-8250U 1.60 GHz
16 GB
Intel UHD Graphics 620
Ubuntu 20.04.6 LTS
GNOME Version

5.2 Cycle time comparison

In the validation tests, the affected cycle time in different scenarios is measured and compared. There are seven runs conducted to compare the difference in the cycle time. Run no. 1 was conducted in normal situation, which allows two cobots in the RFAC to work on the assembly task by conducting the pre-defined assembly steps in the TEF. Run no. 2 was a robot-out scenario as discussed in Sect. 4. Cobot no. 1 became functionally faulty after row 11 in the TEF was executed, leaving cobot no. 2 works alone for the second half of the run no. 2. Then, the task was repeated for run no. 3, having cobot no. 2 work alone to complete the whole task. Run no. 4 was set for returning cobot no. 1, which is a robot-in scenarios that was discussed in Sect. 4. Run no. 5 is a robot-out test again by selecting cobot no. 2 as the functionally faulty robot. Then, cobot no. 1 worked alone in run no. 6. Followed by run no. 7 for testing the return of cobot no. 2. The arrangement of those seven runs is summarised in Table 3.
Table 3
Arrangement for cycle time comparison runs
Description for the test setup
Run no. 1
Both cobots share the assembly task
Run no. 2
Both cobots share the first half of the task. Cobot no. 2 works alone for the second half of the task
Run no. 3
Cobot no. 2 works alone for the whole task
Run no. 4
Cobot no. 2 works alone for the first half of the task. Both cobot share the second half of the task
Run no. 5
Both cobots share the first half of the task. Cobot no. 1 works for the second half of the task
Run no. 6
Cobot no. 1 works alone for the whole task
Run no. 7
Cobot no. 1 works for the first half of the task. Both cobots share the second half of the task
The cycle time for those seven runs was recorded and is presented in Fig. 13. It is clear that the cycle time of run no. 1 is slightly shorter than other scenarios. This is because two robots which share the work can slightly shorten the total cycle time by moving the robot to the pick-up area for the next part in advance while the previous part is being placed by the other robot. And single-robot scenarios have the longest cycle time. Also, run no. 3 has shorter cycle time than run no. 6 as two robots had slightly different configurations for the purpose to have a stronger contrast in the results. It is noticeable that there was no production down time in each run.

5.3 Resilience performance enhancement

In general, the resilience performance of engineering systems can be expressed as the addition of the system reliability and the restoration of capacity [27]. Assuming that the system’s reliability is constant, the resilience performance for the system can be improved by enhancing the restoration capacity. The time for actuating the standby unit and the degraded throughput level can be seen as the criteria of the restoration capacity of a discrete manufacturing system where standby redundancy strategy is employed. Therefore, the measurement of those two criteria can reflect the performance improvement of the resilience capacity of the system.
Mathematically, the relationships between the time spent on actuating the standby unit, the system degradation level and the performance improvement can be expressed as
$$\mathrm{Resilience performance indicator }= \frac{1-TLL}{t}$$
where TLL is the throughput loss level, and t is the time spent with standby unit actuation. So, from this point of view, the resilience performance improvement can be reflected by comparing the indicators of different redundancy strategies.
Figure 14 demonstrates a comparison of the resilience performance indicators (higher is better) by using the data collected in sub-Sect. 5.1. Also, the performance comparison has been extended to the applications where more robots are employed in the RFAC. It is clear that the ASR strategy has significantly better performance indicators than traditional standby redundancy strategies, and, especially, has higher performance improvement while the number of robots in the RFAC is increased.

6 Conclusions

In this paper, a novel redundancy strategy is proposed for building a resilient discrete manufacturing system, and its feasibility and efficacy are discussed. The strategy was validated in an RFAC which was a two-cobot assembly cell. The system was developed to enable each cobot in the cell to be capable of taking over the assembly job from its counterpart, which in turn maintains the system’s continuity in the face of failures of cobots. The strategy also allows repaired cobots to re-join the workcell. Automated processes can be achieved for both robot-out and robot-in scenarios. The proposed ASR strategy consistently improved the system’s resilience performance and efficiency in comparison with traditional standby redundancy strategies in all experimental runs.
Regarding future development of the concept, the following work is being planned:
  • The asset manager will be further developed by employing an AI algorithm, which is capable of handling more complicated tasks, such as gripper failure and material handling system failure.
  • The developed ASR strategy will be further extended for complex re-configurable manufacturing contexts.
  • The concepts of predictive maintenance and a bi-directional digital twin will be developed and integrated with the current system. The works will bring the digital and biological realms to the physical one, the developed RFAC, which will make the system a genuine Industry 4 facility.


This research work was accomplished with the support of the Australian Government Research Training Program (RTP).


Ethics approval

Not applicable.
Not applicable.
Consent for publication.
Not applicable.

Competing interests

The authors declare no relevant competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
go back to reference Asokan VA, Yarime M, Esteban M (2017) Introducing flexibility to complex, resilient socio-ecological systems: a comparative analysis of economics, flexible manufacturing systems, evolutionary biology, and supply chain management. Sustainability. Asokan VA, Yarime M, Esteban M (2017) Introducing flexibility to complex, resilient socio-ecological systems: a comparative analysis of economics, flexible manufacturing systems, evolutionary biology, and supply chain management. Sustainability. https://​doi.​org/​10.​3390/​su9071091CrossRef
go back to reference Stavropoulos P, Papacharalampopoulos A, Tzimanis K, Lianos A (2020) Manufacturing resilience during the coronavirus pandemic: on the investigation manufacturing processes agility. Euro J Soc Impact Circ Econ 1(3):28–57 Stavropoulos P, Papacharalampopoulos A, Tzimanis K, Lianos A (2020) Manufacturing resilience during the coronavirus pandemic: on the investigation manufacturing processes agility. Euro J Soc Impact Circ Econ 1(3):28–57
go back to reference Marian RM, Kargas A, Luong LHS, Abhary K (2003) A framework to planning robotic flexible assembly cells. 32nd International Conference on Computers and Industrial Engineering. CSIRO, Limerick, Ireland, pp 607–615 Marian RM, Kargas A, Luong LHS, Abhary K (2003) A framework to planning robotic flexible assembly cells. 32nd International Conference on Computers and Industrial Engineering. CSIRO, Limerick, Ireland, pp 607–615
A new redundancy strategy for enabling graceful degradation in resilient robotic flexible assembly cells
Ziyue Jin
Romeo M. Marian
Javaan S. Chahl
Publication date
Springer London
Published in
The International Journal of Advanced Manufacturing Technology / Issue 7-8/2024
Print ISSN: 0268-3768
Electronic ISSN: 1433-3015

Other articles of this Issue 7-8/2024

The International Journal of Advanced Manufacturing Technology 7-8/2024 Go to the issue

Premium Partners