Skip to main content

Open Access 05.03.2024 | Original Article

A predictive-reactive strategy for flight test task scheduling with aircraft grounding

verfasst von: Bei Tian, Gang Xiao, Yu Shen

Erschienen in: Complex & Intelligent Systems

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In flight test engineering, the flight test duration (FTD) affects the aircraft’s delivery node and directly impacts costs. In the actual flight test process, the environmental status updates frequently, and various uncertain events are often encountered, which affect the flight test progress and project implementation. Therefore, when scheduling flight test tasks, rescheduling should be taken into account. This paper proposes a predictive-reactive strategy based on a deep reinforcement learning approach to solve the flight test task scheduling problem with consideration of aircraft grounding. In the predictive stage, a constructive heuristic algorithm is designed to generate an initial schedule. The rescheduling problem is solved by the appropriate rescheduling method that aims to optimize the FTD deviation, task reallocation, and workload cost simultaneously. The problem is modeled as a Markov decision process, including the well-designed state features, rewards, and actions based on different rescheduling methods. The policy is trained by the proximal policy optimization algorithm. At last, numerical results are provided to demonstrate the effectiveness and superiority of the proposed approach.
Hinweise

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

The flight test includes various tests that aim to obtain an aircraft’s function or performance data in the real flight environment [1]. The flight test, on the one hand, aims to evaluate the design conformity from the test data, verify whether the design requirements are fulfilled, and provide support for refinement, improvement, and modification. On the other hand, the purpose of the flight test is to confirm that the test aircraft satisfies the airworthiness regulations. Therefore, the flight test is a crucial step in obtaining the type certificate and allowing an aircraft to enter the market [2]. To improve the flight test efficiency, shorten the flight test duration, and reduce the flight test cost, it is necessary to schedule the flight test tasks reasonably and scientifically.
The flight test task scheduling problem (FTTSP) is a combinatorial optimization problem (COP), which refers to searching for an optimal schedule to assign each flight test task to an appropriate aircraft and establish the execution sequences. Meanwhile, multiple constraints should be taken into account, such as the execution logic relations between tasks and the deployment and configuration of the aircraft. In practice, some FTTSPs also involve stochastic and dynamic disruptions. For example, the duration of some tasks may change due to an inaccurate initial assessment, task re-testing due to invalid flight test results, cancelation of flight test tasks due to failure to pass the executability assessment, aircraft grounding caused by aircraft or other safeguarding equipment malfunction, etc. The baseline schedules are often infeasible due to these disruptions. Therefore, rescheduling, which is the process of updating an established schedule in response to disruptions [3], should be considered in problem resolution.
In the literature, there are few papers and surveys related to FTTSP. Recently, some Chinese scholars have published their relevant research on static problems. For example, Liu et al. [4] developed an improved genetic algorithm to allocate the flight test tasks on multiple test aircraft. Xu et al. [5] developed the flight test task allocation and sequencing problem model and applied a genetic algorithm (GA) to optimize the flight test period. Although there are few works of literature about the FTTSP, some typical COPs that are similar to our problem have been well investigated in the literature, such as the traveling salesman problem [6], the job-shop scheduling problem [7], the vehicle routing problem [8], and task scheduling in cloud computing [9]. Specifically, as for solving dynamic scheduling problems, the predictive-reactive scheduling strategy, which considers both the initial schedules and the rescheduling process to obtain a schedule with high quality and stability, is widely used in the literature [10]. In predictive-reactive scheduling, the rescheduling method, which decides the way of the rescheduling process to repair the initial schedule when dynamic events occur [11], has an important impact on rescheduling performance. While the existing rescheduling methods, such as right-shift rescheduling (RSR) [12], match-up rescheduling (MUR) [13], and inverse rescheduling (IR) [14], are mainly used as a framework for rescheduling problems, the rescheduling methods specifically designed for the problem of flight test task scheduling have not been studied. For flight test engineering, the flight test duration (FTD) accounts for a large proportion of the overall aircraft development progress and directly affects the aircraft’s delivery node. To shorten the FTD, there are mainly two ways: 1) the tasks can be started as early as possible through proper aircraft arrangement; 2) the flight test intensity determines the flight hours per day, and the test duration of the task can be compressed by increasing the flight test intensity. Therefore, based on the above analysis as well as the considerations of the system’s stability, in this study, we propose three types of rescheduling methods for flight test task scheduling: RSR, aircraft changing rescheduling (ACR), and IR.
Another challenge for the application of the rescheduling methods is that each rescheduling method has its advantages and disadvantages, and almost no method can guarantee a good solution quality under different rescheduling moments. Recent advances in machine learning technology, particularly in reinforcement learning (RL), offer new prospects for developing an adaptive rescheduling method that can deal with disruptions in the real-time environment [15]. The rescheduling of flight test tasks, i.e., choosing the appropriate rescheduling method at each rescheduling point to arrange the remaining flight test tasks, is similar to RL’s “action selection” features and can be modeled as a Markov decision process (MDP). Due to a series of advantages, such as fast speed and generalization ability compared to traditional combinatorial optimization algorithms, RL-based algorithms have attracted large attention. Especially, studies on the combination of deep neural networks and RL, i.e., deep RL (DRL) methods to solve COPs, have emerged in recent years [1618]. However, there are almost no previous studies to apply the DRL-based methods for adaptively choosing the rescheduling methods in dynamic scheduling problems.
With the motivations above, this paper proposes a predictive-reactive strategy based on a DRL approach for solving the FTTSP with consideration of aircraft grounding. The proposed approach can achieve adaptive task scheduling for the two-stage schedule generation for flight test tasks in different environmental states. The major contributions of this study are summarized as follows:
  • A two-stage task scheduling problem for flight tests with consideration of aircraft grounding is proposed and studied. Before the disruption, an initial schedule is generated to minimize FTD, and each time a disruption occurs, the rescheduling process is triggered. The rescheduling process considers optimizing the FTD deviation, task reallocation, and workload cost simultaneously.
  • A predictive-reactive strategy based on the DRL approach for solving the addressed problem is proposed. An initial schedule is generated by a constructive heuristic algorithm, and a DRL approach achieves adaptively selecting the optimal rescheduling methods at the different flight test rescheduling moments.
  • The rescheduling process is modeled as an MDP model. The state features based on information about the schedules of different stages are extracted to comprehensively represent the status of the environment. Three types of rescheduling methods are designed for flight test task scheduling. The proximal policy optimization (PPO) algorithm is employed to train the agent.
The rest of this paper is organized as follows: The section “Literature review” introduces the related work. In the section “Problem statement and formulation”, the description and the model formulation of the studied problem are given. The section “Proposed method” presents the details of the proposed approach. The section “Numerical experiments” provides the experimental results and analysis. The conclusions and future work are drawn in the last section.

Literature review

Dynamic scheduling can be mainly divided into three categories, which are reactive scheduling, proactive scheduling, and predictive-reactive scheduling [19]. In reactive scheduling, the schedule is not predetermined. Generally, the schedule is constructed over time as the system changes according to the scheduling rules. For example, Mihoubi et al. [20] proposed a scheduling rules-based surrogate-assisted simulation–optimization approach for solving the realistic flexible job-shop scheduling problem. Chakrabortty et al. [21] proposed an event-based reactive approach for the resource-constrained project scheduling problem with unreliable resources. Li et al. [22] designed a rescheduling method based on the Monte Carlo tree search algorithm to solve the dynamic flexible job-shop scheduling problem. Proactive scheduling constructs a predictive schedule that predictably meets performance requirements in a dynamic environment in advance. In the literature, Tan et al. [23] proposed a proactive strategy for berth allocation and quay crane assignment problems. Ma et al. [24] investigated the proactive resource-constrained project scheduling problem with resource transfer times in an uncertain environment, aiming to generate robust baseline schedules that will be as stable as possible during project execution. Dai et al. [25] proposed a proactive-reactive hybrid strategy to deal with the berth allocation problem with uncertain arrival times. Predictive-reactive scheduling consists of two steps. The first step generates an initial schedule. The second step, rescheduling, adjusts the initial schedule in response to disruptions. Compared with the other dynamic scheduling strategies, predictive-reactive scheduling is able to achieve a high-quality schedule with consideration of both performance and stability. In the literature, Su et al. [26] proposed a robust predictive-reactive approach for task allocations in complex product design processes. Li et al. [27] used predictive-reactive rescheduling for a multi-crane-scheduling problem. Nasiri et al. [28] developed a periodic predictive-reactive rescheduling system for a cross-dock rescheduling problem. Manzini et al. [29] proposed a predictive-reactive approach for sequencing assembly operations in an automated assembly line.
The rescheduling method determines how to generate or repair schedules when disruptions occur. The commonly used rescheduling methods include RSR, MUR, and IR, etc. The RSR directly delays the start time of the remaining tasks while maintaining the original arrangement and sequence of the tasks. For example, Kim et al. [30] generated a new schedule by right-shifting the original schedule to address the rescheduling problem of unrelated parallel machines with job-dependent setup times. An et al. [31] applied the RSR to a flexible job-shop rescheduling problem. The MUR method calculates a time window or rescheduling horizon and reschedules the tasks within that horizon. The tasks within the rescheduling horizon form a new scheduling problem. For example, Wang et al. [32] applied four match-up strategies to determine the rescheduling horizon for dynamic job-shop scheduling. Franzoi et al. [33] proposed a moving-horizon rescheduling framework for a typical chemical manufacturing process. In the IR, the parameter value of the scheduling problem, e.g., the execution time, is controllable. When given a non-optimal sequence, this sequence can be adjusted optimally by changing the values of the relevant parameters. For example, Mou et al. [34] proposed the energy-efficient distributed permutation flow-shop inverse scheduling problem to minimize adjustment and energy consumption simultaneously. Wang et al. [35] proposed the model and algorithm of the welding shop inverse scheduling problem for dynamic events according to welding shop characteristics.
The task scheduling problem for flight tests is a class of COP. The traditional solving algorithms for such problems involve exact algorithms, heuristics, and meta-heuristics [3638]. While the exact method can give the optimal solution for structured COPs, it is inefficient when solving large-scale or dynamic problems [39, 40]. The heuristic is simple and easy to implement, but it is easy to fall into the local optimum and is hard to obtain a satisfactory solution under diverse scenarios [41]. The meta-heuristic has been successfully applied to various optimization problems [4244], whereas in a dynamic environment, due to its poor generalization, it needs to be re-executed every time the system changes. As mentioned before, the DRL provides new aspects to solve dynamic scheduling problems. In this study, we aim to adaptively select the rescheduling methods at different scheduling moments. In the literature, there are also some studies on the adaptive learning of different scheduling rules using the DRL approach and finally applying the learned rules to scheduling or planning. For example, Wang et al. [45] applied the Q-learning algorithm to the single-machine dispatching rule selection problem. Han et al. [46] proposed a DRL framework to solve the job-shop scheduling problem by learning multiple heuristic rules by combining convolutional neural networks. Luo et al. [47] developed a hierarchical multi-agent DRL-based real-time scheduling method to solve the dynamic partial-no-wait multi-objective flexible job-shop scheduling problem.
Table 1
Related notations used in this paper
Notation
Description
Notation
Description
n
Total number of tasks
\(N_u\)
The set of unaffected tasks
m
Total number of aircraft
\(N_r\)
The set of remaining tasks
v
Total number of task group
\(n_r\)
The size of the remaining task set \(N_r\)
ij
Index of tasks, \(i,j=1,2,\ldots ,n\)
\({FTD}_0\)
Flight test duration in initial schedule
k
Index of aircraft, \(k=1,2,\ldots ,m\)
FTD
Flight test duration in current schedule
h
Index of positions, \(h=1,2,\ldots ,n\)
\(T_{avg}\)
Average duration of tasks
\({AC}^i\)
Compatible aircraft set of \(T_i\)
\(T_{sum}\)
Total duration of tasks
\(PO_k\)
Positions set of \(AC_k\)
\(U_{k0}\)
Aircraft utilization rate for \({AC}_k\) in initial schedule
\({TP}_i\)
Pre-requisite tasks set of \(T_i\)
\(U_{k}\)
Aircraft utilization rate for \({AC}_k\) in current schedule
\(tp_i\)
The number of pre-requisite tasks of \(T_i\)
\(sm_{kh}\)
Start time of position h of \(AC_k\)
\(t_{i0}\)
Duration of \(T_i\) under the nominal flight test intensity
\(s_{i0(k)}\)
Start time of \(T_i\) in initial schedule (on \({AC}_k\))
\(t_{i}\)
Actual duration of \(T_i\)
\(c_{i0(k)}\)
Completion time of \(T_i\) in initial schedule (on \({AC}_k\))
\(a_T\)
Deployment interval between the two aircraft
\(s_{ip(k)}\)
Start time of \(T_i\) in previous schedule (on \({AC}_k\))
M
A big positive number
\(c_{ip(k)}\)
Completion time of \(T_i\) in previous schedule (on \({AC}_k\))
\(a_k\)
Deployment time of \({AC}_k\)
\(s_{i(k)}\)
Start time of \(T_i\) in current schedule (on \({AC}_k\))
\({in}_N\)
Nominal flight test intensity
\(c_{i(k)}\)
Completion time of \(T_i\) in current schedule (on \({AC}_k\))
\({in}_i\)
Actual flight test intensity of \(T_i\)
\(c_k^{end}\)
Completion time of the last task on \({AC}_k\)
\(t_g\)
Start time of the aircraft grounding
\(c_i^{TP}\)
Maximum completion time of the pre-requisite tasks for \(T_i\)
G
Grounding aircraft
\(AM_{ikh}\)
Binary variable that is equal to 1 when \(T_i\) is arranged to position h of \(AC_k\), and 0 otherwise
Td
Directly affected task
\({Rl}_i\)
Binary variable that is equal to 1 when the aircraft assignment for \(T_i\) changes from the initial schedule, and 0 otherwise
\(ind_i\)
The deviation between the actual intensity and the nominal intensity of \(T_i\)
\({Tc}_i\)
Binary variable that is equal to 1 when the actual flight test intensity of \(T_i\) differs from \({in}_N\), and 0 otherwise
\(TR_{ij}\)
Binary parameter that is equal to 1 if task \(T_j\) can only be processed after the completion of task \(T_i\), and 0 otherwise
\(X_{ik}\)
Binary variable that is equal to 1 when task \(T_i\) is arranged on aircraft \(AC_k\) in the rescheduling, and 0 otherwise
\(N_0\)
The set of required tasks
  
Another important aspect of dynamic scheduling problems is how to measure the performance of a schedule. As for the rescheduling criteria, there are two main evaluation metrics for rescheduling problems in the literature: efficiency and stability [48]. Efficiency usually measures the performance of a scheduling system, which is generally the same as the objectives of the initial schedule, and stability evaluates the impact of disruptions by characterizing the deviation of the revised schedule from the initial schedule [49]. For example, Fattahi et al. [50] considered making a balance between efficiency measured by makespan and stability measured by the starting time deviation of the schedules for dynamic scheduling in a flexible job-shop problem. Katragjini et al. [51] proposed an objective function consisting of efficiency measured by the makespan and instability measured by the number of tasks whose starting times have been altered in the new schedule. Buddala et al. [52] used an efficiency indicator measured by makespan and stability measured by the deviation of the operations’ completion time. Tighazoui et al. [53] proposed an offline phase to minimize the total weighted waiting times (TWWT) of jobs in a classical identical parallel machine scheduling problem and an online phase to minimize the TWWT and the total weighted completion time deviation. Additionally, some other cost criteria are also involved in the literature, such as energy consumption [54], compression cost [26], workload [27], etc.

Problem statement and formulation

This section first describes the characteristics and objectives of the FTTSP. Next, based on the problem description, the mathematical model of the addressed problem is given. Before the detailed description, the related notations used throughout the paper are introduced in Table 1.

Problem description

The FTTSP considers scheduling hundreds of flight test tasks on several test aircraft, and multiple constraints should be taken into account. First, the logical relationship of tasks should be satisfied, which means that each task can be processed only after the completion of its pre-requisite tasks. Otherwise, the test results will be invalid. It should be noted that the flight test tasks need to be executed following the given logical relationship but do not need to be executed order by order, which differs from both the “operation” and the “job” in the flexible job-shop problem, which has been extensively studied in the literature. The duration of a task is influenced by the flight test intensity, which indicates the flight hours for each aircraft in a day. Generally, low-intensity flight tests will lead to a long FTD, while high-intensity flight tests may increase the workload of pilots. At the aircraft level, due to the different configurations of the aircraft, each task can only be tested on compatible aircraft. Meanwhile, due to the manufacturing capability, the test aircraft is not put into operation at the same time. Therefore, the arrival times of the test aircraft should also be considered, which is defined as the deployment time. Additionally, during the test execution process, the aircraft or other safeguarding equipment’s malfunction may require the aircraft to be grounded for maintenance.
The FTTSP considered in this paper can be described as follows: There are n flight test tasks \(N_0=\{T_1, T_2,..., T_n\} \) to be processed on m aircraft \(AC=\{{AC}_1,{AC}_2,...,{AC}_m\}\). Each task \(T_i\) can be processed only after the completion of its pre-requisite tasks \(T_j\subseteq {TP}_i (i\ne j) \) on any available aircraft \({AC}_k\) selected from a compatible aircraft set \({AC}^i\). The aircraft is deployed orderly with a certain interval \(a_T\). Thus, the arrival time \(a_k\) of aircraft \({AC}_k\) is \(\left( k-1\right) *a_T\). The nominal flight test intensity \({in}_N\) indicates the nominal flight hours of each aircraft in a day. The test duration of task \(T_i\) under the nominal flight test intensity \({in}_N\) is denoted as \(t_{i0}\) (days).
The problem addressed in this paper is composed of two stages: Before the disruption, an initial schedule that aims to minimize \(FTD_0=\max (c_{i0})\) needed to be established in the first stage. In the second stage, during the test execution process, each time a disruption occurs, the rescheduling process is triggered.
We use G to denote the grounding aircraft at time \(t_g\). Td represents the tasks that are directly affected. \(N_u\) denotes the unaffected task set. \(N_r\) is the set of the remaining tasks that need to be rescheduled. At the arrival of each interruption, the task sets are updated as follows:
Step 1: Identify the unaffected task set \(N_u\): For the grounding aircraft, tasks with a completion time prior to \(t_g\) will not be affected. The tasks on other aircraft with a start time before \(t_g\) are also unaffected. The unaffected task set \(N_u\) is defined in Eq. (1)
$$\begin{aligned} N_u{} & {} =\{T_i|c_{ipk}\le t_g,k\in {G}\}\cup \{T_i|s_{ipk}\nonumber \\{} & {} <t_g,k\in \left\{ 1,2,\ldots ,m\right\} -{G}\}. \end{aligned}$$
(1)
Step 2: Find the directly affected task Td: The directly affected task Td is the first task that is interrupted by the aircraft grounding, satisfying
$$\begin{aligned} Td=\{T_i[1]|s_{ipk}\le t_g,c_{ipk}>t_g,k\in {G}\}. \end{aligned}$$
(2)
Step 3: Identify the remaining task set \(N_r\): The remaining task set contains the tasks to be rescheduled, which is the complementary set of \(N_u\) and Td, calculated as follows:
$$\begin{aligned} N_r=N_0-N_u-Td. \end{aligned}$$
(3)
Table 2 gives an example of a simplified FTTSP. In this example, the tasks \(N_0=\{T_1,T_2,\ldots ,T_{12}\}\) need to be processed on three test aircraft \(AC=\{{AC}_1,{AC}_2,{AC}_3\}\). Each task has a pre-requisite task set and a compatible aircraft set. Figure 1 presents a Gantt chart of a feasible schedule with a disruption occurrence. Initially, \(T_1\), \(T_4\), \(T_7\), and \(T_{11}\) are arranged on \(AC_1\), \(T_3\), \(T_2\), \(T_5\), and \(T_9\) are arranged on \(AC_2\), and the rest of the tasks will be executed on \(AC_3\). At time \(t_g\), \(AC_1\) encounters a malfunction and needs to be grounded to perform the maintenance. Except for the directly affected task \(T_7\), the unaffected tasks \(N_u=\{T_1, T_2,T_3, T_4,T_6, T_8\}\), which are initiated before \(t_g\) in the previous schedule, keep the same place in the timetable. And the remaining tasks \(N_r=\{T_5,T_9,T_{10},T_{11},T_{12}\}\) need to be rescheduled. Here, we assume that the task directly interrupted by the aircraft grounding (\(T_7\) in this example) would be resumed on the same aircraft (\(AC_1\) in this example) after the maintenance.
Table 2
An example of the FTTSP
Task {pre-requisite tasks}
Compatible aircraft
Duration (days)
\(T_1\{\}\)
\({AC}_1,{AC}_2 \)
2
\(T_2\{T_1\}\)
\({AC}_2\)
2
\(T_3\{T_1\}\)
\({{AC}_1,AC}_2\)
2
\(T_4\{T_3\}\)
\({AC}_1\)
1
\(T_5\{T_4\}\)
\({AC}_2\)
2
\(T_6\{T_3\}\)
\({{AC}_1,AC}_3\)
3
\(T_7\{T_4\}\)
\({AC}_1\)
3
\(T_8\{\}\)
\({AC}_3\)
1
\(T_9\{T_{5,}T_6\}\)
\({{AC}_2,AC}_3\)
1
\(T_{10}\{T_7\}\)
\({{AC}_1,AC}_3\)
1
\(T_{11}\{T_9\}\)
\({{AC}_1,AC}_2\)
1
\(T_{12}\{T_9,T_{10}\}\)
\({{AC}_2,AC}_3\)
1
For the rescheduling stage, we first consider the FTD deviation between the initial schedule and the current schedule. In addition, the number of tasks reallocated and workload costs generated with the schedule changes are also considered, which are elaborated as follows:
1.
Objective of FTD deviation In the rescheduling phase, the objective of the initial schedule should be taken into account first. The FTD deviation between the current schedule and the initial schedule is calculated as
$$\begin{aligned} \textrm{FTD}_\textrm{dev}=\textrm{FTD}-\textrm{FTD}_0. \end{aligned}$$
(4)
 
2.
Objective of the number of tasks reallocated After the initial schedule has been generated, the aircraft arrangements for each task are determined. The test instrumentation, personnel, etc. required for the corresponding tasks are then prepared. Each time a test aircraft encounters a malfunction, the tasks already scheduled on the grounding aircraft can be transferred to other compatible aircraft, which allows the task to be executed as early as possible. In this case, the current aircraft arrangement for the tasks may differ from the initial arrangement, and the corresponding preparation needs to be adjusted accordingly. Specifically, let \(Rl_i\) be the binary variable that is equal to 1 when the aircraft assignment for \(T_i\) changes from the initial schedule and 0 otherwise. The objective of the number of tasks reallocated (NTR) can be defined as
$$\begin{aligned} \textrm{NTR}=\sum _{i=1}^{n}{Rl}_i. \end{aligned}$$
(5)
 
3.
Objective of workload cost As mentioned before, increasing the flight test intensity can compress the duration of the task and thus reduce the FTD. However, the increase in the flight test intensity will lead to an increase in the workload, which may get the flight test crew tired and raise the possibility of failure. As to quantifying the workload cost, a duration change (DC) factor is adopted, as shown in Eq. (6). Where \(ind_i\) indicates the difference between the actual intensity and the nominal intensity, which is calculated as \(ind_i=in_i-in_N\). When the actual flight intensity of \(T_i\) is set as \(in_N\), \(ind_i\) is equal to 0, i.e., it does not cause an increase in workload cost
$$\begin{aligned} \textrm{DC}=\sum _{i=1}^{n}\frac{ind_i}{{in}_N}*t_{i0}. \end{aligned}$$
(6)
 

Model formulation

As there is no established mathematical model in the literature that can be applied to address the proposed problem, this study establishes the mathematical models for the two-stage FTTSP based on a position-based modeling idea, which is inspired by [55].
Additionally, the following assumptions are made for the addressed problem:
1.
Only one task can be processed at a time by each aircraft.
 
2.
The task execution interrupted by the aircraft grounding will be immediately resumed on the same aircraft after the maintenance.
 
3.
The pre-requisite task set for each task is established.
 
4.
The compatible aircraft set for each task is determined.
 
5.
The interval between the aircraft deployments is known.
 
6.
The nominal test duration of each task is known.
 
Based on the above notations, descriptions, and assumptions, the mathematical model for the first stage of FTTSP is formulated as follows:
$$\begin{aligned}{} & {} \min FTD_0 \end{aligned}$$
(7)
            s.t.
$$\begin{aligned}{} & {} \sum _{k\in AC^i} \sum _{h\in PO_k} AM_{ikh}=1 \quad \forall i\in N_0 \end{aligned}$$
(8)
$$\begin{aligned}{} & {} c_i \ge s_i+t_{i} \quad \forall i\in N_0 \end{aligned}$$
(9)
$$\begin{aligned}{} & {} sm_{k(1)} \ge a_k \quad \forall k \in AC \end{aligned}$$
(10)
$$\begin{aligned}{} & {} c_i \le s_j+(1-TR_{ij} )*M \quad \forall i, j\in N_0, i\ne j \end{aligned}$$
(11)
$$\begin{aligned}{} & {} sm_{k(h+1)} \ge sm_{kh} + \sum _{i\in N_0} AM_{ikh}*t_{i} \nonumber \\{} & {} \qquad \qquad \qquad \forall k \in AC^i, h \in PO_k \end{aligned}$$
(12)
$$\begin{aligned}{} & {} sm_{kh} \ge s_i-(1-AM_{ikh} )*M \quad \forall i \in N_0,\nonumber \\{} & {} \qquad \qquad k \in AC^i, h \in PO_k \end{aligned}$$
(13)
$$\begin{aligned}{} & {} sm_{kh} \le s_i+(1-AM_{ikh})*M \quad \forall i\in N_0, \nonumber \\{} & {} \qquad \qquad k \in AC^i, h \in PO_k \end{aligned}$$
(14)
$$\begin{aligned}{} & {} \sum _{i\in N_0}AM_{ikh} \ge \sum _{i\in N_0}AM_{ik(h+1)} \nonumber \\{} & {} \qquad \qquad \qquad \forall k \in AC^i, h \in PO_k \end{aligned}$$
(15)
$$\begin{aligned}{} & {} \sum _{i\in N_0}AM_{ikh} \le 1 \quad \forall k \in AC^i, h \in PO_k \end{aligned}$$
(16)
$$\begin{aligned}{} & {} s_i\ge 0 \quad \forall i\in N_0 \end{aligned}$$
(17)
$$\begin{aligned}{} & {} AM_{ikh} \in \{0,1\} \quad \forall i\in N_0, k \in AC^i, h \in PO_k. \end{aligned}$$
(18)
The objective function (7) minimizes the \(FTD_0\). Constraint (8) ensures that each task can only be assigned to one position on a compatible aircraft. Constraint (9) indicates the constraint of the task’s start time and completion time. Constraint (10) guarantees the tasks can only be arranged after the arrival time of the aircraft. Constraint (11) forms the precedence relationships among different tasks. Constraints (12)–(14) denote the relationship between the variables \(s_{i}\), \(sm_{kh}\), and \(AM_{ikh}\). Constraint (15) guarantees that positions on each aircraft must be sequentially assigned to the tasks. Constraint (16) indicates that each position on each aircraft can only be arranged once. Constraint (17) represents that the task start time is non-negative. Constraint (18) defines the type of decision variable \(AM_{ikh}\).
For the second stage, the constraints (8) until (18) are also applied. The five additional constraints (19)–(23) are added, shown as follows:
$$\begin{aligned}{} & {} t_i\ge (in_N/in_i)*t_{i0} \quad \forall i\in N_r \end{aligned}$$
(19)
$$\begin{aligned}{} & {} ind_i\ge in_i-in_N \quad \forall i\in N_r \end{aligned}$$
(20)
$$\begin{aligned}{} & {} ind_i \ge 0 \quad \forall i\in N_r \end{aligned}$$
(21)
$$\begin{aligned}{} & {} Rl_i\ge \sum _{h\in PO_k} AM_{ikh} -X_{ik} \quad \forall i\in N_r, k\in AC^i \end{aligned}$$
(22)
$$\begin{aligned}{} & {} Rl_i\in \{0,1\} \quad \forall i\in N_r. \end{aligned}$$
(23)
Constraint (19) calculates the value of the actual task duration \(t_i\). Constraints (20) and (21) express the value of variable \(ind_i\). Constraint (22) calculates the value of variable \(Rl_i\). Constraint (23) defines the types of the variable \(Rl_i\).

Proposed method

Based on the analysis and formulation above, it can be concluded that the two-stage FTTSP is a type of complex COP. Therefore, it is difficult to realize the dynamic flight test task scheduling or rescheduling in various environments by the traditional combinatorial optimization methods. In this study, a predictive-reactive strategy based on a DRL approach is proposed to solve the addressed problem in an efficient and adaptive way. In this section, the background of RL is provided first. Then, the DRL architecture for predictive-reactive flight test task scheduling is illustrated. Next, the initial schedule generation and rescheduling process details are presented. Finally, the training algorithm for the addressed problem is given.

Background of RL

In general, an RL problem can be formed as an MDP, consisting of the following five components [56]: state space S, action space A, transition function T, reward function R, and discount factor \(\gamma \in [0,1]\). RL employs an agent to explore, interact with, and learn from the environment. At each time t, the agent observes the current state \(s_t\in S\). Following the policy \(\pi (a_t|s_t)\), the agent selects an action \(a_t\in A\). Then, the agent enters a new state \(s_{t+1}\) with transition probability \(P\left( s_{t+1}| s_t,a_t\right) \) and receives an immediate reward \(R\left( s_t,a_t,s_{t+1}\right) \).
Most learning algorithms for MDPs obtain optimal policies by learning a value function. The value function represents an estimate of “how good or bad” the agent is in a certain state. This estimate is represented by the expected return. Given a policy \(\pi \), the expected return at state s can be expressed as the value function \(V^\pi \left( s\right) :\ S\rightarrow {\mathbb {R}}\) [57], that is
$$\begin{aligned} V^\pi \left( s\right) ={\mathbb {E}}\left[ \sum _{k=0}^{\infty }{\gamma ^k*r_{t+k}|s_t=s,\pi }\right] . \end{aligned}$$
(24)
For any policy \(\pi \) and any state s, the expression in Eq. (24) can be recursively defined as the Bellman equation [58] as follows:
$$\begin{aligned} \begin{aligned} V^\pi \left( s\right)&={\mathbb {E}}\left[ r_t+\gamma r_{t+1}+\gamma ^2r_{t+2}+\ldots | s_t=s,\pi \right] \\&={\mathbb {E}}\left[ r_t+\gamma V^\pi \left( s_{t+1}\right) | s_t=s,\pi \right] \\&=\sum _{s^\prime \in S}{T\left( s,\pi (s),s^\prime \right) \left( R\left( s,a,s^\prime \right) +\gamma V^\pi \left( s^\prime \right) \right) }. \end{aligned} \end{aligned}$$
(25)
The objective is to find a policy \(\pi (s, a)\) to optimize the expected return \(V^\pi \left( s\right) \). An optimal policy \(\pi ^*\), which achieves the maximum expected return \(V^*\left( s\right) \), is represented by Eq. (26)
$$\begin{aligned} \pi ^*\left( s\right) =\arg \max \limits _ a{\sum _{s^\prime \in S}{T\left( s,a,s^\prime \right) \left( R\left( s,a,s^\prime \right) +\gamma V^*\left( s^\prime \right) \right) }}.\nonumber \\ \end{aligned}$$
(26)

DRL architecture for the predictive-reactive flight test task scheduling

The proposed scheduling approach consists of two stages. The first stage provides an initial schedule of the required tasks set by a constructive heuristic algorithm. Then, the rescheduling problem is modeled as an MDP. The training algorithm is based on the actor-critic architecture; in which both the actor network and the critic network are represented by nonlinear neural network approximators. The actor network \(\pi (s_t|\theta )\) determines the action selection policy according to the state of the environment, and the critic network \(V^\pi (s_t|\phi )\) represents the value function, which gives the expectation of the corresponding long-term reward. In the training process, the constructive heuristic algorithm provides input to the MDP, i.e., the MDP considers the results of the heuristic to obtain the final solution. As there are no rescheduling methods that can achieve the best performance under all scenarios, the RL agent interacts with the scheduling environment to learn the optimal policy. Finally, the optimal policy can be used to make decisions based on different environmental states, achieving adaptive task rescheduling. The architecture of the DRL-based predictive-reactive approach for the FTTSP is shown in Fig. 2.
Figure 3 describes the flowchart of the proposed approach that is used to solve the problem. At time \(t_0\), the initial schedule is generated with the constructive heuristic algorithm. Then, the problem is transformed into a formulated MDP model. When a rescheduling is triggered, at each rescheduling time t, the agent selects an action \(a_t\) to choose a rescheduling method according to the current environment state \(s_t\). The remaining tasks are then arranged according to the corresponding method. Next, the agent receives a reward \(r_t\) to evaluate the effect of the action and enter the next state \(s_{t+1}\). The process is repeated until all the tasks are completed.

Initial schedule generation

In this paper, the initial schedule is generated according to a constructive heuristic algorithm that takes into account the available time and the number of follow-up tasks for each task (HAANF). This heuristic aims to reduce the vacancy of the aircraft as much as possible, which helps to improve the utilization rate of the test aircraft. The following steps are proposed to generate the initial schedule:
Step 1: Update the ready task queue (RTQ). To ensure the effectiveness of the flight test, the tasks should be evaluated before execution to verify that the executable constraints are satisfied. The tasks that pass the assessment are added to RTQ. The criteria for assessment of flight test tasks before execution at the scheduling time t are shown in Eq. (27)
$$\begin{aligned} \textrm{RTQ}\left( t\right) =\left\{ T_i|T_i\in N_0;t\ge c_i^{TP};t\ge \min \limits _{k\in {AC}^i}{a_k}\right\} . \end{aligned}$$
(27)
Step 2: Determine the task \(T_s\) to be scheduled. In this paper, the task in RTQ with the minimum earliest start time (EST) is chosen to be assigned. The calculation for EST is shown in Eq. (28). If there is more than one alternative with the same EST, select the one with the maximum follow-up task number (FTN) among them. And the random one will be selected if two alternative tasks have the same number of follow-ups
$$\begin{aligned} \textrm{EST}(T_i)=\max \left\{ \min \limits _{k\in {AC}^i}a_k,\min \limits _{k\in {AC}^i}c_k^{end},c_i^{TP}\right\} . \end{aligned}$$
(28)
Step 3: Select the aircraft \(AC_s\) to arrange the task. The aircraft with the earliest idle time (EIT) will be selected to arrange the selected task. If there are multiple aircraft available with the same EIT, choose a random one among them.
Step 4: Remove \(T_s\) from \(N_0\) and update the schedule information on \(AC_s\).
Step 5: Repeat steps 1–4 until all the tasks have been scheduled.
The pseudo-code of HAANF is shown in Algorithm 1. As can be seen from the pseudo-code, the time complexity of HAANF is \(O(n^2)\) and the space complexity is O(n). Therefore, the HAANF can be easily integrated with MDP due to its low complexity, which is also an important advantage of heuristic algorithms.

Rescheduling process

For the second stage, with the monitoring of flight test execution information, the rescheduling is triggered every time an aircraft grounding occurs. The rescheduling step, which could be seen as a decision-making sequence, iterates until all the tasks are completed. As stated before, this rescheduling process can be modeled as an MDP model. In this section, the details of the action space, state features, reward definition, and training algorithm are provided.

Rescheduling methods

In this paper, three types of rescheduling methods are designed to form the action space according to the flight test task scheduling characteristics. In the proposed DRL method, at each rescheduling time t, the agent takes an action to adaptively select a rescheduling method in a stable and cost-efficient manner, and then, the schedule is repaired corresponding to the different flight test environment status. The rescheduling methods proposed in this work include (1) RSR, (2) ACR, and (3) IR. The details are as follows:
(1)
Right-shift rescheduling. The RSR repairs the previous schedule after a dynamic event by delaying the start time of remaining tasks while maintaining the sequence order of the initial schedule on each aircraft. The start time of the task \(s_{ik}\) and completion time \(c_{ik}\) of the remaining task \(T_i\in N_r\), which are scheduled after the disturbing time in the previous schedule, are recalculated according to Eq. (29) and Eq. (30)
$$\begin{aligned} s_{ik}=\max \left\{ a_k,c_k^{end},c_i^{TP}\right\} \end{aligned}$$
(29)
$$\begin{aligned} c_{ik}=s_{ik}+t_i. \end{aligned}$$
(30)
 
(2)
Aircraft changing rescheduling. In the ACR, the tasks affected by dynamic events can be moved to another compatible aircraft for testing. The following procedure presents the rescheduling process of the remaining tasks by the ACR. Step 1: Calculate the EST of each remaining task \(T_i\in N_r\) by Eq. (28) and order them from the earliest to the latest. The tasks with the same EST are ordered the same as the initial schedule. Step 2: Select the task with the highest priority of EST. If the EST of the selected task in the current schedule is greater than the start time of the initial schedule, the selected task will be arranged on the first idle aircraft. Otherwise, the task will be retained on the initially assigned aircraft. Additionally, if multiple aircraft have the same first idle time, the initially assigned aircraft among them is preferred; otherwise, choose the random one among them. Step 3: Repeat steps 1–2 to update the start time and completion time of the remaining tasks.
 
(3)
Inverse rescheduling. The IR is an emerging technology that is applied by modifying the parameters of a pre-specified task sequence. In this study, the flight test intensity is used as the varying parameter, allowing the task duration to be varied within a given range to fix the schedule. As mentioned before, a task’s test duration (in days) is estimated by the flight test intensity, which indicates the flight hours for each aircraft in a day. The relationship between the task duration and the given flight test intensity is expressed in Eq. (31). Before each task is scheduled, the duration under the new flight test intensity is calculated. If the task duration can be compressed by increasing the flight intensity, i.e., \(t_i<t_{i0}\), the task is scheduled according to the new flight intensity. Otherwise, the task will be scheduled at the nominal flight intensity. Except for the task duration change, the remaining tasks are arranged in the same way as RSR
$$\begin{aligned} t_i=ceil\left( \frac{{in}_N}{{in}_i}*t_{i0}\right) , \end{aligned}$$
(31)
where \(ceil(\bullet )\) represents a rounding operation that rounds the element toward positive infinity. In this study, the increasing flight test intensity \({in}_i\) is set as \(1.2*{in}_N\).
 

State features

The state feature definition in DRL determines how the agent perceives the environment, which significantly impacts the agent’s actions and performance. The definition of state features should represent comprehensive environmental information. Since the problem size is not fixed, in this paper, we employ abstracted information to express the state space. The state features are designed based on the information from the initial schedule and the current schedule, which are listed in Table 3.
The state vector of the information in Table 3 can be formulated as \(V_c=[m,\textrm{RFTD},\textrm{UTP},\textrm{UTDP},U_\textrm{ave},U_\textrm{std},UD_\textrm{ave},UD_\textrm{std},\textrm{RNTR},\textrm{RDC}]\). In addition, considering the information from the previous schedule is also important, the states from the previous iteration \(V_p\) are also taken into account at each rescheduling time. The final state is a concatenation of the two vectors: \(S=[V_c, V_p]\).
Table 3
State features of the problem
Index
Feature
Formulation
1
Number of aircraft
m
2
Relative FTD variation (RFTD)
\(\frac{FTD-{FTD}_0}{{FTD}_0}\)
3
Uncompleted task proportion (UTP)
\(\frac{n-n_r}{n}\)
4
Uncompleted task duration proportion (UTDP)
\(\frac{\sum _{T_i\in N_r} t_{i0}}{T_\textrm{sum}}\)
5
Average utilization rate of test aircraft (\(U_\textrm{ave}\))
\(\frac{\sum _{k=1}^{m}U_k}{m}\)
6
Standard deviation of utilization rate (\(U_\textrm{std}\))
\(\frac{\sqrt{\sum _{k=1}^{m}\left( U_k-U_\textrm{ave}\right) ^2}}{m}\)
7
Average utilization rate deviation (\(UD_\textrm{ave}\))
\(\frac{\sum _{k=1}^{m}{{(U}_k-U_{k0})}}{m}\)
8
Standard deviation of utilization rate deviation (\(UD_\textrm{std}\))
\(\frac{\sqrt{\sum _{k=1}^{m}\left( U_k-U_{k0}-UD_\textrm{ave}\right) ^2}}{m}\)
9
Relative number of task reallocation after rescheduling (RNTR)
\(\frac{1}{n}\sum _{i=1}^{n}{Rl}_i\)
10
Relative duration changes factor after rescheduling (RDC)
\(\sum _{i=1}^{n}\frac{ind_i*t_{i0}}{{in}_N*T_\textrm{sum}}\)

Performance measure

In DRL, the agent will receive a reward immediately for each action taken to assess the action’s immediate effect, and the accumulated reward will show the action’s long-term impact. Therefore, the definition of the reward function should be closely related to the rescheduling objectives. In the rescheduling, we consider optimizing the FTD deviation, task reallocation, and workload cost simultaneously. Therefore, the reward function contains three parts.
The first measure \(r_1\) considers the relative FTD variation, which is calculated as
$$\begin{aligned} r_1=-\textrm{RFTD}. \end{aligned}$$
(32)
The second measure \(r_2\) is relevant to the number of tasks with the changes of the aircraft after rescheduling. And we scale the \(r_2\) value at the same quantify level with \(r_1\), which is shown in Eq. (33)
$$\begin{aligned} r_2=-\textrm{RNTR}/(\textrm{FTD}_0*m). \end{aligned}$$
(33)
The third evaluation \(r_3\) considers the workload cost, associated with the flight intensity and task duration, as calculated by Eq. (34)
$$\begin{aligned} r_3=-\textrm{RDC}/(\textrm{FTD}_0*T_\textrm{avg}). \end{aligned}$$
(34)
Finally, the overall cumulative reward can be considered as a weighted sum of \(r_1\), \(r_2\), and \(r_3\), which can be formulated as
$$\begin{aligned} R=w_1*r_1+w_2*r_2+w_3*r_3, \end{aligned}$$
(35)
where \(w_1\), \(w_2\), and \(w_3\) represent the weight factor for each measure, respectively.

Training algorithm

Currently, DRL methods can be classified into two categories: value-based methods and policy-based methods [59]. Compared to value-based methods, policy-based methods have better performance on convergence and have the ability to learn stochastic strategies. PPO is a recent state-of-the-art policy gradient algorithm, which is based on the trust region policy optimization (TRPO) [60] algorithm, but it is easier to implement and has better generalization ability than TRPO. The PPO alternates between sampling data through interaction with the environment and optimizing a “surrogate” objective function using stochastic gradient ascent. We use \(\theta _{old}\) and \(\theta \) to denote the policy parameters before and after the parameter update, respectively. The objective of PPO can be defined as [61].
$$\begin{aligned} L^{CLIP}\left( \theta \right)= & {} {\mathbb {E}}_t[\min (r_t\left( \theta \right) {\hat{A}}_t,\ clip(r_t\left( \theta \right) , \nonumber \\{} & {} \quad 1-\epsilon ,1+\epsilon ){\hat{A}}_t)], \end{aligned}$$
(36)
where \(r_t\left( \theta \right) \) is the changing ratio of the policy formulated as \(r_t\left( \theta \right) =\pi _\theta (a_t|s_t)/\pi _{\theta _{old}}(a_t|s_t)\). \(\epsilon \in (0,1)\) is the clip parameter. The generalized advantage estimator (GAE) [62] is applied to approximate the advantage function \({\hat{A}}_t\).
The overall PPO training algorithm for the addressed problem is given in Algorithm 2.

Numerical experiments

In this section, the details of instance generations and parameter settings are provided first. Then, the algorithm’s performance in terms of generality is evaluated. Next, a sensitivity analysis of the proposed method under different scenarios is conducted. Furthermore, to confirm the effectiveness and superiority of the proposed method, we compare its performance with single rescheduling methods and different well-known scheduling algorithms.

Instances generation

To test the feasibility and effectiveness of the proposed algorithm, the scheduling scenario instances for training and testing are generated by simulation experiments. In the flight test task scheduling scenarios, the number of tasks n and test aircraft m vary uniformly in the interval [50, 300] and [3, 6], respectively. The task group v ranges uniformly from 2 to 8. For each flight test task \(T_i\), a set of compatible aircraft is randomly selected from \(\{1,2,\ldots ,m\}\). For the pre-requisite task set, the \(t_{pi}\) pre-requisite tasks are randomly selected from \(\{T_{v,1},\ T_{v,2},\ldots , T_{v,\ i-1}\}\) within the same task group, where \(t_{pi}\) varies uniformly in the interval of [0, 3]. The task duration under the nominal flight test tensity of each task follows a uniform distribution with the interval [1,15] days.
At the test aircraft level, the interval \(a_T\) between aircraft deployments varies uniformly between 15 and 60 days. The time interval between aircraft groundings and the time to repair follows the exponential distribution. The mean time between groundings (MTBG) is selected from the set of \(\{30, 60, 90\}\) days, and the mean time to repair (MTTR) is set to 10 days. The parameter settings mentioned above are summarized in Appendix 7.

Parameter settings

To balance the impacts of the different objectives with the reward function in Eq. (35), the weight coefficients \(w_1\), \(w_2\), and \(w_3\) are set as 0.4, 0.3, and 0.3, respectively. Such weight settings indicate that the importance of these three objectives is approximately the same, with a slightly higher weight for the FTD deviation and the same weight for the other two objectives.
For the training network, the actor network contains an input layer with 19 nodes, 2 Relu hidden layers with 256 nodes for each layer, and a Softmax output layer with 3 nodes. The critic network includes an input layer with 19 nodes, 2 Relu hidden layers with 256 nodes per layer, and a fully connected output layer with 1 node. During the PPO training, the agent extracts the mini-batch data from the buffer to adjust the parameter \(\theta \). The detailed parameter settings of the training algorithm are listed in Table 4.
Table 4
Parameter settings of the training algorithm
Parameter
Value
Input layer size
19
Number of hidden layers
2
Number of units of each hidden layer
256
Learning rate of actor network
1e-4
Learning rate of critic network
1e-3
Mini-batch size
32
Number of training epochs per update
5
Clip factor
0.2
Discount factor
0.95

Algorithm performance

The algorithm is trained for 10k episodes, where each episode is conducted in a randomly generated scenario for the robustness and generality of the algorithm according to “Instances generation ”. After each training episode, the trained policies on the validation instance are tested to verify the generality of the algorithm. The validation instance contains 150 tasks, 5 aircraft, and 6 task groups. The MTBG and MTTR are set at 90 days and 10 days, respectively. Figure 4 illustrates the values of reward per training episode of the validation instance. The convergence of the reward curve shows the ability of the proposed algorithm to learn the appropriate action under various states.
Table 5
Characteristics of the experimental instances
Instance
n
m
v
1
50
3
2
2
50
4
4
3
100
3
2
4
100
4
4
5
150
4
4
6
150
5
6
7
200
4
4
8
200
5
6
9
250
5
6
10
250
6
8
11
300
5
6
12
300
6
8

Sensitivity analysis

In this section, we use a variety of experimental instances with wide-range parameter settings of n, m, and v to test the proposed method’s performance under different scenarios, as shown in Table 5. The environment settings for each instance follow the generation rules in “Instances generation”. The algorithm is run 30 times independently for each instance.
Figure 5 shows the average values obtained for FTD, NTR, and DC for each instance with the different settings of MTBG. It can be seen that the performance of the proposed algorithm is influenced by different environmental settings. In detail, it can be found that (1) the FTD, NTR, and DC curves show an increasing trend. That is, as the instance size increases, the FTD, task reallocation, and workload costs will increase, as well. Specifically, in the case where m and v are fixed, e.g., the instance sets [2, 4, 5, 7] and [6, 8, 9, 11], the FTD, NTR, and DC all tend to increase as the number of tasks n increases. (2) When the task number n is fixed, the increase in m and v can reduce FTD by an average of 13.94%, 14.31%, and 12.91% when MTBG = 30, 60, and 90, respectively. Additionally, an increase in m and v can reduce the workload costs by, on average, 23.02%, 28.53%, and 45.73% when MTBG = 30, 60, and 90, respectively. Since the increase in m and v provides more task flexibility and, consequently, more possibilities to change the aircraft arrangements, the decrease in NTR is not significant. (3) In most of the test instances, both FTD and NTR fall as the frequency of groundings decreases, i.e., the MTBG value increases. As the value of DC is directly correlated with task duration, which varies greatly depending on the task, the variation of DC did not show a significant positive correlation with the frequency of failures. The above results demonstrate that the proposed approach can achieve adaptive task scheduling in various scenarios.

Comparisons with the single rescheduling methods

To verify the effectiveness and superiority of the proposed approach, comparison experiments are conducted with the proposed DRL approach, RSR, ACR, and IR to perform the rescheduling under the same set of scenario instances. As for each method, the initial schedule is generated in the same way as described in “Initial schedule generation”. Each comparative method is run 30 times independently for each instance.
The relative deviation (RE) is used as the performance indicator, which is calculated as \(RE=(R-R_{\min })/R_{\min }\). Where R is the reward as shown in Eq. (35) and \(R_{\min }\) represents the minimum value of R among all the solutions. Tables 6, 7, 8 present the comparison results with the different settings of MTBG. The first two columns of the table indicate the instance label and the instance size of ‘task number\(\times \)aircraft number\(\times \)task group number’. The last four columns give the average values of RE obtained by each method. The best values obtained are highlighted in bold. It can be seen that the proposed DRL approach can attain the best solutions in most of the test instances. These results show the superiority of the proposed approach, which has better performance than a single rescheduling method under various scenarios.
To show the algorithm’s performance for the FTD, we consider the gap value of the FTD, which is calculated as \(\textrm{GAP}_\textrm{FTD}=100*(\textrm{FTD}-\textrm{FTD}_0)/\textrm{FTD}_0\). Figure 6 shows the average GAP values of FTD. It can be seen that the proposed DRL shows better performance for the FTD optimization on most of the test instances. Additionally, the RSR maintains the original aircraft arrangement and task sequence and directly delays the start of the remaining tasks, thus obtaining the maximum value of \(\textrm{GAP}_\textrm{FTD}\). Compared to RSR, the proposed approach obtained gains of 11.39%, 10.50%, and 8.04% for the GAP value of FTD when MTBG = 30, 60, and 90, respectively, and an average gain of 9.98%. It can also be found that as the frequency of groundings increases, i.e., MTBG decreases, the gain in GAP value of FTD also increases.
Since only the ACR requires reallocation for tasks, the comparison of the task proportion of reallocation RNTR between the proposed DRL approach and the ACR is given in Fig. 7a. We can see that the proposed approach always attains the lowest level of task reallocation. As the problem scale increases, the proportion of task reallocation also increases. Similarly, Fig. 7b shows the comparison results on the task duration proportion of the intensity change, calculated as \((\sum _{i=1}^{n}{t_{i0}*{Tc}_i)/T_\textrm{sum}}\), between the proposed approach and the IR. The proposed DRL approach also achieves the lowest workload cost of flight test intensity change. These results confirm that the proposed DRL approach can effectively select the appropriate rescheduling method at different rescheduling points.

Comparisons with different scheduling algorithms

To further confirm the superiority of the proposed algorithm, the proposed approach is compared with several heuristic rules and a representative meta-heuristic algorithm GA, which are described below.
(1) Heuristic rules
As the environment changes, the heuristic rules need to sort the tasks and then arrange a task with the highest priority from RTQ to a compatible aircraft until all the tasks are completed. The FCFS selects the task with the first available time. The SPT selects the task with the shortest duration. The ECT selects the task with the earliest completion time. The MFT selects the task with the most follow-up tasks. At each step, the task selected by the above rules is allocated to the first idle aircraft. Each heuristic is performed 30 times independently on each test instance, both for the initial scheduling and the rescheduling process.
Table 6
Experiment results of the comparative methods when MTBG=30 days
Instance
Size
Proposed DRL
RSR
ACR
IR
1
\(50\times 3\times 2\)
0.2377
0.8198
0.3588
0.9339
2
\(50\times 4\times 4\)
0.1556
1.1223
0.2276
0.7481
3
\(100\times 3\times 2\)
0.1204
0.2050
0.1675
0.0740
4
\(100\times 4\times 4\)
0.3597
1.5377
0.3975
0.4213
5
\(150\times 4\times 4\)
0.1598
0.8785
0.3491
0.4957
6
\(150\times 5\times 6\)
0.1964
1.0800
0.2352
0.8748
7
\(200\times 4\times 4\)
0.1026
0.4484
0.1268
0.3094
8
\(200\times 5\times 6\)
0.1946
0.3632
0.2103
0.2323
9
\(250\times 5\times 6\)
0.1391
0.2223
0.3349
0.2484
10
\(250\times 6\times 8\)
0.1079
0.6984
0.1291
0.5632
11
\(300\times 5\times 6\)
0.1109
0.5396
0.1903
0.4078
12
\(300\times 6\times 8\)
0.1374
0.2299
0.1568
0.3823
Table 7
Experiment results of the comparative methods when MTBG=60 days
Instance
Size
Proposed DRL
RSR
ACR
IR
1
\(50\times 3\times 2\)
0.2745
0.9417
0.7841
0.3181
2
\(50\times 4\times 4\)
0.3799
0.8686
0.7263
0.6696
3
\(100\times 3\times 2\)
0.3559
1.1950
0.6287
0.8416
4
\(100\times 4\times 4 \)
0.2217
0.7911
0.3307
0.2933
5
\(150\times 4\times 4 \)
0.3214
1.0783
0.3403
0.5216
6
\(150\times 5\times 6 \)
0.5860
3.4010
1.2119
2.6289
7
\(200\times 4\times 4 \)
0.1905
0.4460
0.3046
0.3372
8
\(200\times 5\times 6 \)
0.1319
0.5821
0.1523
0.3477
9
\(250\times 5\times 6 \)
0.1923
1.1178
0.3188
0.5448
10
\(250\times 6\times 8 \)
0.1187
0.5063
0.1582
0.5479
11
\(300\times 5\times 6 \)
0.1698
0.5276
0.4821
0.4400
12
\(300\times 6\times 8 \)
0.1457
0.2549
0.2185
0.1952
Table 8
Experiment results of the comparative methods when MTBG=90 days
Instance
Size
Proposed DRL
RSR
ACR
IR
1
\(50\times 3\times 2\)
0.3635
1.3079
0.6723
0.4754
2
\(50\times 4\times 4\)
0.0736
0.0000
0.1346
0.1219
3
\(100\times 3\times 2\)
0.7923
2.3508
2.0615
1.1277
4
\(100\times 4\times 4 \)
0.4114
0.9111
0.8499
0.5147
5
\(150\times 4\times 4 \)
0.3900
1.4123
2.2583
0.9299
6
\(150\times 5\times 6 \)
0.1901
1.4944
0.3345
1.0041
7
\(200\times 4\times 4 \)
0.2251
0.4761
0.1945
0.2459
8
\(200\times 5\times 6 \)
0.3139
1.2595
0.4695
0.7133
9
\(250\times 5\times 6 \)
0.1990
0.6581
0.3058
0.3343
10
\(250\times 6\times 8 \)
0.3177
0.8574
0.2437
0.3560
11
\(300\times 5\times 6 \)
0.1522
0.3253
0.2143
0.1527
12
\(300\times 6\times 8 \)
0.1641
0.7004
0.2562
0.3913
(2) GA
The GA is initially run to create an initial schedule, and then, when an interruption occurs, the algorithm is re-executed to generate a new schedule. Each chromosome contains two parts: a task sequencing vector (TSV) and an aircraft assignment vector (AAV). For TSV initialization, the order of the task sequence is generated by the random permutation of the integers from 1 to n without repeating elements. An aircraft is randomly selected from the compatible set of aircraft for each task to form the initial AAV. The FTD is used as the fitness function. The 30 best results for FTD in the final iteration are used for comparison. The main settings of the GA are listed in Table 9.
Table 9
GA parameter settings
Parameter
Setting
Population size
100
Crossover for TSV
Position-based crossover [63]
Crossover for AAV
Multi-point crossover
Crossover probability
0.8
Mutation for TSV
Two-point swap mutation
Mutation for AAV
One-point mutation
Mutation probability
0.2
Selection
Binary tournament [64]
Elitism preservation
Yes
Maximum iteration
100
Since these comparative algorithms do not involve changes in flight intensity, here we only investigate FTD and task reallocation results. Figure 8 shows the experiment results for the average GAP values of FTD and the task proportion of reallocation. It can be seen that the proposed approach can obtain the relative optimal solution for FTD and the lowest level of task reallocation in most of the test instances.
Besides, the GA achieves the smallest FTD in several small-scale instances. It is shown that the meta-heuristic algorithm performs better for FTD optimization on some small-size instances compared to other heuristic rules. However, the meta-heuristic algorithm takes more computation time due to its population iterations. It is also worth noting that the GA and FCFS attain lower FTD levels with a higher level of task reallocation. While the ECT and SPT achieve the opposite result. Each of them demonstrates the difficulty of balancing multiple objectives. In conclusion, compared to these representative heuristic and meta-heuristic algorithms, the proposed approach performs the best in this study.

Conclusion and future work

In this paper, the flight test task scheduling problem with consideration of aircraft grounding is studied. A predictive-reactive strategy based on a DRL approach is proposed to address the problem. First, a constructive heuristic is employed to generate an initial schedule to optimize FTD. Then, the rescheduling process is modeled as an MDP model. Specifically, three types of rescheduling methods are designed. The 19 state features from different scheduling stages are extracted to provide a comprehensive reflection of the environment. Three objectives, including the FTD deviation, task reallocation, and workload cost, are considered. The PPO-based algorithm is applied to learn the optimal policy. Finally, the test instances of the wide-range size are generated to conduct the numerical experiments to demonstrate the effectiveness and superiority of the proposed approach. From the experimental results, the conclusions are drawn as follows: 1) The proposed approach can achieve adaptively selecting the appropriate rescheduling methods to arrange the remaining flight test tasks on appropriate aircraft according to the different environmental states. 2) The proposed DRL approach shows the superiority of single rescheduling methods under various scenarios. Compared to the method that directly delays the tasks, the proposed approach yields a nearly 10% gain in FTD. 3) With the comparison of the five well-known heuristic and meta-heuristic scheduling algorithms, the proposed approach has advantages both in quality and efficiency.
Table 10
Overview of the parameter settings for flight test task scheduling scenarios
Parameter
Setting
Task number (n)
randi [50, 300]
Aircraft number (m)
randi [3, 6]
Task group number (v)
randi [2, 8]
Nominal task duration (\(t_{i0}\))
randi [1, 15]
Pre-requisite task number (\(tp_i\))
randi [0, 3]
Pre-requisite task set (\(TP_i\))
At random from the set \(\{T_{v,1},\ T_{v,2},\ldots ,T_{v,\ i-1}\}\)
Interval between aircraft deployments (\(a_T\))
randi [15, 60]
Compatible aircraft set (\(AC^i\))
At random from the set \(\{1,2,\ldots ,m\}\)
The mean time between groundings (MTBG)
At random from the set \(\{30, 60, 90\}\)
Time interval between aircraft groundings
Exponential distribution with the MTBG
The mean time to repair (MTTR)
10
Time to repair
Exponential distribution with the MTTR
This study helps to provide a two-stage schedule generation to support the whole flight test process in an efficient, stable, and low-cost way. Moreover, this study can be generalized to various dynamic scheduling problems. In future work, more heuristics for initial schedule generation as well as for rescheduling will be investigated to provide more diverse solutions. In addition, more dynamic disruptions, such as task cancelations and re-testing, will also be studied.

Declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Anhänge

Appendix A: The parameter settings of the flight test task scheduling scenarios

The overview of the parameters used to generate the flight test task scheduling scenarios in this study is presented in Table 10. In this table, the term ”randi” refers to the uniform distribution of integers.
Literatur
10.
Zurück zum Zitat Wang Z, Zhang J, Si J (2020) Dynamic job shop scheduling problem with new job arrivals: a survey. In: Deng Z (eds) Proceedings of 2019 Chinese Intelligent Automation Conference. Springer, Singapore, pp 664–671 Wang Z, Zhang J, Si J (2020) Dynamic job shop scheduling problem with new job arrivals: a survey. In: Deng Z (eds) Proceedings of 2019 Chinese Intelligent Automation Conference. Springer, Singapore, pp 664–671
60.
Zurück zum Zitat Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: Bach F, Blei D (eds) Proceedings of the 32nd International Conference on Machine Learning, vol 37, Proceedings of Machine Learning Research, Lille, France: PMLR, pp 1889–1897 Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: Bach F, Blei D (eds) Proceedings of the 32nd International Conference on Machine Learning, vol 37, Proceedings of Machine Learning Research, Lille, France: PMLR, pp 1889–1897
Metadaten
Titel
A predictive-reactive strategy for flight test task scheduling with aircraft grounding
verfasst von
Bei Tian
Gang Xiao
Yu Shen
Publikationsdatum
05.03.2024
Verlag
Springer International Publishing
Erschienen in
Complex & Intelligent Systems
Print ISSN: 2199-4536
Elektronische ISSN: 2198-6053
DOI
https://doi.org/10.1007/s40747-024-01365-8