Zum Inhalt

A multi-objective optimization approach for the virtual coupling train set driving strategy

  • Open Access
  • 10.01.2025
Erschienen in:

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Der Artikel beschreibt die aktuelle Funktionsweise von virtuellen Kupplungszügen (VCTS), bei denen jede Zugeinheit unabhängig voneinander arbeitet. Es wird ein mehrzieliger Optimierungsansatz eingeführt, um die Fahrstrategie des Führungszuges und die Nachführkontrolle des Folgezuges zu verbessern. Die Studie verwendet einen Lernalgorithmus zur Tiefenverstärkung (ICM-PER-D3QN), um die Geschwindigkeitskurve des Führungszuges zu optimieren, und einen Partikelschwarm-Optimierungsalgorithmus zur präzisen Steuerung der Vorhersage (PSO-MPC). Die Forschungsergebnisse zeigen signifikante Verbesserungen bei der Genauigkeit der Haltestellen, Pünktlichkeit, Energieeffizienz und Fahrgastkomfort im Vergleich zu herkömmlichen Methoden. Experimentelle Ergebnisse zeigen die Effektivität des vorgeschlagenen Rahmenwerks und zeigen sein Potenzial, die Sicherheit und Effizienz des VCTS-Betriebs zu verbessern.

1 Introduction

Currently, the mode of operation for a virtual coupling train set (VCTS) typically involves the independent, step-by-step planning and control of each train unit. The leader train develops a reference speed curve based on the operational plan and regulates its operations. Meanwhile, the following train receives real-time status updates from the leader train and establishes its control strategy, considering the safety distance between the two trains traveling in the same direction [1]. Existing research on virtual coupling operation control primarily focuses on achieving collision avoidance [2], shortening tracking intervals [3], improving precision, and decreasing control times [4]. The goal is to uphold the stability and accuracy of coordinated control among coupled trains while enhancing transportation efficiency [5]. Other research, such as that by Wang et al. [6], examines the impacts of model uncertainty and external disruptions on the VCTS operation control. Yang et al. [7] have managed to curtail operational costs by adjusting the operation mode of the VCTS on high-traffic routes. Although Luo et al. [8] were the first to implement dynamic coupling control in a practical environment, they have not yet addressed the optimization of external interference and leader train speed curves. Overall, there are not many studies in this area that look at how to improve the VCTS operation control process. At the moment, the most important part of modern VCTS operation control is building speed curves and controlling how they track [9]. To get the best results from the VCTS operation control process, this paper improves the VCTS operation control framework and splits it into two modules: one for tracking control of the follower train and one for optimizing the driving strategy of the leader train to get the best results. On the basis of the improved control framework for the operation of VCTS, both operating conditions and external disturbances on the railway line are taken into account in order to achieve multi-objective optimization of the driving strategy for the VCTS.
Train speed curve optimization problems are classified into three categories according to the solution method [10]. The first category consists of optimization approaches based on Pontryagin’s maximum principle, which optimize speed curves by determining the train’s optimal switching points in different operating phases [11]. Tan et al. [12] and Ying et al. [13] have further investigated the efficiencies of motor traction and regenerative braking, as well as the effects of speed restriction situations and steep grades, respectively. Algorithms like the hybrid simulated annealing algorithm [14], the hybrid genetic algorithm [15], the taboo search algorithm [16], the ant colony algorithm [17], and the particle swarm optimization algorithm [18] are in the second group. These approaches aim to optimize models by achieving predefined objectives. The third category is data-driven methods. Yin et al. [19] and Ning et al. [20] use deep reinforcement learning algorithms to add expert knowledge to the rules that decide which train driving methods to use. This limits the train speed and improves the optimization effect. Similarly, Lin et al. [21] achieved multi-objective optimization of the speed curve by optimizing the control sequence. Meng et al. [22] developed a train operating simulation platform using a five-dimensional digital twin model, and they conducted a comparative analysis with a three-dimensional model to validate the model’s advantages and consistency with the simulation object. This paper uses a deep reinforcement learning algorithm to achieve multi-objective optimization of train speed curves. Compared to Pontryagin’s maximum principle, reinforcement learning algorithms excel at dealing with complex, high-dimensional state and action spaces [23]. Unlike heuristic methods, the deep reinforcement learning algorithm effectively handles complex constraints [24], offering high adaptability and robust solutions for decision-making. Additionally, their ability to adjust strategies based on feedback enhances flexibility, making them ideal for a variety of complex environments [25].
The function of the follower train tracking control module is to achieve the precise control of the leader train’s speed curve, ultimately enabling multi-objective optimization of the VCTS speed curve. The implementation can be broadly divided into three categories. Geometric control algorithms, specifically the proportional-integral-derivative (PID) algorithm, are a widely used method due to its simple structure, stable performance, and fast response time [26]. Furthermore, using a neural network function to fine-tune the traditional PID controller can effectively improve the accuracy and effectiveness of train tracking in complex systems [27]. The second method is an optimization control algorithm. With a known system model, model predictive control (MPC) yields solid tracking results, and its enhancements surpass traditional methods in terms of control effects [28, 29]. Bersani et al. [30] utilize other approaches like the linear quadratic regulator (LQR) method, which adjusts the train driving strategy and effectively reduces control times. The third method is the intelligent control algorithm. According to Chen et al. [31], they developed an adaptive iterative learning controller that uses command filtering inverse step technology to control train tracking. This shows that it is useful for managing fuzzy and repetitive systems. Additionally, Huang et al. [32] combined artificial field potentials with consensus algorithms, ensuring stability in train tracking control by adjusting the potential function. Overall, the MPC algorithm is better than the PID algorithm at dealing with complicated nonlinear systems and improving tracking accuracy, stability, and robustness [33]. Since the reference tracking curve is known, the MPC can solve the optimization problem at each state transition step and find the best control input. This is more precise than adaptive iterative learning, which does not involve process optimization, resulting in higher tracking accuracy [34].
Overall, this paper presents an improved method for VCTS operation control framework, addressing the lack of driving strategy optimization and the poor tracking control accuracy inherent in traditional systems. This method involves constructing the tracking control process using the leader–follower model and optimizing the leader train’s speed curve with an improved deep reinforcement learning algorithm. Subsequently, employing the method of knowledge transfer, the speed curve of the leader train serves as the reference for the follower train. An improved MPC algorithm is then developed to further improve tracking control accuracy, achieving multi-objective optimization of VCTS’s driving strategies.
We delineate the main contributions of this paper from a macro view and a micro-view:
(1)
As shown in Fig. 1, from a macroscopic point of view, this paper improved the VCTS operation control framework, primarily through the multi-objective optimization of the leader train driving strategy and enhanced tracking control accuracy of the follower train. Using the knowledge transfer method, the optimized speed curve of the leader train serves as the tracking objective for the follower train, accounting for temporary speed limits on the railway line and communication delays between the trains. This approach effectively realizes multi-objective optimization of driving strategies for the VCTS.
 
(2)
For the first micro-view, we look at how well the intrinsic curiosity module prioritized experience replay dueling double deep Q-network (ICM-PER-D3QN) algorithm and the deep Q-network (DQN) method work. The experimental results demonstrate that the ICM-PER-D3QN algorithm is more effective for addressing the multi-objective optimization problem of the leader train’s driving strategy than the optimization results achieved by the DQN algorithm.
 
(3)
For the second micro-view, we compare the control effects of the particle swarm optimization (PSO)-based MPC algorithm, referred to as PSO-MPC, along with MPC, PID, and LQR algorithms on the tracking process of virtual coupling trains. The experimental results indicate that the PSO-MPC algorithm significantly enhances the tracking accuracy of the virtual coupling trains and ensures their safe operation.
 
Fig. 1
VCTS operation control framework
Bild vergrößern

2 Equations describing the process of high-speed train operation

The single mass point model has the characteristics of fewer parameters, higher computational efficiency, fewer configurable parameters, etc. Since the traction or braking force of each carriage usually cannot be directly controlled in the existing ATO system, the design of the train operation control model by using the single mass point model can reduce the model complexity and improve the computational efficiency more effectively [35].
$$\left\{ {\begin{gathered} {\dot{x}_{i} \left( t \right) = v_{i} \left( t \right)} \hfill \\ {\dot{v}_{i} \left( t \right) = \frac{{u_{i} \left( t \right) - f_{i} \left( {v_{i} \left( t \right)} \right) - \xi_{i} \left( {x_{i} \left( t \right)} \right)}}{{M_{i} }}} \hfill \\ {f_{i} \left( v \right) = c_{0} + c_{1} v_{i} \left( t \right) + c_{2} v_{i}^{2} \left( t \right)} \hfill \\ {\xi_{i} = M_{i} g\left( {\varphi \left( {x_{i} \left( t \right)} \right) + \frac{2000}{{R\left( {x_{i} \left( t \right)} \right)}}} \right)} \hfill \\ \end{gathered} } \right. ,$$
(1)
where \(i = 0,\;1, \ldots ,\;I\) represents the train index number; \(t\) is the train operation time; \(x_{i} \left( t \right)\) is the current position of the train \(i\); \(v_{i} \left( t \right)\) is the current speed; \(u_{i} \left( t \right)\) is the traction or braking force output from the train \(i\); \(f_{i} \left( {v_{i} \left( t \right)} \right)\) represents the basic operating resistance; \(c_{0}\), \(c_{1}\), and \(c_{2}\) values are selected based on experience and train type [36]; \(\xi_{i}\) signifies the extra resistance; \(\varphi \left( {x_{i} \left( t \right)} \right)\) is the thousandth part of the slope at the current position of the train \(i\); \(R\left( {x_{i} \left( t \right)} \right)\) is the radius of the curve at the current location of the train \(i\); \(M_{i}\) represents the mass of the train \(i\); \(g\) represents the acceleration of gravity. The aforementioned approach to designing train operation resistance allows for the research and computation of train operation energy loss. This method effectively reduces the effort and calculated amount while still ensuring the accuracy of the model simulation [37].

3 Leader train driving strategy optimization model

3.1 PER-D3QN algorithm based on intrinsic curiosity drive

The ICM-PER-D3QN algorithm comprises four main components: an intrinsic curiosity module (ICM) [38], a prioritized experience replay (PER) [39], a dueling network (DN) [40], and double Q-learning (DQL) [41]. The ICM consists of two models: The inverse model collects the feature vectors \(\varPhi \left( {s_{t} } \right)\), \(\varPhi \left( {s_{t + 1} } \right)\) associated with the current moment states \(s_{t}\), the next moment states \(s_{t + 1}\), and the current moment action \(a_{t}\) through the feature extraction layer and conducts action prediction. The closer the expected action value is to the actual action value, the better. The forward model utilizes the inputs \(\varPhi \left( {s_{t} } \right)\) and \(a_{t}\) to forecast the characteristic \(\hat{\varPhi }\left( {s_{t + 1} } \right)\) of the subsequent state. The internal reward provided by the ICM increases proportionally with the magnitude of the discrepancy between the anticipated and real values. To address the issue of sparse rewards, the reward function is modified by combining the incentive \(r_{t}^{{{\text{in}}}}\) obtained via environmental exploration with the reward \(r_{t}^{{\text{e}}}\), resulting in a new reward \(r_{t}^{{{\text{in}}}} + r_{t}^{{\text{e}}}\). This incentivizes the intelligent agent to improve its exploration capabilities. The intrinsic reward \(r_{t}^{{{\text{in}}}}\) is calculated from
$$r_{t}^{{{\text{in}}}} = \kappa \cdot ||f(s_{t} ,a_{t} ) - s_{t + 1} ||^{2},$$
(2)
where \(\kappa\) is a coefficient regulating the intensity of intrinsic rewards, and \(f(s_{t} ,a_{t} )\) denotes the state transfer prediction model.
The DN employs a two-path structure. One path estimates the value function \(V(s\begin{array}{*{20}l} ; \hfill \\ \end{array} \theta ,\alpha )\), while the other path estimates the action advantage function \(A(s,a\begin{array}{*{20}l} ; \hfill \\ \end{array} \theta ,\beta )\). The value function \(Q(s,a\begin{array}{*{20}l} ; \hfill \\ \end{array} \theta ,\alpha ,\beta )\), which represents the current state and activity, is produced by combining these two functions, and the value function corresponding to the current state and action. The intelligent agent consistently updates the value function to maximize rewards and determine the optimal course of action during each training session. Alternatively, in DQL, we utilize two networks, \(Q\) and \(Q^{\prime}\), where the values of the selected action and the evaluated action are calculated by separate networks. This approach helps to avoid overestimation. The network updating formula for the ICM-PER-D3QN algorithm is detailed in Eq. (3).
$$\begin{aligned} Q\,\left({s,a;\;\theta ,\alpha ,\beta } \right) = & V(s;\;\theta ,\alpha ) + \left( {A(s,a;\;\theta ,\beta ) - \frac{1}{{\left| A \right|}}\sum\limits_{{a^{*} \in \left| A \right|}} {A(s,a^{*} ;\;\theta ,\beta )} } \right)\!, \\ & \delta _{t} = r_{t}^{{\text{e}}} + \gamma Q^{\prime } (s_{{t + 1}} ,{\text{argmax}}_{a} Q(s_{{t + 1}} ,a);\;\theta ,\alpha ,\beta )\,, \\ & Q(s_{t} ,a_{t} ;\;\theta ,\alpha ,\beta )\; \leftarrow Q(s_{t} ,a_{t} ;\;\theta ,\alpha ,\beta ) + l(\delta _{t} + r_{t}^{{{\text{in}}}} ) ,\\ \end{aligned}$$
(3)
In Eq. (3), the first part is that DN is mainly used to improve the estimation of the value function Q, and the network can be applied to every value update process. Among its parameters, the term \(\frac{1}{\left| A \right|}\sum\limits_{{a^{*} \in \left| A \right|}} {A(s,a^{*} )}\) computes the mean dominance across all actions in state \(s\), with each action dominance reduced by this average to zero out the mean dominance per state, \(\left| A \right|\) represents the total number of actions in the denoted action space, and \(a^{*}\) represents all possible actions in the current state \(s\); \(\theta\) is the network parameter that is shared by the state function, value function, and advantage function in the dueling network; \(\alpha\) is the parameter of the state value function, while \(\beta\) is the parameter of the advantage function. The second part exemplifies the advantages of DQL in the computation of the temporal difference (TD) error \(\delta\), where \(\gamma\) is the discount factor. \(Q^{\prime}\big(s_{t + 1} ,{\text{argmax}}_{a} Q(s_{t + 1} ,a);\;\theta ,\alpha ,\beta \big)\) is the value obtained by taking the optimal action in state \(s_{t + 1}\). The final third part shows the Q-value updating process obtained by combining the DN and the DQL advantage improvement, where \(l\) is the learning rate. Figure 2 shows the configuration of the ICM-PER-D3QN algorithm, where the \({\text{Max}}_{a^{\prime}} Q\left( {s_{t + 1} ,a_{t + 1} ;\;\theta_{t + 1} ,\alpha_{t + 1} ,\beta_{t + 1} } \right)\) represents the highest \(Q\) value associated with the next state and action. \(\arg \max_{a} Q\left( {s,a;\;\theta ,\alpha ,\beta } \right)\) represents the action with the greatest value in the current state.
Fig. 2
The structure of ICM-PER-D3QN algorithm
Bild vergrößern

3.2 State space and action space

The train operation process encompasses a significant quantity of state information. To precisely depict and streamline the computation of the train operation process, this paper defines the train position, speed, operation time, and acceleration information as the state space. Equation (4) shows the state vector of the train.
$$\left\{ {\begin{array}{*{20}l} \begin{gathered} n \in [0,N] \hfill \\ x_{n} \in \left[ {0,X_{{{\text{end}}}} } \right] \hfill \\ \end{gathered} \hfill \\ {v_{n} \in \left[ {0,V_{\max } } \right]} \hfill \\ {b_{n} \in \left[ {B_{\min } ,B_{\max } } \right]} \hfill \\ \begin{gathered} t_{n} \in \left[ {0,T_{{{\text{end}}}} } \right] \hfill \\ ds = \frac{{X_{{{\text{end}}}} }}{N} \hfill \\ {\varvec{S}}_{n} = \left( {x_{n} ,v_{n} ,a_{n} ,t_{n} } \right) \hfill \\ \end{gathered} \hfill \\ \end{array} } \right. ,$$
(4)
where the variable \(x_{n}\), \(v_{n}\), \(b_{n}\) , and \(t_{n}\) represent the train position, speed, acceleration, and operation time in state \(n\), respectively; \(X_{{{\text{end}}}}\) represents the train’s planned stopping position; \(V_{\max }\) represents the maximum speed limit of the railway line; \(B_{\min }\) and \(B_{\max }\) are the maximum deceleration and acceleration, respectively; \(T_{{{\text{end}}}}\) is the planned arrival time of the train at the terminal station. Finally, since this paper adopts the discrete space method to deal with the train operation process, the displacement change during the train operation is regarded as a state transfer process, according to which the distance \(X_{{{\text{end}}}}\) traveled by the train is divided into \(N\) segments, forming \(N + 1\) state nodes. Thus, the length of each segment \(ds\) is given, and the state of the train is consistent within this length.
Control commands are outputted based on the train driving strategy when the train is in a different state, and the control commands remain unchanged when the train is in the same state. Designate the initial state as \({\varvec{S}}_{0} = \left( {0,0,0,b_{0} } \right)\) and the end state as \({\varvec{S}}_\text{end} = \left( {X_{{{\text{end}}}} ,0,T_{{{\text{end}}}} ,0} \right)\). In order to describe the train operation process more accurately, this paper calculates the train operation process according to the simulation step size before the current operation state of the train is transformed into the next state transition point. To enhance the solution accuracy of the algorithm, we refine the computation at each state step \(ds\), subsequently dividing this step into several smaller unit steps within the same state, each defined as a simulation step \(dx\). The computational procedure at each simulation step is shown in Eq. (5), where \(t_{1}\) and \(v_{1} \left( {t_{1} } \right)\) correspond to the current operation time and speed of the train, while \(t_{0}\) and \(v_{0} \left( {t_{0} } \right)\) represent the operation time and speed of the train in the previous step, respectively. However, the actual length of the railway line is usually not an integer multiple of the state transition step length. Therefore, the transition step length for the end state needs to be determined according to the actual conditions of the railway line, and the final state may exceed the planned range of states.
$$\left\{ \begin{gathered} b_{0} = \left( {\frac{{u_{i} \left( {t_{0} } \right) - f_{i} \left( {v_{0} \left( {t_{0} } \right)} \right) - \xi_{i} \left( {x_{i} \left( {t_{0} } \right)} \right)}}{{M_{i} }}} \right) \hfill \\ v_{1} \left( {t_{1} } \right) = \sqrt {2 \cdot b_{0} \cdot dx + v_{0}^{2} \left( {t_{0} } \right)} \hfill \\ t_{1} = t_{0} + \frac{2 \cdot dx}{{v_{0} \left( {t_{0} } \right) + v_{1} \left( {t_{1} } \right)}} \hfill \\ \end{gathered} \right. .$$
(5)
In this design, the possible actions outputted by the train in state n include traction, coasting, and braking, where the traction and braking actions include five gears each and the coasting action is only one gear. The output action levels and their corresponding traction and braking forces are shown in Eq. (6):
$$\begin{gathered} a_{n} = \left\{ {\begin{array}{*{20}l} {a_{n}^{0} = - \eta_{0} ,\;\eta_{0} \in \left\{ {0.2,0.4,0.6,0.8,1} \right\}} \hfill \\ {a_{n}^{1} = \eta_{1} ,\;\;\;\;\eta_{1} = 0} \hfill \\ {a_{n}^{2} = \eta_{2} ,\;\;\;\;\eta_{2} \in \left\{ {0.2,0.4,0.6,0.8,1} \right\}} \hfill \\ \end{array} } \right., \hfill \\ u_{n} = \left\{ {\begin{array}{*{20}l} {a_{n} \cdot F_{{\text{t}}}^{{{\text{max}}}} ,\;\;a_{n} > 0} \hfill \\ {0\begin{array}{*{20}l} {,\;\;} \hfill \\ \end{array} \;\;\;\;\;\;\;\;\;\;a_{n} = 0} \hfill \\ {a_{n} \cdot F_{{\text{b}}}^{{{\text{max}}}} ,\;\;a_{n} < 0} \hfill \\ \end{array} } \right., \hfill \\ \end{gathered}$$
(6)
where the variable \(a_{n}\) represents the action in state \(n\). The variable \(a_{n} > 0\) represents the traction condition, \(a_{n} < 0\) the braking condition, and \(a_{n} = 0\) the coasting condition. \(\eta\) represents the gear corresponding to the action. The variable \(u_{n}\) represents the traction or braking force of the train in state \(n\), while \(F_{{\text{t}}}^{{{\text{max}}}}\) and \(F_{{\text{b}}}^{{{\text{max}}}}\) denote the maximum traction force and maximum common braking force of the train, respectively.

3.3 Reward function

This paper primarily focuses on four key metrics for train operation: energy loss, exactness of train stop, punctuality, and passenger comfort. These metrics serve as the optimization objectives for the leader train driving strategy optimization model. The reward function is defined by
$$G= \omega_{1} R_{{{\text{stop}}}} + \omega_{2} R_{{{\text{time}}}} + \omega_{3} R_{{{\text{comfort}}}} + \omega_{4} R_{{{\text{energy}}}}.$$
(7)
(1)
Reward for the exactness of the train stop
A stopping error of 0.5 m is deemed acceptable for a high-speed train, whereas a stopping error of 0.3 m is regarded as highly secure. Hence, this paper establishes the exactness of the train stop reward when the train’s stopping error is limited to 0.3 m, as shown in Eq. (8):
$$R_{{{\text{stop}}}} = \left\{ \begin{gathered} \min \left( {\frac{{D_{{{\text{stop}}}} }}{{\left| {x_{{{\text{end}}}} - X_{{{\text{end}}}} } \right|}},R_{{{\text{reward}}}} } \right),\;\left| {x_{{{\text{end}}}} - X_{{{\text{end}}}} } \right| \le 0.3 \hfill \\ R_{{{\text{penalty}}}} ,\;\left| {x_{{{\text{end}}}} - X_{{{\text{end}}}} } \right| > 0.3 \hfill \\ \end{gathered} \right. ,$$
(8)
where \(D_{{{\text{stop}}}}\) represents an arbitrary constant that must be determined based on the experimental scenario and \(x_{{{\text{end}}}}\) represents the actual position at which the train comes to a stop. A positive reward \(R_{{{\text{reward}}}}\) is granted when the train’s stopping error falls within the desired range. The magnitude of the reward increases as the stopping error diminishes, and conversely, a penalty \(R_{{{\text{penalty}}}}\) is imposed when the stopping error exceeds the desired range.
 
(2)
Reward for the punctuality
Delays in the operation of high-speed trains can quickly spread across the railway system, and in severe instances, they can greatly disrupt the coordination of timetables and passenger journeys. Hence, this document defines the intended time of the train’s operation based on its operational schedule. A train is deemed punctual if the discrepancy between its actual time of travel and the scheduled time is within 5 s; any train delay above 1 min is classified as a delay train [42]. The reward for punctuality is determined according to
$$R_{{{\text{time}}}} = \left\{ \begin{gathered} \min \left( {\frac{{D_{{{\text{time}}}} }}{{\left| {t_{{{\text{end}}}} - T_{{{\text{end}}}} } \right|}},R_{{{\text{reward}}}} } \right),\;\left| {t_{{{\text{end}}}} - T_{{{\text{end}}}} } \right| \le 5 \hfill \\ R_{{{\text{penalty}}}} ,\;\left| {t_{{{\text{end}}}} - T_{{{\text{end}}}} } \right| > 5 \hfill \\ \end{gathered} \right. ,$$
(9)
where \(D_{{{\text{time}}}}\) represents an arbitrary constant that must be determined based on the specific experimental scenario and \(t_{{{\text{end}}}}\) the actual stopping time of the train. A positive reward \(R_{{{\text{reward}}}}\) is assigned when the error in the train’s stopping time falls within the desired range. The magnitude of the reward increases as the error in the stopping time decreases. Conversely, a penalty \(R_{{{\text{penalty}}}}\) is imposed when the error exceeds the desired range.
 
(3)
Reward for the comfort
Excessive changes in the acceleration of high-speed trains can cause discomfort to passengers and, in severe instances, compromise passenger safety. Hence, in this article, the reward function will be formulated based on the variation in acceleration between the operational modes of the high-speed train, as shown in Eq. (10):
$$R_{{{\text{comfort}}}} = - \int\limits_{0}^{{T_{{{\text{end}}}} }} {\left| {\frac{{{\text{d}}b}}{{{\text{d}}t}}} \right|} {\text{d}}t = - \sum\limits_{n = 1}^{N + 1} {\left| {b_{n} - b_{n - 1} } \right|}.$$
(10)
Equation (10) states that among the \(N + 1\) states during train operation, the reward is higher when the acceleration difference between neighboring states is smaller. In this paper, the rate of change of acceleration during train operation is calculated and observed to assess the variation in passenger comfort (Vcomfort), as shown in Eq. (11):
$$V_{{{\text{comfort}}}} = \left| {\frac{{\sum\limits_{n = 1}^{N + 1} {\left( {b_{n} - b_{n - 1} } \right)} }}{{N \cdot {\text{d}}t}}} \right|.$$
(11)
 
(4)
Reward for the energy loss.
The traction energy loss of a train power unit constitutes a significant component of the overall train energy loss, amounting to approximately 60% of the total energy loss of the entire train system [43]. Hence, in this research, the reward function is formulated with the aim of minimizing the energy loss during the traction process, as shown in Eq. (12):
$$R_{{{\text{energy}}}} = - \int\limits_{0}^{{X_{{{\text{end}}}} }} {\eta_{0} \cdot F_{{\text{t}}}^{{{\text{max}}}} {\text{d}}x} .$$
(12)
 

4 Reference system

This paper presents the construction of a reference system using the minimum time distribution (MTD) algorithm. The system generates the reference speed of the train in the current state \(n\) to ensure safe operation, improve punctuality performance, and enhance passenger comfort. The train moving distance is separated into multiple segments, with each section having a recorded speed limit point denoted as \(\left( {x_{k}^{\lim } ,v_{k}^{\lim } } \right)\) and \(k = 1,\;2, \ldots ,\;K\). Figure 3 shows the planning curve of the reference system.
Fig. 3
Reference system planning curve
Bild vergrößern
The time required to reach the next speed limit point \(\left( {x_{k}^{\lim } ,v_{k}^{\lim } } \right)\), given the current condition \((x_{n} ,v_{n} ,t_{n}, b_{n} )\) of the train, is represented as \(t_{{\text{r}}}^{\min }\). \(T_{{\text{r}}}^{\min }\) represents the minimal operation time to reach the end point, which is determined based on the present position of the train. Equation (13) shows the remaining time of the train operation in its present condition:
$$T_{n} = T_{{{\text{end}}}} - t_{n} - T_{{\text{r}}}^{\min }.$$
(13)
Equation (14) shows the minimum and maximum reference speeds in the present condition:
$$\left\{ {\begin{gathered} {v_{n}^{\min } = \frac{{x_{k}^{\lim } - x_{n} }}{{t_{{\text{r}}}^{\min } + T_{n} }}} \\ {v_{n}^{\max } = \frac{{x_{k}^{\lim } - x_{n} }}{{t_{{\text{r}}}^{\min } }}} \\ \end{gathered} } \right. .$$
(14)
This paper outlines the method of generating a speed curve for a leader train, which is divided into three stages: the traction process, the intermediate operation process, and the braking process. During the traction process, the reference system limits the selection of the train action to only the traction action gears, without any random selection. During the intermediate operation phase, a more advanced hybrid control approach [44] is employed to modify the train operating process, as shown in Fig. 4. When the train operational speed curve matches the reference system’s empirically generated braking curve, the braking procedure starts. This approach enhances the exactness of the train stop. In Fig. 4, \(v_{\lim }\) represents the temporary speed limit value.
Fig. 4
Improved hybrid control strategy
Bild vergrößern

5 Follower train tracking control model

5.1 State space model

The linearization method is a frequently employed technique in coupling systems. It involves the follower train acquiring the reference curve of the leader train (indexed as \(i = 0\)) and completing the tracking process. Then, a Taylor expansion of Eq. (1) around the equilibrium point \(v_{0} \left( t \right) = v_{1} \left( t \right) = \ldots = v_{I} \left( t \right)\) yields
$$\left\{ {\begin{gathered} {\dot{x}_{i} \left( t \right) = v_{i} \left( t \right)} \hfill \\ {\dot{v}_{i} \left( t \right) = \frac{{u_{i} \left( t \right) - \xi_{i} \left( {x_{i} \left( t \right)} \right) - p_{i} \left( t \right)v_{i} \left( t \right) - q_{i} \left( t \right)}}{{M_{i} }}} \hfill \\ {p_{i} \left( t \right) = c_{1} + 2c_{2} v_{0} \left( t \right)} \hfill \\ {q_{i} \left( t \right) = c_{0} - c_{2} v_{0}^{2} \left( t \right)} \hfill \\ \end{gathered} } \right. .$$
(15)
The state space equation of unit coupling train \(i = 1,\;2, \cdots ,\;I\) can be defined as follows: \({\varvec{x}}_{i} \left( t \right) = \left[ {\Delta x_{i} ,\Delta v_{i} } \right]^{{\text{T}}}\), in which \(\Delta x_i = x_{i - 1} \left( t \right) - x_{i} \left( t \right) - L - x_{{\text{d}}} - x_{\min }\) represents the displacement error, \(L\) the train length, \(x_{{\text{d}}}\) the intended tracking distance, and \(x_{\min }\) the safe tracking margin; \(\Delta v_i = v_{i - 1} \left( t \right) - v_{i} \left( t \right)\) represents the speed error and \(\Delta u_{t} \left( t \right) = u_{i - 1} \left( t \right) - u_{i} \left( t \right)\) represents the control variable error. Finally, the state space equations of unit coupling train \(i\) are brought \({\varvec{x}}_{i} \left( t \right)\) into Eq. (15), and Eq. (16) is obtained.
$${\dot{\varvec{x}}}_{i} \left( t \right) = {\varvec{A}}\left( t \right){\varvec{x}}_{i} \left( t \right) + {\varvec{B}}\Delta u_{i} \left( t \right).$$
(16)
In order to simplify the calculation related to the coupling train \(i\), the parameters \(p_{i} \left( t \right)\) and \(M_{i}\) in Eq. (15) are abbreviated to \(p\left( t \right)\) and \(M\), and set \(\varvec{A}\left( t \right) = \left[ {\begin{array}{*{20}c} 0 & 1 \\ 0 & { - \frac{p\left( t \right)}{M}} \\ \end{array} } \right]\) and \(\varvec{B} = \left[ {\begin{array}{*{20}c} 0 \\ \frac{1}{M} \\ \end{array} } \right]\).

5.2 Constraints setting

The control commands emitted by the train must adhere to the limitations of its maximum traction and braking capability, as shown in Eq. (17):
$$F_{{\text{b}}}^{\max } \le u_{i} \left( t \right) \le F_{{\text{a}}}^{{{\text{max}}}}.$$
(17)
The train must not surpass the minimum speed restriction value while in operation, and its speed must always be above 0, as shown in Eq. (18).
$$0 \le v_{i} \left( t \right) \le v_{\lim } \left( {x_{i} \left( t \right)} \right),$$
(18)
where \(v_{\lim } \left( {x_{i} \left( t \right)} \right)\) represents the value of the speed limit at \(x_{i} \left( t \right)\).
The minimum safe tracking distance between coupling trains is constrained to be
$$x_{i - 1} \left( t \right) - x_{i} \left( t \right) - L \ge x_{{{\text{safe}}}},$$
(19)
where the minimum safe tracking interval is represented as \(x_{{{\text{safe}}}}\), between the coupling train \(i\) and its preceding train is constrained by the relative braking distance principle, as shown in Eq. (20).
$$x_{\text{safe}} = x_{\min } + \max \left[ {\left( {\frac{{v_{i - 1}^{2} \left( t \right)}}{{2F_{{\text{b}}}^{{{\text{max}}}} }} - \frac{{v_{i}^{2} \left( t \right)}}{{2F_{{\text{b}}}^{{{\text{max}}}} }}} \right) \cdot M,0} \right],$$
(20)
where \(\frac{{v_{i}^{2} \left( t \right)}}{{2F_{{\text{b}}}^{{{\text{max}}}} }}\) represents the emergency braking distance of train \(i\). The relative braking distance determines the minimum interval required for tracking adjacent coupling trains when \(\frac{{v_{i - 1}^{2} \left( t \right)}}{{2F_{{\text{b}}}^{{{\text{max}}}} }} - \frac{{v_{i}^{2} \left( t \right)}}{{2F_{{\text{b}}}^{{{\text{max}}}} }} \ge 0\). When \(\frac{{v_{i - 1}^{2} \left( t \right)}}{{2F_{{\text{b}}}^{{{\text{max}}}} }} - \frac{{v_{i}^{2} \left( t \right)}}{{2F_{{\text{b}}}^{{{\text{max}}}} }} \le 0\), the safe tracking margin determines the minimal safe interval necessary for tracking adjacent coupling trains.

5.3 Design of the distributed controller

This paper employs distributed model predictive control (DMPC) to achieve accurate tracking control among the coupling trains. Due to the requirement for independent control of coupling trains and the ability to communicate state information through train-to-train communication, DMPC is well suited for tracking control among coupling trains.
Equation (21) calculates the output vector for the current moment and the state vector for subsequent moments based on the state vector \({\varvec{x}}_{i} \left( t \right)\) at the current moment \(t\).
$$\left\{ {\begin{array}{*{20}l} {{\dot{\varvec{x}}}_{i} \left( {t + T_{{\text{p}}} \left| t \right.} \right) = {\varvec{Ax}}_{i} \left( {t + T_{{\text{p}}} - 1\left| t \right.} \right) + {\varvec{B}}\Delta u_{i} \left( {t + T_{{\text{p}}} - 1\left| t \right.} \right)} \hfill \\ \begin{gathered} {\varvec{y}}_{i} \left( {t + T_{{\text{p}}} - 1\left| t \right.} \right) = {\varvec{Ox}}_{i} \left( {t + T_{{\text{p}}} - 1\left| t \right.} \right) \hfill \\ {\varvec{O}} = \left[ {\begin{array}{*{20}c} 1 & 0 & 0 \\ 0 & 1 & 0 \\ \end{array} } \right] \hfill \\ \end{gathered} \hfill \\ \end{array} } \right. ,$$
(21)
where \({\varvec{x}}_{i} \left( {t + T_{{\text{p}}} - 1\left| t \right.} \right)\) denotes the state amount at moment \(t + T_{{\text{p}}} - 1\), predicted at moment \(t\), and similarly, \({\varvec{y}}_{i} \left( {t + T_{{\text{p}}} - 1\left| t \right.} \right)\) denotes the prediction of the output at moment \(t\) for moment \(t + T_{{\text{p}}} - 1\), while \({\varvec{O}}\) is the output matrix. Based on this, the system state vector \({\varvec{X}}_{{\varvec{i}}} \left( t \right)\) within the prediction horizon \(T_{{\text{p}}}\) and the control input \({\varvec{U}}_{{\varvec{i}}} \left( t \right)\) within the control horizon \(T_{{\text{c}}}\) at moment \(t\) are obtained, as shown in Eq. (22), noting that \(T_{{\text{c}}}\) is usually smaller than \(T_{{\text{p}}}\) where, \({\varvec{Y}}_{i} \left( t \right)\) denotes the output vector of the system.
$$\left\{ {\begin{array}{*{20}l} {{\varvec{X}}_{{i}} \left( t \right) = \left[ {{\varvec{x}}_{i} \left( {t + 1\left| t \right.} \right)^{{\text{T}}} , \cdots ,\;{\varvec{x}}_{i} \left( {t + T_{{\text{p}}} \left| t \right.} \right)^{{\text{T}}} } \right]^{{\text{T}}} } \hfill \\ \begin{gathered} {\varvec{U}}_{{i}} \left( t \right) = \left[ {\Delta \varvec{u}_{i} \left( {t\left| t \right.} \right)^{{\text{T}}} , \cdots ,\;\Delta \varvec{u}_{i} \left( {t + T_{{\text{c}}} - 1\left| t \right.} \right)^{{\text{T}}} } \right]^{{\text{T}}} \hfill \\ {\varvec{Y}}_{i} \left( t \right) = \left[ {{\varvec{y}}_{i} \left( {t + 1\left| t \right.} \right)^{{\text{T}}} , \cdots ,\;{\varvec{y}}_{i} \left( {t + T_{{\text{p}}} \left| t \right.} \right)^{{\text{T}}} } \right]^{{\text{T}}} \hfill \\ \end{gathered} \hfill \\ \end{array} } \right. .$$
(22)
Equation (21) is rewritten in matrix form by applying the state vector \({\varvec{X}}_{i} \left( t \right)\), the control vector \({\varvec{U}}_{i} \left( t \right)\), and the output \({\varvec{Y}}_{i} \left( t \right)\) at the current moment \(t\) to the system, as shown in Eq. (23).
$$\left\{ {\begin{array}{*{20}l} {\boldsymbol{Y}_{i} \left( t \right) = \varvec{\varPhi X}_{i} \left( t \right) + \varvec{\psi U}_{i} \left( t \right)} \hfill \\ {\varvec{\varPhi } = \left[ {\boldsymbol{A},\boldsymbol{A}^{2} , \cdots ,\boldsymbol{A}^{{T_{{\text{p}}} }} } \right]^{\text{T}} } \hfill \\ {\varvec{\psi } = \left[ {\begin{array}{*{20}l} \boldsymbol{B} & 0 & \cdots & 0 \\ {\boldsymbol{AB}} & \boldsymbol{B} & \cdots & 0 \\ \vdots & \vdots & {} & \vdots \\ {\boldsymbol{A}^{{T_{{\text{p}}} - 1}} B} & {\boldsymbol{A}^{{T_{{\text{p}}} - 2}} \boldsymbol{B}} & \cdots & \boldsymbol{B} \\ \end{array} } \right]} \hfill \\ \end{array} } \right. ,$$
(23)
\({\varvec{Y}}_{{\text{r}}} \left( t \right) = \left[ {{\varvec{y}}_{{\text{r}}} \left( {t + 1} \right)^{{\text{T}}} ,{\varvec{y}}_{{\text{r}}} \left( {t + 2} \right)^{{\text{T}}} , \ldots ,{\varvec{y}}_{{\text{r}}} \left( {t + T_{{\text{p}}} } \right)^{{\text{T}}} } \right]^{{\text{T}}}\) is a representation of the reference trajectory within the prediction horizon \(T_{{\text{p}}}\) and \({\varvec{y}}_{{\text{r}}}\) denotes the reference state. When accounting for the communication delay, the reference trajectory becomes
$${\varvec{Y}}_{{\text{r}}} \left( t \right) = \left[ {{\varvec{y}}_{{\text{r}}} \left( {t + 1 - t_{{\text{d}}} } \right)^{{\text{T}}} ,\;{\varvec{y}}_{{\text{r}}} \left( {t + 2 - t_{{\text{d}}} } \right)^{{\text{T}}} , \ldots ,\;{\varvec{y}}_{{\text{r}}} \left( {t + T_{{\text{p}}} - t_{{\text{d}}} } \right)^{{\text{T}}} } \right]^{{\text{T}}} ,\quad t + T_{{\text{p}}} - t_{{\text{d}}} \ge 0,$$
where \(t_{{\text{d}}}\) is a representation of the communication delay. Combining the results above, we derive the cost function as shown in Eq. (24):
$$\begin{gathered} {\varvec{J}}_{{{\text{MPC}}}} = \left[ {{\varvec{Y}}_{i} \left( t \right) - {\varvec{Y}}_{{\text{r}}} \left( t \right)} \right]^{{\text{T}}} {\varvec{Q}}\left[ {{\varvec{Y}}_{i} \left( t \right) - {\varvec{Y}}_{{\text{r}}} \left( t \right)} \right] + {\varvec{U}}_{i} \left( t \right)^{{\text{T}}} {\varvec{RU}}_{i} \left( t \right) \\ = \left[ {{\varvec{\varPhi X}}_{i} \left( t \right) + {\varvec{\psi U}}_{i} \left( t \right) - {\varvec{Y}}_{{\text{r}}} \left( t \right)} \right]^{{\text{T}}} {\varvec{Q}}\left[ {{\varvec{\varPhi X}}_{i} \left( t \right) + {\varvec{\psi U}}_{i} \left( t \right) - {\varvec{Y}}_{{\text{r}}} \left( t \right)} \right] + {\varvec{U}}_{i} \left( t \right)^{{\text{T}}} {\varvec{RU}}_{i} \left( t \right) \\ = \left[ {{\varvec{\varPhi X}}_{i} \left( t \right) - {\varvec{Y}}_{{\text{r}}} \left( t \right)} \right]^{{\text{T}}} {\varvec{Q}}\left[ {{\varvec{\varPhi X}}_{i} \left( t \right) - {\varvec{Y}}_{{\text{r}}} \left( t \right)} \right] + 2{\varvec{U}}_{i} \left( t \right)^{{\text{T}}} {{\varvec{\varPhi}}}^{{\text{T}}} {\varvec{Q}}\left[ {{\varvec{\varPhi X}}_{i} \left( t \right) - {\varvec{Y}}_{{\text{r}}} \left( t \right)} \right] + \\ {\varvec{U}}_{i} \left( t \right)^{{\text{T}}} \left( {{{\varvec{\varPhi}}}^{{\text{T}}} {\varvec{Q\varPhi }}} \right){\varvec{U}}_{i} \left( t \right) + {\varvec{U}}_{i} \left( t \right)^{{\text{T}}} {\varvec{RU}}_{i} \left( t \right), \\ \end{gathered}$$
(24)
where the two-dimensional weight matrix \({\varvec{Q}}\) is utilized to modify the impact of the state vector on the system, while the one-dimensional non-negative weight matrix \({\varvec{R}}\) is employed to achieve the modification of the control increment.
In order to address the computational complexity and instability of the nonlinear constraints, this paper utilizes a constraint set representation, as shown in Eq. (25), to linearize the nonlinear constraints discussed in Sect. 3.2.
$$\left. \begin{gathered} F_{{\text{b}}}^{\text{max}} \le u_{i} \left( t \right) \le F_{{\text{a}}}^{\text{max}} \hfill \\ v_{i} \left( t \right) \ge 0 \hfill \\ v_{\lim } \left( {x_{i} \left( t \right)} \right) - v_{i} \left( t \right) \ge 0 \hfill \\ x_{i - 1} \left( t \right) - x_{i} \left( t \right) - x_{\min } - L \ge 0 \hfill \\ \end{gathered} \right\}.$$
(25)
The optimal estimation of the objective is obtained using Eq. (24), as follows:
$$\begin{aligned} \qquad{\hat{\varvec{J}}}_{\text{MPC}} &= 2{\varvec{U}}_{i} \left( t \right)^{{\text{T}}} {{\varvec{\psi}}}^{\text{T}} {\varvec{Q}}\left[ {{\varvec{\varPhi X}}_{i} \left( t \right) - {\varvec{Y}}_{\text{r}} \left( t \right)} \right] + \\ &{\varvec{U}}_{i} \left( t \right)^{{\text{T}}} \left( {{\varvec{\psi}}^{\text{T}} {\varvec{Q\psi }}} \right){\varvec{U}}_{i} \left( t \right) + {\varvec{U}}_{i} \left( t \right)^{{\text{T}}} {\varvec{RU}}_{i} \left( t \right) \\ &= 2{\varvec{U}}_{i} \left( t \right)^{{\text{T}}} {\varvec{H}} + {\varvec{U}}_{i} \left( t \right)^{{\text{T}}} {\varvec{EU}}_{i} \left( t \right), \\ \end{aligned}$$
(26)
where \({\varvec{H}} = {{\varvec{\psi}}}^{{\text{T}}} {\varvec{Q}}\left[ {{\varvec{\varPhi X}}_{i} \left( t \right) - {\varvec{Y}}_{r} \left( t \right)} \right]\) and \({\varvec{E}} = {{\varvec{\uppsi}}}^{{\text{T}}} {\varvec{Q\psi }} + {\varvec{R}}.\)
The optimal control sequence \({\varvec{U}}_{i}^{*} \left( t \right) = \left[ {u_{i}^{*} \left( {t\left| t \right.} \right),u_{i}^{*} \left( {t + 1\left| t \right.} \right), \ldots ,u_{i}^{*} \left( {t + T_{{\text{p}}} - 1\left| t \right.} \right)} \right]^{{\text{T}}}\) in the prediction horizon from the current moment is obtained by minimizing Eq. (26), where the first control variable \(u_{i}^{*} \left( {t\left| t \right.} \right)\) of the control sequence is the optimal control variable under the current moment. Throughout the train operation, the system iteratively follows these steps at each current moment, thereby achieving optimal control for the entire operational process.

5.4 Parameter adaptive design

The advantages of the MPC algorithm, especially its ability to handle uncertainty and nonlinear safety constraints, make it well suited for virtual coupling trains, particularly for collision avoidance and shortening following distances. However, in VCTS applications, the following distance between trains must be strictly controlled to ensure safety. Traditional MPC might not effectively predict and manage this due to insufficient consideration of future uncertainties, thus jeopardizing the safety of the entire tracking process. Current applications of improved MPC algorithms in VCTS tracking and control incorporate techniques like robust optimization and adaptive parameter adjustment [4549]. To further improve the prediction capabilities of the MPC algorithm and enhance control effects, we developed a PSO-based control strategy to boost the tracking performance of the MPC system for the reference velocity curve. This strategy dynamically adjusts the control and prediction horizons, optimizing responses and enabling precise tracking of the reference trajectory, significantly enhancing the stability and accuracy of the control system. The MPC method that utilizes PSO for dynamic adjustment is shown in Fig. 5.
Fig. 5
MPC method based on PSO dynamic adjustment
Bild vergrößern
We begin by defining an objective function whose main purpose is to minimize the average error between the actual and reference velocities, as well as between the actual and reference displacements, over the control period. This is accomplished by adjusting the control and prediction horizons. The optimization objective of the particle swarm algorithm is shown in Eq. (27):
$$J_{{{\text{PSO}}}} = \frac{1}{{t_{{{\text{step}}}} }}\sum\nolimits_{t}^{{t + t_{{{\text{step}}}} }} {\left[ {\varpi_{1} \left( {v\left( t \right) - v_{{\text{r}}} \left( t \right)} \right)^{2} + \varpi_{2} \left( {x\left( t \right) - x_{{\text{r}}} \left( t \right)} \right)^{2} } \right]},$$
(27)
where \(t_{{{\text{step}}}}\) denotes the size of the optimization time window for the PSO algorithm to solve the optimization; \(\varpi_{1}\) and \(\varpi_{2}\) are the weighting factors for the velocity error and displacement error, respectively; \(v\left( t \right)\) and \(v_{{\text{r}}} \left( t \right)\) denote the actual velocity and the reference velocity at time \(t\), respectively; \(x\left( t \right)\) and \(x_{{\text{r}}} \left( t \right)\) denote the actual displacement and the reference displacement at time \(t\).
To determine the optimal control parameters, we applied the PSO algorithm. This algorithm begins by initializing a swarm of particles, each representing a set of control parameters. The particles adjust their positions in the parameter space based on the performance of the objective function while also considering the optimal historical positions of both individual particles and the swarm. The algorithm efficiently explores the parameter space by iteratively updating the positions and velocities of the particles, gradually optimizing the parameters to minimize the objective function. After identifying the optimal parameters, we implement them in the MPC system and adjust the control logic to enhance trajectory tracking performance. This adjustment significantly improves the system response to and control over the dynamic reference trajectory. The velocity and position update formulas of the PSO algorithm are detailed in Eqs. (28) and (29), respectively:
$$v_{jd}^{(t + 1)} = w \cdot v_{jd}^{(t)} + z_{1} \cdot r_{1} \cdot (p_{jd} - x_{jd}^{(t)} ) + z_{2} \cdot r_{2} \cdot (p_{\text{g}d} - x_{jd}^{(t)} ),$$
(28)
$$x_{{jd}}^{{(t + 1)}} = x_{{jd}}^{{(t)}} + v_{{jd}}^{{(t + 1)}},$$
(29)
where \(v_{jd}^{(t)}\) is the velocity of particle \(j\) in dimension \(d\) at time \(t;\) \(x_{jd}^{(t)}\) is the position of the particle \(j\) in dimension \(d\) at time \(t;\) \(p_{jd}\) is the individual historical best position of particle \(j\) in dimension \(d;\) \(p_{{{\text{g}}d}}\) is the group historical best position in dimension \(d;\) \(w\) is the inertia weight, which controls the effect of the previous velocity on the current velocity; \(z_{1}\) and \(z_{2}\) are the learning factors, often referred to as cognitive and social parameters, respectively, which determine the intensity of particle movement toward the individual best experience and the group best experience; \(r_{1}\) and \(r_{2}\) are random numbers between \(\left[ {0,1} \right]\).

6 Experimental setting and results

The experimental environment selected for this research is a single direction of the double-track railway operation from Jinan West Station to Jinan Station on the Qingdao–Jinan passenger railway. The train operates at a speed of 200 km/h on this railway line. The railway line is 20,000 m long. The shortest operation time is 7 min, while the average is 16 min [50]. The experiment involves a CRH3C-type train set with traction and braking forces that depend on the current speed [51]. Table 1 offers a comprehensive description of the train parameters.
Table 1
Train parameters
Parameter
Symbol
Value
Train crew mass (t)
M
536
Train length (m)
L
200
Operation time (s)
Tend
800,
900,
960,
1020
Maximum speed limit (km·h−1)
Vmax
200
Maximum tractive force (kN)
\(F_{{\text{a}}}^{{{\text{max}}}}\)
300
Maximum common braking force (kN)
\(F_{{\text{b}}}^{{{\text{max}}}}\)
536
Safety tracking margin (m)
xmin
100
Desired tracking distance (m)
xd
100
Control cycle (ms)
τc
10
Communication delay (ms)
τ
The experimental scenario comprises two components: the first involves devising the speed curve for the leader train using the ICM-PER-D3QN algorithm, while the second entails controlling the coupling train to closely follow the leader train using MPC. This paper presents the construction of the operation and control framework for VCTS using Python 3.8 on a computer with an Intel® Core™ i7-8750H CPU @ 2.20 GHz and 16 GB RAM. The effectiveness of the framework is demonstrated through simulation experiments. Table 2 provides details of the experimental parameters.
Table 2
Experimental parameters
Parameter
Symbol
Value
Maximum number of iterations
Nmax
5000
Capacity of the experience replay pool
Cmax
10,000
Learning rate
l
0.001
Discount factor
γ
0.9
Initial greed
εinit
1
Ultimate greed
εend
0.01
State transfer step (m)
ds
500
Simulation step size (m)
dx
0.5
Batch size
bs
32
Frequency of target network update
ƒ
100
Sampling time (s)
Ts
0.01
Prediction horizon
Tp
1–20
Control horizon
Tc
1–19

6.1 Experiments on the effectiveness of optimizing leader train driving strategies

This section first compares the solution quality of the ICM-PER-D3QN algorithm with the DQN algorithm for the multi-objective optimization of the leader train speed curve, defines time granularity in minutes, and conducts experiments on the same railway line. Using the average operation time of trains as a reference, four sets of comparative experiments modified the train operation times to verify the effectiveness of the ICM-PER-D3QN algorithm and the DQN algorithm in optimizing train operation strategies. Concurrently, the PSO-MPC, MPC, PID, and LQR methods were assessed in the context of virtual coupling train tracking control. Based on the above process, this paper compares the operation of trains on the railway line under operation time of 840, 900, 960, and 1020 s, respectively. In the end, we selected the best methods for optimizing the leader train operation strategy and realizing the tracking control of virtual coupling trains. This selection enables the improved operation framework presented in this paper to meet the requirements of VCTS for the exactness of train stops, punctuality, energy efficiency, and comfort based on these methods.
Figure 6 shows the training process of the deep reinforcement learning algorithm for different train operation times. In general, the ICM-PER-D3QN algorithm and the traditional DQN algorithm used in this paper are able to converge to a better solution, and the ICM-PER-D3QN has a better performance compared to the DQN algorithm under the above four experimental scenarios, with the performance improved by 91.5%, 52.4%, 44.9%, and 39.1%, respectively, and the average performance is improved by 57% overall. Table 3 demonstrates the optimization of the DQN algorithm as well as the ICM-PER-D3QN algorithm for the leader train operation metrics.
Fig. 6
The training process of deep reinforcement learning algorithm under different operating time of the leader train: a 840 s; b 900 s; c 960 s; d 1020 s
Bild vergrößern
Table 3
The experimental results of multi-objective optimization of leader trains with different operating times
Evaluation indicators
840 s
900 s
960 s
1020 s
ICM-PER-D3QN
DQN
ICM-PER-D3QN
DQN
ICM-PER-D3QN
DQN
ICM-PER-D3QN
DQN
Stopping error (m)
0.09
0.28
0.01
0.01
0.01
0.21
0.11
0.07
Punctuality error (s)
− 1.39
4.14
0.2
− 3.58
− 3.22
− 2.89
1.43
4.07
Energy loss (kW·h−1)
77.76
92.6
85.21
85.28
116.81
117.24
65.99
69.05
Jerk rate (m·s−3)
0.004
0.004
0.004
0.004
0.004
0.004
0.008
0.006
The experimental data in Table 3 show that the ICM-PER-D3QN algorithm significantly outperformed the DQN algorithm in terms of train stop exactness, punctuality, energy efficiency, and passenger comfort across different train operation times. In this case, the stopping error and punctuality error are better when they are closer to 0, while the energy loss and the jerk rate are better when smaller. Therefore, the bold numbers indicate the optimal values after comparing the two algorithms for the same train operation times. Specifically, when the train operated for 840 s, the ICM-PER-D3QN algorithm reduced stopping error by 67.9%, punctuality error by 66.4%, and energy loss to 16.1%. When operating for 900 s, the stopping error decreased by a remarkable 96.4%, punctuality error was reduced by 94.4%, and energy loss was reduced to 0.1%. When the train operated for 960 s, the stopping error was reduced by 95.2%, although the punctuality error increased marginally by 11.4%, and energy loss was decreased to 0.37%. When the train operated for 1020 s, stopping error reduction was noted at 57.1%, punctuality error was reduced by 64.9%, and energy loss was decreased by 4.4%. On average, the stopping error was reduced by 35.7%, the punctuality error by 75.2%, and the energy loss by 6.9%. The jerk rate remained largely unchanged under both algorithms. In summary, the experimental results proved that the ICM-PER-D3QN algorithm performs better in optimizing the leader’s train driving strategy compared to the DQN algorithm, and therefore the ICM-PER-D3QN algorithm was chosen as the preferred method for optimizing the leader’s train driving strategy in the subsequent experiments.
Figure 7 shows the speed curves of the virtual coupling trains at four different operating times. These include the speed curves derived from the multi-objective optimization of the driving strategy of the leader train and those from the precise tracking of the follower train. This paper applies interpolation to process the speed curve of the leader train, resulting in a multi-objective optimized speed curve that then serves as the reference curve for the following train through knowledge transfer.
Fig. 7
The speed curves of virtual coupling trains at four different operation times: a 840 s; b 900 s; c 960 s; d 1020 s
Bild vergrößern

6.2 Experiments on the effectiveness of follower train tracking control

In this section, we evaluate the effectiveness of the improved VCTS operation control framework and the feasibility of optimizing the VCTS driving strategy for multiple objectives. This is done by comparing the tracking control effects of the PSO-MPC, traditional MPC, PID, and LQR algorithms under the same framework. Specifically, the PSO-MPC algorithm sets both the number of PSO iterations and particles to 20, corresponding to the three operating stages of the train and optimizing the prediction and control horizon of the MPC algorithm. Additionally, to ensure the exactness of the train stop during coupling, it is necessary for the PSO algorithm’s target weight \(\varpi_{2}\) to be greater than \(\varpi_{1}\). Therefore, we define \(\varpi_{2} = 10\), \(\varpi_{1} = 1\). Building on that basis, we show the speed tracking difference and distance tracking difference of the virtual coupling train under the four control algorithms for operation times of 840, 900, 960, and 1020 s, respectively, where \(\Delta v\) denotes the speed difference between the follower train and the train in the same direction; \(\Delta x\) denotes the safety guarding distance difference between the follower train and the train in the same direction. Taking the tracking control of follower train 1 to the leader train as an example: for follower train 1, the leader train and the preceding train are the same train. Thus, we get \(\Delta v_{1} = v_{0} - v_{1}\) and \(\Delta x_{1} = x_{0} - x_{1} - L - x_{{{\text{safe}}}}\).
As shown in Figs. 8-11, to account for the demand for multi-objective optimization under four different running times, the PID algorithm exhibits poor stability throughout the tracking process, making it difficult to meet the demands of multi-objective optimization. The LQR algorithm maintains good stability in most cases; however, significant fluctuations occur in the final stage of tracking, which seriously affect the tracking performance. The MPC algorithm exhibits large fluctuations throughout the process and poorer tracking effects, yet it recovers quickly in the final stage to meet accuracy requirements. However, although the results of virtual coupling train tracking control under the MPC algorithm can satisfy the process of multi-objective optimization, there is a large tracking error during its operation, thereby posing a significant safety hazard. In addressing this issue, this paper utilizes the PSO-MPC algorithm to optimize the tracking control process. Experimental results show that this improvement effectively mitigates fluctuations during tracking, thereby enhancing the safety of virtual coupling train tracking control. Figures 8, 9, 10 and 11b, d illustrate the tracking effects intuitively. When the train operation time is 840 s, under the control of the PSO-MPC algorithm, the average difference between tracking error and safety protection distance is 0.36 m for follower train 1 and leader train, and 0.32 m for follower train 2 and follower train 1. However, under the control of the MPC algorithm, these two values are 1.31 m and 1.18 m, thereby improving the tracking performance of the PSO-MPC algorithm by 72.5% and 72.9%, respectively. Similarly, when the train operation time is 900 s, under the control of the PSO-MPC algorithm, these two averages are 0.44 m and 0.3 m; under the control of the MPC algorithm, these two averages are 0.23 m and 1.47 m, thereby increasing the tracking performance of the PSO-MPC algorithm by 47.7% and decreasing by 11.6%, respectively. When the train operation time is 960 s, under the control of the PSO-MPC algorithm, these two averages were 0.35 m and 0.19 m; under the control of the MPC algorithm, these two averages were 0.86 m and 1.26 m, thereby increasing the tracking performance of the PSO-MPC algorithm by 59.3% and 84.9%, respectively. When the train operation time is 1020 s, under the control of the PSO-MPC algorithm, these two averages are 0.32 m and 0.43 m. Under the control of the MPC algorithm, these two averages are 0.26 m and 1.31 m, respectively, and the tracking performance of the PSO-MPC algorithm increased by 18.8% and decreased by 67.2%. Overall, compared with the MPC algorithm, the PSO-MPC algorithm used in this paper improves the average tracking performance of virtual coupling trains by 37.7%. Finally, Tables 47 show the experimental results of multi-objective optimization of VCTS.
Fig. 8
The experimental results of tracking control of virtual coupling trains with trains in the same direction under the control of four algorithms with an operation time of 840 s: a speed difference between follower train and leader train; b difference between the tracking distance of the follower train and the safety guarding distance of the leader train; c speed difference between follower train and preceding train; d difference between the tracking distance of the follower train and the safety guarding distance of the preceding train
Bild vergrößern
Fig. 9
The experimental results of tracking control of virtual coupling trains with trains in the same direction under the control of four algorithms with an operation time of 900 s: a speed difference between follower train and leader train; b difference between the tracking distance of the follower train and the safety guarding distance of the leader train; c speed difference between follower train and preceding train; d difference between the tracking distance of the follower train and the safety guarding distance of the preceding train
Bild vergrößern
Fig. 10
The experimental results of tracking control of virtual coupling trains with trains in the same direction under the control of four algorithms with an operation time of 960 s: a speed difference between follower train and leader train; b difference between the tracking distance of the follower train and the safety guarding distance of the leader train, c speed difference between follower train and preceding train; d difference between the tracking distance of the follower train and the safety guarding distance of the preceding train
Bild vergrößern
Fig. 11
The experimental results of tracking control of virtual coupling trains with trains in the same direction under the control of four algorithms with an operation time of 1020 s: a speed difference between follower train and leader train; b difference between the tracking distance of the follower train and the safety guarding distance of the leader train; c speed difference between follower train and preceding train; d difference between the tracking distance of the follower train and the safety guarding distance of the preceding train
Bild vergrößern
Table 4
The experimental results of multi-objective optimization of VCTS with an operation time of 840 s
Evaluation indicators
Leader
Follower 1
Follower 2
PSO-MPC
MPC
PID
LQR
PSO-MPC
MPC
PID
LQR
Stopping error (m)
0.09
− 0.04
− 0.08
0.22
0.92
0.11
0.15
0.3
1.74
Punctuality error (s)
− 1.39
− 1.44
− 1.39
1.4
− 1.39
1.47
1.45
− 1.41
− 1.41
Energy loss (kW·h−1)
77.76
93.14
109.8
135.24
114.69
110.62
113.54
145.04
109.84
Jerk rate (m·s−3)
0.004
0.036
0.027
0.084
0.006
0.1
0.06
0.11
0.009
Table 5
The experimental results of multi-objective optimization of VCTS with an operation time of 900 s
Evaluation indicators
Leader
Follower 1
Follower 2
PSO-MPC
MPC
PID
LQR
PSO-MPC
MPC
PID
LQR
Stopping error (m)
0.01
0.01
0.04
0.09
0.84
0.1
0.13
0.13
1.67
Punctuality error (s)
0.2
3.62
3.6
− 3.59
− 3.59
3.64
− 3.67
− 3.6
− 3.59
Energy loss (kW·h−1)
85.21
103.19
109.42
110.41
102.72
113.95
118.43
119.83
104.66
Jerk rate (m·s−3)
0.004
0.046
0.037
0.16
0.006
0.11
0.11
0.17
0.009
Table 6
The experimental results of multi-objective optimization of VCTS with an operation time of 960 s
Evaluation indicators
Leader
Follower 1
Follower 2
PSO-MPC
MPC
PID
LQR
PSO-MPC
MPC
PID
LQR
Stopping error (m)
0.01
0
− 0.02
0.1
0.43
0.02
0.04
1.1
0.86
Punctuality error (s)
3.22
− 3.27
− 3.26
- 3.5
− 3.22
− 3.31
− 3.28
− 3.71
− 3.23
Energy loss (kW·h−1)
116.81
116.33
120.6
119.82
112.93
134.18
130
136.21
115.3
Jerk rate (m·s−3)
0.045
0.077
0.063
0.099
0.006
0.16
0.15
0.22
0.01
Table 7
The experimental results of multi-objective optimization of VCTS with an operation time of 1020 s
Evaluation indicators
Leader
Follower 1
Follower 2
PSO-MPC
MPC
PID
LQR
PSO-MPC
MPC
PID
LQR
Stopping error (m)
0.11
0.01
0.16
0.17
0.74
0.12
0.29
0.23
1.37
Punctuality error (s)
1.43
1.4
1.41
1.42
1.42
1.39
1.35
1.41
1.42
Energy loss (kW·h−1)
65.99
111.38
110.79
117.14
102.89
135.13
155.48
176.83
108.94
Jerk rate (m·s−3)
0.008
0.12
0.16
0.16
0.013
0.21
0.25
0.27
0.02
In this section, we compare the tracking control effects of the PSO-MPC, traditional MPC, PID, and LQR algorithms in VCTS, focusing on train stop exactness, punctuality, energy loss, and jerk rate. This comparison is presented in Tables 4, 5, 6, 7, where bold numbers indicate the optimal values for each of the two follower trains under the control of the four algorithms, corresponding to the four metrics for evaluation.
Intuitively, the VCTS, consisting of three trains and controlled by these algorithms, meets the requirements for punctuality and jerk rate. However, the LQR algorithm, while reducing energy loss and improving punctuality and passenger comfort, incurs a large stopping error that compromises VCTS safety. Similarly, the PID algorithm ensures train stop exactness, punctuality, and passenger comfort but results in greater energy loss compared to those controlled by the PSO-MPC or traditional MPC algorithms. Therefore, the PID and LQR algorithms do not meet the operational control accuracy requirements defined in this paper, whereas the PSO-MPC and traditional MPC algorithms do. Consequently, this paper will further explore the tracking control effects of the PSO-MPC and traditional MPC algorithms.
The prediction horizon \(T_{{\text{p}}}\) and control horizon \(T_{{\text{c}}}\) of PSO-MPC are adaptively adjusted corresponding to the three operation phases. When the train operation time is 840 s, with \(T_{{\text{p}}} = \left[ {10,2,14} \right]\) for virtual coupling train 1 and \(T_{{\text{p}}} = \left[ {10,20,2} \right]\) for train 2; \(T_{{\text{c}}} = \left[ {2,1,1} \right]\) for train 1 and \(T_{{\text{c}}} = \left[ {2,2,1} \right]\) for train 2. This algorithm performs better than the traditional MPC algorithm. The exactness of train stop for virtual coupling trains 1 and 2 improves by 50% and 26.67%, respectively, while energy loss reduces by 15.17% and 2.57%. However, under the PSO-MPC algorithm, the follower trains perform slightly more poorly in terms of punctuality and comfort, as illustrated: the punctuality errors for train 1 and train 2 increase by 3.6% and 1.38%, respectively; and the jerk rates for trains 1 and 2 increase by 33.33% and 66.67%, respectively. Similarly, when the operation time is 900 s, under the PSO-optimized MPC algorithm with \(T_{{\text{p}}} = \left[ {20,2,9} \right]\) for train 1, \(T_{{\text{p}}} = \left[ {20,2,2} \right]\) for train 2 and \(T_{{\text{c}}} = \left[ {1,1,1} \right]\) for both trains, the exactness of train stop for train 1 improves by 75% and that of train 2 by 23.08%. The punctuality errors of train 1 increase slightly by 0.56%, while those of train 2 decrease by 0.82%, with energy loss decreasing by 5.69% and 3.78% respectively, and the jerk rate for train 1 only increases by 24.32%. When the operation time is 960 s, \(T_{{\text{p}}} = \left[ {20,2,20} \right]\) for train 1 and \(T_{{\text{p}}} = \left[ {20,2,2} \right]\) for train 2; \(T_{{\text{c}}} = \left[ {1,1,2} \right]\) for train 1 and \(T_{{\text{c}}} = \left[ {2,1,1} \right]\) for train 2, the exactness of train stop for trains 1 and 2 improves by 100% and 50%, respectively. The punctuality errors increase by 0.31% and 0.91%, with energy loss decreasing by 3.54% for train 1 and increasing by 3.12% for train 2, and the jerk rates increasing by 22.22% and 6.67%. Finally, when the operation time is 1020 s, with \(T_{{\text{p}}} = \left[ {20,2,8} \right]\) for train 1, \(T_{{\text{p}}} = \left[ {20,2,2} \right]\) for train 2 and \(T_{{\text{c}}} = \left[ {1,1,1} \right]\) for both, the exactness of train stop for trains 1 and 2 improves by 87.5% and 48.28%, respectively. At the punctuality level, the error for train 1 decreases by 2.13% and increases for train 2 by 5.19%. The energy loss for train 1 increases by 0.53% and decreases for train 2 by 13.09%. The jerk rates for trains 1 and 2 change by an increase of 6.25% and a decrease of 24%, respectively.
Overall, under the improved VCTS operation control framework in this paper, both the PSO-MPC algorithm and the traditional MPC algorithm can meet the requirements of the exactness of train stop, punctuality, energy savings, and jerk rate in the process of VCTS operation control. Compared with MPC, PSO-MPC is able to guarantee tracking accuracy in the process of virtual coupling train operation control in most operation scenarios, which improves the safety of the tracking process. Under four sets of operation times, the average increase in the exactness of the train stop was 57.57% and the average decrease in energy loss was 5.02%, combining the experimental results of virtual coupling train 1 and virtual coupling train 2. However, the increase in tracking accuracy also leads to a decrease in other evaluation indexes, including an average increase of 1.13% in the punctuality error and an average increase of 22.93% in the jerk rate. Finally, considering the requirements for safe operation of VCTS and the indicators of the exactness of train stop, punctuality, energy loss, and jerk rate, this paper selects the PSO-MPC algorithm as the virtual coupling train tracking control algorithm.

6.3 Effect of communication delay on experimental results

Virtual coupling technology leverages real-time train-to-train communication to enhance the line carrying capacity and transportation efficiency of high-speed trains. The theoretical delay of the in-vehicle network is less than 0.6 ms, but during actual operation, the train is affected by multiple factors, causing communication delays of up to 500 ms [52, 53]. Therefore, this paper introduces the communication delay, which, in conjunction with PSO-MPC, is used to assess the impact of communication delays on the operation control of the VCTS. As a result of the communication delay, the current train following behind cannot promptly access the state information of the leader train to compute moving authority. To simulate the delay issue caused by communication between coupled trains, a time-delay operation is performed during the prediction process of the PSO-MPC when the follower train acquires information about the current state of the leader train. This paper examines the impact of various communication delays on the multi-objective optimization of coupling trains. The experimental results are presented in Fig. 12.
Fig. 12
The effect of communication delay on multi-objective optimization of virtual coupling trains under different operating times: a effect on stopping error; b effect on punctuality error; c effect on energy loss; d effect on jerk rate
Bild vergrößern
As can be seen from Fig. 12, when the communication delay does not exceed 500 ms, the exactness of the train stop, energy savings, and comfort can still meet the requirements, among which the energy loss and jerk rate do not change significantly under the influence of the communication delay. However, the exactness of train stops and punctuality will decrease, or even cause train delays, with the increase in communication delay. In addition, under the influence of communication delay, the tracking effect of subsequent follower trains gradually deteriorates, and as the number of follower trains increases, it may become difficult for subsequent follower trains to meet the requirements for the exactness of train stops, energy savings, and passenger comfort.

6.4 Effect of temporary speed limit on experimental results

The temporary speed limit scenarios set up in this section are shown in Table 8.
Table 8
Temporary speed limit scenarios
Serial number
Speed limit section (m)
Speed limit value (m·s−1)
1
0–3,000
41.67
2
3,000–17,000
Unlimited speed
3
17,000–19,500
44.44
4
19,500–20,000
33.33
In this section, to more clearly highlight the impact of temporary speed limits on train operating conditions, the operating time is defined as 600 s. Under the constraints of speed limit scenario 1, the train is tracted to approximately 500 m before switching to the coasting state, and it re-enters the traction state after exiting the speed limit range. The train reaches its maximum speed at about 6,000 m and then proceeds with the intermediate operation process. During this phase, the train driving strategy primarily utilizes a modified hybrid control technique, alternating between traction and coasting states while adhering to the reference system’s limitations. To comply with speed limit scenario 4, the train initiates braking at about 18,000 m, follows the braking curve, and continues in this state as required by the operating conditions until it stops at the station. Figure 13 shows the training sequence of the leader train under the temporary speed limit scenario, and Fig. 14 shows the speed curve of the coupled train, where the slope and speed limit values are based on the leader’s train speed curve in a speed limit experimental scenario.
Fig. 13
The training process of leader train under temporary speed limit
Bild vergrößern
Fig. 14
The speed curve of virtual coupling trains under temporary speed limit
Bild vergrößern
In this section, three follower control experiments are conducted in a temporary speed limit scenario, considering the effect of random communication delays. Experiment numbering is defined as follows: the leader train affected by the temporary speed limit but not by communication delays is numbered 1; follower trains incorporating communication delays under the temporary speed limit scenario are sequentially numbered based on these definitions and their order in the experiment. Additionally, the effects of communication delay and temporary speed limit on the improved VCTS operation control framework are simulated by randomly setting the communication delay \(\tau \in \left[ {0,500} \right]\) in the temporary speed limit scenario. The communication delays corresponding to follower train 1 in the three training sessions are 42, 196, and 6 ms, and the communication delays corresponding to follower train 2 are 121, 490, and 110 ms, respectively. The experimental results of tracking control of virtual coupling trains with trains in the same direction under the influence of temporary speed limits and communication delays are shown in Fig. 15. The legend represents the follower train for four different communication delays. \(\Delta v\) denotes the speed difference between the follower train and the train in the same direction; \(\Delta x\) denotes the safety guarding distance difference between the follower train and the train in the same direction.
Fig. 15
The experimental results of tracking control of virtual coupling trains with trains in the same direction under the influence of temporary speed limit and communication delay: a speed difference between follower train and leader train; b speed difference between follower train and preceding train; c difference between the tracking distance of the follower’s train and the safe guarding distance of the leader’s train; d difference between the tracking distance of the follower’s train and the safe guarding distance of the preceding train
Bild vergrößern
Overall, the improved VCTS operation control framework proposed in this paper is suitable for solving the multi-objective optimization problem of driving strategies for VCTS under temporary speed limit scenarios. The framework shows its feasibility in most experimental scenarios. However, when the number of virtual coupling trains increases, the tracking control effect is weakened. In addition, communication delays can lead to a slight degradation in the exactness of train stops and cause delays in coupling trains. As shown in Fig. 15, the communication delay leads to deterioration in the tracking effect of the virtual coupling train, including the speed and displacement differences, with training session number 2 showing a more pronounced manifestation due to the higher communication delay. Specifically, the experimental results in Fig. 15c and d show that the communication delay leads to an extended tracking interval for the follower trains. This extended interval ensures the safe operation of the VCTS, but this comes at the cost of reduced transport efficiency for the railway line. In addition, it may lead to late trains, as was the case for follower train 2, corresponding to training serial number 2, under the influence of the communication delay. Therefore, further mitigating the effect of communication delay on the VCTS operation control framework is also one of the focuses of future research.
Figure 16 shows the specific results of the multi-objective optimization of VCTS under temporary speed limits and communication delay conditions. Under the same experimental parameters, the virtual coupling trains controlled by the PSO-MPC algorithm meet the requirements for exactness of train stop, punctuality, and passenger comfort. However, excessively high communication delays can cause train delays, as shown in Fig. 16b. Additionally, increasing communication delays reduce the responsiveness and accuracy of the control system in most cases, affecting the safety of virtual coupling train tracking control, as shown in Fig. 16a. Finally, as shown in Fig. 16c and d, the communication delay has little effect on the energy loss and jerk rate in most cases.
Fig. 16
The experimental results of multi-objective optimization of virtual coupling trains under temporary speed limit and communication delay conditions: a stopping error; b punctuality error; c energy loss; d jerk rate
Bild vergrößern

7 Conclusion

In this paper, an improved VCTS operation control framework is proposed. In this framework, the optimized leader train speed curve is used as the reference for the follower train through knowledge transfer, aiming to achieve multi-objective optimization of the driving strategy for VCTS. In the leader train driving strategy optimization section, the ICM-PER-D3QN algorithm used in this framework improves the exactness of the train stop for the leader train by 35.7%, punctuality performance by 75.2%, and energy loss by 6.86% compared to the DQN algorithm. In the follower train tracking control section, the PID and LQR algorithms cannot satisfy the requirements of this improved framework due to significantly larger stopping errors and energy losses. Therefore, the train tracking performance is mainly compared under the control of the PSO-MPC algorithm and the traditional MPC algorithm adopted in this framework. Compared with the traditional MPC algorithm, the exactness of the train stop for the follower train under the control of the PSO-MPC algorithm is improved by 57.7%, and the energy loss is reduced by 5.02%. Overall, the improved VCTS operation control framework described in this paper effectively ensures the safety, punctuality, and energy efficiency of VCTS operations.
However, the experimental scheme proposed in this paper focuses solely on the effects of communication delay and neglects a comprehensive assessment of other external disturbances. To enhance the experiment’s comprehensiveness, it will be necessary to enrich the experimental scenarios further in the future studies. Additionally, as the number of virtual coupling trains increases, the control accuracy of this framework decreases, necessitating increased communication rates between trains. This leads to reduced exactness of train stops and diminished punctuality performance. Consequently, future improvements to the control method will focus on enhancing accuracy while mitigating the effects of communication delays.
Finally, while this paper enhances the traditional VCTS operation control framework, operational control remains divided into two stages: speed curve generation and tracking control. Although the driving strategy optimization process has been effectively integrated into the operational control of VCTS, the complexity of this process remains a significant challenge. Therefore, future research will focus on simplifying the framework and reducing its complexity while continuing to ensure the optimization of driving strategies for VCTS.

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant 52162050.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.
download
DOWNLOAD
print
DRUCKEN
Titel
A multi-objective optimization approach for the virtual coupling train set driving strategy
Verfasst von
Junting Lin
Maolin Li
Xiaohui Qiu
Publikationsdatum
10.01.2025
Verlag
Springer Nature Singapore
Erschienen in
Railway Engineering Science / Ausgabe 2/2025
Print ISSN: 2662-4745
Elektronische ISSN: 2662-4753
DOI
https://doi.org/10.1007/s40534-024-00349-1
1.
Zurück zum Zitat Liu H, Lang Y, Zhang L et al (2023) A cooperative control based reference curve generating method for virtually coupled train sets. Sci Technol Rev 41(1):62–72
2.
Zurück zum Zitat Cao Y, Wen J, Ma L (2021) Tracking and collision avoidance of virtual coupling train control system. Future Gener Comp Sys 120:76–90
3.
Zurück zum Zitat Luo X, Tang T, Lin B et al (2023) A robust model predictive control approach for reducing following distance between virtually coupled unit trains. J Chin Railway Soc 45(8):68–76 (in Chinese)
4.
Zurück zum Zitat Lin J, Ni M (2023) Trigger model predictive control based on extended state observers for virtual coupling. J Transp Syst Eng Inf Technol 23(4):134–146
5.
Zurück zum Zitat Su S, Liu W, Zhu Q et al (2022) A cooperative collision-avoidance control methodology for virtual coupling trains. Accident Anal Prev 173:106703
6.
Zurück zum Zitat Xi W, Hu M, Wang H et al (2023) Formation control for virtual coupling trains with parametric uncertainty and unknown disturbances. IEEE T Circuits-II 70(9):3429–3433
7.
Zurück zum Zitat Yang A, Sun J, Wang B et al (2022) Optimization of virtual-coupling-orientated train operation plan based on full-length and short-turn routing. J Beijing Jiaotong University 46(4):9–14 (in Chinese)
8.
Zurück zum Zitat Luo X, Tang T, Li K et al (2024) Computation-efficient distributed MPC for dynamic coupling of virtually coupled train set. Control Eng Pract 145:105846
9.
Zurück zum Zitat Cao Y, Wang Z, Liu F et al (2019) Bio-inspired speed curve optimization and sliding mode tracking control for subway trains. IEEE T Ven Technol 68(7):6331–6342
10.
Zurück zum Zitat Su S, Zhu Q, Liu J et al (2023) A data-driven iterative learning approach for optimizing the train control strategy. IEEE Trans Ind Inform 19(7):7885–7893
11.
Zurück zum Zitat Anh TTTA, Quyến N (2020) Optimal speed profile determination with fixed trip time in the electric train operation of the Cat Linh-Ha Dong metro line based on Pontryagin’s maximum principle. Eng Technol Appl Sci 10(6):6488–6493
12.
Zurück zum Zitat Tan Z, Lu S, Bao K et al (2018) Adaptive partial train speed trajectory optimization. Energies 11(12):3302
13.
Zurück zum Zitat Ying P, Zeng X, Song H et al (2021) Energy-efficient train operation with steep track and speed limits: A novel Pontryagin’s maximum principle-based approach for adjoint variable discontinuity cases. IET Intell Transp Sys 15(9):1183–1202
14.
Zurück zum Zitat Wei S, Yan X, Cai B et al (2015) Multiobjective optimization for train speed trajectory in CTCS high-speed railway with hybrid evolutionary algorithm. IEEE T Intell Transp 16(4):2215–2225
15.
Zurück zum Zitat Mo P, Yang L, Gao Z (2019) Energy-efficient train operation strategy with speed profiles selection for an urban metro line. Transport Res Rec 2673(4):348–360
16.
Zurück zum Zitat Liu S, Cao F, Xun J et al (2015) Energy-efficient operation of single train based on the control strategy of ATO. In: 2015 IEEE 18th International Conference on Intelligent Transportation Systems (ITSC), Gran Canaria, pp 2580–2586
17.
Zurück zum Zitat Fernández PM, Font Torres JB, Sanchís IV et al (2023) Multi-objective ant colony optimization to obtain efficient metro speed profiles. Proc Inst Mech Eng F J Rail Rapid Transit 237(2):232–242
18.
Zurück zum Zitat Zhang Y, Zuo T, Zhu M et al (2021) Research on multi-train energy saving optimization based on cooperative multi-objective particle swarm optimization algorithm. Int J Energy Res 45(2):2644–2667
19.
Zurück zum Zitat Yin J, Chen D, Li L (2014) Intelligent train operation algorithms for subway by expert system and reinforcement learning. IEEE Trans Intell Transp syst 15(6):2561–2571
20.
Zurück zum Zitat Ning L, Zhou M, Hou Z et al (2021) Deep deterministic policy gradient for high-speed train trajectory optimization. IEEE Trans Intell Transp 23(8):11562–11574
21.
Zurück zum Zitat Lin X, Liang Z, Shen L et al (2023) Reinforcement learning method for the multi-objective speed trajectory optimization of a freight train. Control Eng Pract 138:105605
22.
Zurück zum Zitat Meng Z, Tang T, Wei G et al (2020) Digital twin based comfort scenario modeling of ATO controlled train. J Phys Conf Ser 1654(1):012071
23.
Zurück zum Zitat Mehta P, Meyn S (2009) Q-learning and Pontryagin’s minimum principle. In: Proceedings of the 48h IEEE conference on decision and control (CDC) held jointly with 2009 28th Chinese control conference (CCC), Shanghai, pp 3598–3605
24.
Zurück zum Zitat Liu Y, Halev A, Liu X (2021) Policy learning with constraints in model-free reinforcement learning: A survey. In: The 30th international joint conference on artificial intelligence (IJCAI), Montreal, pp 4508–4515
25.
Zurück zum Zitat Hing MM, Harten AV, Schuur PC et al (2007) Reinforcement learning versus heuristics for order acceptance on a single resource. J Heuristics 13:167–187
26.
Zurück zum Zitat Liu M, Zhao F, Yin J et al (2021) Reinforcement-tracking: an effective trajectory tracking and navigation method for autonomous urban driving. IEEE T Intell Transp 23(7):6991–7007
27.
Zurück zum Zitat Fu P, Gao S, Dong H et al (2018) Speed tracking error and rate driven event-triggered PID control design method for automatic train operation system. In: 2018 Chinese Automation Congress (CAC), Xi’an, pp 2889–2894
28.
Zurück zum Zitat Pu Q, Zhu X, Zhang R et al (2020) Speed profile tracking by an adaptive controller for subway train based on neural network and PID algorithm. IEEE Trans Veh Technol 69(10):10656–10667
29.
Zurück zum Zitat Wang L, Wang X, Sheng Z et al (2020) Model predictive controller based on online obtaining of softness factor and fusion velocity for automatic train operation. Sensors 20(6):1719
30.
Zurück zum Zitat Bersani C, Cardano M, Lavaggi S et al (2023) Stochastic linear quadratic optimal control of speed and position of multiple trains on a single-track line. IEEE Tran Intell Transp syst 24(9):9110–9120
31.
Zurück zum Zitat Chen Y, Huang D, Li Y et al (2020) A novel iterative learning approach for tracking control of high-speed trains subject to unknown time-varying delay. IEEE Trans Autom Sci Eng 19(1):113–121
32.
Zurück zum Zitat Huang Z, Wang P, Zhou F et al (2022) Cooperative tracking control of the multiple-high-speed trains system using a tunable artificial potential function. J Adv Transport 2022:3639586
33.
Zurück zum Zitat Cai Q, Luo X, Gao C et al (2021) A machine learning-based model predictive control method for pumped storage systems. Front Energy Res 9:757507
34.
Zurück zum Zitat Bujarbaruah M, Zhang X, Rosolia U et al (2018) Adaptive MPC for iterative tasks. In: 2018 IEEE Conference on Decision and Control (CDC). Miami, pp 6322–6327
35.
Zurück zum Zitat Liu X, Xun J, Gao S et al (2022) Robust self-triggered model predictive control for accurate stopping of high-speed trains. Acta Automatica Sinica 48(1):171–181
36.
Zurück zum Zitat Yin L (2022) High speed train modeling and nonlinear speed tracking control based on disturbance observer. Dissertation, Jilin University (in Chinese)
37.
Zurück zum Zitat Meng X (2021) Energy-efficient train operation control of automatic driving based on Q learning and deep Q learning. Dissertation, Beijing Jiaotong University (in Chinese)
38.
Zurück zum Zitat Pathak D, Agrawal P, Efros AA et al (2017) Curiosity-driven exploration by self-supervised prediction. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, pp 2778–2787
39.
Zurück zum Zitat Schaul T, Quan J, Antonoglou I et al (2015) Prioritized experience replay. In: 4th International Conference on Learning Representations, San Juan, May 2–4
40.
Zurück zum Zitat Wang Z, Schaul T, Hessel M et al (2016) Dueling network architectures for deep reinforcement learning. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning (ICML), New York, pp 1995–2003
41.
Zurück zum Zitat Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double Q-learning. Proc of the AAAI conf on artificial intell (AAAI-16), Phoenix 30(1):2094–2100
42.
Zurück zum Zitat Yang Y, Huang P, Peng Q et al (2019) Statistical delay distribution analysis on high-speed railway trains. J Mod Transp 27(3):188–197
43.
Zurück zum Zitat González-Gil A, Palacin R, Batty P (2013) Sustainable urban rail systems: Strategies and technologies for optimal management of regenerative braking energy. Energ Convers Manage 75:374–388
44.
Zurück zum Zitat Zhang J, Zhu A (2022) Optimization method of automatic train operation speed curve based on genetic algorithm and particle swarm optimization. J Comput Appl 42(2):599–605
45.
Zurück zum Zitat Luo X, Tang T, Yin J et al (2023) A robust MPC approach with controller tuning for close following operation of virtually coupled train set. Transport Res Part C Emerg Technol 151:104116
46.
Zurück zum Zitat Wang JN, Teng F, Li J et al (2021) Intelligent vehicle lane change trajectory control algorithm based on weight coefficient adaptive adjustment. Adv Mech Eng 13(3):16878140211003392
47.
Zurück zum Zitat Sasfi A, Zeilinger MN, Köhler J (2023) Robust adaptive MPC using control contraction metrics. Automatica 155:111169MathSciNet
48.
Zurück zum Zitat Liu H, Yang L, Yang H (2022) Cooperative optimal control of the following operation of high-speed trains. IEEE Trans Intell Transp Syst 23(10):17744–17755
49.
Zurück zum Zitat Vaquero-Serrano MA, Felez J (2023) A decentralized robust control approach for virtually coupled train sets. Comput Aided Civ Infrastruct Eng 38(14):1896–1915
50.
Zurück zum Zitat Ma Y (2022) Research on the integration of high-speed train operation adjustment and energy-saving control under the condition of road network. Dissertation, Lanzhou Jiaotong University (in Chinese)
51.
Zurück zum Zitat Long SH (2021) Models and algorithms for the integrated optimization of train rescheduling and train control for high-speed railway. Dissertation, Beijing Jiaotong University (in Chinese)
52.
Zurück zum Zitat Parise R, Dittus H, Winter J et al (2019) Reasoning functional requirements for virtually coupled train sets: Communication. IEEE Commun Mag 57(9):12–17
53.
Zurück zum Zitat Guo Y, Pei X, Luo X et al (2023) A particle swarm optimization-based online optimization approach for virtual coupling trains with communication delay. IEEE Intell Transp Syst 15(6):49–63
Bildnachweise
AVL List GmbH/© AVL List GmbH, dSpace, BorgWarner, Smalley, FEV, Xometry Europe GmbH/© Xometry Europe GmbH, The MathWorks Deutschland GmbH/© The MathWorks Deutschland GmbH, IPG Automotive GmbH/© IPG Automotive GmbH, HORIBA/© HORIBA, Outokumpu/© Outokumpu, Hioko/© Hioko, Head acoustics GmbH/© Head acoustics GmbH, Gentex GmbH/© Gentex GmbH, Ansys, Yokogawa GmbH/© Yokogawa GmbH, Softing Automotive Electronics GmbH/© Softing Automotive Electronics GmbH, measX GmbH & Co. KG