Introduction
-
(1) A mathematical model is constructed for the network link load balancing problem, and optimization objectives are developed for the SDN routing algorithm.
-
(2) A routing optimization scheme, AVRO, is designed based on a metaheuristic algorithm. AVRO adds link betweenness measurement in population initialization to enhance the topological awareness of the algorithm. Additionally, it adds an optimization stage based on development and exploration to strengthen the algorithm's performance and improve the model's convergence stability.
-
(3) The AVRO model is compared with traditional routing algorithms, deep learning-based routing algorithms, and classic reinforcement learning-based routing algorithms in three different network topologies and two different traffic intensities under simulation environments, demonstrating the superior load-balancing performance of the algorithm proposed in this paper.
Related work
Literature | Schemes | Aim | Algorithm | Limitations |
---|---|---|---|---|
Shin [8] | Distributed Intelligent Routing | Utilize topology information effectively to improve routing decisions | GNN-based algorithm | Prone to causing loops, requires powerful model computing capabilities, and may need modifications to routing protocols |
Rischke [14] | Reinforcement Learning Routing | Optimize network load balancing, latency, and packet loss | RSIR algorithm using Q-learning | Limited perception capabilities and potential performance issues |
Wang [16] | Deep Reinforcement Learning | Optimize throughput, latency, and packet loss for mouse and elephant flows in data center networks | DQN-based routing policy using deep neural networks and reinforcement learning | Computational complexity limits hardware deployment |
Bernárdez [18] | Multiagent Reinforcement Learning | Minimize network congestion | Combination of MARL and GNN | Computational complexity may limit hardware deployment |
Chen [19] | Ensemble Learning DRL | Maximize the utilization of optical transport networks | DRL intelligent routing algorithm based on ensemble learning and information propagation neural networks | Computational complexity may limit hardware deployment |
Rusek [24] | Deep Learning-Assisted Routing | Establish relationships between network status, topology, traffic matrices, and routing path models | GNN and LSTM models combined with heuristic algorithms | Process of replacing traditional routing algorithms with deep learning-based ones is challenging due to non-convex loss functions, gradient issues, and practical deployment constraints |
Farshin [26] | Knowledge-Based Metaheuristics | Utilize knowledge from SDN controllers for VNF placement and routing | Enhanced ant colony system algorithm with knowledge integration | Neglects network volatility and complexity, instability in algorithm training and convergence |
Samarji [27] | Fault tolerance metaheuristic | Maximize the network connectivity, maximize the load balance among controllers, minimize the worst-case latency, and maximize the network lifetime | Genetic algorithm and greedy randomized adaptive search problem algorithm | The impact of load distribution of faulty controller on the network performanc is not analyzed |
Samarji [28] | Energy soaring-based routing | Selects the network cluster heads for solving the controller placement problem | Energysoaring routing algorithm adopted from the albatross bird | Without factoring in the network's instability and intricacies |
Raouf [29] | Ant Colony Optimization | Handle dynamic network fluctuations and reduce congestion, latency, and packet loss | ACOSDN algorithm using Ant Colony Optimization | Sluggish convergence and potential convergence to local optima |
Isyaku [30] | Route Path Selection Optimization | Elevate data throughput and packet delivery rates with link quality estimation and constraint parameters | Route path selection optimization approach based on link quality estimation and switch awareness | Doesn't adopt a global optimization perspective, may limit network performance |
Mathematical modelling for network optimization
Design of routing optimization algorithm based on AVRO
AVRO algorithm
-
(1) The first stage: Identifying the finest vulture among the populationIn the AVOA algorithm, the initial population is formed by randomly generating solutions within the search range, as shown in Eq. (12), where r1 is a random variable that takes values between 0 and 1.$$P\left(i\right)={r}_{1}*(ub-lb)+lb$$(12)Instead of randomly generating the population, Eq. (13) is utilized to initialize the vulture population in this paper. Here, parameter δ1 can take any value to adjust the range of \(b\), representing the edge betweenness, i.e., the proportion of shortest paths passing through a given edge in the network.$$P\left(i\right)={r}_{1}*(ub-lb)+lb+{\delta }_{1} *b$$(13)The central idea of the OSPF algorithm in computer networks is to utilize the shortest paths within the network. The higher the edge betweenness centrality, the more shortest paths pass through that edge, leading to a higher utilization rate and potential congestion. Thus, increasing the initial weight of links with high betweenness centrality can reduce the number of shortest paths passing through such edges, thereby avoiding situations where traffic congregates on only a few edges. Similar to the concept of course learning [31], during population initialization, the AVRO algorithm is explicitly informed about the information it needs to explore to guide the algorithm toward better solution spaces and improve its performance.After the formation of the initial population, the fitness of all solutions is calculated. The best solution Best1 and second-best solution Best2 in each iteration are candidates for leader vulture \(R(i)\), as shown in Eq. (14).$$R\left(i\right)=\left\{\begin{array}{l}{Best}_{1}, {p}_{1}={L}_{1}\\ {Best}_{2}, {p}_{2}={L}_{2}\end{array}\right.$$(14)
-
(2) The second stage: Identification of the satiety rate for the vulturesThe vultures must feed to gain the energy necessary for survival. Satiated vultures are able to venture further in search of sustenance, while their famished counterparts lacking the requisite energy are forced to seek food in proximity to high-energy vultures and become more aggressive in the process. Equation (17) models this process by using F to represent the satiety of the vultures and balancing algorithmic development with exploration. In Eq. (17), t is defined by Eq. (16), and this kind of simulation behavior has been used before [21]. When |F| is greater than 1, the vultures search for food in different areas, and AVRO enters the exploration stage. If |F| is less than 1, AVRO enters the development stage, and the vultures search for food near their current location. During training, development and exploration alternate, with the early stages emphasizing exploration and the later stages emphasizing development to promote algorithm convergence.$$t=h\times \left(si{n}^{\gamma }\left(\frac{\pi }{2}\times \frac{i}{T}\right)+cos\left(\frac{\pi }{2}\times \frac{i}{T}\right)-1\right)$$(16)$$F=\left(2\times {r}_{2}+1\right)\times z\times \left(1-\frac{i}{T}\right)+t$$(17)Within Eqs. (16) and (17), i denotes the current iteration, and T represents the total iterations. \({r}_{2}\) is a random number between 0 and 1. z is a random number between -1 and 1, where values below 0 indicate hunger in the vulture, while values above 0 signify satiation, with satiation declining over time. h represents a random number between -2 and 2.
-
(3) The third stage: ExplorationWhen |F|≥ 1, the vulture enters an exploration phase and randomly searches the environment. As shown in Eq. (18), P(i + 1) represents the location of the vulture in iteration i + 1. If the generated random number \({r}_{{G}_{1}}\) is greater than or equal to parameter G1, then Eq. (19) is executed. Otherwise, Eq. (21) is used.$$P\left(i+1\right)=\left\{\begin{array}{l}Eq\left(19\right),if {G}_{1}\ge {r}_{{G}_{1}}\\ Eq\left(21\right),if{ G}_{1}<{r}_{{G}_{1}}\end{array}\right.$$(18)$$P\left(i+1\right)=R\left(i\right)-D\left(i\right)\times F$$(19)$$D\left(i\right)=\left|X\times R\left(i\right)-P\left(i\right)\right|$$(20)According to Eq. (19), the vulture forages around the leader vulture R(i), with D(i) defined by Eq. (20) representing the exploration distance of the vulture. X is the distance randomly moved by the vulture, adding randomness to the exploration phase. It is obtained using the formula X = 2 × r, where r represents a random number between 0 and 1.$$P\left(i+1\right)=R\left(i\right)-F+{r}_{3}\times \left(\left(ub-lb\right)\times {r}_{4}+lb\right)$$(21)In Eq. (21), ub and lb indicate the upper and lower bounds of the algorithm search interval, representing the positions of the population. \({r}_{3}\) and \({r}_{4}\) are random numbers between 0 and 1. The use of r4 adds randomness to the distribution of solutions within the search interval, increasing the diversity of the algorithm's exploration.
-
(4) The fourth stage: DevelopmentIf |F| is less than 1, AVRO enters the development phase, which is aimed at improving the convergence efficiency of AVRO. There are also two stages within the development phase, each utilizing two different strategies determined by parameters G2 and G3, both predefined in the [0,1] range.Development (Stage One): When |F| falls within the range [0.5,1), AVRO enters the first stage of its development phase. In this stage, a random number between 0 and 1, denoted as \({r}_{{G}_{2}}\), is generated. If \({r}_{{G}_{2}}\) is greater than or equal to parameter G2, then a food competition process is executed; otherwise, a rotating flight process is performed, as outlined in Eq. (22).$$P\left(i+1\right)=\left\{\begin{array}{l}Eq\left(23\right),if{ G}_{2}\ge {r}_{{G}_{2}}\\ Eq\left(27\right),if{ G}_{2}<{r}_{{G}_{2}}\end{array}\right.$$(22)Food Competition: When |F|≥ 0.5, it indicates that the vultures have relatively abundant energy. Vultures with sufficient energy are reluctant to share food with others, while weaker vultures gather around healthier ones to search for food. When many vultures converge on a single food source, population conflict may arise. Equations (23) and (24) are utilized to model this process. Random variable r5, which takes values between 0 and 1, is introduced to increase the randomness of the process. Equation (24) is used to obtain the distance d(t) between a vulture and the leader vulture.$$P\left(i+1\right)=D\left(i\right)\times \left(F+{r}_{5}\right)-d\left(t\right)$$(23)$$d\left(t\right)=R\left(i\right)-P\left(i\right)$$(24)Rotating Flight: The vulture's rotating flight behavior is simulated using a spiral model. The vulture's position is updated using Eq. (27), while S1 and S2 are obtained using Eqs. (25) and (26). Random variables r6 and r7 take values between 0 and 1.$${S}_{1}=R\left(i\right)\times \left(\frac{{r}_{6}\times P\left(i\right)}{2\pi }\right)\times cos\left(P\left(i\right)\right)$$(25)$${S}_{2}=R\left(i\right)\times \left(\frac{{r}_{7}\times P\left(i\right)}{2\pi }\right)\times sin\left(P\left(i\right)\right)$$(26)$$P\left(i+1\right)=R\left(i\right)-\left({S}_{1}+{S}_{2}\right)$$(27)Development (Stage Two): If |F|< 0.5, AVRO enters the second stage of its development phase, during which intense fighting breaks out among vultures due to their convergence. First, a random number between 0 and 1, denoted as \({r}_{{G}_{3}}\), is generated. If \({r}_{{G}_{3}}\) is greater than or equal to parameter G3, then the vulture converges towards the leader vulture. Otherwise, an aggressive food competition process is carried out, as outlined in Eq. (28).$$P\left(i+1\right)=\left\{\begin{array}{c}Eq(31),if{ G}_{3}\ge {r}_{{G}_{3}}\\ Eq(32),if{ G}_{3}<{r}_{{G}_{3}}\end{array}\right.$$(28)Vulture Convergence: Vultures converge towards the leader vulture, using Eqs. (29) and (30) to calculate the convergence position. Here, Best1(i) represents the optimal solution in the i-th iteration, while Best2(i) represents the second-best solution. Subsequently, all vultures are gathered using Eq. (31), initiating competition for food.$${A}_{1}={Best}_{1}\left(i\right)-\frac{{Best}_{1}\left(i\right)\times P\left(i\right)}{{Best}_{1}\left(i\right)-P{\left(i\right)}^{2}}\times F$$(29)$${A}_{2}={Best}_{2}\left(i\right)-\frac{{Best}_{2}\left(i\right)\times P\left(i\right)}{{Best}_{2}\left(i\right)-P{\left(i\right)}^{2}}\times F$$(30)$$P\left(i+1\right)=\frac{{A}_{1}+{A}_{2}}{2}$$(31)Food Competition: When |F|< 0.5, the leader vulture becomes hungry and weak, lacking sufficient energy to compete with other vultures for food. Other stronger vultures become aggressive in their search for food. Equation (32) is utilized to model this behavior. LF(x) simulates the vulture's flight process, utilizing a Levy Flight (LF) model [33], as shown in Eq. (33), where \(\upsigma\) is a part of the definition of the LF model, x represents the problem dimension and β is a fixed parameter that takes a value of 1.5. Random variables u and v are drawn from a normal distribution, as shown in Eq. (34).$$P\left(i+1\right)=R\left(i\right)-\left|d\left(t\right)\right|\times F\times LF\left(x\right)$$(32)$$LF\left(x\right)=0.01\times \frac{u\times \sigma }{{\left|y\right|}^\frac{1}{2}},\sigma ={\left\{\frac{\Gamma \left(1+\beta \right)\times sin\left(\pi \beta /2\right)}{\Gamma \left[\left(1+\beta \right)/2\right]\times \beta \times {2}^{\left(\beta -1\right)/2}}\right\}}^{1/\beta }$$(33)$$u\sim N\left(0,{\sigma }^{2}\right), v\sim N\left(\mathrm{0,1}\right)$$(34)
-
(5) The Fifth stage: OptimizationIn the original algorithm, when |F| is greater than 1, vultures search for food in different areas, and AVRO enters an exploration phase. On the other hand, if |F| is less than 1, AVRO enters a development stage where vultures search for food near the optimal solution. Eq. (17) is utilized to compute F. The convergence process and final convergence performance vary depending on the maximum number of iterations. As the maximum number of iterations changes, F also changes during the training process, and its reduction rate slows down, resulting in slower convergence. This approach aims to explore the maximum fitness during the training process but does not directly yield stable convergence during the training phase.
symbol | definition | formula |
---|---|---|
\({\alpha }_{1}\), \({\alpha }_{2}\),\({\alpha }_{3}\)
| Adjustable parameters with no intrinsic significance | (11) |
rn | Random real numbers that follows a uniform distribution from 0 to 1, n is a positive integer from 1 to 8 | - |
i | The i-th iteration | - |
T | Maximum Number Of Iterations | - |
\(b\)
| The proportion of shortest paths passing through a given edge in the network | (13) |
δ1 | Adjustable parameters to adjust the range of \(b\)
| (13) |
R(i) | Leader vulture (the two vultures with the highest fitness) in iteration i | (14) |
Best1 | The best solution | (14) |
Best2 | The second-best solution | (14) |
F | The satiety rate of vultures | (17) |
P(i) | The vector position of the vulture in the i-th iteration | - |
D(i) | Exploration distance of vultures in the i-th iteration | (20) |
d(i) | The distance between vultures and leader vultures | (24) |
ub | The upper bound of algorithm search interval | - |
lb | The lower bound of algorithm search interval | - |
δ2 | Adjustable parameters to adjust the range of \(F\) in the optimization stage | (36) |
The workflow of the routing optimization algorithm based on AVRO
Experimental evaluation
Experimental environment settings
Parameter settings
Parameter | Value |
---|---|
POP | 10 |
\({\delta }_{1}\)
| 2 |
\({\delta }_{2}\)
| 0.0001 |
γ | 2 |
G1 | 0.4 |
G2 | 0.3 |
G3 | 0.4 |
ub | 1 |
lb | 2 |
\({\alpha }_{1}\)
| 5 |
\({\alpha }_{2}\)
| 0.1 |
\({\alpha }_{3}\)
| 19 |
Performance evaluation
-
DDPG is a classical deep reinforcement learning algorithm whose state space includes link traffic, link utilization, link ingress/egress demands, and the agent's action in the previous time slot. The action space is set as the weight of all links in the network, and the reward is consistent with the fitness design of this study. The DDPG algorithm outputs link weights based on the state. Then it calculates the routing strategy using the link weights, consistent with the AVRO algorithm.
-
GTO, also known as Gorilla Troops Optimizer, is a metaheuristic algorithm that mathematizes the collective social habits of gorillas. This algorithm shares the same optimization objective as the one in this paper, and it also outputs link weights.
-
RSIR is designed based on the Q-learning algorithm of reinforcement learning. It is a hop-based routing algorithm whose state space is designed as nodes in the network, and the action space is designed as optional next hops. The reward setting in this paper is related to the remaining bandwidth of links, packet loss rate, and delay, and is adjusted by three hyperparameters accordingly. When compared with our model, the reward setting is focused on the remaining bandwidth of links. When traffic reaches a certain node, RSIR continues to determine the next node for routing the traffic, thereby allocating network traffic.
-
OSPF is a classic routing algorithm that calculates the shortest path between node pairs using the Dijkstra algorithm. Traffic will be routed directly through the path with the least number of hops. It should be noted that among these algorithms, OSPF is not implemented based on SDN controllers, while other algorithms are implemented within the SDN architecture.
Fitness evaluation
GBN-1 | GBN-2 | GEANT-1 | GEANT-2 | NSFNet-1 | NSFNet-2 | |
---|---|---|---|---|---|---|
AVRO(ours) | 19.10 | 17.32 | 16.38 | 12.61 | 19.57 | 19.29 |
AVOA [21] | 16.74 | 17.09 | 15.63 | 12.28 | 17.79 | 19.06 |
DDPG [34] | 17.77 | 14.72 | 12.80 | 11.36 | 16.40 | 16.16 |
GTO [35] | 17.58 | 14.45 | 9.91 | 10.18 | 17.59 | 15.72 |
RSIR [6] | 16.26 | 14.61 | 10.45 | 12.19 | 16.50 | 10.86 |
OSPF [5] | 11.93 | 7.57 | 6.46 | 9.56 | 16.27 | 14.11 |
GBN-1 | GBN-2 | GEANT-1 | GEANT-2 | NSFNet-1 | NSFNet-2 | |
---|---|---|---|---|---|---|
AVRO(ours) | 39.2% | 43.6% | 45.2% | 58.1% | 38.0% | 39.3% |
AVOA [21] | 45.2% | 43.5% | 47.7% | 59.4% | 42.9% | 39.2% |
DDPG [34] | 41.5% | 50.5% | 56.7% | 62.4% | 47.0% | 45.8% |
GTO [35] | 42.6% | 51.9% | 68.2% | 66.8% | 43.3% | 48.4% |
RSIR [6] | 48.0% | 51.7% | 66.8% | 60.9% | 46.5% | 66.9% |
OSPF [5] | 60.6% | 78.2% | 83.1% | 69.4% | 47.3% | 53.8% |