1 Introduction
1.1 A Promising Solution via Reinforcement Learning
1.2 Contributions
2 Background and Related Work
3 Methodology
3.1 Twin Delayed Deep Deterministic Policy Gradient
-
a set of states S
-
a set of actions A
-
transition dynamics \(T = P(s_{t+1} \mid s_t, a_t)\)
-
an immediate reward function \(R(s_t, a_t)\)
-
a discount factor \(\gamma \in\) [0,1],
3.2 Network Data Collection via OpenFlow Messages
3.2.1 Packet Processing
3.2.2 Flow Rule Instillation
3.2.3 Traffic Monitoring
3.2.4 Flow Rule Removal
3.3 Feature Extraction
Parameter | Message |
---|---|
Table-Miss packet_In Inter-arrival Time | packet_in_handler |
Flow duration | flow_removed_handler |
Previous value | – |
Miss rate | packet_in_handler |
Inactive rate | flow_stats_reply_handler |
Hit rate | packet_in_handler |
Use rate | flow_stats_reply_handler |
3.3.1 State Space
3.3.2 Action Space
3.3.3 Reward Function
3.4 Traffic
3.5 Framework
3.5.1 Cache
3.5.2 Deep RL
3.6 Training
4 Implementation
5 Evaluation
Timeout value | Avg. active | Avg. match | Packet in messages |
---|---|---|---|
DDT | 0.79 | 0.80 | 7826 |
1 | 1.00 | 0.72 | 8616 |
2 | 0.94 | 0.73 | 8546 |
3 | 0.93 | 0.74 | 8380 |
4 | 0.92 | 0.77 | 7905 |
5 | 0.90 | 0.79 | 7889 |
6 | 0.76 | 0.80 | 7798 |
7 | 0.71 | 0.83 | 7707 |
8 | 0.65 | 0.86 | 7610 |
9 | 0.61 | 0.90 | 7470 |
10 | 0.53 | 1.00 | 7441 |
5.1 Analysis of the Obtained Results
5.2 Comparative Analysis with State-of-the-Art Solutions
Solution | Network planes | Algorithm | Implementation | Metrics | Results commentary |
---|---|---|---|---|---|
HQTimer [24] | Uses reference SDN Architecture No modification to the switch specification | Q-learning | Simulation using synthesized datasets | Table hit rate Number of overflow occurrences | No rule dependency problem Higher table-hit rate than static idle timeout Less overflow occurrences than static idle timeout |
Adaptive data forwarding [47] | Both control and data planes are involved Requires OpenFlow or switch design modification | A statistical control mechanism | Simulation using MATLAB | Table hit rate | Improved flow table hit rate and capacity utilization |
DRL-FTO [23] | Uses just the control plane | Deep reinforcement learning (Deep Q-Network (DQN)) | Proof of concept emulation Synthesized traffic using Iperf | Number of exchanged controller-switch messages | Minimized controller-switch communication overhead |
DRL-Idle [48] | Statistical analysis on both control and data planes | A heuristic based on deep reinforcement learning | Simulation using synthesized datasets | Flow-based Service Time (FST) Total number of flow installations | Improved QoS through flow management |
DeepPlace [49] | Uses both the control and the data planes | Deep reinforcement learning | Statistical model evaluated by emulation Synthesized traffic using Hping3 | Number of used match-fields in a flow rule QoS violation ratio | Assured QoS of traffic Decreased flow table overflow in the data plane Best for Software-Defined Internet of Things networks |
DDT (This work) | Uses just the control plane No modification to the southbound Protocol or the switch specification | Twin delayed deep deterministic policy gradient (TD3) | Proof of concept emulation Synthesized traffic using MGEN | Flow match rate Number of active flows in a table at a time | Higher average active transmissions Improved flow entry match ratio Minimized controller-switch communication overhead |