1 Introduction
-
A comprehensive view of ML techniques in networking. We review literature published in peer-reviewed venues over the past two decades that have high impact and have been well received by peers. The works selected and discussed in this survey are comprehensive in the advances made for networking. The key criteria used in the selection is a combination of the year of publication, citation count and merit. For example, consider two papers A and B published in the same year with citation counts x and y, respectively. If x is significantly larger than y, A would be selected for discussion. However, upon evaluating B, if it is evidenced that it presents original ideas, critical insights or lessons learnt, then it is also selected for discussion due to its merit, despite the lower citation count.
-
A purposeful discussion on the feasibility of the ML techniques for networking. We explore ML techniques in networking, including their benefits and limitations. It is important to realize that our coverage of networking aspects are not limited to a specific network technology (e.g. cellular network, wireless sensor network (WSN), mobile ad hoc network (MANET), cognitive radio network (CRN)). This gives readers a broad view of the possible solutions to networking problems across network technologies.
-
Identification of key challenges and future research opportunities. The presented discussion on ML-based techniques in networking uncovers fundamental research challenges that confront networking and inhibit ultimate cognition in network operation and management. A discussion of these opportunities will motivate future work and push the boundaries of networking.
AdaBoost | Adaptive Boosting |
AIWPSO | Adaptive Inertia Weight Particle Swarm Optimization |
BN | Bayesian Network |
BNN | Bayesian Neural Network |
BP | BackPropagation |
CALA | Continuous Action-set Learning Automata |
CART | Classification and Regression Tree |
CMAC | Cerebellar Model Articulation Controller |
DBN | Deep belief Network |
DBSCAN | Density-based Spatial Clustering of Applications with Noise |
DE | Differential Evolution |
DL | Deep Learning |
DNN | Deep Neural Network |
DQN | Deep Q-Network |
DT | Decision Tree |
EM | Expectation Maximization |
EMD | Entropy Minimization Discretization |
FALA | Finite Action-set Learning Automata |
FCM | Fuzzy C Means |
FNN | Feedforward Neural Network |
GD | Gradient Descent |
HCA | Hierarchical Clustering Analysis |
HMM | Hidden Markov Model |
HNN | Hopfield Neural Network |
ID3 | Iterative Dichotomiser 3 |
k-NN | k-Nearest Neighbor |
KDE | Kernel Density Estimation |
LDA | Linear Discriminant Analysis |
LSTM | Long Short-term Memory |
LVQ | Learning Vector Quantization |
MART | Multiple Additive Regression Tree |
MaxEnt | Maximum Entropy |
MDP | Markov Decision Process |
ML | Machine Learning |
MLP | Multi-layer Perceptron |
NB | Naïve Bayes |
NBKE | Naïve Bayes with Kernel Estimation |
NN | Neural Network |
OLS | Ordinary Least Squares |
PCA | Principal Component Analysis |
PNN | Probabilistic Neural Network |
POMDP | Partially Observable Markov Decision Process |
RandNN | Random Neural Network |
RBF | Radial Basis Function |
RBFNN | Radial Basis Function Neural Network |
RBM | Restricted Boltzman Machines |
REPTree | Reduced Error Pruning Tree |
RF | Random Forest |
RIPPER | Repeated Incremental Pruning to Produce Error Reduction |
RL | Reinforcement Learning |
RNN | Recurrent Neural Network |
SARSA | State-Action-Reward-State-Action |
SGBoost | Stochastic Gradient Boosting |
SHLLE | Supervised Hessian Locally Linear Embedding |
SLP | Single-Layer Perceptron |
SOM | Self-Organizing Map |
STL | Selt-Taught Learning |
SVM | Support Vector Machine |
SVR | Support Vector Regression |
TD | Temporal Difference |
THAID | THeta Automatic Interaction Detection |
TLFN | Time-Lagged Feedforward Neural Network |
WMA | Weighted Majority Algorithm |
XGBoost | eXtreme Gradient Boosting |
2 Machine learning for networking—a primer
2.1 Learning paradigms
2.2 Data collection
2.3 Feature engineering
2.4 Establishing ground truth
2.5 Performance metrics and model validation
Metric | Description |
---|---|
Mean Absolute Error (MAE) | Average of the absolute error between the actual and predicted values. Facilitates error interpretability. |
Mean Squared Error (MSE) | Average of the squares of the error between the actual and predicted values. Heavily penalizes large errors. |
Mean Absolute Prediction Error (MAPE) | Percentage of the error between the actual and predicted values. Not reliable for zero values or low-scale data. |
Root MSE (RMSE) | Squared root of MSE. Represents the standard deviation of the error between the actual and predicted values. |
Normalized RMSE (NRMSE) | Normalized RMSE. Facilitates comparing different models independently of their working scale. |
Cross-entropy | Metric based on the logistic function that measures the error between the actual and predicted values. |
Accuracy | Proportion of correct predictions among the total number of predictions. Not reliable for skewed class-wise data. |
True Positive Rate (TPR) or recall | Proportion of actual positives that are correctly predicted. Represents the sensitivity or detection rate (DR) of a model. |
False Positive Rate (FPR) | Proportion of actual negatives predicted as positives. Represents the significance level of a model. |
True Negative Rate (TNR) | Proportion of actual negatives that are correctly predicted. Represents the specificity of a model. |
False Negative Rate (FNR) | Proportion of actual positives predicted as negatives. Inversely proportional to the statistical power of a model. |
Received Operating Characteristic (ROC) | Curve that plots TPR versus FPR at different parameter settings. Facilitates analyzing the cost-benefit of possibly optimal models. |
Area Under the ROC Curve (AUC) | Probability of confidence in a model to accurately predict positive outcomes for actual positive instances. |
Precision | Proportion of positive predictions that are correctly predicted. |
F-measure | Harmonic mean of precision and recall. Facilitates analyzing the trade-off between these metrics. |
Coefficient of Variation (CV) | Intra-cluster similarity to measure the accuracy of unsupervised classification models based on clusters. |
2.6 Evolution of machine learning techniques
3 Traffic prediction
Ref. | ML Technique | Application | Dataset | Features | Output | Evaluation | |
---|---|---|---|---|---|---|---|
(approach)
|
(availability)
|
(training)
| Settings | Resultsab | |||
NBP [141] | Supervised: · MLP-NN (offline) | End-to-end path bandwidth availability prediction (TSF) | NSF TeraGrid dataset (N/A) | Max, Min, Avg load observed in past 10 s ∼ 30 s | Available bandwidth on a end-to-end path in future epoch | Number of features =3 MLP-NN: ·(N/A) | MSE =8% |
Cortez et al. [104] | Supervised: · NNE trained with Rp (offline) | Link load and traffic volume prediction in ISP networks (TSF) | SNMP traffic data from 2 ISP nets, · traffic on a transatlantic link · aggregated traffic in the ISP backbone (N/A) | Traffic volume observed in past few minutes ∼several days | Expected traffic volume | Number of features =6∼9 5 NNs NNE: · all SLPs for dataset1 · 1 hidden layer MLPs with 6∼8 neurons for dataset2 | 1h lookahead: · MAPE =1.43%∼5.23% 1h ∼ 24h lookahead: · MAPE =6.34%∼23.48% |
Bermolen et al. [52] | Supervised: · SVR (offline) | Link load prediction in ISP networks (TSF) | Internet traffic collected at the POP of an ISP network (N/A) | Link load observed at τ time scale | Expected link load | Number of features =d samples with d=1..30 Number of support vectors: · varies with d (e.g. ∼ 320 for d=10) | RMSE < 2 for τ=1ms and d=5·≈ AR ·10% less than MA |
Chabaa et al. [86] | Supervised: MLP-NN with different training algorithms (GD, CG, SS, LM, Rp) (offline) | Network traffic prediction (TSF) | 1000 points dataset (N/A) | Past measurements | Expected traffic volume | Number of features (N/A) MLP-NN: · 1 hidden layer | LM: · RMSE =0.0019 RPE =0.0230% Rp: · RMSE =0.0031 RPE =0.0371% |
Zhu et al. [500] | Supervised: MLP-NN with PSO-ABC (offline) | Network traffic prediction (TSF) | 2-week hourly traffic measurements (N/A) | N past days hourly traffic volume | Expected next-day hourly traffic volume | Number of features =5 MLP-NN (5, 11, 1) PSO-ABC: ·30 particles of length=66 | MSE =0.006 on normalized data 50% less than BP |
Li et al. [274] | Supervised: MLP-NN (offline) | Traffic volume prediction on an inter-DC link (Regression) | 6-week inter-DC traffic dataset from Baidu · SNMP counters data collected every 30 s · Top-5 applications traffic data collected every 5 min (N/A) | Level-N wavelet transform used to extract time and frequency features from total and elephant traffic volumes time series | k×30-s ahead expected traffic volume | Number of wavelets: ·N=10 Number of features =k×120 for N=10 1 hidden layer MLP-NN | RRMSE =4%∼10% for k=1∼40 |
Chen et al. [94] | Supervised: · KBR · LSTM-RNN (offline) | Inferring future traffic volume based on flow statistics (regression) | Network traffic volume and flow count collected every 5 min over a 24-week period (public) | Flow count | Expected traffic volume | Number of features: · 1 feature (past sample) LSTM-RNN: ·(N/A) | RNN · MSE > 0.3 on normalized data · 0.05 higher than KBR · twice as much as RNN fed with traffic volume time series |
Poupart et al. [365] | Supervised: · GPR · oBMM · MLP-NN (offline) | Early flow-size prediction and elephant flow detection (classification) | 3 university and academic networks datasets with over three million flows each (public) | · source IP · destination IP · source port · destination port · protocol · server vs. client · size of 3 first packets | Flow size class; elephant vs. non-elephant | Number of features: · 7 features MLP-NN: · (106,60,40,1) | GPR: · TPR > 80% · TNR > 80% oBMM: · TPR and TNR ≈100% on one dataset · TPR < 50% on other datasets MLP-NN: · TPR > 80% · lowest TNR < 80% |
3.1 Traffic prediction as a pure TSF problem
3.2 Traffic prediction as a non-TSF problem
3.3 Summary
4 Traffic classification
Ref. | ML Technique | Dataset | Features | Classes | Evaluation | |
---|---|---|---|---|---|---|
Settings | Results | |||||
Haffner et al. [176] ⋆ | Supervised NB, AdaBoost, MaxEnt | Proprietary | Discrete byte encoding for first n bytes of unidirectional flow | FTP, SMTP, POP3, IMAP, HTTPS, HTTP, SSH | n=64−256 bytes | Overall error rate <0.51%, precision > 99%, recall > 94% |
Ma et al. [286] ⋆ | Unsupervised HCA | Proprietary: U. Cambridge, UCSD | Discrete byte encoding for first n bytes of unidirectional flow | FTP, SMTP, HTTP, HTTPS, DNS, NTP, NetBIOS, SrvLoc | n=64 bytes, distance metric: PD = 250, MP = 150, CSG = 12% | Error rate: PD ≤ 4.15%, MP ≤ 9.97%, CSG ≤ 6.19% |
Finamore et al. [146] ⋆ | Supervised SVM | Statistical characterization of first N bytes of each packet a window of size C, divided into G groups of b consecutive bits | eMule, BitTorrent, RTP, RTCP, DNS, P2P-TV (PPLive, Joost, SopCast, TVAnts), Skype, Background | C=80,N=12,G=24,b=4 | Average TP = 99.6%, FP < 1% | |
Schatzmann et al. [404]
†
| Supervised SVM | Proprietary: ISP network | Service proximity, activity profiles, session duration, periodicity | Mail, Non-Mail | N/A | Average accuracy = 93.2%, precision = 79.2% |
Bermolan et al. [53]
†
| Supervised SVM | Proprietary: campus network, ISP network | Packet count exchanged between peers in duration △T | PPLive, TVAnts, SopCast, Joost | △T=5 s, SVM distance metric R=0.5 | Worst-case TPR ≈95%, FPR < 0.1% |
Ref. | ML Technique | Dataset | Features | Classes | Evaluation | |
---|---|---|---|---|---|---|
Settings | Results | |||||
Roughan et al. [390] | Supervisedk-NN | Proprietary: univ. networks, streaming service | Packet-level and flow-level features | Telnet, FTP-data, Kazaa, RealMedia Streaming, DNS, HTTPS | k=3, number of QoS classes = 3, 4, 7 | Error rate: 5.1% (4), 2.5% (3), 9.4% (7); (#): number of QoS Classes |
Moore and Zuev [321] | Supervised NBKE | Proprietary: campus network | Baseline and derivative packet-level features | BULK, WWW, MAIL, SERVICES, DB, P2P, ATTACK, MULTIMEDIA | N/A | Accuracy upto 95%, TPR upto 99% |
Jiang et al. [218] | Supervised NBKE | Proprietary: campus network | Baseline and derivative flow-level features | WWW, email, bulk, attack, P2P, multimedia, service, database, interaction, games | N/A | Average accuracy ≈ 91% |
Park et al. [347] | Supervised REPTree, REPTree-Bagging | NLANR [457] | Packet-level, flow-level and connection-level features | WWW, Telnet, Messenger, FTP, P2P, Multimedia, SMTP, POP, IMAP, DNS, Services | Burst packet threshold = 0.007s | Accuracy ≥ 90% (features ≥ 7) |
Zhang et al. [496] | Supervised BoF-NB | WIDE [474], proprietary: ISP network | Packet-level and flow-level features from unidirectional flows | BT, DNS, FTP, HTTP, IMAP, MSN, POP3, SMTP, SSH, SSL, XMPP | Aggregation rule = sum, BoF size | Accuracy 87-94%, F-measure = 80% |
Zhang et al. [497] | Supervised RF, Unsupervised k-Means (BoF-based, RTC) | Packet-level and flow-level features from unidirectional flows | FTP, HTTP, IMAP, POP3, RAZOR, SSH, SSL, UNKNOWN / ZERO-DAY (BT, DNS, SMTP) | N/A | RTC upto 15% and 10% better in flow and byte accuracy, respectively, than second best F-measure = 0.91 (before update), 0.94 (after update) | |
Auld et al. [26] | Supervised BNN | Proprietary | Packet-level and flow-level features | ATTACK, BULK, DB, MAIL, P2P, SERVICE, WWW | Number of features = 246, hidden layers = 0-1, 0-30 nodes in the hidden layer, output = 10 | Accuracy > 99%, 95% with temporally distant training and testing datasets |
Sun et al. [431] | Supervised PNN | Proprietary: campus networks | Packet-level and flow-level features | P2P, WEB, OTHERS | Number of features = 22 | Accuracy = 87.99%; P2P: TPR = 91.25%, FPR = 1.36%; WEB: TPR = 98.74%, FPR = 27.7% |
Este et al. [140] | Supervised SVM | Packet payload size | HTTP, SMTP, POP3, HTTPS, IMAPS, BitTorrent, FTP, MSN, eDonkey, SSL, SMB, Kazaa, Gnutella, NNTP, DNS, LDAP, SSH | Number of support vectors cf., [140] | TP > 90% for most classes | |
Jing et al. [223] | Supervised FT-SVM | A subset of 12 from 248 features [321] | BULK, INTERACTIVE, WWW, MAIL, SERVICES, P2P, ATTACK, GAME, MULTIMEDIA, OTHER | SVM parameters automatically chosen | Accuracy up to 96%, error ratio ↓ 2.35 times, avg. computation cost ↓ 7.65 times | |
Wang et al. [464] | Supervised multi-class SVM, unbalanced binary SVM | Proprietary: univ. network | Flow-level and connection-level features | BitTorrent, eDonkey, Kazaa, pplive | N/A | Accuracy 75-99% |
Ref. | ML Technique | Dataset | Features | Classes | Evaluation | |
---|---|---|---|---|---|---|
Settings | Results | |||||
Liu et al. [283] | Unsupervised k-Means | Proprietary: campus network | Packet-level and flow-level features | WWW, MAIL, P2P, FTP (CONTROL, PASV, DATA), ATTACK, DATABASE, SERVICES, INTERACTIVE, MULTIMEDIA, GAMES | k=80 | Average accuracy ≈ 90%, minimum recall = 70% |
Zander et al. [492] | Unsupervised AutoClass | NLANR [457] | Packet-level and flow-level features | AOL Messenger, Napster, Half-Life, FTP, Telnet, SMTP, DNS, HTTP | Intra-class homogeneity (H) | Mean accuracy = 86.5% |
Erman et al. [136] | Unsupervised AutoClass | Univ. Auckland [457] | Packet-level and flow-level features | HTTP, SMTP, DNS, SOCKS, IRC, FTP(control, data), POP3, LIMEWIRE, FTP | N/A | Accuracy = 91.2% |
Erman et al. [135] | Unsupervised DBSCAN | Univ. Auckland [457], proprietary: Univ. Calgary | Packet-level and flow-level features | HTTP, P2P, SMTP, IMAP, POP3, MSSQL, OTHER | eps = 0.03, minPts = 3, number of clusters = 190 | Overall accuracy = 75.6%, average precision > 95% (7/9 classes) |
Erman et al. [138] | Unsupervised k-Means | Proprietary: univ. network | Packet-level and flow-level features from unidirectional flows | Web, EMAIL, DB, P2P, OTHER, CHAT, FTP, STREAMING | k= 400 | Server-to-client: Avg. flow accuracy = 95%, Avg. byte accuracy = 79%; Web: precision = 97%, recall = 97%; P2P: precision = 82%, recall = 77% |
Ref. | ML Technique | Dataset | Features | Classes | Evaluation | |
---|---|---|---|---|---|---|
Settings | Results | |||||
Bernaille et al. [55] ∗ | Unsupervised k-Means | Proprietary: univ. network | Packet size and direction of first P packets in a flow | eDonkey, FTP, HTTP, Kazaa, NNTP, POP3, SMTP, SSH, HTTPS, POP3S | P=5, k=50 | Accuracy > 80% |
Supervised J48 DT, k-NN, Random Tree, RIPPER, MLP, NB | Proprietary: Univ. Napoli campus network | Payload size stats and inter-packet time stats of first N packets, bidirectional flow duration and size, transport protocol | BitTorrent, SMTP, Skype2Skype, POP, HTTP, SOULSEEK, NBNS, QQ, DNS,SSL RTP, EDONKEY | N=1...10 | Overall accuracy = 98.4% with BKS (J48, Random Tree, RIPPER, PL) combiner, N=10 | |
Nguyen et al. [337]
†
| Supervised NB, C4.5 DT | Proprietary: home network, univ. network, game server | Inter-packet arrival time statistics, inter-packet length variation statistics, IP packet length statistics of N consecutive packets | Enemy Territory (online game), VoIP, Other | N=25 | C4.5 DT: Enemy Territory - recall ∗ = 99.3%, prec. ∗ = 97%; VoIP - recall ∗= 95.7%, precision ∗= 99.2% NB: Enemy Territory - recall ∗ = 98.9%, prec. ∗ = 87%, VoIP - recall ∗= 99.6%, precision ∗= 95.4% ∗ median |
Erman et al. [137] ⋆ | Semi-supervised k-Means | Proprietary: Univ. Calgary | Number of packets, average packet size, total bytes, total header bytes, total payload bytes (caller to callee and vice versa) | P2P, HTTP, CHAT, EMAIL, FTP, STREAMING, OTHER | k = 400, 13 layers, packet milestones (number of packets) in layers are separated exponentially (8, 16, 32, …) | Flow accuracy > 94%, byte accuracy 70-90% |
Li et al. [270] ⋆ | Supervised C4.5 DT, C4.5 DT with AdaBoost, NBKE | Proprietary | A subset of 12 from 248 features [321] of first N packets | WEB, MAIL, BULK, Attack, P2P, DB, Service, Interactive | N=5 | C4.5 DT: Accuracy >99%; Attack is an exception with moderate-high recall |
Jin et al.[222] ⋆ | Supervised AdaBoost | Proprietary: ISP network, labeled as in [176] | Lowsrcport, highsrcport, duration, mean packet size, mean packet rate, toscount, tcpflags, dstinnet, lowdstport, highdstport, packet, byte, tos, numtosbytes, srcinnet | Business, chat, DNS, FileSharing, FTP, Games, Mail, Multimedia, NetNews, SecurityThreat, VoIP, Web | Number of binary classifiers (k): TCP = 12, UDP = 8 | Error rate: TCP = 3%, UDP = 0.4% |
Bonfiglio et al. [69]
‡
| Supervised NB, Pearson’s χ2 test | Proprietary: univ. network, ISP network | Message size, average inter-packet gap | Skype | NB decision threshold B
min
=−5, χ2(Thr)=150 | NB ∧χ2: UDP – E2E - FP = 0.01%, FN = 29.98% E2O - FP = 0.0%, FN = 9.82% (univ. dataset); E2E - FP = 0.01%, FN = 24.62% E2O - FP = 0.11%, FN = 2.40% (ISP dataset) TCP – negligible FP |
Alshammari et al. [17]
‡
| Supervised AdaBoost, SVM, NB, RIPPER, C4.5 DT | Packet size, packet inter-arrival time, number of packets, number of bytes, flow duration, protocol (forward and backward direction) | SSH, Skype | N/A | C4.5 DT: SSH – DR = 95.9%, FPR = 2.8% (Dalhousie), DR = 97.2%, FPR = 0.8% (AMP), DR = 82.9%, FPR = 0.5% (MAWI) Skype – DR = 98.4%, FPR = 7.8% (Dalhousie) | |
Shbair et al. [409]
‡
| Supervised C4.5 DT, RF | Synthetic trace | Statistical features from encrypted payload and [253] (client to server and vice versa) | Service Provider (number of services): Uni-lorraine.fr (15), Google.com (29), akamihd.net (6), Googlevideo.com (1), Twitter.com (3), Youtube.com (1), Facebook.com (4), Yahoo.com (19), Cloudfront.com (1) | N/A | RF (service provider): precision = 92.6%, recall = 92.8%, F-measure = 92.6% RF (service): accuracy in 95-100% for majority of service providers > 100 connections per HTTPS service |
Ref. | ML Technique | Dataset | Features | Classes | Evaluation | |
---|---|---|---|---|---|---|
Settings | Results | |||||
He et al. [182] ⋆ | Supervised k-NN, Linear-SVM, Radial-SVM, DT, RF, Extended Tree, AdaBoost, Gradient-AdaBoost, NB, MLP | KDD [42] | Protocol, network service, source bytes, destination bytes, login status, error rate, connection counts, connection percentages (different services among the same host, different hosts among the same service) | Attack types from [450] | Dynamic selection of classifier and features to collect | Accuracy = 95.6% |
Amaral et al. [19]
†
| Supervised RF, SGBoost, XGBoost | Proprietary: enterprise network | Packet size (1 to N packets), packet timestamp (1 to N packets), inter-arrival time (N packets), source/destination MAC, source/destination IP, source/destination port, flow duration, packet count byte count | BitTorrent, Dropbox, Facebook, Web Browsing (HTTP), LinkedIn, Skype, Vimeo, YouTube | N=5 | RF: Accuracy 73.6-96.0% SGBoost: Accuracy 71.2-93.6% XGBoost: Accuracy 73.6-95.2% |
Wang et al. [462]
†
| Semi-supervised Laplacian-SVM | Proprietary: univ. network | Entropy of packet length, average packet length (source to destination and vice versa), source port, destination port, packets to respond from source to destination, minimum length of packets from destination to source, packet inactivity degree from source to destination, median of packet length from source to destination for the first N packets | Voice/video conference, streaming, bulk data transfer, interactive | N=20, Laplacian-SVM parameters λ=0.00001−0.0001, σ=0.21−0.23 | Accuracy > 90% |
4.1 Payload-based traffic classification
4.2 Host behavior-based traffic classification
4.3 Flow Feature-based traffic classification
4.3.1 Supervised complete flow feature-based traffic classification
4.3.2 Unsupervised complete flow feature-based traffic classification
4.3.3 Early and sub-flow-based traffic classification
4.3.4 Encrypted traffic classification
4.3.5 NFV and SDN for traffic classification
4.4 Summary
5 Traffic routing
Ref. | Technique | Application | Dataset | Featuresa | Action set | Evaluation | |
---|---|---|---|---|---|---|---|
(selection)
|
(network)
| Settingsa | Improvementb | ||||
AdaR [461] | Partially decentralized LSPI (ε-greedy) | Unicast routing (WSN) | Simulations ·400 sensors ·20 data sources ·1 sink | State: \(\mathcal {N}_{i}\) Reward: function of · node load · residual energy · hop cost to sink · link reliability | Next-hop nodes to destination | ·S=#nodes·A=#neighbors | Compared to Q-learning: · Faster convergence (by 40 episodes) · Less sensitive to initial parameters |
FROMS [151] | Q-learning (variant of ε-greedy) | Multicast routing (WSN) | Omnet++ Mobility Framework with 50 random topologies ·50 nodes ·5 sources ·45 sinks | State: (\(\mathcal {N}^{k}_{i}\), D
k
) Reward: function of hop cost | \(\{a_{1} \cdots a_{m}\} a_{k} = (\mathcal {N}^{k}_{j}, D_k) N^{k}_{j} =\) next hop along the path to sink D
k
| ·S=#nodes·A=#neighbors | Comparedto directed diffusion: · up to 5× higher delivery rate ·≈20% lower overhead |
Q-PR [24] | Variant of Q-learning (ε-greedy) | Localization-aware routing to achieve a trade-off between packet delivery rate, ETX, and network lifetime (WSN) | Simulations ·50 different topologies ·100 nodes | State: \(\mathcal {N}_{i}\) Reward: function of · distance(\(\mathcal {N}_{i}\),\(\mathcal {N}_{j}\)) · distance(\(\mathcal {N}_{j}\),d) · energy at \(\mathcal {N}_{j} \cdot \) ETX \(\cdot \mathcal {N}_{j}\)’s neighbors for any neighbor \(\mathcal {N}_{j}\) and destination | Next-hop nodes to destination | ·S=#nodes·A=#neighbors | Delivery rate: ·25% more than GPSR Network lifetime ·3× more than GPSR ·4× more than EFE |
Ref. | Technique | Application | Dataset | Featuresa | Action set | Evaluation | |
(selection)
|
(network)
| Settingsa | Improvementb | ||||
Xia et al. [482] | DRQ-learning (greedy) | Spectrum-aware routing (CRN) | OMNET++ simulations · stationary multi-hop CRN · 10 nodes · 2 PUs | State: \(\mathcal {N}_{i}\) Reward: # available channels between current node and next-hop node | Next-hop nodes to destination | ·S=#nodes·A=#neighbors | Compared to Q-routing: ·50% faster at lower activity level Compared to Q-routing and SP-routing: · lower converged end-to-end delay |
QELAR [197] | Model-based Q-learning (greedy) | Distributed energy-efficient routing (underwater WSN) | Simulations (ns-2) ·250 sensors in 5003m3 space ·100m transmission range · fixed source/sink ·1m/s maximum speed for intermediate nodes | State: \(\mathcal {N}_{i}\) Reward: function of the residual energy of the node receiving the packet and the energy distribution among its neighbor nodes. | Next-hop nodes to destination ∪ packet withdrawal | ·S=#nodes·A=1+#neighbors | Compared to Q-learning: · Faster convergence (40 episodes less) · Less sensitive to initial parameters |
Lin et al. [277] | n−step TD (greedy) | Delay-sensitive application routing(multi-hop wireless ad hoc networks) | Simulations 2 users transmitting video sequences to the same destination node ·3∼4-hops wireless network | State: current channel states and queue sizes at the nodes in each hop Reward: goodput at destination | Next-hop nodes to destination | \(\cdot S=n_{q}^{N}\times n_{c}^{H} \cdot A=(N_{h}^2)^{H-1}\times N_{h} N=\# nodes N_h=\# nodes\)at hop h H=#hopsn
q
=#queuestates n
c
=#channelstates | Complexity ≈2×108 for the 3−hop network With 95% less information exchanges ·∼10% higher PSNR · slightly slower convergence (+1∼2sec) |
d-AdaptOR [59] | Q-learning with adaptive learning rate (ε−greedy) | Opportunistic routing (multi-hop wireless ad hoc networks) | Simulations on QualNet with 36 randomly placed wireless nodes in a 150m×150m | State: \(\mathcal {N}_{i}\) Reward: · fixed negative transmission cost is receiver is not the destination · fixed positive reward if receiver is the destination · 0 if packet is withdrawn | Next-hop nodes to destination ∪ packet withdrawal | ·S=#nodes·A=1+#neighbors | After convergence (≈300sec) · ETX comparable to a topology-aware routing algorithm ·>30% improvement over greedy-SR, greedy ExOR and SRCR with a single flow · Improvement decreases with # flows |
QAR [276] | Centralized SARSA (ε-greedy) | QoS-aware adaptive routing(SDN) | Sprint GIP network trace-driven simulations [418] · 25 switches, 53 links | State: \(\mathcal {N}_{i}\) Reward: function of delay, loss, throughput | Next-hop nodes to destination | ·S=#nodes ·A=#neighbors | Compared to Q-learning with QoS-awareness: · Faster convergence time (20 episodes less) |
5.1 Routing as a decentralized operation function
5.2 Routing as a partially decentralized operation function
5.3 Routing as a centralized control function
5.4 Summary
6 Congestion control
6.1 Packet loss classification
Ref. | ML Technique | Network | Dataset | Features | Classification | Evaluation | |
---|---|---|---|---|---|---|---|
Settings | Results | ||||||
Liu et al. [282] | Unsupervised: · EM for HMM | Hybrid wired and wireless | Synthetic data: · ns-2 simulation · 4-linear topology Data distribution: · Training = 10k | · Loss pair RTT | · Congestion loss · Wireless loss | · 4-state HMM · Gaussian variables · Viterbi inference | HMM accuracya: ·44−98% |
Barman and Matta [38] | Unsupervised: · EM for HMM | Hybrid wired and wireless | Synthetic data: · ns-2 simulation · Topology: - 4-linear - Dumbbell | · Loss pair delay · Loss probabilities: - Congestion - Wireless (nw)nw: network support | · Congestion loss · Wireless loss | · 2-state HMM · Gaussian variables · Bayesian inference · Discretized values: - 10 symbols | HMM accuracya: ·92−98% |
Supervised: · Boosting DT · DT · RF · Bagging DT · Extra-trees · MLP-NN ·k-NN | Hybrid wired and wireless | Synthetic data: · Simulation in: - ns-2 - BRITE ·> 1k random topologies Data distribution: · Training = 25k· Testing = 10k | 40 features applying avg, stdev, min, and max on parameters: · One-way delay · IAT And on packets: · 3 following loss · 1 before loss · 1/2 before RTT [130] finds that adding the number of losses is insignificant | · Congestion loss · Wireless loss | Ensemble DT: · 25 trees NN: · 40 input neurons · 2 hidden layers with 30 neurons · 1 output neuron · LMAb learning k-NN: ·k=7 | AUC (%)c: · 98.40 · 94.24 · 98.23 · 97.96 · 98.13 · 97.61 · 95.41 | |
Fonseca and Crovella [150] | Supervised: · Bayesian | Wired | Real data: · PMA project · BU Web server | · Loss pair RTT | · Congestion loss · Reordering | · Gaussian variables · 0 to 3 historic samples | In PMA: · TPR = 80% · FPR = 40% In BU: · TPR = 90% · FPR = 20% |
Jayaraj et al. [214] | Unsupervised: · EM for HMM · EM-clustering | Optical | Synthetic data: · ns-2 simulation · NSFNET topology Data distribution: · Training = 25k· Testing = 15k | · Number of bursts between failures | · Congestion loss · Contention loss | HMM: · 8 states · Gaussian variables · Viterbi inference · 26 EM iterations Clustering: · 8 clusters · 24 EM iterations | CVc: ·0.16−0.42·0.15−0.28 HMM accuracya: ·86−96% |
6.2 Queue management
Ref. | ML Technique | Multiple | Synthetic data from | Features | Output | Evaluation | |
---|---|---|---|---|---|---|---|
Bottlenecka | ns-2 simulation |
(action-set for RL)
| Settings | Results | |||
PAQM [160] | Supervised: · OLS | ✓ | Topology: · 6-linear · Arbitrary dumbbell Time =50s | · Traffic volume (bytes) | TSF: · Traffic volume | · NMLS algorithm based on LMMSE | Accuracy: ·90−92.3% |
APACE [212] | Supervised: · OLS | ✓ | Topology: · Dumbbell (1-sink) · 6-linear Time =40s | · Queue length | TSF: · Queue length | · NMLS algorithm based on LMMSE | Accuracy: · 92% |
α_SNFAQM [498] | Supervised: · MLP-NN | – | Topology: · Dumbbell (1-sink) Time =300s | · Traffic volume · Predicted traffic volume | TSF: · Traffic volume | · 2 input neurons · 2 hidden layers with 3 neurons · 1 output neuron | Accuracy: ·90−93% |
NN-RED [179] | Supervised: · SLP-NN | – | Topology: · Dumbbell Time =900s | · Queue length | TSF: · Queue length | · 1+N input neurons (N past values) · 0 hidden layers · 1 output neuron · Delta-rule learning |
N/A
|
DEEP BLUE [298] | Reinforcement: · Q-learning - ε-greedy | – | Topology: · Dumbbell Time =50sOPNET simulator instead of ns-2 | States: · Queue length · Packet drop prob. Reward: · Throughput · Queuing delay | Decision making: · Increment of the packet drop probability (finite: 6 actions) | ·N/A states · 6 actions ·ε-greedy ASSb | Optimal packet drop probability: · Outperforms BLUE [144] |
Neuron PID [428] | Reinforcement: · PIDNN | ✓ | Topology: · Dumbbell Time =100s | · Queue length error | Decision making: · Increment of the packet drop probability(continuous) | · 3 input neurons · 0 hidden layers · 1 output neuron · Hebbian learning · 1 PID component | QLAcc errorc: · 7.15 QLJit: · 20.18 |
AN-AQM [427] | Reinforcement: · PIDNN | ✓ | Topology: · Dumbbell · 6-linear Time =100s | · Queue length error · Sending rate error | Decision making: · Increment of the packet drop probability(continuous) | · 6 input neurons · 0 hidden layers · 1 output neuron · Hebbian learning · 2 PID components | QLAcc errorc: · 6.44 QLJit: · 22.61 |
FAPIDNN [485] | Reinforcement: · PIDNN | ✓ | Topology: · Dumbbell Time =60s | · Queue length error | Decision making: · Increment of the packet drop probability(continuous) | · 3 input neurons · 0 hidden layers · 1 output neuron · 1 PID component · 1 fuzzy component | QLAcc errorc: · 3.73 QLJit: · 31.8 |
NRL [499] | Reinforcement: · SLP-NN | ✓ | Topology: · Dumbbell Time =100s | · Queue length error · Sending rate error | Decision making: · Increment of the packet drop probability(continuous) | · 2 input neurons · 0 hidden layers · 1 output neuron · RL learning | QLAcc errorc: · 38.73 QLJit: · 128.84 |
6.3 Congestion window update
Ref. | RL | Network | Synthetic Dataset | Features | Action-set | Evaluation | |
---|---|---|---|---|---|---|---|
Technique |
(action selection)
| Settings | Resultsa | ||||
TCP-FALA [380] | FALA | WANET | GloMoSim simulation: · Topology: - Random - Dumbbell | States and reward: · IAT of ACKs(distinguish ACKS and DUPACKs) | Finite: · 5 actions(stochastic) | · 1 input feature · 5 states · 5 actions | To TCP-NewRenob: · Packet loss =66% · Goodput =29% · Fairness =20% To TCP-FeW
‡
: · Packet loss =−5% · Goodput =−10% · Fairness =12% |
CALA | WANET | Simulation: · ns2 and GloMoSim · Topology: - Chain - Random node - Grid Experimental: · Linux-based · Chain topology | States and reward: · IAT of ACKs | Continuous: · Normal action probability distribution (stochastic) | · 1 input feature · 2 states ·∞ actions | To TCP-FeW: · Packet loss =37% · Goodput =13% · Fairness =23% To TCP-FALA: · Packet loss =28% · Goodput =36% · Fairness =14% | |
TCP-GVegas [219] | Q-learning | WANET | ns-2 simulation: · Topology: - Chain - Random | States: · CWND · RTTz · Throughput Reward: · Throughput | Continuous: · Range based on RTT, throughput, and a span factor (ε-greedy) | · 3 input features · 3 states ·N/A actions | To TCP-Vegas: · Throughput =60% · Delay =54% |
FK-TCPLearning [271] | FKQL | IoT | ns-3 simulation: · Dumbbell topology: - Single source/sink - Double source/sink | States: · IAT of ACKs · IAT of packets sent · RTT · SSThresh Reward: · Throughput · RTT | Finite: · 5 actions (ε-greedy) | · 5 input features ·10k states · 5 actions · FK approx: - 100 prototypes | To TCP-NewReno: · Throughput =34% · Delay =12% To TCPLearning based on pure Q-learning: · Throughput =−1.5%· Delay =−10% |
UL-TCP [30] | CALA | Wireless: · Single-hop: - Satellite - Cellular - WLAN · Multi-hop: - WANET | ns-2 simulation: · Single-hop dumbbell · Multi-hop topology: - Chain - Random - Grid | States and reward: · RTT · Throughput · RTO CWND | Continuous: · Normal action probability distribution (stochastic) | · 3 input features · 2 states ·∞ actions | For single-hop, to ATL: · Packet loss =51% · Goodput: =−14% · Fairness =53% For multi-hop, similar to Learning-TCP |
Remy [477] | Own(offline training) | · Wired · Cellular | ns-2 simulation: · Wired topology: - Dumbbell - Datacenter · Cellular topology | States: · IAT of ACKs · IAT of packets sent · RTT Reward: · Throughput · Delay | Continuous with 3-dimensions: · CWND multiple · CWND increment · Time between successive sends (ε-greedy) | · 4 input features ·(16k)3 states ·1003 actions · 16 network configurations | To TCP-Cubic: · Throughput =21% · Delay =60% To TCP-Cubic/SFQ-CD: · Throughput =10%· Delay =38% |
PCC [122] | Own | · Wired · Satellite | Experimental: · GENI · Emulab · PlanetLab | States: · Sending rate Reward: · Throughput · Delay · Loss rate | Finite: · 2 actions of the increment for updating sending rate (not CWND) (gradient ascent) | · 3 input features · 4 states · 2 actions | To TCP-Cubic: · Throughput =21%· Delay =60% |
6.4 Congestion inference
Ref. | ML Technique | Network | Dataset | Features | Output | Evaluation | |
---|---|---|---|---|---|---|---|
(location)
| Settings | Resultsab | |||||
El Khayat et al. [238] | Supervised: · MLP-NN · MART · Bagging DT · Extra-trees(offline) | Wired(end-system) | Synthetic data: · ns-2 simulation ·> 1k random topologies Data distribution: · Training = 18k · Testing = 7.6k | · Packet size · RTT: avg, min, max, stdev · Sesion loss rate · Initial timeout · Packets ACK at once · Session duration · TLR | Prediction: · Throughput | Ensemble DT: · 25 trees NN: N/A | MSE (10−3)c: · 0.245 · 0.423 · 0.501 · 0.525 |
Mirza et al. [316] | Supervised: · SVR(offline) | Multi-path wired(end-system) | Synthetic data: · Laboratory testbed - Dumbbell multi- path topology · RON testbed | · Queuing delay · Packet loss · Throughput | Prediction: · Throughput | · 2 input features · RBF kernel | Rate of predictions with RPE ≤ 10%: · Lab: 51% · RON: 87% |
Quer et al. [371] | Supervised: · BN(offline) | WLAN (access point) | Synthetic data: · ns-3 simulation · Star topology Data distribution: · Training =40k · Testing =10k | · MAC-TX · MAC-RTX · MAC contention window · CWND · CWND status · RTT · Trhoughput | Prediction: · Throughput | DAG: · 7 vertices · 6 edges | Using MAC-TX: · NRMSE =0.37 Using all features: · NRMSE =0.27 |
Mezzavilla et al. [309] | Supervised: · BN(offline) | WANET(end-system) | Synthetic data: · ns-3 simulation · Topology: - (not mentioned) | · MAC-TX · MAC-RTX · Slots before TX · Queue TX packets · Missing entries in IP table | Classification: · Static · Mobile | DAG: · 6 vertices · 5 edges | Using MAC-TX and MAC-RTX: · Precision =0.88 · Recall =0.91 |
Fixed-Share Experts [22] | Supervised: · WMA (online) | · WANET · Wired · Hybrid wired and wireless (end-system) | Synthetic data: · QualNet simulation · Topology: - Random WANET - Dumbbell wired Real data: · File transfer · Wired and WLAN | · RTT | Prediction: · RTT | · 1 input feature · 100 experts · Simple experts | MAE (ticks): · Synthetic data (ticks of 500ms): =0.53 · Real data (ticks of 4ms): =2.95 |
SENSE [128] | Supervised: · WMA(online) | Hybrid wired and wireless (end-system) | Real data: · Dataset from [22] | · RTT | Prediction: · RTT | · 1 input feature · 100 experts · EWMA experts | MAE (ticks of 4ms): =1.55 |
ACCPndn [230] | Supervised: · TLFN - PSO - GA(online) | NDN(controller node) | Synthetic data: · ns-2 simulation · Topology: - DFN - SWITCH Data distribution: · Training =70% · Validation =15% · Testing =15% | · PIT entries rate | Prediction: · PIT entries rate | ·R input neurons · 2 hidden layers with R neurons ·R output neurons R: number of contributing routers | MSE: · PSO-GA =2.23 · GA-PSO =3.25 · PSO =4.05 · GA =5.65 · BP =7.27 |
Smart-DTN-CC [412] | Reinforcement: · Q-learning - Boltzmann - WoLF(online) | DTN (node) | Synthetic data: · ONE simulation: · Random topology | States: · Input rate · Output rate · Buffer space Reward: · State transition | Decision-making: · Action to control the congestion(finite action-set: 12 actions) | · 3 input features · 4 states · 12 actions | Improvement to CCC: · Delivery ratio =53% · Delay =95% |
6.5 Summary
7 Resource management
7.1 Admission control
Ref. | ML Technique | Network | Dataset | Features | Output | Evaluation | |
---|---|---|---|---|---|---|---|
Settings | Results | ||||||
Supervised: NN | ATM | Simulation | · Link capacity ·Observed call generation rate | · Call loss rate | 2-10-1a | Improved call loss rate | |
Cheng and Chang [95] | Supervised: MLP-NN | ATM | Simulation | · Congestion-status · Cell-loss probability · Peak bitrate · Average bitrate · Mean peak-rate duration | Acceptance or rejection | 30-30-1a | 20% system utilization improvement over [189] |
Piamrat et al. [359] | Supervised: · RandNN | Wireless | Videos (distorted) generated by streaming application | Codec, bandwidth, loss, delay, and jitter | · MOS | N/A | N/A |
Baldo et al. [36] | Supervised: · MLP-NN | Wireless LAN | ns-3 simulator and testbed | Link load and frame loss | Service quality | 9-10-1a | 98.5% (offline) 92% (online) |
Liu et al. [281] | Supervised: · MLP-NN | Cellular (CDMA) | Simulation of cellular networks | · Network environment · User behavior · Call class · Action | GoS | 5-10-1a | Performs better than the static algorithms |
Bojovic et al. [66] | Supervised: · MLP-NN | Cellular (LTE) | ns-3 network simulator | · Application throughput · Average packet error rate · Average size of packet data unit | QoS fulfillment ratio | N/A | Accuracy: 86% |
Vassis et al. [452] | Supervised: · MLP · Probabilistic RBFNN · LVQ-NN · HNN ·SVM network | Ad hoc networks | Pamvotis WLAN simulator | · Network throughput · Packet generation rate | Average packet delays | N/A | Correctness: ·77% - 88% (Probabilistic RBFNN) Others do not converge |
Ahn et al. [8] | Un-Supervised: · HNN | Wireless network | Simulation | · Usable QoS levels | QoS assignment matrix for each connection | N × M, where N and M are the number of connections and the number of QoS levels | Minimized connection blocking and dropping probabilities |
Blenk et al. [63] | Supervised: · RNN | VN | Simulation | · Different graph features | Acceptance or rejection of VN | 18 different Recurrent NNs | 89% - 98% |
Bojovic et al. [67] | Supervised: · NN ·BN | Cellular (LTE) network | ns-3 simulator | · Channel quality indicator | R-factor | Two layers with Number of nodes in the hidden layer: 10 and 20 | Accuracy: 98% (BN) |
Quer et al. [372] | Supervised: · BN | Wireless LAN | ns-3 simulator | · Link Layer conditions | Voice call quality | Nodes: 9, Links: 14 | Accuracy: 95% |
Mignanti et al. [311] | RL: · Q-learning | NGN | OMNET simulator | States · Environment state based on number of active connections of each traffic class | Action · Accept or reject (ε-greedy) | Not provided | 10%-30% better than a greedy approach |
Wang et al. [458] | RL: · Q-learning | LTE femtocell networks | Simulation | States · Queue length of handoff and new calls | Action · Maintain, degrade, or upgrade proportion levels | RRl ×3, where l is QoS proportion levels | Reduction in blocking probability |
Tong et al. [446] | RL: · Q-learning | Multimedia networks | Simulation | States · The number ongoing calls of each class · Call arrival or termination event · QoS and capacity constraints | Action · Accept or reject or no action | K × 2, where K is number of constraints | Improvement in rejection rates |
Marbach et al. [295] | RL: · TD(0) | Integrated service networks | Simulation | States · The number active calls of each class · Routing path of each active call | Action · Accept with a route or reject | States 1.4 ×10256 | 2.2% improvement in rewards |
7.2 Resource allocation
Ref. | ML Technique | Network | Dataset | Features | Output | Evaluation | |
---|---|---|---|---|---|---|---|
Settings | Results | ||||||
Baldo et al. [35] | Supervised: · MLP-NN | Wireless networks | Simulation data generated using ns-Miracle simulator | · Signal to noise ratio · Received frames · Erroneous frames · Idle time | · Throughput · Delay · Reliability | 2 layers with 6 neurons in the hidden layer | Very good accuracy |
Bojovic [65] | Supervised: · MLP-NN | Wireless LAN | Synthetic data generated using testbed | · Signal to noise ratio · Probability of failure · Business ratio · Average beacon delay · Number of detected stations | · Throughput of an access point | 2 layers with varying number of nodes in the hidden layer, maximum number of epochs, and learning rate | NRMSE = 8% |
Adeel et al. [6] | RNN with GD, AIWPSO, and DE | Cellular network | Synthetically generated using a SEAMCAT LTE simulator | · Signal to interference noise ratio · Inter-cell-interference · Modulation/coding schemes · Transmit power | Throughput | 5-8-1a | Mean square error · AIWPSO: 8.5 ×10−4· GD: 1.03 ×10−3 · DE: 9.3 ×10−4 |
Testolin et al. [443] | Supervised: · Linear classifier Unsupervised: · RNN | Wireless networks | 38 video clips taken from CIF | · Video frame size | · Quality level of each video in terms of the average SSIM index | 32 visible units with a varying number of hidden units | RMSE < 3% |
Mijumbi et al. [312] | RL · Q-learning (ε-greedy and softmax) | VNs | Simulation on ns-3 and real Internet traffic traces | States · Percentages of allocated and unused resources in substrate nodes and links | Actions · Increase or decrease the percentages of allocated resource | 29 states, 9 actions | Improved the acceptance ratio |
Mijumbi et al. [313] | Supervised: · FNN | VNF chains | VoIP traffic traces | · Dependency of resource requirements of each VNFC on its neighbor VNFCs · Historical local VNFC resource utilization | · Resource requirements of each VNFC | 2 NNs for each VNFC | Accuracy ∼ 90% |
Shi et al. [410] | Supervised: · MDP · BN | VNF chains | Simulation data generated using WorkflowSim | · Historical resource usage | · Future resource reliability | Running time for MDP: O(tv+1), where t and v stand for the number of NFV component tasks and the number of VMs, respectively | Better than other greedy methods in terms of cost |
7.3 Summary
8 Fault management
8.1 Predicting fault
8.2 Detecting fault
8.3 Localizing the root cause of fault
8.4 Automated mitigation
8.5 Summary
Ref. | ML Technique | Network | Dataset | Features | Output | Evaluation | |
---|---|---|---|---|---|---|---|
(location) | (training) | Settings | Results | ||||
Hood et al. [193] | Supervised: · BN | Campus network | Data collected from router | Management information base (MIB) variables for following network functions · Interface group · IP group · UDP group | Predict network health | 500 samples for each of 14 MIB variables of the 3 network functions | Predict approximately 8 min before fault occurrence |
Kogeda et al. [248] | Supervised: · BN | Cellular network | Simulation with fault injection | ·Power ·Multiplexer ·Cell ·Transmission | Faulty or not | 4 nodes each with 3 states | Confidence level of 99.8% |
Snow et al. [414] | Supervised: · NN (MLP) | Wireless network | Generated using discrete time event simulation | ·Mean time to failure ·Mean time to restore ·Time Profile ·Run Time | Dependability of a network ·Survivability ·Availability ·Failed components ·Reportable outages | 14 inputs, 10 and 5 nodes in the first and second hidden layer, respectively | Closely approximates reportable outages |
Wang et al. [466] | Supervised: · DT (J4.8) · Rule learners (JRip) · SVM · BN · Ensemble | Wireless sensor network | Generated using sensor network testbed | ·Received signal strength indication ·Send and forward buffer sizes ·Channel load assessment ·Forward and backward | Link quality estimation | 10-fold cross validation was used with 5000 samples | Accuracy · 82% for J4.8 ·80% for JRip |
Lu et al. [285] | Manifold learning: ·SHLLE | Distributed systems | Generated from a testbed of a distributed environment with a file transfer application | System performance ·interface group ·IP group ·TCP group ·UDP group | Prediction of network, CPU, and memory failures | Not provided | · Precision: 0.452 ·Recall: 0.456 · False positive rate: 0.152 |
Pellegrini et al. [355] | Different ML methods: ·Linear Regression · M5P · REP-Tree · LASSO · SVM · Least-Square SVM | Multi-tier e-commerce web application | Generated from a testbed of a virtual architecture | Different system performance | Remaining Time to Failure (RTTF) | Not provided | Soft mean absolute error · Linear regression: 137.600 · M5P: 79.182 · REP-Tree: 69.832 · LASSO as a Predictor: 405.187 · SVM: 132.668 · Least-Square SVM: 132.675 |
Wang et al. [469] | Supervised: · Double-exponential smoothing (DES) and SVM | Optical network | Real data collected from an optical network of a telecommunications operator | Indicators In Board Data: ·Input Optical Power · Laser Bias Current · Laser Temperature Offset · Output Optical Power · Environmental Temperature ·Unusable Time | Predicting equipment failure | 10-fold cross-validation was used to test model accuracy | DES with SVM · Prediction accuracy: 95% |
Kumar et al. [255] | Unsupervised: · DNN with Autoencoders | Cellular Network | Fault data from one of the national mobile operators of USA for a month | Historical data of fault occurrence and their inter-arrival times | Prediction of inter-arrival time of faults | 10 neurons in the hidden layer | DNN with autoencoders · NRMSE: 0.122092 ·RMSE: 0.504425 |
Ref. | ML Technique | Network | Dataset | Features | Output | Evaluation | |
---|---|---|---|---|---|---|---|
(location) | (training) | Settings | Results | ||||
Rao [382] | Statistical learning | Cellular network | Data collected from real cellular networks | Mobile user call load profile | Detect faults at ·Base station level ·Sector level ·Carrier level ·Channel level | Not provided | Bounded probability of false alarm |
Baras et al. [37] | A combination of NN (radial basis functions) | Cellular network (X.25 protocol) | Simulation with OPNET | For each fault scenario ·Blocking of packets ·Queue sizes ·Packet throughput ·Utilization on links connecting subnetworks ·Packet end-to-end delays | Detect one of the fault scenarios ·Reduced switch capacity ·Increased packet generation rate of a certain application ·Disabled switch ·Disabled links | Varying number of hidden nodes between 175 and 230 | Different rates of errors |
Adda et al. [5] | Supervised: ·k-Means ·FCM ·EM | IP network of a school campus | Obtained from a network with heavy and light traffic scenarios | 12 variables of interface (IF) category collected through SNMP | Fault classes: · Normal traffic ·Link failure traffic ·Server crash ·Broadcast storm ·Protocol error | Not provided | Precision for heavy scenario in router dataset ·k-Means = 40 · FCM = 85 · EM = 40 |
Moustapha and Selmic [324] | Supervised: · RNN | Wireless sensor network | Collected from a simulated sensor network | ·Previous outputs of sensor nodes · Current and previous output samples of neighboring sensor nodes | Approximation of the output of the sensor node | 8-10-1a | Constant error smaller than state-of-the-art |
Hajji [178] | Unsupervised change detection method | Local area networks | Collected from a real network using remote monitoring agents | Baseline random variable | An alarm as soon as an anomaly occurs | Time to detect : · 50 s to 17 min | Accuracy: 100% · Low alarm rate: 0.12 alarms per hour |
Hashmi et al. [181] | Supervised: ·k-Means · FCM · SOM | Broadband service provider network | 1 million NFL data points from 5 service regions | ·Fault occurrence date ·Time of the day ·Geographical region ·Fault cause ·Resolution time | Identify the spatio-temporal patterns linked with high fault resolution times | SOM on a 15x15 network grid for 154 epochs | Sum of squared errors: ·k-Means = 2156788 · FCM = 2822823 · SOM = 1136 |
Ref. | ML Technique | Network | Dataset | Features | Output | Evaluation | |
---|---|---|---|---|---|---|---|
(location) | (training) | Settings | Results | ||||
Chen et al. [91] | DT (C4.5) | Network systems | Snapshots of logs from eBay | A complete request trace ·Request type ·Request name ·Pool ·Host ·Version ·Status of each request | Different faulty elements | 10 one hour snapshots with 14 faults in total | ·Precision: 92% ·Recall: 93% |
Ruiz et al. [393] | BN | Optical network | Synthetically generated time series | Quality of Transmission (QoT) parameters ·Received power ·Pre-forward error correction bit error rate (pre-FEC BER) | Detect one of the two fault scenarios ·Tight filtering ·Inter-channel interference | 5,000 and 500 time series for training and testing, respectively | Accuracy: 99.2% |
Khanafer et al. [237] | BN and EMD | Cellular network | Synthetically generated from a simulated and a real UMTS network | ·Causes of faults ·Symptoms, i.e., alarms and KPIs | Identify the cuase of the fault | 77 and 42 faulty cells for training and testing, respectively | Accuracy: 88.1% |
Kiciman and Fox [241] | Supervised: · DT (ID3) | Three-tier enterprise applications | Generated using small testbed platform | Paths classified as normal or anomalous | Hardware and software components that are correlated with the failures | Three different DTs were evaluated | Correctly detect 89% to 96% of major failures |
Johnsson et al. [225] | Unsupervised: discrete state-space particle filtering | IP network | Discrete event simulator | ·Active network measurements ·Probabilistic inference ·Change detection | Probability mass function indicating the location of the faulty components | Operations per filter: O(|G|), where |G| is the number of edges in a graph G | Found the location of faults and performance degradations in real time |
Barreto et al. [40] | Unsupervised: · Winner-Take-All (WTA) · Frequency-Sensitive Competitive Learning (FSCL) · SOM ·Neural-Gas algorithm (NGA) | Cellular network | Simulation study | State vectors representing the normal functioning of a network | State vector causing the abnormally | 400 vectors were used for training and 100 vectors were used for testing | False alarm: · WTA: 12.43 · FSCL: 10.20 · SOM: 8.75 ·NGA: 9.50 |
9 QoS and QoE management
Ref. | ML | Application | Dataset | Features | Output | Evaluation | |
---|---|---|---|---|---|---|---|
Technique |
(approach)
|
(availability)
| Settings | Resultsab | |||
ANFIS | Impact of network and application-level QoS on MPEG4 video streaming over wireless mobile networks (NR regression) | Simulations with Evalvid and ns−2· MPEG4 video source · 3 video types · variable network conditions · mobile video streaming client · PSNR-generated MOS | Video type Application-related: frame rate, send bitrate network-level: link bandwidth, packet error rate | MOS | Number of features =5 5-layer ANFIS-NN: fuzzy layer, product layer, normalized layer, defuzzy layer, total output layer | For slight/gentle/rapid motion video type: · RMSE =0.15/0.18/0.56· R 2=0.7/0.8/0.75(on normalized data) Outperformed by a simple regression model [235] | |
Machado et al. [287] | MLP-NN | Impact of QoS and video features over QoE (FR/NR regression) | Simulations on Evalvid integrated to NS−2·3 video types (slight, gentle, rapid motion) ·565 data points · MOS, PSNR, SSIM and VQM generated by Evalvid and the VQMT tool | Delay, jitter, total/I/P/B frame loss · not clear if type of video is considered | A model is created for each output · MOS · PSNR · SSIM · VQM | Number of features =6∼7 (-,10,1) MOS-MLP (-,10,1) PSN-MLP (-,12,24,1) SSIM-MLP (-,10,1) VQM-MLP | MOS-MLP · MSE ≈0.01 PSNR-MLP · MSE ≈0.14 SSIM-MLP · MSE ≈0.01 VQM-MLPMSE · MSE ≈0.3(on normalized data) |
Mushtaq et al. [328] | DT, RF, NB, SVM, k-NN, and NN | Impact of QoS, video features and viewer features over QoE (NR classification) | Collected from streaming videos over QoS-controlled emulated network, and MOS collected from a panel of viewers | network-level: · delay, jitter, packet loss, etc. application-related: · resolution type of video: · motion complexity viewer-related: · gender, interest, etc. | MOS | Number of features =9k-NN (k=4) Other settings (N/A) | RF · MAE =0.136· TP =74.8% DT · MAE =0.126· TP =74% NB · MAE ≈0.23· TP ≈57% SVM · MAE =0.26· TP ≈61% 4-NN · MAE ≈0.2· TP =49% NN · MAE ≈0.18· TP ≈65%(on normalized data) |
MLQoE [89] | SVR, MLP-NN, DT, and GNB | modular user-centric correlation of QoE and network QoS metrics for VoIP services (NR regression) | 3 datasets of VoIP sessions under different network conditions generated with OMNET++: during handover (dataset 1), in a network with heavy UDP traffic (dataset 2), in a network with heavy TCP traffic (dataset 3) QoE assessed with user-generated MOS and program-generated PESQ and E-model QoE | network-related: · delay, jitter, packet loss, etc. | MOS | Number of features =10 MLP-NN (10,2∼5,1) Gaussian, linear, and polynomial kernel SVR | SVR · MAE 1=0.66· MAE 2=0.65· MAE 3=0.47 MLP-NN · MAE 1=0.75· MAE 2=0.68· MAE 3=0.53 DT · MAE 1=0.73· MAE 2=0.55· MAE 3=0.5 GNB · MAE 1=0.69· MAE 2=0.68· MAE 3=0.53(on normalized data) |
Dermibilek et al. [114] | RF, BG, and DNN | Correlation of QoE and network and application QoS metrics for video streaming services (NR regression) | INRS dataset, including user-generated MOS on audiovisual sequences encoded and transmitted with varying video and network parameters, and other pub (public [112]) | network-related: delay, jitter, packet loss, etc. application-related: video frame rates, quantization parameters, filters, etc. | MOS | Number of features: ·RF1, BG1 =34 ·RF2, BG2 =5 ·DNN21, DNN22 =5 RF, BG tree size =200 Number of hidden layers: ·DNN21=1 · DNN22 hidden =20 | RF1 · RMSE =0.340· PCC =0.930 RF2 · RMSE =0.340· PCC =0.930BG1 · RMSE =0.345· PCC =0.928BG2 · RMSE =0.355· PCC =0.925DNN21 · RMSE =0.403· PCC =0.909DNN22 · RMSE =0.437· PCC =0.894(on normalized data) |
Ref. | ML Technique | Application | Dataset | Features | Output | Evaluation | |
---|---|---|---|---|---|---|---|
(training)
|
(approach)
|
(availability)
| Settings | Resultsab | |||
CS2P [432] | Supervised: HMM (offline) | Throughput prediction for midstream bitrate adaptation in HAS clients to improve the QoE for video streaming (regression) | iQIYI dataset consisting of 20 million sessions covering ·3 million unique clients IPs and ·18 server IPs ·87 ISPs | Throughput samples | Throughput 1∼10 periods ahead | HMM model per cluster of similar sessions: · Number of states =6 · Number of samples =100 SVM, GBR single model for all sessions: · Number of features=6 | MAE =7%(on normalized data) · up to 50% more accurate than SVR, GBR and HMM with no clustering ·3.2% improvement on overall QoE ·10.9% improved bitrate over MPC |
Claeys et al. [102] | Reinforcement learning: Q-Learning (online) | Video quality adaptation in a HAS client to maximize QoE under varying network conditions (rule extraction) | ns-3 simulation based on TCP streaming sessions in Norway’s Telenor 3G/HSDPA mobile wireless network dataset. (public [384]) | State: · client buffer filling level · client throughput level Reward: QoE as function of · targeted quality level · span between current and targeted video quality level · rebuffering level | Finite action set of N=7 possible video quality levels (sotmax selection) | Improvement compared to Microsoft MSS: ·9.12% higher estimated MOS ·16.65% lower standard deviation | ·S=(N+1)\(\frac {B_{max}}{T_{seg} +1}\) ·A=N |
9.1 QoE/QoS correlation with supervised ML
9.2 QoE prediction under QoS impairments
9.3 QoS/QoE prediction for HAS and DASH
9.4 Summary
10 Network security
-
Encryption of network traffic, especially the payload, to protect the integrity and confidentiality of the data in the packets traversing the network.
-
Authorization using credentials, to restrict access to authorized personnel only.
-
Access control, for instance, using security policies to grant different access rights and privileges to different users based on their roles and authorities.
-
Anti-viruses, to protect end-systems against malwares, e.g. Trojan horse, ransom-wares, etc.
-
Firewalls, hardware or software-based, to allow or block network traffic based on pre-defined set of rules.
Ref. | ML Technique | Dataset | Features | Evaluation | |
---|---|---|---|---|---|
Settings | Results | ||||
Cannady [84] | Supervised NN (offline) | TCP, IP, and ICMP header fields and payload | -1 Layer MLP: 9, a, 2 -Sigmoid function -Number of nodes in hidden layers determined by trial & error | DR: 89%-91% Training + Testing runtime: 26.13 hrs | |
Pfahringer [358] | Supervised Ensemble of C5 DTs (offline) | KDD Cup [257] | all 41 features | -Two-processor (2x300Mhz) -512M memory, 9 GB disc Solaris OS 5.6 -10-folds cross-validation | DR Normal: 99.5% DR Probe: 83.3% DR DoS: 97.1% DR U2R: 13.2% DR R2L: 8.4% Training: 24 h |
Pan et al. [344] | Supervised NN and C4.5 DT (offline) | KDD Cup [257] | all 41 features | -29,313 training data records -111,858 testing data records -1 Layer MLP: 70-14-6 -NN trained until MSE = 0.001 or # Epochs = 1500 -Selected attacks for U2L and R2L -After-the-event analysis | DR Normal : 99.5% DR DoS: 97.3% DR Probe (Satan): 95.3% DR Probe (Portsweep): 94.9% DR U2R: 72.7% DR R2L: 100% ADR: 93.28% FP: 0.2% |
Moradi et al. [322] | Supervised NN (offline) | KDD Cup [257] | 35 features | -12,159 training data records -900 validation data records -6,996 testing data records -Attacks: SYN Flood and Satan -2 Layers MLP: 35 35 35 3 -1 Layer MLP: 35 45 35 -ESVM Method | 2 Layers MLP DR: 80% 2 Layers MLP Training time > 25 hrs 2 Layers MLP w/ ESVM DR: 90% 2 Layers MLP w/ ESVM Training time < 5 hrs 1 Layers MLP w/ ESVM DR: 87% |
Chebrolu et al. [90] | Supervised BN and CART (offline) | KDD Cup [257] | Feature Selection using Markov Blanket and Gini rule | -5,092 training data records -6,890 testing data records - AMD Athlon 1.67 GHz processor with 992 MB of RAM | DR Normal: 100% DR Probe: 100% DR DoS: 100% DR U2R: 84% DR R2L: 99.47% Training BN time: 11.03 ∼ 25.19 sec Testing BN time: 5.01 ∼ 12.13 sec Training CART time : 0.59 ∼ 1.15 sec Testing CART time: 0.02 ∼ 0.13 sec |
Amor et al. [20] | Supervised NB (offline) | KDD Cup [257] | all 41 features | -494,019 training data records -311,029 testing data records -Pentium III 700 Mhz processor | DR Normal: 97.68% PCC DoS: 96.65% PCC R2L: 8.66% PCC U2R: 11.84% PCC Probing: 88.33% |
Stein et al. [421] | Supervised C4.5 DT (offline) | KDD Cup [257] | GA-based feature selection | -489,843 training data records -311,029 testing data records -10-fold cross validation -GA ran for 100 generations | Error rate DoS: 2.22% Error rate Probe: 1.67% Error rate R2L: 19.9% Error rate U2R: 0.1% |
Paddabachigari et al. [354] | Supervised Ensemble of SVM, DT, and SVM-DT Offline | KDD Cup [257] | all 41 features | 5,092 training data records 6,890 testing data records AMD Athlon, 1.67 GHz processor with 992 MB of RAM -Polynomial kernel | DR Normal: 99.7% DR Probe:100% DR DoS: 99.92% DR U2R: 68% DR R2L: 97.16% Training time: 1 ∼ 19 sec Testing time: 0.03 ∼ 2.11 sec |
Sangkatsanee et al. [402] | Supervised C4.5 DT (online) | TCP, UPD, and ICMP header fields | -55,000 training data records -102,959 testing data records -12 features -2.83 GHz Intel Pentium Core2 Quad 9550 processor with 4 GB RAM and 100 Mbps LAN -Platform used: Weka V.3.6.0 | DR Normal: 99.43% DR DoS: 99.17% DR Probe: 98.73% Detection speed: 2 ∼ 3 sec | |
Miller et al. [314] | Supervised Ensemble MPML (Offline) | NSL-KDD [438] | all 41 features | -125,973 training records -22,544 testing records -3 NBs trained w/ 12, 9, 9 features -Platform used Weka [288] | TP: 84.137% FP: 15.863% |
Li et al. [272] | Supervised TCM K-NN (Offline) | KDD Cup [257] | all 41 features 8 features selected using Chi-square | -Intel Pentium 4, 1.73 GHz, 1 GB RAM, Windows XP Professional - Platform Weka [288] -49,402 training records -12,350 testing records -K = 50 | 41 features: TP 99.7% 41 features: FP 0% 8 features: TP 99.6% 8 features: FP 0.1% |
Ref. | ML Technique | Dataset | Features | Evaluation | |
---|---|---|---|---|---|
Settings | Results | ||||
Kayacik et al. [232] | Unsupervised Hierarchical SOM (Offline) | KDD Cup [257] | 6 TCP features | DR Test-set 1: 89% FP Test-set 1: 4.6% DR Test-set 2: 99.7% FP Test-set 2: 1.7% | |
Kim et al. [242] | Supervised SVM (Offline) | KDD Cup [257] | selected using GA | DR w/ Neural Kernel: 99% DR w/ Radial Kernel:87% DR w/ Inverse Multi-Quadratic Kernel: 77% | |
Jiang et al. [220] | Unsupervised Improved NN (Offline) | all 41 features | -40,459 training records -429,742 testing records -Cluster Radius Thresh r=[0.2-0.27] | DR DoS: 99.10%%99.15 DR Probe: 64.72%80.27% DR U2R: 25.49%60.78% DR R2L 6.34%8.67% DR new attacks: 32.44%42.12% FP: 0.05%1.30% | |
Zhang et al. [495] | Unsupervised Random Forests (Offline) | KDD Cup [257] | 40 features labeled by service type | -4 datasets used with % of attack connections: 1%, 2%, 5%, 10% -Platform used: Weka [288] | 1% attacks: FP: 1% DR: 95% 10% attacks: FP: 1% DR: 80% |
Ahmed et al. [7] | Supervised Kernel Function (Online) | From Abilene backbone network | number of packets, number of individual IP flows | -2 timeseries binned at 5 min intervals -Timeseries dimensions = FxT -F = 121 flows, T = 2016 timesteps | T#1 DR: 21/34-30/34 FP:0-19 T#2 DR:28/44-39/44 FP:5-16 |
Shon et al. [411] | Unsupervised Soft-margin SVM and OCSVM (Offline) | KDD Cup [257] Data collected from Dalhousie U. | selected using GA | KDD w/ 9 attack types DR: 74.4% Dalhousie Dataset DR: 99.99% KDD w/ 9 attack types FN:31.3% Dalhousi Dataset FP:0.01% | |
Giacinto et al. [165] | Unsupervised Multiple Classifiers (Offline) | KDD Cup [257] | 29 features for HTTP 34 features for FTP 16 features for ICMP 31 features for Mail 37 features for Misc 29 features for Private&Other | -494,020 training records -311,029 testing records -1.5% of data records is attacks | v-SVC DR: 67.31%-94.25% v-SVC FP: 0.91%-9.62% |
Hu et al. [198] | Supervised Decision stumps with AdaBoost (Offline) | KDD Cup [257] | all 41 features | -494,021 training records -311,029 testing records -Pentium IV with 2.6-GHz CPU and 256-MB RAM -Platform used Matlab 7 | DR: 90.04%-90.88% FP: 0.31%-1.79% Mean Training time: 73 sec |
Muniyandi et al. [327] | Unsupervised K-Means, C4.5 DT (Offline) | KDD Cup [257] | all 41 features | -15,000 training records -2,500 testing records -Intel Pentium Core 2 Duo CPU 2.20GHz, 2.19GHz, 0.99GB of RAM w/ Microsoft Windows XP (SP2) -Platform: Weka 3.5 [288] | DR: 99.6% FP: 0.1% Precision: 95.6% Accuracy: 95.8% F-measure: 94.0% |
Panda et al. [345] | Unsupervised RF, ND, END (Offline) | NSL-KDD [438] | all 41 features | -25,192 training instances -IBM PC of 2.66GHz CPU with 40GB HDD and 512 MB RAM -10-fold cross validation | TP: 99.5 FP: 0.1% F-measure: 99.7% Precision: 99.9% Recall 99.9% Time to build model: 18.13 sec |
Boero et al. [64] | Supervised RBF-SVM (Offline) | 7 SDN OpenFlow features | -RBF Complexity par: 20 -RBF kernel par: 2 | Normal-TP: 86% Normal-FP: 1.6% Malware-TP: 98.4% Malware-FP: 13.8% |
Ref. | ML Technique | Dataset | Features | Evaluation | |
---|---|---|---|---|---|
Settings | Results | ||||
Zanero et al. [493] | Unsupervised A two-tier SOM-based architecture (Offline) | Packet headers and payload | -2,000 training packets -2,000 testing packets -10x10 SOM trained for 10,000 epochs -Platform used: SOM toolbox [12] | Improves DR by 75% over 1-tiered S.O.M | |
Wang et al. [459] | Unsupervised Centroid model (Offline) | KDD Cup [257] & CUCS | Payload of TCP traffic | -2 weeks training data -3 weeks testing data -Inside network TCP data only -Incremental learning | DR w/ payload of a packet: 58.8% DR w/ first 100 bytes of a packet: 56.7% DR w/ last 100 bytes of a packet: 47.4% DR w/ all payloads of a con: 56.7% DR w/ first 1000 bytes of a Con: 52.6% Training time: 4.6-26.2 sec Testing time: 1.6-16.1 sec |
Perdisci et al. [356] | Supervised Ensemble of single-class SVM (Offline) | Payload | -50% of dataset for training -50% of dataset for testing -11 OCSVM trained with 2
v
-grams; v=1...10 -5-fold cross validation on KDD cup -7-fold cross validation on GATECH -2 GHz Dual Core AMD Opteron Processor and 8GB RAM | Generic DR w/ FP 10−5: 60% shell-code DR w/ FP 10−5: 90% CLET DR w/ FP 10−5: 90% Detection time KDD Cup: 10.92 ms Detection time GATECH: 17.11 ms | |
Gornitz et al. [171] | Supervised SVDD (Online) | Normal: from Fraunhofer Inst. Attack: Metasploit | payload | -2,500 training network events -1,250 testing network events -Active Learning -Fraction of Labeled data: 1.5% | DR: 96% FP: 0.0015% |
Ref. | ML Technique | Dataset | Features | Evaluation | |
---|---|---|---|---|---|
Settings | Results | ||||
Cannady et al. [85] | RL CMAC-NN (Online) | Prototype Application | Patterns of Ping Flood and UDP Packet Storm attacks | -3 Layers NN -Prototype developed w/ C & Matlab | Learning Error: 2.199-1.94 −07% New Attack Error:2.199-8.53 −14% Recollection Error: 0.038-3.28 −05% Error after Refinement: 1.24% |
Servin et al. [407] | RL Q-Learning (Online) | Generated using NS-2 | Congestion, Delay, and Flow-based | -Number of Agents: 7 -DDoS attacks only -Boltzmann’s rules for E2 | FP: 0-10% Accuracy: ∼ 70%- ∼ 99% Recall: ∼ 30%- ∼ 99% |
Li et al. [273] | DL DBN w/ Auto-Encoder (Offline) | KDD Cup [257] | all 41 features | -494,021 training records -311,029 testing records -Intel Core Duo CPU 2.10 GHz and 2GB RAM -Platform used: Matlab v.7.11 -3 Layers Encoder: 41,300,150,75,* | TPR: 92.20%-96.79% FPR: 1.58%-15.79% Accuracy: 88.95%-92.10% Training time: [1.147-2.625] sec |
Alom et al. [14] | DL DBN (Offline) | NSL-KDD [438] | 39 features | -25,000 training & testing records | DR w/ 40% data for training: 97.45% Training time w/ 40% data for training: 0.32 sec |
Tang et al. [436] | DL DNN (Offline) | NSL-KDD [438] | 6 basic features | -125,975 training records -22,554 testing records -3-Layers DNN: 6,12,6,3,2 -Batch Size: 10 # Epochs: 100 -Best Learning Rate: 0.001 | Accuracy: 72.0%5-75.75% Precision: 79%-83% Recall: 72%-76% F-measure: 72%-75% |
Kim et al. [245] | DL LSTM-RNN (Offline) | KDD Cup [257] | all 41 features | -1,930 training data records -10 test datasets of 5000 records -Intel Core I7 3.60 GHZ, RAM 8GB, OS Ubuntu 14.04 -# Nodes in Input Layer: 41 -# Nodes in Output Layer: 5 -Batch Size:50 #Epoch:500 -Best Learning Rate:0.01 | DR: 98.88% FP: 10.04% Accuracy: 96.93% |
Javaid et al. [213] | DL Self-taught Learning (Offline) | NSL-KDD [438] | all 41 features | -125,973 training records -22,544 testing records -10-fold cross validation | 2-class TP: 88.39% 2-class Precision: 85.44% 2-class Recall: 95.95% 2-class F-measure: 90.4% |
Ref. | ML Technique | Dataset | Features | Evaluation | |
---|---|---|---|---|---|
Settings | Results | ||||
Mukkamala et al. [325] | Supervised RBF-SVM (Online) | KDD cup [257] | all 41 features | 7,312 training records -6,980 testing records -Platform used: SVMLight [224] | Accuracy: 99.5% Training time: 17.77 sec Testing Time: 1.63 sec |
Zhang et al. [494] | Hybrid Hierarchical-RBF (Online) | KDD Cup | all 41 features | -32,000 training records -32,000 testing records | SHIDS Normal DR:=99.5% |
SHIDS Normal FP: 1.2% | |||||
SHIDS Attack DR: [98.2%-99.3%] | |||||
SHIDS Attack FP: [0%-5.4%] | |||||
PHIDS level 1 DR: 99.8% | |||||
PHIDS level 1 DR:1.2% | |||||
PHIDS level 2 DR:[98.8%-99.7%] | |||||
PHIDS level 2 FP:[0%-4%] | |||||
PHIDS level 3 DR: 86.9% | |||||
PHIDS level 3 FP: 0% | |||||
Training time: 5 min | |||||
Depren et al. [116] | Hybrid SOM w./ J.48 (Offline) | KDD Cup | 6 basic features for SOM all 41 features for J.48 | -10-fold cross validation -Two-phases SOM Training -Phase 1 learning rate:0.6 -Phase 2 learning rate: 0.05 -Confidence Val. for J.48 pruning: 25% | DR: 99.9% Missed Rate: 0.1% FP: 1.25% |
10.1 Misuse-based intrusion detection
10.2 Anomaly-based intrusion detection
10.2.1 Flow feature-based anomaly detection
-
Create a profile of normal packets using Self-organized Feature Map.
-
Packet filtering scheme, using p0f [491], based on passive TCP/IP fingerprinting to reject incorrectly formed TPC/IP packets.
-
GA to perform feature selection
-
Temporal correlation of packets during packet processing