1 Introduction
Literature study | Type of soil | Number of datasets | Computational approach | R2/R |
---|---|---|---|---|
National Cooperative Highway Research Program [9] | Non-Plastic Coarse-grained soils | 7 | SLR | 0.84 |
Plastic Fine-grained soils | 11 | 0.67 | ||
Kin [10] | Fine-grained soil | 57 | MLR | NA |
Taskiran [11] | Fine-grained soil | 151 | ANN | 0.91 |
GEP | 0.92 | |||
Yildirim and Gunaydin [12] | Granular soil | 124 | SLR | 0.86 |
MLR | 0.88 | |||
ANN | 0.93 | |||
Alawi and Rajab [25] | Granular soil | 19 | MLR | 0.95 |
Varghese, Babu [26] | Fine-grained soil | 112 | MLR | 0.83 |
ANN | 0.85 | |||
Erzin and Turkoz [13] | Sandy soil | 61 | MLR | 0.81 |
ANN | 0.98 | |||
Tenpe and Patel [16] | Mixed soil samples | 389 | GEP | 0.82 |
ANN | 0.89 | |||
Katte, Mfoyet [27] | Subgrade soil | 33 | MLR | 0.84 |
Kurnaz and Kaya [22] | Mixed soil samples | 158 | GMDH | 0.97 |
Taha, Gabr [15] | Granular soil | 218 | ANN | 0.97 |
Alam, Mondal [28] | Fine-grained soil | 20 | GEP | 0.94 |
ANN | 0.96 | |||
Tenpe and Patel [17] | Mixed soil samples | 389 | GEP | 0.78 |
SVM | 0.80 | |||
Bardhan, Gokceoglu [19] | Mixed soil samples | 312 | MARS | 0.90 |
GP | 0.88 | |||
Bardhan, Samui [18] | Mixed soil samples | 312 | ELM-MPSO | 0.91 |
ELM-TPSO | 0.90 | |||
SVM | 0.87 |
1.1 Research Significance and Contributions
2 Machine Learning Algorithms and Statistical Assessment Indices
2.1 Applied ML Algorithms
2.1.1 Kernel Ridge Regression (KRR)
2.1.2 K-Nearest Neighbor (K-NN)
2.1.3 Gaussian Process Regression (GPR)
2.1.4 Hyperparameters Tuning Using a Grid Search
Hyperparameters | Selected values |
---|---|
Alpha | 3 |
Kernel type | Polynomial |
Coefficient of kernel | 0.2 |
Kernel degree | 3 |
Hyperparameters | Selected values |
---|---|
n_neighbours | 5 |
Weights | Uniform |
Algorithm | Auto |
Leaf size | 30 |
P | 2 |
Metric | Manhattan |
Hyperparameters | Selected values |
---|---|
Kernel | Rational quadratic () * Dot Product () + White Kernel () |
Alpha | 1e−10 |
Optimizer | Broyden–Fletcher–Goldfarb–Shanno |
2.2 Statistical Performance Measurement Indices
Parameters | Ideal value | |
---|---|---|
\({R}^{2}=1-\frac{\sum_{i=1}^{N}{\left({y}_{i}(a)-{y}_{i}(p)\right)}^{2}}{\sum_{i=1}^{N}{\left({y}_{i}(a)-\overline{{y }_{i}(a)}\right)}^{2}}\) | 1 | (15) |
\(Adj. {R}^{2}=\left[1-\frac{N-1}{N-P-1}\left(1-{R}^{2}\right)\right]\) | 1 | (16) |
\(R=\frac{{\sum }_{i=1}^{N}\left(\left({y}_{i}(a)-\overline{{y }_{i}(a)}\right)\left({y}_{i}(p)-\overline{{y }_{i}(p)}\right)\right)}{\sqrt{{\left({y}_{i}(a)-\overline{{y }_{i}(a)}\right)}^{2}{\left({y}_{i}(p)-\overline{{y }_{i}(p)}\right)}^{2}}}\) | 1 | (17) |
\(MAE=\left[\frac{1}{N}\sum_{i=1}^{N}\left|{y}_{i}(a)-{y}_{i}(p)\right|\right]\) | 0 | (18) |
\(MAPE (\%)=\left[\frac{1}{N}\sum_{i=1}^{N}\left|\frac{{y}_{i}\left(p\right)-{y}_{i}(a)}{{y}_{i}(a)}\right|\right]\times 100\) | 0 | (19) |
\(RMSE=\sqrt{\frac{1}{N}\sum_{i=1}^{N}{\left({y}_{i}(a)-{y}_{i}(p)\right)}^{2}}\) | 0 | (20) |
\(VAF \left(\%\right)=\left[1-\frac{Var({y}_{i}(a)-{y}_{i}(p))}{Var({y}_{i}(a))}\right]\times 100\) | 100 | (21) |
\({I}_{P}=Adj. {R}^{2}+0.01 VAF-RMSE\) | 2 | (22) |
\(IOA=1-\frac{\sum_{i=1}^{N}{\left({y}_{i}(a)-{y}_{i}(p)\right)}^{2}}{\sum_{i=1}^{N}{\left(\left|{y}_{i}(p)-\overline{{y }_{i}(a)}\right|+\left|{y}_{i}(a)-\overline{{y }_{i}(a)}\right|\right)}^{2}}\) | 1 | (23) |
\(IOS=\frac{\sqrt{\frac{1}{N}\sum_{i=1}^{N}{\left({y}_{i}(a)-{y}_{i}(p)\right)}^{2}}}{\overline{{y }_{i}(p)}}\) | 0 | (24) |
\(a20-ndex=\frac{n20}{N}\times 100\) | 1 | (25) |
\({S}_{P}=\frac{{\left(Adj. {R}^{2}\right)}_{total}+{\left(0.01 VAF\right)}_{total}-{\left(RMSE\right)}_{total}}{{\left(\frac{Adj. {R}^{2}}{{R}^{2}}\right)}_{training}+{\left(\frac{Adj. {R}^{2}}{{R}^{2}}\right)}_{testing}}\) | 1 | (26) |
2.2.1 Data Preparation and Analysis
2.3 Data Collection and Geographical Location
2.4 Laboratory Experiments
2.5 Statistical Visualization and Correlation Analysis
G (%) | S (%) | FC (%) | LL (%) | PL (%) | PI (%) | MDD (g/cc) | OMC (%) | CBR (%) | |
---|---|---|---|---|---|---|---|---|---|
Min | 0.00 | 2.25 | 50.65 | 24.40 | 11.81 | 1.93 | 1.455 | 9.50 | 1.00 |
Max | 27.42 | 48.85 | 96.28 | 85.00 | 50.00 | 39.00 | 1.959 | 29.50 | 13.20 |
Mean | 2.83 | 13.99 | 83.18 | 29.85 | 21.60 | 8.25 | 1.866 | 11.96 | 9.02 |
Median | 1.23 | 12.62 | 85.46 | 28.70 | 21.10 | 7.65 | 1.885 | 11.45 | 9.10 |
Mode | 0.00 | 12.79 | 87.00 | 29.00 | 21.30 | 7.75 | 1.900 | 10.70 | 10.00 |
S. D | 4.64 | 6.64 | 7.68 | 6.21 | 2.89 | 3.72 | 0.073 | 2.52 | 1.16 |
Variance | 21.54 | 44.05 | 59.00 | 38.59 | 8.34 | 13.83 | 0.005 | 6.36 | 1.35 |
2.6 Data Divisional Approaches
2.6.1 K-Fold Division Approach
Min. | Max. | Range | Average | Median | Mode | S.D. | Variance | |
---|---|---|---|---|---|---|---|---|
Sand (%) | 2.25 | 48.85 | 46.60 | 14.11 | 12.72 | 12.79 | 6.62 | 43.80 |
FC (%) | 50.65 | 96.28 | 45.63 | 83.06 | 85.44 | 77.00 | 7.71 | 59.39 |
PL (%) | 11.81 | 44.00 | 32.20 | 21.60 | 21.10 | 21.30 | 2.75 | 7.58 |
PI (%) | 1.93 | 39.00 | 37.07 | 8.28 | 7.66 | 7.85 | 3.80 | 14.46 |
MDD (g/cc) | 1.480 | 1.959 | 0.479 | 1.867 | 1.885 | 1.900 | 0.073 | 0.005 |
OMC (%) | 9.50 | 29.50 | 20.00 | 11.98 | 11.45 | 10.70 | 2.53 | 6.38 |
CBR (%) | 3.00 | 13.00 | 10.00 | 9.03 | 9.19 | 10.00 | 1.13 | 1.28 |
Min. | Max. | Range | Average | Median | Mode | S.D. | Variance | |
---|---|---|---|---|---|---|---|---|
Sand (%) | 4.08 | 48.40 | 44.33 | 13.53 | 12.19 | 10.32 | 6.71 | 44.98 |
FC (%) | 51.60 | 95.61 | 44.01 | 83.66 | 85.54 | 89.57 | 7.58 | 57.41 |
PL (%) | 12.72 | 50.00 | 37.29 | 21.62 | 21.09 | 21.45 | 3.38 | 11.45 |
PI (%) | 5.25 | 35.00 | 29.75 | 8.11 | 7.58 | 8.35 | 3.37 | 11.33 |
MDD (g/cc) | 1.455 | 1.950 | 0.495 | 1.863 | 1.882 | 1.900 | 0.073 | 0.005 |
OMC (%) | 9.50 | 29.30 | 19.80 | 11.91 | 11.43 | 11.70 | 2.52 | 6.33 |
CBR (%) | 1.00 | 13.20 | 12.20 | 8.98 | 9.05 | 8.40 | 1.28 | 1.65 |
2.6.2 Fuzzy C-Means (FCM) Division Approach
Min. | Max. | Range | Average | Median | Mode | S.D. | Variance | |
---|---|---|---|---|---|---|---|---|
Sand (%) | 4.01 | 48.85 | 44.85 | 13.96 | 12.46 | 11.00 | 6.76 | 45.67 |
FC (%) | 50.65 | 95.61 | 44.96 | 83.16 | 85.45 | 87.00 | 7.77 | 60.38 |
PL (%) | 11.81 | 50.00 | 38.20 | 21.59 | 21.10 | 21.30 | 2.87 | 8.26 |
PI (%) | 1.93 | 39.00 | 37.07 | 8.29 | 7.65 | 7.90 | 3.86 | 14.90 |
MDD (g/cc) | 1.455 | 1.959 | 0.504 | 1.867 | 1.885 | 1.900 | 0.073 | 0.005 |
OMC (%) | 9.50 | 29.50 | 20.00 | 11.98 | 11.46 | 11.70 | 2.58 | 6.66 |
CBR (%) | 1.00 | 13.20 | 12.20 | 9.01 | 9.10 | 10.00 | 1.18 | 1.40 |
Min. | Max. | Range | Average | Median | Mode | S.D. | Variance | |
---|---|---|---|---|---|---|---|---|
Sand (%) | 2.25 | 44.76 | 42.51 | 14.14 | 12.96 | 12.96 | 6.14 | 37.67 |
FC (%) | 51.70 | 96.28 | 44.58 | 83.27 | 85.49 | 89.57 | 7.33 | 53.68 |
PL (%) | 18.79 | 42.00 | 23.21 | 21.66 | 21.06 | 20.95 | 2.95 | 8.70 |
PI (%) | 5.20 | 29.00 | 23.80 | 8.08 | 7.57 | 6.50 | 3.08 | 9.50 |
MDD (g/cc) | 1.490 | 1.943 | 0.453 | 1.863 | 1.883 | 1.900 | 0.075 | 0.006 |
OMC (%) | 9.80 | 29.30 | 19.50 | 11.91 | 11.38 | 11.05 | 2.28 | 5.21 |
CBR (%) | 3.00 | 10.90 | 7.90 | 9.05 | 9.20 | 9.60 | 1.09 | 1.18 |
3 Results
3.1 Statistical Performance of Developed Models
KRRK | KRRF | K-NNK | K-NNF | GPRK | GPRF | |
---|---|---|---|---|---|---|
R2 | 0.647 | 0.681 | 0.715 | 0.727 | 0.746 | 0.887 |
Adj. R2 | 0.644 | 0.679 | 0.713 | 0.725 | 0.744 | 0.886 |
R | 0.804 | 0.825 | 0.847 | 0.854 | 0.866 | 0.943 |
MAE | 0.515 | 0.512 | 0.432 | 0.437 | 0.438 | 0.308 |
MAPE | 5.913 | 5.915 | 5.181 | 5.594 | 5.045 | 3.478 |
RMSE | 0.672 | 0.666 | 0.604 | 0.617 | 0.570 | 0.397 |
VAF | 64.705 | 68.131 | 71.530 | 72.743 | 74.551 | 88.690 |
IP | 0.620 | 0.694 | 0.824 | 0.835 | 0.919 | 1.376 |
IOA | 0.882 | 0.897 | 0.906 | 0.911 | 0.917 | 0.968 |
IOS | 0.074 | 0.074 | 0.067 | 0.068 | 0.063 | 0.044 |
a20-index | 0.979 | 0.978 | 0.977 | 0.975 | 0.985 | 1.000 |
KRRK | KRRF | K-NNK | K-NNF | GPRK | GPRF | |
---|---|---|---|---|---|---|
R2 | 0.680 | 0.407 | 0.706 | 0.645 | 0.758 | 0.700 |
Adj. R2 | 0.670 | 0.389 | 0.697 | 0.634 | 0.750 | 0.690 |
R | 0.840 | 0.739 | 0.847 | 0.804 | 0.880 | 0.837 |
MAE | 0.542 | 0.549 | 0.501 | 0.491 | 0.476 | 0.457 |
MAPE | 7.642 | 6.964 | 7.612 | 5.750 | 6.605 | 5.310 |
RMSE | 0.724 | 0.836 | 0.694 | 0.647 | 0.630 | 0.595 |
VAF | 68.036 | 40.758 | 70.749 | 64.584 | 75.822 | 69.956 |
IP | 0.626 | − 0.040 | 0.710 | 0.634 | 0.879 | 0.795 |
IOA | 0.915 | 0.851 | 0.899 | 0.883 | 0.918 | 0.904 |
IOS | 0.081 | 0.093 | 0.077 | 0.071 | 0.070 | 0.066 |
a20-index | 0.970 | 0.950 | 0.955 | 0.965 | 0.985 | 0.985 |
3.2 Visual Interpretation of Developed Models
3.2.1 Trend and Error Plot for the Developed Models
3.2.2 Regression Error Characteristics (REC) curve
3.2.3 Accuracy Analysis
Statistical performance measurement parameters | Models accuracy (%) | |||||
---|---|---|---|---|---|---|
KRRK | K-NNK | GPRK | KRRF | K-NNF | GPRF | |
R2 | 64.7 | 71.5 | 74.6 | 68.1 | 72.7 | 88.7 |
Adj. R2 | 64.4 | 71.3 | 74.4 | 67.9 | 72.5 | 88.6 |
R | 80.4 | 84.7 | 86.6 | 82.5 | 85.4 | 94.3 |
MAE | 48.5 | 56.8 | 56.2 | 48.8 | 56.3 | 69.2 |
MAPE | 94.1 | 94.8 | 95.0 | 94.1 | 94.4 | 96.5 |
RMSE | 32.8 | 39.6 | 43.0 | 33.4 | 38.3 | 60.3 |
VAF | 64.7 | 71.5 | 74.6 | 68.1 | 72.7 | 88.7 |
IP | 31.0 | 41.2 | 46.0 | 34.7 | 41.8 | 68.8 |
IOA | 88.2 | 90.6 | 91.7 | 89.7 | 91.1 | 96.8 |
IOS | 92.6 | 93.3 | 93.7 | 92.6 | 93.2 | 95.6 |
a20-index | 97.9 | 97.7 | 98.5 | 97.8 | 97.5 | 100.0 |
Statistical performance measurement parameters | Models accuracy (%) | |||||
---|---|---|---|---|---|---|
KRRK | K-NNK | GPRK | KRRF | K-NNF | GPRF | |
R2 | 68.0 | 70.6 | 75.8 | 40.7 | 64.5 | 70.0 |
Adj. R2 | 67.0 | 69.7 | 75.0 | 38.9 | 63.4 | 69.0 |
R | 84.0 | 84.7 | 88.0 | 73.9 | 80.4 | 83.7 |
MAE | 45.8 | 49.9 | 52.4 | 45.1 | 50.9 | 54.3 |
MAPE | 92.4 | 92.4 | 93.4 | 93.0 | 94.3 | 94.7 |
RMSE | 27.6 | 30.6 | 37.0 | 16.4 | 35.3 | 40.5 |
VAF | 68.0 | 70.8 | 75.8 | 40.8 | 64.6 | 70.0 |
IP | 31.3 | 35.5 | 44.0 | 2.0 | 31.7 | 39.8 |
IOA | 91.5 | 89.9 | 91.8 | 85.1 | 88.3 | 90.4 |
IOS | 91.9 | 92.3 | 93.0 | 90.7 | 92.9 | 93.4 |
a20-index | 97.0 | 95.5 | 98.5 | 95.0 | 96.5 | 98.5 |
3.3 Selection of Best-Fitted CBR Prediction Model
3.3.1 Ranking Analysis (RA)
R2 | Adj. R2 | R | MAE | MAPE | RMSE | VAF | IP | IOA | IOS | a20-index | Score | Total score | SP value | Score for SP value | Final score | Rank | OR | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
KRRK | TR | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 4 | 15 | 46 | 0.316 | 2 | 48 | 5 | 1.077 |
TS | 3 | 3 | 4 | 2 | 1 | 2 | 3 | 2 | 5 | 2 | 4 | 31 | |||||||
KRRF | TR | 2 | 2 | 2 | 2 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 22 | 35 | 0.404 | 3 | 38 | 6 | 1.149 |
TS | 1 | 1 | 1 | 1 | 3 | 1 | 1 | 1 | 1 | 1 | 1 | 13 | |||||||
K-NNK | TR | 3 | 3 | 3 | 5 | 4 | 4 | 3 | 3 | 3 | 4 | 2 | 37 | 77 | 0.459 | 5 | 82 | 3 | 1.105 |
TS | 5 | 5 | 5 | 3 | 2 | 3 | 5 | 4 | 3 | 3 | 2 | 40 | |||||||
K-NNF | TR | 4 | 4 | 4 | 4 | 3 | 3 | 4 | 4 | 4 | 3 | 1 | 38 | 71 | 0.288 | 1 | 72 | 4 | 1.255 |
TS | 2 | 2 | 2 | 4 | 5 | 4 | 2 | 3 | 2 | 4 | 3 | 33 | |||||||
GPRK | TR | 5 | 5 | 5 | 3 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 53 | 113 | 0.405 | 4 | 117 | 2 | 1.049 |
TS | 6 | 6 | 6 | 5 | 4 | 5 | 6 | 6 | 6 | 5 | 5 | 60 | |||||||
GPRF | TR | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 66 | 120 | 0.637 | 6 | 126 | 1 | 1.499 |
TS | 4 | 4 | 3 | 6 | 6 | 6 | 4 | 5 | 4 | 6 | 6 | 54 |
3.3.2 Overfitting Ratio (OR)
3.4 Influence of ML Algorithms and Data Division Approaches on the Model Performance
3.5 Validation of Literature Study Models
Statistical performance indices | Literature models | ||||
---|---|---|---|---|---|
Kin [10] | Taskiran [11] | Yildirim and Gunaydin [12] | Bardhan, Gokceoglu [19] | Bardhan, Gokceoglu [19] | |
MARS-L | GP | ||||
R2 | − 14.58 | − 99.38 | − 5.41 | − 5.09 | − 4.79 |
Adj. R2 | − 14.58 | − 99.38 | − 5.41 | − 5.09 | − 4.79 |
R | 0.20 | 0.52 | 0.11 | 0.00 | 0.10 |
MAE | 3.84 | 9.66 | 2.54 | 1.51 | 1.56 |
MAPE | 42.42 | 106.08 | 31.83 | 15.91 | 16.40 |
RMSE | 4.35 | 11.65 | 2.94 | 1.75 | 1.71 |
VAF | − 284.14 | − 3035.08 | − 64.88 | − 94.49 | − 8.79 |
IP | − 21.76 | − 141.37 | − 9.00 | − 7.79 | − 6.59 |
IOA | 0.30 | 0.16 | 0.35 | 0.36 | 0.39 |
IOS | 0.82 | 0.62 | 0.25 | 0.22 | 0.22 |
a20-index | 0.16 | 0.01 | 0.35 | 0.65 | 0.67 |