1 Introduction
2 Related literature
3 Materials and methods
3.1 YAAPT
3.2 MFCCs
3.2.1 Pre-emphasis
3.2.2 Framing
3.2.3 Windowing
3.2.4 FFT
3.2.5 Mel-frequency scale
3.2.6 DCT
3.3 ΔMFCCs
3.4 RNN
3.5 LSTM
3.6 Optimization algorithms
3.6.1 Adaptive gradient descent (AdaGrad)
3.6.2 Adaptive delta (AdaDelta)
3.6.3 Batch size
4 Experimental results
4.1 Data set
4.2 Experimental setups
Model | Epoch | Training accuracy (%) | Validation accuracy (%) | Test accuracy (%) | Training accuracy (%) | Validation accuracy (%) | Test accuracy (%) | Training accuracy (%) | Validation accuracy (%) | Test accuracy (%) | Training accuracy (%) | Validation accuracy (%) | Test accuracy (%) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Batch size = 1 | Batch size = 4 | Batch size = 8 | Batch size = 16 | ||||||||||
M11 | 500 | 83.40 | 72.33 | 75.23 | 86.50 | 71.33 | 72.83 | 83.79 | 73.47 | 75.91 | 84.43 | 77.18 | 72.37 |
1000 | 84.18 | 71.90 | 73.63 | 89.58 | 70.90 | 69.86 | 88.79 | 72.75 | 72.49 | 88.11 | 70.76 | 71.46 | |
1500 | 88.72 | 70.33 | 72.60 | 90.68 | 70.33 | 70.66 | 90.00 | 72.75 | 73.86 | 89.97 | 70.76 | 71.69 | |
2000 | 88.11 | 70.90 | 72.37 | 90.72 | 68.33 | 71.12 | 90.25 | 72.33 | 71.23 | 90.93 | 69.19 | 71.69 | |
M12 | 500 | 83.33 | 71.33 | 76.03 | 83.47 | 71.47 | 76.48 | 82.43 | 72.47 | 73.97 | 81.40 | 73.32 | 73.29 |
1000 | 86.22 | 72.90 | 70.32 | 86.43 | 72.90 | 71.23 | 86.11 | 70.47 | 72.95 | 85.54 | 71.18 | 71.92 | |
1500 | 87.25 | 72.33 | 74.89 | 87.00 | 72.18 | 71.92 | 86.83 | 72.90 | 71.58 | 86.36 | 73.18 | 72.03 | |
2000 | 87.61 | 71.33 | 73.06 | 87.08 | 72.90 | 75.11 | 88.22 | 72.18 | 72.26 | 88.83 | 69.33 | 71.23 | |
M13 | 500 | 85.79 | 70.47 | 73.17 | 84.22 | 74.18 | 72.60 | 83.36 | 75.46 | 74.43 | 81.65 | 72.47 | 73.74 |
1000 | 88.50 | 74.18 | 71.12 | 88.00 | 68.47 | 73.29 | 86.83 | 72.75 | 74.54 | 87.79 | 72.18 | 70.89 | |
1500 | 90.25 | 74.47 | 68.95 | 89.83 | 70.76 | 71.23 | 89.11 | 72.18 | 71.80 | 88.11 | 71.04 | 73.06 | |
2000 | 91.22 | 70.47 | 70.21 | 90.82 | 70.90 | 69.98 | 89.36 | 72.75 | 70.43 | 90.79 | 70.76 | 70.89 |
Model | Epoch | Training accuracy (%) | Validation accuracy (%) | Test accuracy (%) | Training accuracy (%) | Validation accuracy (%) | Test accuracy (%) | Training accuracy (%) | Validation accuracy (%) | Test accuracy (%) | Training accuracy (%) | Validation accuracy (%) | Test accuracy (%) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Batch size = 1 | Batch size = 4 | Batch size = 8 | Batch size = 16 | ||||||||||
M1 | 500 | 74.83 | 72.33 | 73.86 | 74.76 | 70.04 | 73.52 | 72.94 | 72.90 | 72.95 | 73.12 | 70.19 | 69.06 |
1000 | 80.01 | 73.47 | 71.92 | 76.94 | 75.46 | 74.32 | 77.62 | 72.18 | 71.46 | 74.94 | 68.33 | 73.63 | |
1500 | 82.22 | 73.75 | 73.52 | 79.51 | 74.04 | 74.54 | 78.97 | 75.46 | 71.12 | 74.58 | 71.47 | 73.40 | |
2000 | 83.72 | 75.18 | 71.35 | 83.68 | 76.18 | 72.83 | 81.44 | 74.18 | 73.06 | 80.01 | 71.61 | 73.29 | |
M2 | 500 | 75.44 | 77.46 | 72.95 | 73.51 | 71.18 | 72.26 | 72.05 | 70.61 | 69.98 | 49.09 | 51.50 | 51.71 |
1000 | 79.54 | 72.33 | 73.17 | 77.12 | 71.90 | 75.23 | 74.26 | 72.75 | 70.32 | 73.22 | 73.18 | 70.89 | |
1500 | 81.90 | 70.61 | 75.80 | 76.76 | 74.89 | 75.57 | 77.94 | 72.90 | 74.66 | 75.44 | 71.90 | 71.00 | |
2000 | 84.22 | 73.47 | 74.32 | 79.69 | 72.04 | 75.51 | 79.47 | 73.18 | 75.23 | 76.87 | 72.04 | 74.89 | |
M3 | 500 | 75.37 | 73.75 | 73.40 | 75.01 | 68.90 | 74.20 | 72.83 | 72.75 | 71.58 | 73.08 | 72.47 | 71.46 |
1000 | 81.86 | 75.61 | 73.74 | 78.72 | 75.18 | 73.63 | 76.51 | 74.47 | 74.20 | 73.40 | 73.61 | 76.94 | |
1500 | 84.01 | 73.32 | 75.23 | 81.94 | 74.32 | 74.77 | 80.04 | 70.61 | 72.72 | 76.54 | 70.90 | 74.43 | |
2000 | 90.93 | 71.47 | 69.63 | 83.47 | 71.33 | 72.49 | 83.54 | 70.76 | 72.15 | 79.51 | 71.47 | 73.86 |
Model | Epoch | Training accuracy (%) | Validation accuracy (%) | Test accuracy (%) | Training accuracy (%) | Validation accuracy (%) | Test accuracy (%) | Training accuracy (%) | Validation accuracy (%) | Test accuracy (%) | Training accuracy (%) | Validation accuracy (%) | Test accuracy (%) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Batch size = 1 | Batch size = 4 | Batch size = 8 | Batch size = 16 | ||||||||||
M11 | 500 | 82.58 | 72.47 | 74.32 | 80.76 | 70.61 | 74.77 | 81.76 | 69.19 | 76.37 | 79.47 | 77.03 | 72.60 |
1000 | 86.97 | 72.33 | 74.77 | 84.33 | 72.18 | 72.95 | 85.01 | 75.32 | 73.06 | 84.54 | 73.32 | 70.89 | |
1500 | 88.58 | 73.47 | 71.35 | 86.79 | 71.47 | 71.23 | 85.58 | 73.04 | 72.60 | 85.29 | 74.04 | 70.09 | |
2000 | 89.04 | 73.47 | 70.32 | 87.29 | 72.90 | 73.06 | 87.36 | 71.47 | 74.32 | 86.22 | 70.76 | 72.37 | |
M12 | 500 | 81.61 | 73.32 | 74.43 | 80.86 | 74.61 | 72.72 | 81.29 | 71.04 | 71.69 | 80.76 | 73.04 | 73.97 |
1000 | 85.04 | 70.47 | 74.54 | 84.33 | 72.04 | 73.74 | 82.79 | 72.61 | 73.86 | 82.68 | 75.04 | 74.09 | |
1500 | 87.04 | 73.18 | 75.11 | 86.36 | 72.18 | 70.09 | 86.04 | 71.18 | 72.72 | 87.11 | 69.61 | 74.77 | |
2000 | 87.65 | 75.18 | 69.41 | 86.00 | 70.19 | 73.17 | 85.79 | 73.18 | 72.60 | 85.83 | 72.47 | 74.54 | |
M13 | 500 | 85.18 | 73.32 | 74.43 | 83.40 | 72.75 | 76.14 | 83.83 | 75.18 | 72.37 | 83.79 | 75.46 | 73.86 |
1000 | 89.29 | 72.04 | 70.32 | 87.58 | 73.47 | 72.26 | 86.93 | 74.32 | 71.12 | 85.04 | 74.04 | 72.26 | |
1500 | 91.40 | 69.90 | 70.78 | 87.97 | 71.61 | 72.49 | 88.00 | 73.32 | 70.21 | 88.68 | 72.33 | 72.03 | |
2000 | 91.82 | 75.04 | 67.69 | 89.47 | 70.90 | 72.95 | 89.29 | 70.33 | 72.72 | 89.40 | 69.47 | 71.23 |
Model | Epoch | Training accuracy (%) | Validation accuracy (%) | Test accuracy (%) | Training accuracy (%) | Validation accuracy (%) | Test accuracy (%) | Training accuracy (%) | Validation accuracy (%) | Test accuracy (%) | Training accuracy (%) | Validation accuracy (%) | Test accuracy (%) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Batch size = 1 | Batch size = 4 | Batch size = 8 | Batch size = 16 | ||||||||||
M1 | 500 | 73.72 | 72.47 | 71.58 | 71.55 | 70.04 | 68.72 | 49.98 | 48.50 | 51.26 | 49.30 | 50.21 | 52.05 |
1000 | 77.40 | 72.18 | 73.40 | 74.04 | 70.61 | 73.40 | 72.37 | 71.75 | 72.49 | 72.55 | 70.04 | 73.74 | |
1500 | 79.65 | 75.18 | 72.95 | 75.12 | 74.61 | 69.52 | 74.65 | 74.47 | 72.03 | 67.73 | 63.05 | 65.98 | |
2000 | 80.83 | 72.18 | 73.97 | 78.08 | 73.89 | 72.49 | 74.90 | 73.32 | 75.80 | 72.37 | 70.90 | 72.37 | |
M2 | 500 | 73.69 | 73.47 | 71.46 | 72.26 | 72.33 | 68.84 | 50.12 | 46.65 | 52.28 | 50.87 | 48.36 | 48.52 |
1000 | 76.58 | 73.18 | 75.91 | 74.04 | 69.61 | 72.95 | 72.37 | 74.47 | 69.52 | 61.37 | 63.20 | 56.51 | |
1500 | 78.86 | 73.75 | 73.74 | 75.04 | 72.47 | 73.97 | 73.69 | 74.18 | 68.84 | 72.08 | 70.90 | 69.29 | |
2000 | 81.79 | 72.33 | 74.66 | 77.51 | 72.90 | 73.86 | 75.76 | 72.18 | 72.60 | 73.12 | 74.61 | 69.86 | |
M3 | 500 | 74.01 | 73.04 | 74.09 | 71.80 | 70.61 | 72.83 | 67.76 | 66.33 | 68.72 | 63.44 | 67.62 | 64.61 |
1000 | 78.65 | 74.18 | 75.57 | 74.94 | 73.75 | 73.52 | 72.65 | 73.47 | 73.74 | 72.37 | 71.33 | 70.66 | |
1500 | 81.83 | 72.33 | 74.32 | 78.08 | 76.03 | 73.29 | 75.22 | 70.76 | 72.60 | 72.22 | 73.04 | 72.03 | |
2000 | 82.72 | 75.75 | 76.03 | 80.86 | 74.61 | 72.49 | 77.29 | 74.89 | 72.15 | 73.94 | 74.32 | 73.06 |
4.3 Results
Literature | Features | Decision system | Threshold | Data name, Data count |
---|---|---|---|---|
Li et al. | Pitch frequency | PCC | Yes | – 500 audio records |
Yan et al. | Pitch frequency and formant feature | Dynamic time warping | Yes | TIMIT and WSJ database 20.000 duplicated segments, 20.000 non-forged segment |
Huang et al. | DFT | Compared each segment | Yes | – |
Xie et al. | Gammatone feature, MFCCs, pitch feature and DFT | C4.5, Decision Tree, PCCs and average difference | Yes | Self-generated dataset 1000 forgery, 1000 original records |
Imran et al. | 1D LBP | Compared between the histograms | Yes | King Saud University Arabic Speech Data – |
Wang et al. | DCT, SVD | Distance between singular vectors | Yes | Self-generated dataset 100 forgery, 100 original audio records |
Liu et al. | DFT, MFCC | PCC | Yes | Self-generated dataset 1000 audio records |
Yan et al. | Pitch feature | PCC, average difference | Yes | Self-generated dataset 1000 forgery records |
Xiao et al. | – | Fast convolution algorithm | Yes | – |
Proposed methods | Fusion features | RNN LSTM | No | TIMIT 2189 forgery, 2189 original audio records |
5 Discussion
-
In the literature, most methods are traditional methods. However, the impact of the latest deep learning methods for big data in studies in every field is apparent. Methods in the literature differ considerably from deep learning methods that are trending today. Sequence deep learning models, frequently used in time-series problems but not previously tested in the field, were used to overcome that deficiency.
-
Our proposed method’s results were compared with the one-dimensional LBP method [26], the pitch feature method [20], the pitch feature and formant feature methods [7], the DFT method [9], the gammatone feature method, the MFCCs method, the pitch feature and DFT method [8], the DCT method, the SVD method [14], DFT method, and the results of using MFCCs [19]. Results appear in Table 10.
-
When the results are examined, our proposed algorithm offers highly accurate results without a threshold value due to training on hybrid features data.
-
AdaGrad is an adaptive gradient learning algorithm that can adapt the learning rate of the parameters. The algorithm thus learns the learning rate itself. Although AdaGrad has a dynamic structure, it operates using a different learning coefficient at each step. The most important advantage of the algorithm is that it does not require manually adjusting the learning rate. The optimization algorithm has been proposed as a solution to the fixed learning step problem in the SGD and momentum learning algorithms. Because the learning rate is gradually decreased in the algorithm, the developed model terminates learning at any time t. That benefit is the greatest disadvantage of AdaGrad [56‐59].
-
AdaDelta is a variation of AdaGrad produced as a solution to the fixed learning rate problem. The optimization method does not require choosing a learning coefficient different from AdaGrad [56]. In addition, AdaDelta is more robust than algorithms using noisy gradient information, different model architectures, and different types of data [59, 60].
-
In the training phase using AdaGrad, the manual selection of the learning rate and the continuous decrease in the rate reveal the inadequacy of AdaGrad compared with AdaDelta. The results show that AdaDelta gives more successful rates than AdaGrad.