1 Introduction
2 Dataset
Types | Characters | Words |
---|---|---|
Train | 37,125 | 18,723 |
Validation | 1004 | 260 |
Test | 3941 | 978 |
Total | 42,070 | 19,961 |
3 Methods
3.1 Network architecture
3.2 Spatiotemporal convolutional neural network
3.3 LSTM neural network
3.4 Word embedding model
3.5 Attention model
4 Results and discussion
4.1 Implementation details and results
Method | WER% | Accuracy% |
---|---|---|
MCLRN | 36.33 | 63.67 |
MCLRN + CL | 34.72 | 65.28 |
MCLRN + BS | 29.15 | 70.85 |
MCLRN + CL + BS | 27.68 | 72.32 |
4.2 Discussion
Method | WER% | Accuracy% |
---|---|---|
(a) Test on GRID dataset | ||
Lan et al. [24] | 35.00 | 65.00 |
Wand et al. [25] | 20.40 | 79.60 |
Gergen et al. [26] | 13.60 | 86.40 |
Assael et al. [27] | 11.40 | 88.60 |
Maulana et al. [28] | 3.30 | 96.70 |
MCLRN (ours) | 4.40 | 95.60 |
(b) Test on LRW dataset | ||
Petridis et al. [29] | 18.00 | 82.00 |
Stafylakis et al. [30] | 17.00 | 83.00 |
Wang et al. [31] | 16.66 | 83.34 |
MCLRN (ours) | 11.30 | 88.70 |
(c) Test on LRW-1000 dataset | ||
Wang et al. [31] | 63.09 | 36.91 |
Yang et al. [23] | 61.81 | 38.19 |
MCLRN (ours) | 59.80 | 40.20 |