1 Introduction
Dataset | Content | Device | Writers | Statistics | |
---|---|---|---|---|---|
Japanese characters | Tablet | 120 | \(10,154 \times 120\) char. patterns | ||
Characters | MRG-OHTC [53] | Tibetan characters | Tablet | 130 | 910 character classes |
CASIA [98] | Chinese characters | Anoto pen on paper | 1020 | 3.5 m characters | |
OnHW-chars [65] | English characters | Sensor pen | 119 | 31275 characters, 52 classes | |
UNIPEN [33] | Sentences, words, characters | Pen-based computer | – | >12,000 chars. per writer | |
CROHME [55] | Mathematical expressions | White-board, tablet | >100 | 9507 expressions | |
IRONOFF [95] | French words, chars., digits | Trajectory, images | – | 50,000 words, 32,000 chars. | |
Sequence | ICROW [78] | Dutch, Irish, Italian words | – | 67 | 13,119 words |
IAM-OnDB [51] | English sentences | Whiteboard | 197 | 82,272 words | |
LMCA [42] | Arabic words, chars., digits | Tablet | 55 | 30 k digits, 100 k chars., 500 w. | |
ADAB [1] | Arabic words | Tablet | 170 | 20,000+ words | |
IBM_UB_1 [84] | English words | Notepad | 43 | 6654 pages | |
Vietnamese words, lines, paragr. | Tablet | 200 | 110,746 words | ||
OnHW-equations | Equations written on paper | Sensor pen | 55 | 10,720 equations, 15 classes | |
Ours | OnHW-words500 | Repeated 500 words on paper | Sensor pen | 53 | 25,218 words, 59 classes |
OnHW-wordsRandom | Random words written on paper | Sensor pen | 54 | 14,645 words, 59 classes | |
OnHW-wordsTraj | Words written on a tablet | Sensor pen on tablet | 2 | 16,752 words, 52 classes | |
OnHW-symbols | Numbers, symbols on paper | Sensor pen | 27 | 2326 characters, 15 classes |
2 Background and related work
2.1 Datasets
2.2 Methods
3 Datasets and evaluation methodology
Number | Number | Maximal | Number samples | Total | |||||
---|---|---|---|---|---|---|---|---|---|
Dataset | Writers | Classes | Length | Total | WD | WI | Chars. | ||
OnHW-equations | 55 | 15 | 15 | 10,713 | 8595 | 2118 | 8610 | 2,103 | 106,968 |
OnHW-words500(R) | 53 | 59 | 19 | 25,218 | 20,176 | 5042 | 19,918 | 5300 | 137,219 |
OnHW-wordsRandom | 54 | 59 | 27 | 14,641 | 11,744 | 2897 | 11,716 | 2,925 | 146,350 |
OnHW-wordsTraj | 2 | 59 | 10 | 16,752 | 13,250 | 3502 | – | – | 146,512 |
OnHW-symbols | 27 | 15 | Single | 2326 | 1853 | 473 | 1715 | 611 | 2326 |
ICROW [78] | 67 | 53 | 15 | 13,119 | 10,500 | 2619 | 10,524 | 2595 | 90,138 |
IAM-OnDB [51] | 197 | 81 | 64 | 10,773 | 8702 | 2071 | 8624 | 2149 | 265,477 |
VNOnDB-words [59] | 201 | 147 | 11 | 110,746 | 88,677 | 22,069 | 88,486 | 22,260 | 368,455 |
OnHW-chars [65] | 119 | 52 | Single | 31,275 | 23,059 | 8216 | 23,059 | 8216 | 31,275 |
3.1 Recording setup
3.2 Datasets
Number | Maximal | Number Samples | Total | |||||
---|---|---|---|---|---|---|---|---|
Dataset | Writers | Length | Total | WD | WI | Chars. | ||
OnHW-equations-L | 4 | 15 | 843 | 677 | 166 | 543 | 300 | 8438 |
OnHW-words500-L | 2 | 19 | 1000 | 800 | 200 | 500 | 500 | 5438 |
OnHW-wordsRandom-L | 2 | 26 | 996 | 798 | 198 | 497 | 499 | 10,029 |
OnHW-symbols-L | 4 | Single | 361 | 289 | 72 | 271 | 90 | 361 |
OnHW-chars-L [65] | 9 | Single | 2270 | 1816 | 454 | – | – | 2270 |
+
, -
, \(\cdot \), :
, =
). The dataset consists of a total of 10,713 samples. While in the OnHW-words500 dataset only the same 500 words per each writer appear, in the OnHW-wordsRandom dataset every sample is randomly chosen from a large German and English word list. This allows the comparison of indirectly learning a lexicon of 500 words or, alternatively, completely lexicon-free learning. The OnHW-wordsRandom dataset (14,641 samples) is smaller than the OnHW-words500 dataset (25,218 samples), but contains longer words with a maximal length of 27 labels (19 labels for OnHW-words500). The train/validation split for the OnHW-words500 dataset is based on words for the WD task such that the same 400 words per writer are in the train set and the same 100 words per writer are in the validation set. For the WI task, the split is done by writer such that all 500 words of a single writer are either in the train or validation set. As it is more likely to overfit on the same words, the WD task of OnHW-words500 is more challenging compared to the OnHW-wordsRandom dataset. The OnHW-words500R dataset is a random split of OnHW-words500.
0
to 9
and operators +
, -
, \(\cdot \), :
, =
), written by 27 writers and a total of 2326 single characters. Figure 5 compares the distribution of sample numbers for the OnHW-chars [65] (characters) and OnHW-symbols as well as split OnHW-equations (numbers, symbols) datasets. While the samples are equally distributed for small and capital characters (\(\approx \) 600 per character), the numbers and symbols are unevenly distributed for the split OnHW-equations dataset (similar to Fig. 4b).3.3 Evaluation metrics
4 Benchmark methods
tanh
activations for BiLSTM layers, we choose ReLU
for the TCN and LSTM layers. A dense layer with 100 units with the CTC loss predicts a sequence of class labels. Second, we implement an attention-based network (see Fig. 7) that consists of an encoder with batch normalization, 1D convolutional and (Bi)LSTM layers. These map the input sequence \(\mathbf {U} \in \mathbb {R}^{m \times l}\) to a sequence of continuous representations \(\mathbf {z}\). A transformer transforms \(\mathbf {z}\) using \(n_{\text {head}}\) stacked multi-head self-attention \(\text {MultiHead}(Q,K,V) = \text {Concat}(\text {head}_{1}, \ldots , \text {head}_{h}) W^{O}\) with \(W^{O} \in \mathbb {R}^{hd_{v} \times d_\text {model}}\). The attention consists of point-wise, fully connected time-distributed layers followed by a scaled dot product layer and layer normalization [5] with \(d_{\text {model}}\) output dimension [94]. \(\text {head}_i = \text {Attention}(QW_{i}^{Q}, KW_{i}^{K}, VW_{i}^{V})\), where \(W_{i}^{Q}, W_{i}^{K} \in \mathbb {R}^{d_{\text {model}} \times d_k}\), and \(W_{i}^{V} \in \mathbb {R}^{d_{\text {model}} \times d_v}\). The attention can be described as mapping a set of key-value pairs of dimension \(d_v\) and a query of dimension \(d_k\) to an output, and is computed by \(\text {Attention}(Q,K,V) = \text {softmax}\Big (\frac{Q K^{T}}{\sqrt{d_k}}\Big )V\). The matrices Q, K and V are a set of queries, keys and values.5 Experiments
Method | Met- | OnHW-equations | OnHW-words500(R) | OnHW-wordsRandom | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ric | WD | WI | WD | WI | Random | WD | WI | ||||||||
WER | CER | WER | CER | WER | CER | WER | CER | WER | CER | WER | CER | WER | CER | ||
CNN+LSTM | Mean | 22.96 | 3.50 | 69.22 | 18.11 | 80.70 | 28.41 | 93.30 | 48.24 | 76.80 | 23.73 | 82.29 | 17.90 | 93.90 | 46.92 |
STD | 1.83 | 0.38 | 7.91 | 5.20 | 3.32 | 2.50 | 1.13 | 4.59 | 0.34 | 0.23 | 8.49 | 1.66 | 6.00 | 2.88 | |
CNN+BiLSTM | Mean | 13.19 | 1.78 | 55.25 | 12.98 | 51.95 | 17.16 | 60.91 | 27.80 | 18.77 | 5.20 | 41.27 | 7.87 | 84.52 | 35.22 |
STD | 0.52 | 0.13 | 10.56 | 5.23 | 12.72 | 4.98 | 5.16 | 5.97 | 0.87 | 0.31 | 1.18 | 0.35 | 7.53 | 5.07 | |
CNN+TCN | Mean | 28.57 | 4.29 | 82.06 | 23.95 | 63.51 | 21.07 | 90.54 | 49.53 | 62.61 | 19.13 | 83.16 | 19.26 | 96.46 | 51.42 |
STD | 1.16 | 0.23 | 6.14 | 4.44 | 11.81 | 4.37 | 5.56 | 7.93 | 12.03 | 3.90 | 7.82 | 2.42 | 3.14 | 3.73 | |
Attention-based | Mean | 73.69 | 16.45 | 87.48 | 27.45 | 88.34 | 45.70 | 83.53 | 42.42 | 78.53 | 35.05 | 96.33 | 42.14 | 98.39 | 52.23 |
model | STD | 2.55 | 1.04 | 2.19 | 2.33 | 1.74 | 1.46 | 2.42 | 5.21 | 2.12 | 1.96 | 1.73 | 5.27 | 0.32 | 3.70 |
InceptionTime [25] | Mean | 20.72 | 2.92 | 60.24 | 14.71 | 41.92 | 12.08 | 76.84 | 35.07 | 40.18 | 11.39 | 63.04 | 12.81 | 89.18 | 39.59 |
(32, 6) | STD | 0.58 | 0.18 | 8.81 | 4.75 | 2.38 | 0.64 | 2.92 | 5.63 | 0.62 | 0.24 | 0.99 | 0.21 | 7.87 | 3.72 |
InceptionTime [25] | Mean | 19.48 | 2.72 | 60.90 | 14.29 | 53.34 | 16.24 | 78.22 | 36.85 | 47.52 | 13.87 | 65.68 | 13.63 | 89.84 | 41.81 |
(32, 6) +BiLSTM | STD | 0.29 | 0.13 | 7.87 | 4.61 | 4.34 | 0.71 | 3.53 | 6.53 | 2.02 | 0.83 | 1.31 | 0.36 | 8.17 | 3.38 |
InceptionTime [25] | Mean | 12.94 | 1.77 | 52.40 | 12.23 | 37.12 | 12.96 | 62.09 | 26.36 | 21.34 | 5.34 | 42.88 | 7.19 | 84.14 | 32.35 |
(96, 11) | STD | 0.33 | 0.12 | 8.09 | 4.71 | 2.11 | 0.55 | 5.66 | 2.21 | 0.56 | 0.20 | 1.27 | 0.25 | 8.13 | 3.75 |
InceptionTime [25] | Mean | 12.06 | 1.65 | 49.92 | 11.28 | 43.22 | 13.07 | 61.62 | 26.08 | 21.18 | 5.35 | 39.14 | 6.39 | 85.42 | 33.31 |
(96, 11) +BiLSTM | STD | 0.32 | 0.10 | 7.78 | 4.20 | 2.93 | 0.79 | 5.39 | 6.27 | 0.84 | 0.26 | 0.83 | 0.13 | 7.32 | 4.32 |
XceptionTime [72] | Mean | 38.66 | 5.67 | 71.06 | 17.52 | 49.10 | 15.07 | 78.54 | 36.80 | 45.84 | 13.81 | 69.20 | 15.60 | 89.74 | 41.34 |
(144) | STD | 0.80 | 0.20 | 5.70 | 4.56 | 2.79 | 0.57 | 3.55 | 6.14 | 0.48 | 0.14 | 0.55 | 0.21 | 8.05 | 3.25 |
XceptionTime [72] | Mean | 38.40 | 5.71 | 70.56 | 17.47 | 51.62 | 16.24 | 80.00 | 38.06 | 46.44 | 14.26 | 71.74 | 16.77 | 90.92 | 44.43 |
(144) +BiLSTM | STD | 1.14 | 0.21 | 5.07 | 4.32 | 4.00 | 1.37 | 2.96 | 5.55 | 0.45 | 0.11 | 0.72 | 0.31 | 7.91 | 3.59 |
ResNet [103] | Mean | 39.36 | 5.78 | 87.10 | 27.56 | 90.30 | 44.23 | 95.90 | 58.61 | 77.02 | 27.64 | 92.50 | 27.37 | 93.00 | 59.52 |
(144) | STD | 2.44 | 0.61 | 4.77 | 4.10 | 7.29 | 13.13 | 0.95 | 3.35 | 4.28 | 3.38 | 0.53 | 0.38 | 8.29 | 4.95 |
ResNet [103] | Mean | 37.50 | 5.50 | 84.02 | 25.84 | 79.54 | 28.19 | 96.66 | 59.76 | 79.04 | 28.16 | 91.36 | 25.31 | 92.84 | 57.57 |
(144) +BiLSTM | STD | 3.10 | 0.59 | 9.33 | 5.97 | 3.11 | 1.51 | 0.34 | 2.58 | 0.48 | 0.37 | 0.84 | 0.97 | 8.36 | 4.46 |
ResCNN [113] | Mean | 81.92 | 18.20 | 98.50 | 45.59 | 94.42 | 48.01 | 98.92 | 70.68 | 92.32 | 44.81 | 98.68 | 41.78 | 93.14 | 68.26 |
(144) | STD | 1.29 | 0.79 | 0.87 | 4.60 | 1.57 | 2.04 | 0.16 | 2.08 | 1.89 | 2.96 | 0.17 | 0.74 | 8.40 | 5.82 |
ResCNN [113] | Mean | 87.66 | 23.24 | 99.54 | 51.77 | 94.56 | 48.59 | 98.86 | 70.06 | 93.80 | 45.59 | 99.10 | 43.33 | 93.12 | 67.91 |
(144) +BiLSTM | STD | 2.33 | 1.77 | 0.43 | 4.67 | 1.41 | 1.77 | 0.33 | 1.57 | 0.49 | 1.84 | 0.24 | 0.43 | 8.39 | 6.30 |
FCN [103] | Mean | 91.62 | 24.66 | 99.46 | 53.84 | 96.82 | 54.89 | 99.34 | 75.46 | 96.74 | 54.58 | 99.54 | 48.54 | 98.36 | 74.18 |
STD | 0.92 | 1.04 | 0.37 | 2.73 | 0.66 | 0.88 | 0.14 | 2.80 | 0.14 | 0.55 | 0.08 | 0.70 | 2.04 | 4.41 | |
LSTM-FCN [39] | Mean | 90.82 | 24.47 | 99.44 | 52.49 | 96.18 | 52.53 | 99.48 | 76.94 | 95.82 | 51.50 | 99.48 | 50.06 | 98.22 | 75.70 |
STD | 1.40 | 1.44 | 0.40 | 3.96 | 1.06 | 1.52 | 0.07 | 1.80 | 0.51 | 1.33 | 0.07 | 0.64 | 2.32 | 4.64 | |
GRU-FCN [21] | Mean | 89.12 | 23.03 | 99.32 | 52.01 | 96.78 | 55.11 | 99.46 | 76.05 | 96.66 | 54.32 | 99.60 | 51.97 | 98.16 | 76.05 |
STD | 1.53 | 1.08 | 0.53 | 3.93 | 0.93 | 1.55 | 0.10 | 1.40 | 0.51 | 1.43 | 0.11 | 1.18 | 2.22 | 4.31 | |
MLSTM-FCN [40] | Mean | 87.18 | 21.75 | 99.28 | 48.82 | 98.46 | 70.02 | 99.30 | 77.03 | 97.66 | 63.19 | 99.36 | 47.88 | 97.64 | 72.07 |
STD | 1.67 | 0.96 | 0.35 | 3.68 | 2.08 | 10.30 | 0.11 | 1.70 | 1.89 | 10.21 | 0.05 | 0.93 | 2.85 | 4.35 | |
MGRU-FCN [40] | Mean | 88.64 | 22.50 | 99.34 | 50.56 | 96.80 | 55.22 | 99.4 | 74.34 | 96.16 | 53.02 | 99.38 | 49.32 | 98.00 | 74.23 |
STD | 0.99 | 0.90 | 0.60 | 4.43 | 0.89 | 2.05 | 0.11 | 2.21 | 0.64 | 1.26 | 0.13 | 1.14 | 2.45 | 5.43 |
Method | Met- | OnHW-wordsTraj\(^1\) | IAM-OnDB [51] | VNOnDB-words [59] | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ric | Camera\(^2\) | IMU | Trajectory | WD | WI | WD | WI | ||||||||
WER | CER | WER | CER | WER | CER | WER | CER | WER | CER | WER | CER | WER | CER | ||
CNN+LSTM | Mean | 60.50 | 14.93 | 57.00 | 8.95 | 61.10 | 10.66 | 83.14 | 11.23 | 84.56 | 12.96 | 60.54 | 26.12 | 66.17 | 29.13 |
STD | – | – | 3.03 | 0.95 | 8.74 | 2.50 | 1.30 | 0.65 | 1.66 | 1.35 | 13.77 | 7.57 | 8.62 | 4.33 | |
CNN+BiLSTM | Mean | 26.22 | 8.54 | 16.52 | 2.79 | 11.77 | 2.07 | 65.91 | 6.94 | 72.42 | 9.11 | 15.54 | 6.71 | 18.67 | 8.00 |
STD | – | – | 2.38 | 0.61 | 2.23 | 0.64 | 1.08 | 0.27 | 2.75 | 1.22 | 0.67 | 0.25 | 1.24 | 0.72 | |
CNN+TCN | Mean | 64.00 | 16.10 | 67.47 | 11.40 | 69.40 | 23.94 | 87.07 | 12.97 | 87.18 | 14.32 | 41.70 | 16.98 | 74.70 | 42.33 |
STD | – | – | 7.12 | 4.80 | 16.61 | 27.28 | 3.00 | 1.49 | 2.91 | 1.74 | 8.43 | 3.57 | 25.31 | 19.44 | |
Attention-based | Mean | 60.99 | 17.21 | 74.80 | 16.74 | 33.50 | 5.78 | – | – | – | – | – | – | – | – |
model | STD | – | – | 2.09 | 0.74 | 4.45 | 1.01 | – | – | – | – | – | – | – | – |
InceptionTime [25] | Mean | 59.30 | 51.91 | 34.64 | 2.70 | 12.32 | 2.14 | 73.50 | 8.72 | 78.36 | 10.99 | 19.84 | 7.79 | 23.36 | 9.30 |
(96, 11) | STD | – | – | 1.74 | 0.47 | 1.86 | 0.56 | 2.71 | 0.82 | 3.35 | 1.47 | 1.61 | 0.61 | 0.97 | 0.60 |
InceptionTime [25] | Mean | 99.75 | 75.76 | 16.35 | 2.56 | 11.34 | 2.00 | 71.46 | 8.23 | 75.14 | 9.91 | 23.02 | 9.35 | 26.32 | 10.95 |
(96, 11)+BiLSTM | STD | – | – | 2.23 | 0.53 | 1.55 | 0.47 | 1.87 | 0.42 | 2.2 | 1.04 | 5.25 | 2.12 | 3.33 | 1.37 |
5.1 Seq2seq task evaluation
WD | WI | ||||||||
---|---|---|---|---|---|---|---|---|---|
Augmentation | WER | CER | WER | CER | |||||
Technique | Sensors | Mean | STD | Mean | STD | Mean | STD | Mean | STD |
None | All | 22.96 | 1.83 | 3.50 | 0.38 | 69.21 | 7.91 | 18.11 | 5.20 |
Scaling (S) | All | 22.70 | 0.40 | 3.43 | 0.22 | 69.70 | 7.90 | 18.80 | 5.84 |
Time Warping (TW) | All | 20.90 | 0.83 | 3.18 | 0.27 | 64.10 | 5.51 | 15.26 | 2.27 |
Jittering (J) | All | 22.87 | 0.75 | 3.47 | 0.33 | 68.14 | 10.03 | 18.68 | 7.18 |
Magnitude Warping (MW) | All | 22.88 | 1.21 | 3.53 | 0.29 | 76.80 | 8.35 | 18.47 | 5.21 |
Shifting (SH) | All | 22.40 | 1.12 | 3.43 | 0.24 | 69.81 | 7.59 | 18.80 | 4.88 |
Interpolation | All | 25.04 | 0.92 | 3.96 | 0.32 | 70.50 | 8.30 | 19.42 | 5.96 |
Normalization | All | 55.26 | 2.04 | 7.97 | 0.51 | 82.48 | 8.74 | 22.71 | 5.04 |
None | w/o Magnetometer | 22.60 | 1.51 | 3.44 | 0.36 | 63.48 | 8.32 | 16.07 | 4.73 |
None | w/o Front Accelerometer | 21.36 | 0.60 | 3.28 | 0.29 | 70.24 | 8.25 | 19.55 | 5.52 |
None | w/o Rear Accelerometer | 23.20 | 0.86 | 3.57 | 0.26 | 68.30 | 8.14 | 16.64 | 5.40 |
None | w/o Mag., w/o Front Acc. | 22.46 | 1.55 | 3.41 | 0.38 | 69.12 | 8.40 | 17.31 | 4.02 |
5.2 Single character task evaluation
Method | OnHW- | OnHW- | OnHW-sym.\(^1\) | OnHW-chars\(^3\) [65] | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
(\({\mathcal {L}}_{CCE}\) loss) | symbols\(^1\) | equations\(^{1,2}\) | + equations\(^{1,2}\) | lower | upper | combined | |||||||
WD | WI | WD | WI | WD | WI | WD | WI | WD | WI | WD | WI | ||
CNN+LSTM |
96.44
|
80.00
| 95.43 | 84.22 |
95.65
| 85.11 | 88.85 | 79.48 | 92.15 | 85.60 | 78.17 | 68.06 | |
CNN+BiLSTM | 96.20 | 79.51 | 95.70 | 83.88 | 95.50 | 84.55 |
89.66
|
80.00
|
92.58
|
85.64
|
78.98
|
68.44
| |
CNN+TCN | 94.21 | 76.83 |
96.70
|
84.91
| 95.48 |
86.30
| 88.32 | 78.80 | 90.80 | 84.54 | 77.90 | 67.96 | |
LSTM (2 layers) | 81.18 | 62.85 | 91.05 | 74.11 | 90.64 | 74.70 | 74.76 | 65.63 | 80.46 | 73.86 | 58.88 | 51.41 | |
LSTM (3 layers) | 83.51 | 64.48 | 92.08 | 75.77 | 91.52 | 76.17 | 76.05 | 66.14 | 82.10 | 74.82 | 61.58 | 52.80 | |
BiLSTM (2 layers) | 83.30 | 63.01 | 91.39 | 73.43 | 91.48 | 76.60 | 75.80 | 66.28 | 81.88 | 75.50 | 61.19 | 53.60 | |
BiLSTM (3 layers) | 83.09 | 59.74 | 92.46 | 76.60 | 91.93 | 77.05 | 77.17 | 67.20 | 83.48 | 75.99 | 63.52 | 54.21 | |
GRU [15] | 47.57 | 33.22 | 70.80 | 45.73 | 68.36 | 52.96 | 35.12 | 33.98 | 45.69 | 44.90 | 30.72 | 29.22 | |
TCN [6] | 85.41 | 70.21 | 91.64 | 77.44 | 92.02 | 79.18 | 75.36 | 68.30 | 79.14 | 74.27 | 60.14 | 54.28 | |
FCN [103] | 92.18 | 74.63 | 94.03 | 81.46 | 94.22 | 82.56 | 81.62 | 71.48 | 85.37 | 77.24 | 67.41 | 58.00 | |
RNN-FCN [40] | 93.23 | 74.63 | 94.24 | 81.56 | 94.52 | 82.74 | 81.74 | 71.03 | 85.32 | 77.28 | 67.78 | 57.88 | |
LSTM-FCN [39] | 92.39 | 73.32 | 93.95 | 81.47 | 94.33 | 82.24 | 81.43 | 71.41 | 85.43 | 77.07 | 67.34 | 57.93 | |
GRU-FCN [21] | 92.39 | 73.32 | 94.29 | 81.18 | 94.49 | 82.05 | 81.71 | 71.57 | 85.26 | 77.30 | 67.22 | 58.10 | |
MRNN-FCN | 92.60 | 74.30 | 94.24 | 81.30 | 94.36 | 82.58 | 82.35 | 72.06 | 85.81 | 77.83 | 68.01 | 58.57 | |
MLSTM- | SE | 89.22 | 70.38 | 93.78 | 82.49 | 94.04 | 82.70 | 79.39 | 71.90 | 85.08 | 77.44 | 69.33 | 60.14 |
FCN [40] | SE, Att. | 89.43 | 69.07 | 93.92 | 80.56 | 93.59 | 82.48 | 79.71 | 71.43 | 85.25 | 77.34 | 69.29 | 59.84 |
LSTM | 87.74 | 71.85 | 94.12 | 80.13 | 90.14 | 82.10 | 80.21 | 71.26 | 84.68 | 76.69 | 68.63 | 59.25 | |
Att. | 88.37 | 70.54 | 93.95 | 81.18 | 94.14 | 82.78 | 79.97 | 70.92 | 84.57 | 76.71 | 68.76 | 58.84 | |
MGRU-FCN [40] | 92.60 | 74.30 | 94.21 | 81.28 | 94.43 | 82.25 | 82.17 | 71.90 | 85.81 | 77.92 | 68.22 | 58.79 | |
ResCNN (64) [113] | 92.23 | 77.41 | 94.58 | 80.95 | 94.55 | 82.07 | 82.52 | 72.00 | 86.91 | 78.64 | 67.55 | 58.67 | |
ResNet (64) [103] | 94.50 | 76.76 | 94.68 | 83.45 | 94.74 | 83.43 | 83.01 | 71.93 | 86.41 | 78.03 | 68.56 | 58.74 | |
XResNet (18) [34] | 93.45 | 74.14 | 94.80 | 81.51 | 94.73 | 82.91 | 81.21 | 69.57 | 86.02 | 76.91 | 66.69 | 56.64 | |
XResNet (34) [34] | 93.45 | 74.63 | 94.64 | 81.77 | 94.74 | 82.29 | 81.40 | 69.47 | 85.74 | 77.03 | 66.53 | 55.59 | |
XResNet (50) [34] | 93.66 | 74.47 | 94.63 | 81.74 | 94.83 | 82.76 | 80.99 | 69.14 | 86.05 | 76.69 | 64.98 | 54.38 | |
XResNet (101) [34] | 92.60 | 75.29 | 93.64 | 80.95 | 93.48 | 82.74 | 80.88 | 69.53 | 85.83 | 76.47 | 64.53 | 54.20 | |
XResNet (152) [34] | 92.18 | 73.16 | 93.47 | 80.00 | 92.58 | 81.64 | 80.71 | 69.06 | 85.17 | 76.70 | 64.30 | 53.72 | |
XceptionTime (16) [72] | 91.54 | 72.34 | 94.03 | 82.24 | 93.95 | 81.84 | 81.41 | 70.76 | 85.94 | 78.23 | 66.70 | 56.92 | |
InceptionTime (32,6) [25] | 91.33 | 76.10 | 94.05 | 81.39 | 93.88 | 82.37 | 80.98 | 72.22 | 85.20 | 78.24 | 66.94 | 58.34 | |
InceptionTime (47,9) [25] | 92.60 | 75.94 | 94.49 | 83.42 | 94.20 | 81.25 | 82.11 | 72.40 | 85.93 | 79.49 | 67.72 | 59.53 | |
InceptionTime (62,9) [25] | 91.97 | 78.07 | 94.83 | 81.57 | 95.01 | 81.74 | 82.15 | 72.76 | 86.05 | 79.81 | 67.89 | 59.62 | |
InceptionTime (64,12) [25] | 91.97 | 76.92 | 94.87 | 84.35 | 95.06 | 83.33 | 84.14 | 75.28 | 87.80 | 81.62 | 70.43 | 61.68 | |
MultiIncep.Time (32,6) [25] | 91.12 | 75.29 | 93.91 | 80.57 | 93.61 | 81.67 | 80.96 | 72.25 | 85.12 | 78.21 | 66.76 | 58.32 | |
MiniRocket [87] | 69.77 | 58.76 | 75.91 | 45.34 | 75.58 | 46.46 | 46.01 | 72.25 | 51.38 | 44.64 | 33.65 | 27.63 | |
OmniScaleCNN [89] | 84.78 | 68.09 | 91.76 | 75.46 | 92.23 | 77.49 | 73.70 | 64.13 | 79.54 | 71.23 | 60.58 | 51.88 | |
XEM [24] | 85.84 | 67.10 | 92.13 | 77.04 | 91.42 | 77.90 | 74.39 | 68.12 | 81.67 | 74.32 | 58.18 | 51.99 | |
TapNet [111] | 67.02 | 48.12 | 66.38 | OOM | 65.96 | OOM | 45.62 | 37.86 | 46.04 | 38.76 | OOM | OOM | |
mWDN [99] | 88.58 | 67.43 | 92.37 | 77.30 | 92.02 | 78.60 | 75.69 | 63.44 | 82.91 | 73.01 | 59.80 | 47.48 | |
Perceiver [36] | 67.40 | 48.10 | 89.60 | 58.10 | 89.30 | 61.10 | 56.20 | 39.70 | 57.08 | 42.89 | 42.72 | 30.28 | |
Sinkhorn [90] | 61.10 | 50.90 | 76.80 | 66.40 | 75.70 | 69.80 | 47.26 | 45.56 | 53.04 | 51.36 | 36.84 | 34.52 | |
Performer [13] | 55.40 | 47.80 | 76.10 | 68.30 | 74.90 | 66.80 | 47.54 | 46.32 | 53.48 | 51.76 | 36.62 | 34.56 | |
Reformer [44] | 56.90 | 47.80 | 75.80 | 70.10 | 75.40 | 70.20 | 47.26 | 47.28 | 53.80 | 51.78 | 35.98 | 34.66 | |
Linformer [101] | 53.90 | 42.90 | 75.20 | 67.40 | 74.90 | 68.80 | 48.90 | 44.92 | 53.80 | 51.24 | 34.92 | 34.00 | |
TST [110] (Gaussian) | 91.12 | 71.85 | 93.07 | 80.40 | 93.16 | 80.33 | 80.10 | 70.75 | 84.81 | 78.34 | 66.12 | 57.56 | |
MultiTST [110] | 87.53 | 71.19 | 92.36 | 78.82 | 91.96 | 79.46 | 74.19 | 66.59 | 81.81 | 75.18 | 60.81 | 53.95 | |
TSiT [110] | 84.99 | 68.09 | 93.30 | 78.98 | 92.91 | 80.28 | 79.56 | 69.90 | 84.55 | 77.21 | 64.81 | 55.73 | |
CNN (from [65]) | – | – | – | – | – | – |
84.62
|
76.85
|
89.89
|
83.01
|
70.50
| 64.01 | |
LSTM (from [65]) | – | – | – | – | – | – | 79.83 | 73.03 | 88.68 | 81.91 | 67.83 | 60.29 | |
CNN+LSTM (from [65]) | – | – | – | – | – | – | 82.64 | 74.25 | 88.55 | 82.96 | 69.42 |
64.13
| |
BiLSTM (from [65]) | – | – | – | – | – | – | 82.43 | 75.72 | 89.15 | 81.09 | 69.37 | 63.38 |
Loss Function | OnHW- | OnHW- | OnHW-sym.\(^1\) | OnHW-chars\(^3\) [65] | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
(CNN+BiLSTM architecture) | symbols\(^1\) | equations\(^{1,2}\) | + equations\(^{1,2}\) | Lower | Upper | Combined | |||||||
WD | WI | WD | WI | WD | WI | WD | WI | WD | WI | WD | WI | ||
Categorical CE (CCE) | 96.20 | 79.51 | 95.57 | 83.88 | 95.50 | 84.55 | 89.66 | 80.00 | 92.58 | 85.64 | 78.98 | 68.44 | |
Focal loss (FL) [50] | 95.78 | 79.67 | 95.42 | 84.53 | 95.25 | 85.20 | 88.56 | 78.88 | 91.91 | 85.62 | 77.48 | 68.15 | |
Label smoothing (LSR) [67] | 96.22 | 81.83 |
95.86
|
87.09
|
95.74
| 86.52 |
89.74
|
80.96
|
92.72
| 86.13 |
79.09
|
69.43
| |
Boot soft (SBS) [73] | 96.00 | 79.00 | 95.70 | 84.87 | 95.65 | 85.91 | 89.08 | 79.76 | 92.12 | 85.79 | 78.19 | 68.47 | |
Boot hard (HBS) [73] | 96.22 | 79.17 | 95.63 | 85.27 | 95.60 |
87.11
| 89.20 | 80.00 | 92.29 | 85.82 | 78.28 | 68.41 | |
Generalized CE (GCE) [112] | 96.44 | 80.83 | 95.81 | 86.46 | 95.64 | 86.69 | 88.18 | 79.34 | 91.51 | 85.49 | 76.91 | 67.76 | |
Symmetric CE (SCE) [102] | 96.44 | 81.00 | 95.76 | 85.15 | 95.58 | 85.43 | 89.24 | 79.90 | 92.09 | 85.84 | 78.11 | 68.65 | |
Joint optimization (JO) [88] |
97.33
|
82.17
| 95.67 | 85.40 | 95.60 | 85.87 | 89.71 | 80.14 | 92.65 |
86.56
| 79.07 | 69.26 |
NaN
loss (see Fig. 25, Appendix 7), and hence, is non-robust for our datasets. The improvement for the SCE loss is less significant than other losses and even decreases for the OnHW-chars dataset. JO leads to an improvement for all OnHW-chars datasets. JO further outperforms all losses for the WI upper task and achieves marginally lower accuracies than the LSR loss for the lower and combined datasets. LSR also achieves the highest accuracies on the OnHW-symbols WD (97.33%) and WI (82.17%) datasets. In summary, all loss variants can improve results of the CCE loss for the OnHW-symbols, split OnHW-equations and combined datasets as these are not equally distributed. LSR, SCE and JO can most significantly outperform other techniques. For more details of accuracy plots, see Appendix 7, Fig. 25.Dataset | WD | WI | Dataset | WD | WI | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
(CNN+BiLSTM | WER | CER | WER | CER | (CNN+BiLSTM | CRR | CRR | ||||||
architecture) | Mean | STD | Mean | STD | Mean | STD | Mean | STD | architecture) | Mean | Mean | ||
OnHW-equations-L | 8.56 | 1.59 | 1.24 | 0.25 | 95.73 | 3.13 | 32.16 | 5.16 | OnHW-symbols-L\(^1\) | 92.00 | 54.00 | ||
OnHW-words500-L | 47.90 | 17.25 | 15.32 | 6.03 | 97.90 | 1.10 | 81.43 | 11.66 | OnHW-equations-L\(^{1,2}\) | 92.02 | 51.50 | ||
OnHW-wordsRandom-L | 32.73 | 3.43 | 5.40 | 1.15 | 99.70 | 0.30 | 72.27 | 15.55 | OnHW- | Lower | 94.70 | – | |
chars-L\(^3\) [65] | Upper | 91.90 | – | ||||||||||
Combined | 82.80 | – |