Skip to main content
main-content
Top

Hint

Swipe to navigate through the chapters of this book

2021 | OriginalPaper | Chapter

PyraD-DCNN: A Fully Convolutional Neural Network to Replace BLSTM in Offline Text Recognition Systems

Authors : Jonathan Jouanne, Quentin Dauchy, Ahmad Montaser Awal

Published in: Pattern Recognition. ICPR International Workshops and Challenges

Publisher: Springer International Publishing

share
SHARE

Abstract

We present in this paper a fast and efficient multi-task fully convolutional neural network (FCNN). The proposed architecture uses a multi-resolution Pyramid of Densely connected Dilated Convolution (PyraD-DCNN). Our design also implements optimized convolutional building blocks that enable large dimensional representation with a low computational cost. Besides its ability to perform semantic image segmentation by itself as an auto-encoder, it may also be coupled with a signal encoder to build an end-to-end signal-to-sequence system without the help of recurrent layers (RNN). In the current work, we present the PyraD-DCNN through an application on Optical Character Recognition task and how it holds the comparison with Bidirectional Long Short-Term Memory (BLSTM) RNN. The pyramid-like structure using dilated kernels provides short and long term context management without recurrence. Thus we managed to improve inference time on CPU up to three times faster on our own datasets compared to a classical CNN-LSTM, with slight accuracy improvements in addition to faster training cycles (up to 24 times faster). Furthermore, the lightness of this structure makes it naturally adapted to mobile applications without any accuracy loss.
Literature
1.
go back to reference Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling (2018) Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling (2018)
4.
go back to reference Chang, S.Y., et al.: Temporal modeling using dilated convolution and gating for voice-activity-detection. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5549–5553. IEEE (2018) Chang, S.Y., et al.: Temporal modeling using dilated convolution and gating for voice-activity-detection. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5549–5553. IEEE (2018)
9.
go back to reference Goodfellow, I.J., Bulatov, Y., Ibarz, J., Arnoud, S., Shet, V.: Multi-digit number recognition from street view imagery using deep convolutional neural networks (2013) Goodfellow, I.J., Bulatov, Y., Ibarz, J., Arnoud, S., Shet, V.: Multi-digit number recognition from street view imagery using deep convolutional neural networks (2013)
10.
go back to reference Graves, A., Fernández, S., Gomez, F.J., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML 2006 (2006) Graves, A., Fernández, S., Gomez, F.J., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML 2006 (2006)
12.
go back to reference Grosicki, E., El Abed, H.: ICDAR 2009 handwriting recognition competition. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 1398–1402. IEEE (2009) Grosicki, E., El Abed, H.: ICDAR 2009 handwriting recognition competition. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 1398–1402. IEEE (2009)
13.
go back to reference Gupta, A., Rush, A.M.: Dilated convolutions for modeling long-distance genomic dependencies (2017) Gupta, A., Rush, A.M.: Dilated convolutions for modeling long-distance genomic dependencies (2017)
14.
go back to reference Liang, M., Hu, X.: Recurrent convolutional neural network for object recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015 Liang, M., Hu, X.: Recurrent convolutional neural network for object recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015
16.
go back to reference Ptucha, R., Such, F.P., Pillai, S., Brockler, F., Singh, V., Hutkowski, P.: Intelligent character recognition using fully convolutional neural networks. Pattern Recogn. 88, 604–613 (2019) CrossRef Ptucha, R., Such, F.P., Pillai, S., Brockler, F., Singh, V., Hutkowski, P.: Intelligent character recognition using fully convolutional neural networks. Pattern Recogn. 88, 604–613 (2019) CrossRef
18.
go back to reference Such, F.P., Peri, D., Brockler, F., Paul, H., Ptucha, R.: Fully convolutional networks for handwriting recognition. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 86–91. IEEE (2018) Such, F.P., Peri, D., Brockler, F., Paul, H., Ptucha, R.: Fully convolutional networks for handwriting recognition. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 86–91. IEEE (2018)
20.
go back to reference Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions (2015) Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions (2015)
21.
go back to reference Yuan, Y., Guan, J., Sun, J.: Blind SAR image despeckling using self-supervised dense dilated convolutional neural network (2019) Yuan, Y., Guan, J., Sun, J.: Blind SAR image despeckling using self-supervised dense dilated convolutional neural network (2019)
Metadata
Title
PyraD-DCNN: A Fully Convolutional Neural Network to Replace BLSTM in Offline Text Recognition Systems
Authors
Jonathan Jouanne
Quentin Dauchy
Ahmad Montaser Awal
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-68763-2_49

Premium Partner