nach oben

Pattern Analysis and Applications

Erschienen in:

13.02.2023 | Industrial and Commercial Application

Script identification of ancient books by Chinese ethnic minorities using multi-branch DCNN and SPP

verfasst von: Hai Guo, Doudou Yang, Yifan Liu, Jingying Zhao

Erschienen in: Pattern Analysis and Applications | Ausgabe 2/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Automatic classification of ancient books is an important component of the digital platform of ancient books, while automatic classification of ancient books is more challenging. In view of the ancient books script identification task of different ethnic minorities in China, this paper proposes a deep convolutional neural network (CNN) ancient books script identification method with multi-branch structure and spatial pyramid pooling (SPP), called MbSPPVGG. We build a dataset of Chinese ethnic ancient handwritten books, and crop and standardize preprocessing images of ancient books. In order to improve the identification accuracy of ancient books and ability of CNN to perceive multi-scale changes in image, bottom-level and high-level features of CNN are merged by multi-branch structure to enhance the networks expression ability, and then use SPP to multi-scale de-dimensionality of convolutional features, increase the spatial scale invariance of CNN. The introduction of multi-branch structure and SPP in the CNN model constitutes a new ancient books identification model. The experimental results show that the precision, recall and F1-score of MbSPPVGG model are all 99.94%. As demonstrated by comparison experiments, the classification accuracy of MbSPPVGG model is better than that of state-of-the-art GhostNet, CSPDenseNet, MixNet and other deep learning methods, and its effectiveness is verified on multiple datasets.

Vorheriger Artikel Dual autoencoder based zero shot learning in special domain

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Martínek J, Lenc L, Král P (2020) Building an efficient OCR system for historical documents with little training data. Neural Comput and Applic 32:17209–17227. https://doi.org/10.1007/s00521-020-04910-xCrossRef

Abasi AK, Khader AT, Al-Betar MA et al (2020) A novel hybrid multi-verse optimizer with K-means for text documents clustering. Neural Comput Appl 32:17703–17729CrossRef

Roy S, Das A, Bhattacharya U (2016) Generalized stacking of layer wise-trained deep convolutional neural networks for document image classification. The 23rd international conference on pattern recognition (ICPR), pp 1273–1278

Ghosh D, Dube T, Shivaprasad A (2010) Script recognition: a review. IEEE Trans Pattern Anal Mach Intell 32(12):2142–2161CrossRef

Li C, Zhang H, Chu D et al (2020) SRTM: a supervised relation topic model for multi-classification on large-scale document network. Neural Comput Applic 32:6383–6392. https://doi.org/10.1007/s00521-019-04145-5CrossRef

Kang L, Kumar J, Ye P, Li Y, Doermann D (2014) Convolutional neural networks for document image classification. In: ICPR, pp 3168–3172

Harley AW, Ufkes A, Derpanis KG (2015) Evaluation of deep convolutional nets for document image classification and retrieval. In: The 13th international conference on document analysis and recognition (ICDAR), pp 991–995

Deng J, Dong W, et al (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 248–255, https://doi.org/10.1109/CVPR.2009.5206848

Guo S, Yao N (2020) Generating word and document matrix representations for document classification. Neural Comput Appl 32:10087–10108. https://doi.org/10.1007/s00521-019-04541-xCrossRef

10.

Ferrando J, Domínguez JL et al (2020) Improving accuracy and speeding up document image classification through parallel systems. In: The 20th international conference computational science, pp 387–400

11.

He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916CrossRef

12.

Deng L (2012) The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag 29(6):141–142CrossRef

13.

Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. http://arxiv.org/abs/1708.07747

14.

Netzer Y, Wang T et al (2011) Reading digits in natural images with unsupervised feature learning. In NIPS Workshop, pp 1–9

15.

Afzal MZ, Kölsch A, Ahmed S, Liwicki M (2017) Cutting the error by half: investigation of very deep CNN and advanced training strategies for document image classification. In: The 14th IAPR international conference on document analysis and recognition (ICDAR), pp 883–888

16.

Das A, Roy S, Bhattacharya U, Parui SK (2018) Document image classification with intra-domain transfer learning and stacked generalization of deep convolutional neural networks. In: The 24th international conference on pattern recognition (ICPR), pp 3180–3185

17.

Wei H, Seuret M, Liwicki M, Ingold R, Fu P (2017) Selecting fine-tuned features for layout analysis of historical documents. In: The 14th IAPR international conference on document analysis and recognition, pp 281–286

18.

Karabayir I, Akbilgic O, Tas N (2020) A novel learning algorithm to optimize deep neural networks: evolved gradient direction optimizer (EVGO). IEEE Trans Neural Netw Learn Syst 32(2):685–694MathSciNetCrossRef

19.

Low CY, Park J, Teoh ABJ (2020) Stacking based deep neural network: deep analytic network for pattern classification. IEEE Trans Cybern 50(12):5021–5034CrossRef

20.

Szegedy C, Liu W et al (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition, pp 1–9 https://doi.org/10.1109/CVPR.2015.7298594

21.

Liu S, Deng W (2015) Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian conference on pattern recognition (ACPR), pp 730–734

22.

Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2818–2826

23.

Christian S, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the thirty-first AAAI conference on artificial intelligence (AAAI-17), pp 4278–4284

24.

Huang G, Liu Z, Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269

25.

Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5686–5696

26.

Fu K, Zhao Q, Gu I, Yang J (2019) Deepside: a general deep framework for salient object detection. Neurocomputing 356:69–82CrossRef

27.

Hasanpour S, Rouhani M, Fayyaz M, Sabokrou M (2018) Lets keep it simple, using simple architectures to outperform deeper and more complex architectures. http://arxiv.org/abs/1608.06037

28.

Qiang B et al (2021) SqueezeNet and fusion network-based accurate fast fully convolutional network for hand detection and gesture recognition. IEEE Access 9:77661–77674CrossRef

29.

Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1800–1807

30.

Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 4510–4520

31.

Ma N, Zhang X, Zheng HT, Sun J (2018) ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: European conference on computer vision (ECCV), pp 122–138

32.

Reddy B, Bano S, Reddy G, Kommineni R, Reddy P (2021) Convolutional network based animal recognition using YOLO and Darknet. In: 2021 6th international conference on inventive computation technologies (ICICT), pp 1198–1203, https://doi.org/10.1109/ICICT50816.2021.9358620

33.

Tan M, Le QV (2019) EfficientNet: rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th international conference on machine learning, pp 6105–6114

34.

Tan M, Le QV (2019) MixConv: mixed depthwise convolutional kernels. http://arxiv.org/abs/1907.09595

35.

Wang C, Mark Liao H, Wu Y, Chen P, Hsieh J, Yeh I (2020) CSPNet: A new backbone that can enhance learning capability of CNN. In: 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 1571–1580

36.

Han K, Wang Y et al (2020) GhostNet: more features from cheap operations. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1577–1586

37.

Castaneda G, Morris P, Khoshgoftaar T (2020) Evaluating the number of trainable parameters on deep Maxout and LReLU networks for visual recognition. In: 2020 19th IEEE international conference on machine learning and applications (ICMLA), pp 415–421, https://doi.org/10.1109/ICMLA51294.2020.00072

38.

Noury Z, Rezaei M (2020) Deep-CAPTCHA: a deep learning based CAPTCHA solver for vulnerability assessment. http://arxiv.org/abs/2006.08296

39.

Sun Y, Zhang L, Schaeffer H (2020) NeuPDE: neural network based ordinary and partial differential equations for modeling time-dependent data. Proc Math Sci Mach Learn Conf 107:352–372

40.

Jayasundara V, Jayasekara S et al (2019) TextCaps: handwritten character recognition with very small datasets. In: 2019 IEEE winter conference on applications of computer vision (WACV), pp 254–262

41.

Tan M et al (2019) MnasNet: platform-aware neural architecture search for mobile. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2815–2823, https://doi.org/10.1109/CVPR.2019.00293

42.

Mehta S, Rastegari M (2021) MobileViT: light-weight, general-purpose, and mobile-friendly vision transformer. http://arxiv.org/abs/2110.02178

43.

Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recogn 67:85–96CrossRef

44.

Sharma N, Mandal R, Sharma R, Pal U, Blumenstein M. (2015) ICDAR 2015 competition on video script identification (CVSI 2015). In: 2015 13th international conference on document analysis and recognition (ICDAR), pp 1196–1200

Titel: Script identification of ancient books by Chinese ethnic minorities using multi-branch DCNN and SPP
verfasst von: Hai Guo
Doudou Yang
Yifan Liu
Jingying Zhao
Publikationsdatum: 13.02.2023
Verlag: Springer London
Erschienen in: Pattern Analysis and Applications / Ausgabe 2/2023
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI: https://doi.org/10.1007/s10044-023-01146-y

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2023

Parkinsonian gait patterns quantification from principal geodesic analysis

Dual image-based reversible fragile watermarking scheme for tamper detection and localization

Instance hardness and multivariate Gaussian distribution-based oversampling technique for imbalance classification

EEG-based emotion recognition with cascaded convolutional recurrent neural networks

Polar radius moment with application for affine invariants

Feature fusion based on joint sparse representations and wavelets for multiview classification

Premium Partner