Skip to main content
Erschienen in: Pattern Analysis and Applications 2/2023

13.02.2023 | Industrial and Commercial Application

Script identification of ancient books by Chinese ethnic minorities using multi-branch DCNN and SPP

verfasst von: Hai Guo, Doudou Yang, Yifan Liu, Jingying Zhao

Erschienen in: Pattern Analysis and Applications | Ausgabe 2/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Automatic classification of ancient books is an important component of the digital platform of ancient books, while automatic classification of ancient books is more challenging. In view of the ancient books script identification task of different ethnic minorities in China, this paper proposes a deep convolutional neural network (CNN) ancient books script identification method with multi-branch structure and spatial pyramid pooling (SPP), called MbSPPVGG. We build a dataset of Chinese ethnic ancient handwritten books, and crop and standardize preprocessing images of ancient books. In order to improve the identification accuracy of ancient books and ability of CNN to perceive multi-scale changes in image, bottom-level and high-level features of CNN are merged by multi-branch structure to enhance the networks expression ability, and then use SPP to multi-scale de-dimensionality of convolutional features, increase the spatial scale invariance of CNN. The introduction of multi-branch structure and SPP in the CNN model constitutes a new ancient books identification model. The experimental results show that the precision, recall and F1-score of MbSPPVGG model are all 99.94%. As demonstrated by comparison experiments, the classification accuracy of MbSPPVGG model is better than that of state-of-the-art GhostNet, CSPDenseNet, MixNet and other deep learning methods, and its effectiveness is verified on multiple datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Abasi AK, Khader AT, Al-Betar MA et al (2020) A novel hybrid multi-verse optimizer with K-means for text documents clustering. Neural Comput Appl 32:17703–17729CrossRef Abasi AK, Khader AT, Al-Betar MA et al (2020) A novel hybrid multi-verse optimizer with K-means for text documents clustering. Neural Comput Appl 32:17703–17729CrossRef
3.
Zurück zum Zitat Roy S, Das A, Bhattacharya U (2016) Generalized stacking of layer wise-trained deep convolutional neural networks for document image classification. The 23rd international conference on pattern recognition (ICPR), pp 1273–1278 Roy S, Das A, Bhattacharya U (2016) Generalized stacking of layer wise-trained deep convolutional neural networks for document image classification. The 23rd international conference on pattern recognition (ICPR), pp 1273–1278
4.
Zurück zum Zitat Ghosh D, Dube T, Shivaprasad A (2010) Script recognition: a review. IEEE Trans Pattern Anal Mach Intell 32(12):2142–2161CrossRef Ghosh D, Dube T, Shivaprasad A (2010) Script recognition: a review. IEEE Trans Pattern Anal Mach Intell 32(12):2142–2161CrossRef
6.
Zurück zum Zitat Kang L, Kumar J, Ye P, Li Y, Doermann D (2014) Convolutional neural networks for document image classification. In: ICPR, pp 3168–3172 Kang L, Kumar J, Ye P, Li Y, Doermann D (2014) Convolutional neural networks for document image classification. In: ICPR, pp 3168–3172
7.
Zurück zum Zitat Harley AW, Ufkes A, Derpanis KG (2015) Evaluation of deep convolutional nets for document image classification and retrieval. In: The 13th international conference on document analysis and recognition (ICDAR), pp 991–995 Harley AW, Ufkes A, Derpanis KG (2015) Evaluation of deep convolutional nets for document image classification and retrieval. In: The 13th international conference on document analysis and recognition (ICDAR), pp 991–995
10.
Zurück zum Zitat Ferrando J, Domínguez JL et al (2020) Improving accuracy and speeding up document image classification through parallel systems. In: The 20th international conference computational science, pp 387–400 Ferrando J, Domínguez JL et al (2020) Improving accuracy and speeding up document image classification through parallel systems. In: The 20th international conference computational science, pp 387–400
11.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916CrossRef He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916CrossRef
12.
Zurück zum Zitat Deng L (2012) The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag 29(6):141–142CrossRef Deng L (2012) The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag 29(6):141–142CrossRef
14.
Zurück zum Zitat Netzer Y, Wang T et al (2011) Reading digits in natural images with unsupervised feature learning. In NIPS Workshop, pp 1–9 Netzer Y, Wang T et al (2011) Reading digits in natural images with unsupervised feature learning. In NIPS Workshop, pp 1–9
15.
Zurück zum Zitat Afzal MZ, Kölsch A, Ahmed S, Liwicki M (2017) Cutting the error by half: investigation of very deep CNN and advanced training strategies for document image classification. In: The 14th IAPR international conference on document analysis and recognition (ICDAR), pp 883–888 Afzal MZ, Kölsch A, Ahmed S, Liwicki M (2017) Cutting the error by half: investigation of very deep CNN and advanced training strategies for document image classification. In: The 14th IAPR international conference on document analysis and recognition (ICDAR), pp 883–888
16.
Zurück zum Zitat Das A, Roy S, Bhattacharya U, Parui SK (2018) Document image classification with intra-domain transfer learning and stacked generalization of deep convolutional neural networks. In: The 24th international conference on pattern recognition (ICPR), pp 3180–3185 Das A, Roy S, Bhattacharya U, Parui SK (2018) Document image classification with intra-domain transfer learning and stacked generalization of deep convolutional neural networks. In: The 24th international conference on pattern recognition (ICPR), pp 3180–3185
17.
Zurück zum Zitat Wei H, Seuret M, Liwicki M, Ingold R, Fu P (2017) Selecting fine-tuned features for layout analysis of historical documents. In: The 14th IAPR international conference on document analysis and recognition, pp 281–286 Wei H, Seuret M, Liwicki M, Ingold R, Fu P (2017) Selecting fine-tuned features for layout analysis of historical documents. In: The 14th IAPR international conference on document analysis and recognition, pp 281–286
18.
Zurück zum Zitat Karabayir I, Akbilgic O, Tas N (2020) A novel learning algorithm to optimize deep neural networks: evolved gradient direction optimizer (EVGO). IEEE Trans Neural Netw Learn Syst 32(2):685–694MathSciNetCrossRef Karabayir I, Akbilgic O, Tas N (2020) A novel learning algorithm to optimize deep neural networks: evolved gradient direction optimizer (EVGO). IEEE Trans Neural Netw Learn Syst 32(2):685–694MathSciNetCrossRef
19.
Zurück zum Zitat Low CY, Park J, Teoh ABJ (2020) Stacking based deep neural network: deep analytic network for pattern classification. IEEE Trans Cybern 50(12):5021–5034CrossRef Low CY, Park J, Teoh ABJ (2020) Stacking based deep neural network: deep analytic network for pattern classification. IEEE Trans Cybern 50(12):5021–5034CrossRef
21.
Zurück zum Zitat Liu S, Deng W (2015) Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian conference on pattern recognition (ACPR), pp 730–734 Liu S, Deng W (2015) Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian conference on pattern recognition (ACPR), pp 730–734
22.
Zurück zum Zitat Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2818–2826 Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2818–2826
23.
Zurück zum Zitat Christian S, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the thirty-first AAAI conference on artificial intelligence (AAAI-17), pp 4278–4284 Christian S, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the thirty-first AAAI conference on artificial intelligence (AAAI-17), pp 4278–4284
24.
Zurück zum Zitat Huang G, Liu Z, Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269 Huang G, Liu Z, Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269
25.
Zurück zum Zitat Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5686–5696 Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5686–5696
26.
Zurück zum Zitat Fu K, Zhao Q, Gu I, Yang J (2019) Deepside: a general deep framework for salient object detection. Neurocomputing 356:69–82CrossRef Fu K, Zhao Q, Gu I, Yang J (2019) Deepside: a general deep framework for salient object detection. Neurocomputing 356:69–82CrossRef
28.
Zurück zum Zitat Qiang B et al (2021) SqueezeNet and fusion network-based accurate fast fully convolutional network for hand detection and gesture recognition. IEEE Access 9:77661–77674CrossRef Qiang B et al (2021) SqueezeNet and fusion network-based accurate fast fully convolutional network for hand detection and gesture recognition. IEEE Access 9:77661–77674CrossRef
29.
Zurück zum Zitat Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1800–1807 Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1800–1807
30.
Zurück zum Zitat Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 4510–4520 Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 4510–4520
31.
Zurück zum Zitat Ma N, Zhang X, Zheng HT, Sun J (2018) ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: European conference on computer vision (ECCV), pp 122–138 Ma N, Zhang X, Zheng HT, Sun J (2018) ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: European conference on computer vision (ECCV), pp 122–138
33.
Zurück zum Zitat Tan M, Le QV (2019) EfficientNet: rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th international conference on machine learning, pp 6105–6114 Tan M, Le QV (2019) EfficientNet: rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th international conference on machine learning, pp 6105–6114
35.
Zurück zum Zitat Wang C, Mark Liao H, Wu Y, Chen P, Hsieh J, Yeh I (2020) CSPNet: A new backbone that can enhance learning capability of CNN. In: 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 1571–1580 Wang C, Mark Liao H, Wu Y, Chen P, Hsieh J, Yeh I (2020) CSPNet: A new backbone that can enhance learning capability of CNN. In: 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 1571–1580
36.
Zurück zum Zitat Han K, Wang Y et al (2020) GhostNet: more features from cheap operations. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1577–1586 Han K, Wang Y et al (2020) GhostNet: more features from cheap operations. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1577–1586
37.
39.
Zurück zum Zitat Sun Y, Zhang L, Schaeffer H (2020) NeuPDE: neural network based ordinary and partial differential equations for modeling time-dependent data. Proc Math Sci Mach Learn Conf 107:352–372 Sun Y, Zhang L, Schaeffer H (2020) NeuPDE: neural network based ordinary and partial differential equations for modeling time-dependent data. Proc Math Sci Mach Learn Conf 107:352–372
40.
Zurück zum Zitat Jayasundara V, Jayasekara S et al (2019) TextCaps: handwritten character recognition with very small datasets. In: 2019 IEEE winter conference on applications of computer vision (WACV), pp 254–262 Jayasundara V, Jayasekara S et al (2019) TextCaps: handwritten character recognition with very small datasets. In: 2019 IEEE winter conference on applications of computer vision (WACV), pp 254–262
43.
Zurück zum Zitat Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recogn 67:85–96CrossRef Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recogn 67:85–96CrossRef
44.
Zurück zum Zitat Sharma N, Mandal R, Sharma R, Pal U, Blumenstein M. (2015) ICDAR 2015 competition on video script identification (CVSI 2015). In: 2015 13th international conference on document analysis and recognition (ICDAR), pp 1196–1200 Sharma N, Mandal R, Sharma R, Pal U, Blumenstein M. (2015) ICDAR 2015 competition on video script identification (CVSI 2015). In: 2015 13th international conference on document analysis and recognition (ICDAR), pp 1196–1200
Metadaten
Titel
Script identification of ancient books by Chinese ethnic minorities using multi-branch DCNN and SPP
verfasst von
Hai Guo
Doudou Yang
Yifan Liu
Jingying Zhao
Publikationsdatum
13.02.2023
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 2/2023
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-023-01146-y

Weitere Artikel der Ausgabe 2/2023

Pattern Analysis and Applications 2/2023 Zur Ausgabe

Premium Partner