Skip to main content
Erschienen in: Pattern Recognition and Image Analysis 2/2023

01.06.2023 | SELECTED CONFERENCE PAPERS

Predictors Based on Convolutional Neural Networks for the Movement Strategy of Trainable Agents for Building Customized Image Descriptors

verfasst von: A. Samarin, A. Savelev, A. Toropov, A. Dzestelova, V. Malykh, E. Mikhailova, A. Motyko

Erschienen in: Pattern Recognition and Image Analysis | Ausgabe 2/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We present a description of various custom image descriptor modifications that are used as part of an image classification pipeline with text elements. The problem under consideration is related to the classification of images of commercial facades by the type of services provided. Some of the proposed descriptor types are presented for the first time and demonstrate state-of-the-art performance on open datasets. In our study, we used a special type of descriptor for image areas with text based on traces of the movement of agents. The traces in question are generated using parameterized movement strategies, which are presented and compared in this article.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
5.
6.
Zurück zum Zitat G. Howard, A. Zhu, M. Chen, B. Kalenichenko, D. Wang, W. Weyand, T. Andreetto, and M. Adam, “MobileNets: Efficient convolutional neural networks for mobile vision applications,” (2017). arXiv:1704.04861 [cs.CV] G. Howard, A. Zhu, M. Chen, B. Kalenichenko, D. Wang, W. Weyand, T. Andreetto, and M. Adam, “MobileNets: Efficient convolutional neural networks for mobile vision applications,” (2017). arXiv:1704.04861 [cs.CV]
8.
Zurück zum Zitat G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, 2017 (IEEE, 2017), pp. 2261–2269. https://doi.org/10.1109/CVPR.2017.243 G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, 2017 (IEEE, 2017), pp. 2261–2269. https://​doi.​org/​10.​1109/​CVPR.​2017.​243
10.
Zurück zum Zitat T.-Yi Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, Microsoft COCO: Common objects in context,” in Computer Vision—ECCV 2014, Ed. by D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Lecture Notes in Computer Science, Vol. 8693 (Springer, Cham, 2014), pp. 740–755. https://doi.org/10.1007/978-3-319-10602-1_48CrossRef T.-Yi Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, Microsoft COCO: Common objects in context,” in Computer Vision—ECCV 2014, Ed. by D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Lecture Notes in Computer Science, Vol. 8693 (Springer, Cham, 2014), pp. 740–755. https://​doi.​org/​10.​1007/​978-3-319-10602-1_​48CrossRef
11.
Zurück zum Zitat T. Liu, S. Fang, Y. Zhao, P. Wang, and J. Zhang, “Implementation of training convolutional neural networks,” (2015). arXiv:1506.01195 [cs.CV] T. Liu, S. Fang, Y. Zhao, P. Wang, and J. Zhang, “Implementation of training convolutional neural networks,” (2015). arXiv:1506.01195 [cs.CV]
12.
Zurück zum Zitat V. Malykh and A. Samarin, “Combined advertising sign classifier,” in Analysis of Images, Social Networks and Texts. AIST 2019, Ed. by W. M. P. van der Aalst, V. Batagelj, D. I. Ignatov, M. Khachay, V. Kuskova, A. Kutuzov, S. O. Kuznetsov, I. A. Lomazova, N. Loukachevictch, A. Napoli, P. M. Pardalos, M. Pelillo, A. V. Savchenko, and E. Tutubalina, Lecture Notes in Computer Science, Vol. 11832 (Springer, Cham, 2019), pp. 179–185. https://doi.org/10.1007/978-3-030-37334-4_16CrossRef V. Malykh and A. Samarin, “Combined advertising sign classifier,” in Analysis of Images, Social Networks and Texts. AIST 2019, Ed. by W. M. P. van der Aalst, V. Batagelj, D. I. Ignatov, M. Khachay, V. Kuskova, A. Kutuzov, S. O. Kuznetsov, I. A. Lomazova, N. Loukachevictch, A. Napoli, P. M. Pardalos, M. Pelillo, A. V. Savchenko, and E. Tutubalina, Lecture Notes in Computer Science, Vol. 11832 (Springer, Cham, 2019), pp. 179–185. https://​doi.​org/​10.​1007/​978-3-030-37334-4_​16CrossRef
14.
Zurück zum Zitat S. Romberg, L. G. Pueyo, R. Lienhart, and R. van Zwol, “Scalable logo recognition in real-world images,” in Proc. 1st ACM Int. Conf. on Multimedia Retrieval, Trento, Italy, 2011 (Association for Computing Machinery, New York, 2011), p. 25. https://doi.org/10.1145/1991996.1992021 S. Romberg, L. G. Pueyo, R. Lienhart, and R. van Zwol, “Scalable logo recognition in real-world images,” in Proc. 1st ACM Int. Conf. on Multimedia Retrieval, Trento, Italy, 2011 (Association for Computing Machinery, New York, 2011), p. 25. https://​doi.​org/​10.​1145/​1991996.​1992021
15.
Zurück zum Zitat A. Samarin and V. Malykh, “Worm-like image descriptor for signboard classification,” CEUR Workshop Proc. 2691, 17 (2020). https://ceur-ws.org/Vol-2691/paper17.pdf A. Samarin and V. Malykh, “Worm-like image descriptor for signboard classification,” CEUR Workshop Proc. 2691, 17 (2020). https://​ceur-ws.​org/​Vol-2691/​paper17.​pdf
16.
Zurück zum Zitat A. Samarin and V. Malykh, “Ensemble-based commercial buildings facades photographs classifier,” in Analysis of Images, Social Networks and Texts, Ed. by W. M. P. van der Aalst, V. Batagelj, D. I. Ignatov, M. Khachay, O. Koltsova, A. Kutuzov, S. O. Kuznetsov, I. A. Lomazova, N. Loukachevitch, A. Napoli, A. Panchenko, P. M. Pardalos, M. Pelillo, A. V. Savchenko, and E. Tutubalina, Lecture Notes in Computer Science, Vol. 12602 (Springer, Cham, 2021), pp. 257–265. https://doi.org/10.1007/978-3-030-72610-2_19CrossRef A. Samarin and V. Malykh, “Ensemble-based commercial buildings facades photographs classifier,” in Analysis of Images, Social Networks and Texts, Ed. by W. M. P. van der Aalst, V. Batagelj, D. I. Ignatov, M. Khachay, O. Koltsova, A. Kutuzov, S. O. Kuznetsov, I. A. Lomazova, N. Loukachevitch, A. Napoli, A. Panchenko, P. M. Pardalos, M. Pelillo, A. V. Savchenko, and E. Tutubalina, Lecture Notes in Computer Science, Vol. 12602 (Springer, Cham, 2021), pp. 257–265. https://​doi.​org/​10.​1007/​978-3-030-72610-2_​19CrossRef
17.
Zurück zum Zitat A. Samarin, V. Malykh V., Muravyov, S., “Specialized image descriptors for signboard photographs classification,” in Databases and Information Systems. DB&IS 2020, Ed. by T. Robal, H. M. Haav, J. Penjam, and R. Matulevičius, Communications in Computer and Information Science, Vol. 1243 (Springer, Cham, 2020), pp. 122–129. https://doi.org/10.1007/978-3-030-57672-1_10 A. Samarin, V. Malykh V., Muravyov, S., “Specialized image descriptors for signboard photographs classification,” in Databases and Information Systems. DB&IS 2020, Ed. by T. Robal, H. M. Haav, J. Penjam, and R. Matulevičius, Communications in Computer and Information Science, Vol. 1243 (Springer, Cham, 2020), pp. 122–129. https://​doi.​org/​10.​1007/​978-3-030-57672-1_​10
18.
Zurück zum Zitat M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-Ch. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” in 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, Utah, 2018 (IEEE, 2018), pp. 4510–4520. https://doi.org/10.1109/CVPR.2018.00474 M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-Ch. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” in 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, Utah, 2018 (IEEE, 2018), pp. 4510–4520. https://​doi.​org/​10.​1109/​CVPR.​2018.​00474
19.
Zurück zum Zitat K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” (2014). arXiv:1409.1556 [cs.CV] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” (2014). arXiv:1409.1556 [cs.CV]
21.
Zurück zum Zitat C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, 2015 (IEEE, 2015), pp. 1–9. https://doi.org/10.1109/CVPR.2015.7298594 C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, 2015 (IEEE, 2015), pp. 1–9. https://​doi.​org/​10.​1109/​CVPR.​2015.​7298594
23.
Zurück zum Zitat M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” Proc. Mach. Learn. Res. 97, 6105–6114 (2019). http://proceedings.mlr.press/v97/tan19a.html M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” Proc. Mach. Learn. Res. 97, 6105–6114 (2019). http://​proceedings.​mlr.​press/​v97/​tan19a.​html
24.
Zurück zum Zitat M. Tan and Q. Le, “EfficientNetV2: Smaller models and faster training,” Proc. Mach. Learn. Res. 139, 10096–10106 (2021). https://proceedings.mlr.press/v139/tan21a.html M. Tan and Q. Le, “EfficientNetV2: Smaller models and faster training,” Proc. Mach. Learn. Res. 139, 10096–10106 (2021). https://​proceedings.​mlr.​press/​v139/​tan21a.​html
25.
Zurück zum Zitat Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, “Detecting text in natural image with connetionist text proposal network,” in Computer Vision—ECCV 2016, Ed. by B. Leibe, J. Matas, N. Sebe, and M. Welling, Lecture Notes in Computer Science, Vol. 9912 (Springer, Cham, 2016), pp. 56–72. https://doi.org/10.1007/978-3-319-46484-8_4CrossRef Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, “Detecting text in natural image with connetionist text proposal network,” in Computer Vision—ECCV 2016, Ed. by B. Leibe, J. Matas, N. Sebe, and M. Welling, Lecture Notes in Computer Science, Vol. 9912 (Springer, Cham, 2016), pp. 56–72. https://​doi.​org/​10.​1007/​978-3-319-46484-8_​4CrossRef
28.
Zurück zum Zitat J. Zhou, K. McGuinness, and N. E. O’Connor, “A text recognition and retrieval system for e-business image management,” in MultiMedia Modeling, Ed. by K. Schoeffmann, T. H. Chalidabhongse, Ch. W. Ngo, S. Aramvith, N. E. O’Connor, Yo-S. Ho, M. Gabbouj, and A. Elgammal, Lecture Notes in Computer Science, Vol. 10705 (Springer, Cham, 2018), pp. 23–35. https://doi.org/10.1007/978-3-319-73600-6_3 J. Zhou, K. McGuinness, and N. E. O’Connor, “A text recognition and retrieval system for e-business image management,” in MultiMedia Modeling, Ed. by K. Schoeffmann, T. H. Chalidabhongse, Ch. W. Ngo, S. Aramvith, N. E. O’Connor, Yo-S. Ho, M. Gabbouj, and A. Elgammal, Lecture Notes in Computer Science, Vol. 10705 (Springer, Cham, 2018), pp. 23–35. https://​doi.​org/​10.​1007/​978-3-319-73600-6_​3
29.
Zurück zum Zitat X. Zhou, C. Yao, H. Wen, Y. ang, S. Zhou, W. He, and J. Liang, “East: An efficient and accurate scene text detector,” in 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii (IEEE, 2017), pp. 2642–2651. https://doi.org/10.1109/CVPR.2017.283 X. Zhou, C. Yao, H. Wen, Y. ang, S. Zhou, W. He, and J. Liang, “East: An efficient and accurate scene text detector,” in 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii (IEEE, 2017), pp. 2642–2651. https://​doi.​org/​10.​1109/​CVPR.​2017.​283
Metadaten
Titel
Predictors Based on Convolutional Neural Networks for the Movement Strategy of Trainable Agents for Building Customized Image Descriptors
verfasst von
A. Samarin
A. Savelev
A. Toropov
A. Dzestelova
V. Malykh
E. Mikhailova
A. Motyko
Publikationsdatum
01.06.2023
Verlag
Pleiades Publishing
Erschienen in
Pattern Recognition and Image Analysis / Ausgabe 2/2023
Print ISSN: 1054-6618
Elektronische ISSN: 1555-6212
DOI
https://doi.org/10.1134/S105466182302013X

Weitere Artikel der Ausgabe 2/2023

Pattern Recognition and Image Analysis 2/2023 Zur Ausgabe

Premium Partner