Skip to main content

2019 | OriginalPaper | Buchkapitel

Deep Learning-Based Concept Detection in vitrivr

verfasst von : Luca Rossetto, Mahnaz Amiri Parian, Ralph Gasser, Ivan Giangreco, Silvan Heller, Heiko Schuldt

Erschienen in: MultiMedia Modeling

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents the most recent additions to the vitrivr retrieval stack, which will be put to the test in the context of the 2019 Video Browser Showdown (VBS). The vitrivr stack has been extended by approaches for detecting, localizing, or describing concepts and actions in video scenes using various convolutional neural networks. Leveraging those additions, we have added support for searching the video collection based on semantic sketches. Furthermore, vitrivr offers new types of labels for text-based retrieval. In the same vein, we have also improved upon vitrivr’s pre-existing capabilities for extracting text from video through scene text recognition. Moreover, the user interface has received a major overhaul so as to make it more accessible to novice users, especially for query formulation and result exploration.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abadi, M., Barham, P., Chen, J., et al.: Tensorflow: a system for large-scale machine learning. In: Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), vol. 16, pp. 265–283. USENIX, Savannah, GA, USA (2016) Abadi, M., Barham, P., Chen, J., et al.: Tensorflow: a system for large-scale machine learning. In: Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), vol. 16, pp. 265–283. USENIX, Savannah, GA, USA (2016)
2.
Zurück zum Zitat Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany (2018, page to appear) Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany (2018, page to appear)
3.
Zurück zum Zitat Cobârzan, C., et al.: Interactive video search tools: a detailed analysis of the video browser showdown 2015. Multimedia Tools and Appl. (MTAP) 76(4), 5539–5571 (2017) Cobârzan, C., et al.: Interactive video search tools: a detailed analysis of the video browser showdown 2015. Multimedia Tools and Appl. (MTAP) 76(4), 5539–5571 (2017)
4.
Zurück zum Zitat Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223. IEEE, Las Vegas (2016) Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223. IEEE, Las Vegas (2016)
5.
Zurück zum Zitat Mark Everingham, S.M., Eslami, A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. (IJCV) 111(1), 98–136 (2015)CrossRef Mark Everingham, S.M., Eslami, A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. (IJCV) 111(1), 98–136 (2015)CrossRef
7.
Zurück zum Zitat Giangreco, I., Schuldt, H.: ADAM\(_{pro}\): database support for big multimedia retrieval. Datenbank-Spektrum 16(1), 17–26 (2016) Giangreco, I., Schuldt, H.: ADAM\(_{pro}\): database support for big multimedia retrieval. Datenbank-Spektrum 16(1), 17–26 (2016)
8.
Zurück zum Zitat Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227, pp. 1–10 (2014) Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:​1406.​2227, pp. 1–10 (2014)
9.
Zurück zum Zitat Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Li, F.-F.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1725–1732. IEEE, Columbus (2014) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Li, F.-F.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1725–1732. IEEE, Columbus (2014)
11.
Zurück zum Zitat van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. (JMLR) 9(Nov), 2579–2605 (2008)MATH van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. (JMLR) 9(Nov), 2579–2605 (2008)MATH
12.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, pp. 1–12 (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:​1301.​3781, pp. 1–12 (2013)
15.
Zurück zum Zitat Rossetto, L., Giangreco, I., Schuldt, H.: Cineast: a multi-feature sketch-based video retrieval engine. In: Proceedings of the International Symposium on Multimedia (ISM), pp. 18–23. IEEE, Taichung, December 2014 Rossetto, L., Giangreco, I., Schuldt, H.: Cineast: a multi-feature sketch-based video retrieval engine. In: Proceedings of the International Symposium on Multimedia (ISM), pp. 18–23. IEEE, Taichung, December 2014
17.
Zurück zum Zitat Rossetto, L., Giangreco, I., Tănase, C., Schuldt, H.: vitrivr: a flexible retrieval stack supporting multiple query modesfor searching in multimedia collections. In: Proceedings of the ACM Conference on Multimedia Conference (ACM MM), pp. 1183–1186. ACM, Amsterdam, October 2016 Rossetto, L., Giangreco, I., Tănase, C., Schuldt, H.: vitrivr: a flexible retrieval stack supporting multiple query modesfor searching in multimedia collections. In: Proceedings of the ACM Conference on Multimedia Conference (ACM MM), pp. 1183–1186. ACM, Amsterdam, October 2016
18.
Zurück zum Zitat Rossetto, L., Giangreco, I., Tănase, C., Schuldt, H., Dupont, S., Seddati, O.: Enhanced retrieval and browsing in the IMOTION system. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10133, pp. 469–474. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51814-5_43CrossRef Rossetto, L., Giangreco, I., Tănase, C., Schuldt, H., Dupont, S., Seddati, O.: Enhanced retrieval and browsing in the IMOTION system. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10133, pp. 469–474. Springer, Cham (2017). https://​doi.​org/​10.​1007/​978-3-319-51814-5_​43CrossRef
19.
Zurück zum Zitat Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 39(11), 2298–2304 (2017)CrossRef Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 39(11), 2298–2304 (2017)CrossRef
20.
Zurück zum Zitat Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 4489–4497. IEEE, Santiago (2015) Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 4489–4497. IEEE, Santiago (2015)
21.
Zurück zum Zitat Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 39(4), 652–663 (2017)CrossRef Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 39(4), 652–663 (2017)CrossRef
22.
Zurück zum Zitat Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ADE20K dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, p. 4. IEEE, Honolulu (2017) Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ADE20K dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, p. 4. IEEE, Honolulu (2017)
23.
Zurück zum Zitat Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2642–2651. IEEE, Honolulu (2017) Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2642–2651. IEEE, Honolulu (2017)
Metadaten
Titel
Deep Learning-Based Concept Detection in vitrivr
verfasst von
Luca Rossetto
Mahnaz Amiri Parian
Ralph Gasser
Ivan Giangreco
Silvan Heller
Heiko Schuldt
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-05716-9_55

Neuer Inhalt