Skip to main content

2018 | OriginalPaper | Buchkapitel

Zero-Annotation Object Detection with Web Knowledge Transfer

verfasst von : Qingyi Tao, Hao Yang, Jianfei Cai

Erschienen in: Computer Vision – ECCV 2018

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Object detection is one of the major problems in computer vision, and has been extensively studied. Most of the existing detection works rely on labor-intensive supervision, such as ground truth bounding boxes of objects or at least image-level annotations. On the contrary, we propose an object detection method that does not require any form of human annotation on target tasks, by exploiting freely available web images. In order to facilitate effective knowledge transfer from web images, we introduce a multi-instance multi-label domain adaption learning framework with two key innovations. First of all, we propose an instance-level adversarial domain adaptation network with attention on foreground objects to transfer the object appearances from web domain to target domain. Second, to preserve the class-specific semantic structure of transferred object features, we propose a simultaneous transfer mechanism to transfer the supervision across domains through pseudo strong label generation. With our end-to-end framework that simultaneously learns a weakly supervised detector and transfers knowledge across domains, we achieved significant improvements over baseline methods on the benchmark datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Al-Halah, Z., Stiefelhagen, R.: Automatic discovery, association estimation and learning of semantic attributes for a thousand categories. arXiv preprint arXiv:1704.03607 (2017) Al-Halah, Z., Stiefelhagen, R.: Automatic discovery, association estimation and learning of semantic attributes for a thousand categories. arXiv preprint arXiv:​1704.​03607 (2017)
2.
Zurück zum Zitat Bilen, H., Pedersoli, M., Tuytelaars, T.: Weakly supervised object detection with posterior regularization. In: Proceedings BMVC 2014, pp. 1–12 (2014) Bilen, H., Pedersoli, M., Tuytelaars, T.: Weakly supervised object detection with posterior regularization. In: Proceedings BMVC 2014, pp. 1–12 (2014)
3.
Zurück zum Zitat Bilen, H., Pedersoli, M., Tuytelaars, T.: Weakly supervised object detection with convex clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1081–1089 (2015) Bilen, H., Pedersoli, M., Tuytelaars, T.: Weakly supervised object detection with convex clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1081–1089 (2015)
4.
Zurück zum Zitat Bilen, H., Vedaldi, A.: Weakly supervised deep detection networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2846–2854 (2016) Bilen, H., Vedaldi, A.: Weakly supervised deep detection networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2846–2854 (2016)
5.
Zurück zum Zitat Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., Krishnan, D.: Unsupervised pixel-level domain adaptation with generative adversarial networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, p. 7 (2017) Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., Krishnan, D.: Unsupervised pixel-level domain adaptation with generative adversarial networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, p. 7 (2017)
6.
Zurück zum Zitat Chen, X., Gupta, A.: Webly supervised learning of convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1431–1439 (2015) Chen, X., Gupta, A.: Webly supervised learning of convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1431–1439 (2015)
7.
Zurück zum Zitat Chen, X., Shrivastava, A., Gupta, A.: Neil: extracting visual knowledge from web data. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1409–1416 (2013) Chen, X., Shrivastava, A., Gupta, A.: Neil: extracting visual knowledge from web data. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1409–1416 (2013)
8.
Zurück zum Zitat Cinbis, R.G., Verbeek, J., Schmid, C.: Weakly supervised object localization with multi-fold multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 39(1), 189–203 (2017)CrossRef Cinbis, R.G., Verbeek, J., Schmid, C.: Weakly supervised object localization with multi-fold multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 39(1), 189–203 (2017)CrossRef
9.
Zurück zum Zitat Divvala, S.K., Farhadi, A., Guestrin, C.: Learning everything about anything: webly-supervised visual concept learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3270–3277 (2014) Divvala, S.K., Farhadi, A., Guestrin, C.: Learning everything about anything: webly-supervised visual concept learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3270–3277 (2014)
10.
Zurück zum Zitat Ferrari, V., Zisserman, A.: Learning visual attributes. In: Advances in Neural Information Processing Systems, pp. 433–440 (2008) Ferrari, V., Zisserman, A.: Learning visual attributes. In: Advances in Neural Information Processing Systems, pp. 433–440 (2008)
11.
Zurück zum Zitat Fu, Z., Xiang, T., Kodirov, E., Gong, S.: Zero-shot object recognition by semantic manifold distance. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2635–2644 (2015) Fu, Z., Xiang, T., Kodirov, E., Gong, S.: Zero-shot object recognition by semantic manifold distance. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2635–2644 (2015)
12.
Zurück zum Zitat Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: International Conference on Machine Learning, pp. 1180–1189 (2015) Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: International Conference on Machine Learning, pp. 1180–1189 (2015)
13.
Zurück zum Zitat Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
14.
Zurück zum Zitat Jie, Z., Wei, Y., Jin, X., Feng, J., Liu, W.: Deep self-taught learning for weakly supervised object localization. arXiv preprint arXiv:1704.05188 (2017) Jie, Z., Wei, Y., Jin, X., Feng, J., Liu, W.: Deep self-taught learning for weakly supervised object localization. arXiv preprint arXiv:​1704.​05188 (2017)
16.
Zurück zum Zitat Kumar Singh, K., Xiao, F., Jae Lee, Y.: Track and transfer: watching videos to simulate strong human supervision for weakly-supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3548–3556 (2016) Kumar Singh, K., Xiao, F., Jae Lee, Y.: Track and transfer: watching videos to simulate strong human supervision for weakly-supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3548–3556 (2016)
17.
Zurück zum Zitat Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR 2009, pp. 951–958. IEEE (2009) Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR 2009, pp. 951–958. IEEE (2009)
18.
Zurück zum Zitat Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 453–465 (2014)CrossRef Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 453–465 (2014)CrossRef
19.
Zurück zum Zitat Li, L.J., Fei-Fei, L.: Optimol: automatic online picture collection via incremental model learning. Int. J. Comput. Vis. 88(2), 147–168 (2010)MathSciNetCrossRef Li, L.J., Fei-Fei, L.: Optimol: automatic online picture collection via incremental model learning. Int. J. Comput. Vis. 88(2), 147–168 (2010)MathSciNetCrossRef
20.
Zurück zum Zitat Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017). IEEE Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017). IEEE
22.
Zurück zum Zitat Long, M., Zhu, H., Wang, J., Jordan, M.I.: Deep transfer learning with joint adaptation networks. arXiv preprint arXiv:1605.06636 (2016) Long, M., Zhu, H., Wang, J., Jordan, M.I.: Deep transfer learning with joint adaptation networks. arXiv preprint arXiv:​1605.​06636 (2016)
23.
Zurück zum Zitat Maaten, L.V.D., Hinton, G.: Visualizing data using T-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008) Maaten, L.V.D., Hinton, G.: Visualizing data using T-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
24.
Zurück zum Zitat Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014) Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
25.
Zurück zum Zitat Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015) Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
26.
Zurück zum Zitat Sultani, W., Shah, M.: What if we do not have multiple videos of the same action? video action localization using web images. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1077–1085. IEEE (2016) Sultani, W., Shah, M.: What if we do not have multiple videos of the same action? video action localization using web images. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1077–1085. IEEE (2016)
27.
Zurück zum Zitat Tang, P., Wang, X., Bai, X., Liu, W.: Multiple instance detection network with online instance classifier refinement. In: CVPR (2017) Tang, P., Wang, X., Bai, X., Liu, W.: Multiple instance detection network with online instance classifier refinement. In: CVPR (2017)
28.
29.
Zurück zum Zitat Tzeng, E., Hoffman, J., Darrell, T., Saenko, K.: Simultaneous deep transfer across domains and tasks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4068–4076 (2015) Tzeng, E., Hoffman, J., Darrell, T., Saenko, K.: Simultaneous deep transfer across domains and tasks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4068–4076 (2015)
30.
Zurück zum Zitat Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: Computer Vision and Pattern Recognition (CVPR), vol. 1, p. 4 (2017) Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: Computer Vision and Pattern Recognition (CVPR), vol. 1, p. 4 (2017)
31.
Zurück zum Zitat Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)CrossRef Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)CrossRef
32.
Zurück zum Zitat Wang, G., Luo, P., Lin, L., Wang, X.: Learning object interactions and descriptions for semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5859–5867 (2017) Wang, G., Luo, P., Lin, L., Wang, X.: Learning object interactions and descriptions for semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5859–5867 (2017)
33.
Zurück zum Zitat Wei, Y., et al.: STC: a simple to complex framework for weakly-supervised semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2314–2320 (2017)CrossRef Wei, Y., et al.: STC: a simple to complex framework for weakly-supervised semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2314–2320 (2017)CrossRef
34.
35.
Zurück zum Zitat Xu, Z., Huang, S., Zhang, Y., Tao, D.: Augmenting strong supervision using web data for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2524–2532 (2015) Xu, Z., Huang, S., Zhang, Y., Tao, D.: Augmenting strong supervision using web data for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2524–2532 (2015)
36.
Zurück zum Zitat Yang, X., Zhang, H., Cai, J.: Shuffle-then-assemble: learning object-agnostic visual relationship features. In: ECCV (2018) Yang, X., Zhang, H., Cai, J.: Shuffle-then-assemble: learning object-agnostic visual relationship features. In: ECCV (2018)
Metadaten
Titel
Zero-Annotation Object Detection with Web Knowledge Transfer
verfasst von
Qingyi Tao
Hao Yang
Jianfei Cai
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-01252-6_23

Premium Partner