Skip to main content
Erschienen in: International Journal of Computer Vision 4/2021

09.01.2021

A Shape Transformation-based Dataset Augmentation Framework for Pedestrian Detection

verfasst von: Zhe Chen, Wanli Ouyang, Tongliang Liu, Dacheng Tao

Erschienen in: International Journal of Computer Vision | Ausgabe 4/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Deep learning-based computer vision is usually data-hungry. Many researchers attempt to augment datasets with synthesized data to improve model robustness. However, the augmentation of popular pedestrian datasets, such as Caltech and Citypersons, can be extremely challenging because real pedestrians are commonly in low quality. Due to the factors like occlusions, blurs, and low-resolution, it is significantly difficult for existing augmentation approaches, which generally synthesize data using 3D engines or generative adversarial networks (GANs), to generate realistic-looking pedestrians. Alternatively, to access much more natural-looking pedestrians, we propose to augment pedestrian detection datasets by transforming real pedestrians from the same dataset into different shapes. Accordingly, we propose the Shape Transformation-based Dataset Augmentation (STDA) framework. The proposed framework is composed of two subsequent modules, i.e.  the shape-guided deformation and the environment adaptation. In the first module, we introduce a shape-guided warping field to help deform the shape of a real pedestrian into a different shape. Then, in the second stage, we propose an environment-aware blending map to better adapt the deformed pedestrians into surrounding environments, obtaining more realistic-looking pedestrians and more beneficial augmentation results for pedestrian detection. Extensive empirical studies on different pedestrian detection benchmarks show that the proposed STDA framework consistently produces much better augmentation results than other pedestrian synthesis approaches using low-quality pedestrians. By augmenting the original datasets, our proposed framework also improves the baseline pedestrian detector by up to 38% on the evaluated benchmarks, achieving state-of-the-art performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix
 
Literatur
Zurück zum Zitat Alcorn, M. A., Li, Q., Gong, Z., Wang, C., Mai, L., Ku, W. S., Nguyen, A. (2018). Strike (with) a pose: Neural networks are easily fooled by strange poses of familiar objects. arXiv preprint arXiv:1811.11553 Alcorn, M. A., Li, Q., Gong, Z., Wang, C., Mai, L., Ku, W. S., Nguyen, A. (2018). Strike (with) a pose: Neural networks are easily fooled by strange poses of familiar objects. arXiv preprint arXiv:​1811.​11553
Zurück zum Zitat Bar-Hillel, A., Levi, D., Krupka, E., & Goldberg, C. (2010). Part-based feature synthesis for human detection. ECCV (pp. 127–142). New york: Springer. Bar-Hillel, A., Levi, D., Krupka, E., & Goldberg, C. (2010). Part-based feature synthesis for human detection. ECCV (pp. 127–142). New york: Springer.
Zurück zum Zitat Brazil, G., Yin, X., Liu, X. (2017). Illuminating pedestrians via simultaneous detection & segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy Brazil, G., Yin, X., Liu, X. (2017). Illuminating pedestrians via simultaneous detection & segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy
Zurück zum Zitat Cai, Z., Fan, Q., Feris, R. S., & Vasconcelos, N. (2016). A unified multi-scale deep convolutional neural network for fast object detection. ECCV (pp. 354–370). New york: Springer. Cai, Z., Fan, Q., Feris, R. S., & Vasconcelos, N. (2016). A unified multi-scale deep convolutional neural network for fast object detection. ECCV (pp. 354–370). New york: Springer.
Zurück zum Zitat Chen, Z., Li, J., Chen, Z., & You, X. (2017). Generic pixel level object tracker using bi-channel fully convolutional network. International conference on neural information processing (pp. 666–676). New york: Springer.CrossRef Chen, Z., Li, J., Chen, Z., & You, X. (2017). Generic pixel level object tracker using bi-channel fully convolutional network. International conference on neural information processing (pp. 666–676). New york: Springer.CrossRef
Zurück zum Zitat Chen, Z., Zhang, J., & Tao, D. (2019). Progressive lidar adaptation for road detection. IEEE/CAA Journal of Automatica Sinica, 6(3), 693–702.CrossRef Chen, Z., Zhang, J., & Tao, D. (2019). Progressive lidar adaptation for road detection. IEEE/CAA Journal of Automatica Sinica, 6(3), 693–702.CrossRef
Zurück zum Zitat Cireşan, D. C., Meier, U., Gambardella, L. M., & Schmidhuber, J. (2010). Deep, big, simple neural nets for handwritten digit recognition. Neural Computation, 22(12), 3207–3220.CrossRef Cireşan, D. C., Meier, U., Gambardella, L. M., & Schmidhuber, J. (2010). Deep, big, simple neural nets for handwritten digit recognition. Neural Computation, 22(12), 3207–3220.CrossRef
Zurück zum Zitat Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y. (2017). Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764–773 Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y. (2017). Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764–773
Zurück zum Zitat Dao, T., Gu, A., Ratner, A., Smith, V., De Sa, C., Re, C. (2019). A kernel theory of modern data augmentation. In: International Conference on Machine Learning, pp 1528–1537 Dao, T., Gu, A., Ratner, A., Smith, V., De Sa, C., Re, C. (2019). A kernel theory of modern data augmentation. In: International Conference on Machine Learning, pp 1528–1537
Zurück zum Zitat Dollár, P., Wojek, C., Schiele, B., Perona, P. (2009). Pedestrian detection: A benchmark. In: CVPR, IEEE, pp 304–311 Dollár, P., Wojek, C., Schiele, B., Perona, P. (2009). Pedestrian detection: A benchmark. In: CVPR, IEEE, pp 304–311
Zurück zum Zitat Dollar, P., Wojek, C., Schiele, B., & Perona, P. (2012). Pedestrian detection: An evaluation of the state of the art. T-PAMI, 34(4), 743–761.CrossRef Dollar, P., Wojek, C., Schiele, B., & Perona, P. (2012). Pedestrian detection: An evaluation of the state of the art. T-PAMI, 34(4), 743–761.CrossRef
Zurück zum Zitat Dosovitskiy, A., Fischer, P., Springenberg, J. T., Riedmiller, M., & Brox, T. (2015). Discriminative unsupervised feature learning with exemplar convolutional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(9), 1734–1747.CrossRef Dosovitskiy, A., Fischer, P., Springenberg, J. T., Riedmiller, M., & Brox, T. (2015). Discriminative unsupervised feature learning with exemplar convolutional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(9), 1734–1747.CrossRef
Zurück zum Zitat Du, X., El-Khamy, M., Lee, J., Davis, L. (2017). Fused dnn: A deep neural network fusion approach to fast and robust pedestrian detection. In: WACV, IEEE, pp 953–961 Du, X., El-Khamy, M., Lee, J., Davis, L. (2017). Fused dnn: A deep neural network fusion approach to fast and robust pedestrian detection. In: WACV, IEEE, pp 953–961
Zurück zum Zitat Enzweiler, M., & Gavrila, D. M. (2008). Monocular pedestrian detection: Survey and experiments. T-PAMI, 12, 2179–2195. Enzweiler, M., & Gavrila, D. M. (2008). Monocular pedestrian detection: Survey and experiments. T-PAMI, 12, 2179–2195.
Zurück zum Zitat Felzenszwalb, P., McAllester, D., Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In: CVPR, IEEE, pp 1–8 Felzenszwalb, P., McAllester, D., Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In: CVPR, IEEE, pp 1–8
Zurück zum Zitat Felzenszwalb, P. F., Girshick, R. B., McAllester, D. (2010a). Cascade object detection with deformable part models. In: CVPR, IEEE, pp 2241–2248 Felzenszwalb, P. F., Girshick, R. B., McAllester, D. (2010a). Cascade object detection with deformable part models. In: CVPR, IEEE, pp 2241–2248
Zurück zum Zitat Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010b). Object detection with discriminatively trained part-based models. T-PAMI, 32(9), 1627–1645.CrossRef Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010b). Object detection with discriminatively trained part-based models. T-PAMI, 32(9), 1627–1645.CrossRef
Zurück zum Zitat Ge, Y., Li, Z., Zhao, H., Yin, G., Yi, S., Wang, X., et al. (2018). Fd-gan: Pose-guided feature distilling gan for robust person re-identification. Advances in Neural Information Processing Systems, 31, 1230–1241. Ge, Y., Li, Z., Zhao, H., Yin, G., Yi, S., Wang, X., et al. (2018). Fd-gan: Pose-guided feature distilling gan for robust person re-identification. Advances in Neural Information Processing Systems, 31, 1230–1241.
Zurück zum Zitat Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11), 1231–1237.CrossRef Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11), 1231–1237.CrossRef
Zurück zum Zitat Girshick, R., Donahue, J., Darrell, T., Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587 Girshick, R., Donahue, J., Darrell, T., Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Zurück zum Zitat Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y. (2014). Generative adversarial nets. In: NIPS, pp 2672–2680 Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y. (2014). Generative adversarial nets. In: NIPS, pp 2672–2680
Zurück zum Zitat Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning (Vol. 1, No. 2). Cambridge: MIT press. Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning (Vol. 1, No. 2). Cambridge: MIT press.
Zurück zum Zitat Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A. C. (2017). Improved training of wasserstein gans. In: NIPS, pp 5767–5777 Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A. C. (2017). Improved training of wasserstein gans. In: NIPS, pp 5767–5777
Zurück zum Zitat Hattori, H., Naresh Boddeti, V., Kitani, K. M., Kanade, T.(2015). Learning scene-specific pedestrian detectors without real data. In: CVPR, pp 3819–3827 Hattori, H., Naresh Boddeti, V., Kitani, K. M., Kanade, T.(2015). Learning scene-specific pedestrian detectors without real data. In: CVPR, pp 3819–3827
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In: CVPR, pp 770–778 He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In: CVPR, pp 770–778
Zurück zum Zitat He, K., Gkioxari, G., Dollár, P., Girshick, R. (2017). Mask r-cnn. In: ICCV, IEEE, pp 2980–2988 He, K., Gkioxari, G., Dollár, P., Girshick, R. (2017). Mask r-cnn. In: ICCV, IEEE, pp 2980–2988
Zurück zum Zitat Huang, S., Ramanan, D. (2017). Expecting the unexpected: Training detectors for unusual pedestrians with adversarial imposters. In: CVPR, IEEE, vol 1 Huang, S., Ramanan, D. (2017). Expecting the unexpected: Training detectors for unusual pedestrians with adversarial imposters. In: CVPR, IEEE, vol 1
Zurück zum Zitat Isola, P., Zhu, J. Y., Zhou, T., Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. arXiv preprint Isola, P., Zhu, J. Y., Zhou, T., Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. arXiv preprint
Zurück zum Zitat Jaderberg, M., Simonyan, K., Zisserman, A., et al. (2015). Spatial transformer networks. In: Advances in neural information processing systems, pp 2017–2025 Jaderberg, M., Simonyan, K., Zisserman, A., et al. (2015). Spatial transformer networks. In: Advances in neural information processing systems, pp 2017–2025
Zurück zum Zitat Lee, D., Liu, S., Gu, J., Liu, M. Y., Yang, M.H., Kautz, J. (2018). Context-aware synthesis and placement of object instances. In: NeurIPS, pp 10393–10403 Lee, D., Liu, S., Gu, J., Liu, M. Y., Yang, M.H., Kautz, J. (2018). Context-aware synthesis and placement of object instances. In: NeurIPS, pp 10393–10403
Zurück zum Zitat Li, J., Liang, X., Shen, S., Xu, T., Feng, J., & Yan, S. (2018). Scale-aware fast r-cnn for pedestrian detection. TMM, 20(4), 985–996. Li, J., Liang, X., Shen, S., Xu, T., Feng, J., & Yan, S. (2018). Scale-aware fast r-cnn for pedestrian detection. TMM, 20(4), 985–996.
Zurück zum Zitat Lin, C., Lu, J., Wang, G., & Zhou, J. (2018). Graininess-aware deep feature learning for pedestrian detection. ECCV (pp. 732–747). New york: Springer. Lin, C., Lu, J., Wang, G., & Zhou, J. (2018). Graininess-aware deep feature learning for pedestrian detection. ECCV (pp. 732–747). New york: Springer.
Zurück zum Zitat Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014). Microsoft coco: Common objects in context. ECCV (pp. 740–755). New york: Springer. Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014). Microsoft coco: Common objects in context. ECCV (pp. 740–755). New york: Springer.
Zurück zum Zitat Lin, T.Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J. (2017). Feature pyramid networks for object detection. In: CVPR, vol 1, p 4 Lin, T.Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J. (2017). Feature pyramid networks for object detection. In: CVPR, vol 1, p 4
Zurück zum Zitat Liu, J., Ni, B., Yan, Y., Zhou P., Cheng, S., Hu, J. (2018). Pose transferrable person re-identification. In: CVPR, IEEE, pp 4099–4108 Liu, J., Ni, B., Yan, Y., Zhou P., Cheng, S., Hu, J. (2018). Pose transferrable person re-identification. In: CVPR, IEEE, pp 4099–4108
Zurück zum Zitat Liu, L., Muelly, M., Deng, J., Pfister, T., Li, L. J. (2019). Generative modeling for small-data object detection. In: ICCV, pp 6073–6081 Liu, L., Muelly, M., Deng, J., Pfister, T., Li, L. J. (2019). Generative modeling for small-data object detection. In: ICCV, pp 6073–6081
Zurück zum Zitat Liu, M. Y., Breuel, T., Kautz, J .(2017a). Unsupervised image-to-image translation networks. In: NIPS, pp 700–708 Liu, M. Y., Breuel, T., Kautz, J .(2017a). Unsupervised image-to-image translation networks. In: NIPS, pp 700–708
Zurück zum Zitat Liu, T., Lugosi, G., Neu, G., Tao, D. (2017b). Algorithmic stability and hypothesis complexity. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, JMLR. org, pp 2159–2167 Liu, T., Lugosi, G., Neu, G., Tao, D. (2017b). Algorithmic stability and hypothesis complexity. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, JMLR. org, pp 2159–2167
Zurück zum Zitat Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., et al. (2016). Ssd: Single shot multibox detector. ECCV (pp. 21–37). New york: Springer. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., et al. (2016). Ssd: Single shot multibox detector. ECCV (pp. 21–37). New york: Springer.
Zurück zum Zitat Loy, C. C., Lin, D., Ouyang, W., Xiong, Y., Yang, S., Huang, Q., Zhou, D., Xia, W., Li, Q., Luo, P., et al. (2019). Wider face and pedestrian challenge 2018: Methods and results. arXiv preprint arXiv:1902.06854 Loy, C. C., Lin, D., Ouyang, W., Xiong, Y., Yang, S., Huang, Q., Zhou, D., Xia, W., Li, Q., Luo, P., et al. (2019). Wider face and pedestrian challenge 2018: Methods and results. arXiv preprint arXiv:​1902.​06854
Zurück zum Zitat Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Van Gool, L. (2017). Pose guided person image generation. In: Advances in Neural Information Processing Systems, pp 406–416 Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Van Gool, L. (2017). Pose guided person image generation. In: Advances in Neural Information Processing Systems, pp 406–416
Zurück zum Zitat Ma, L., Sun, Q., Georgoulis, S., Van Gool, L., Schiele, B., Fritz, M. (2018). Disentangled person image generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 99–108 Ma, L., Sun, Q., Georgoulis, S., Van Gool, L., Schiele, B., Fritz, M. (2018). Disentangled person image generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 99–108
Zurück zum Zitat Ouyang, W., Wang, X. (2013). Joint deep learning for pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, pp 2056–2063 Ouyang, W., Wang, X. (2013). Joint deep learning for pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, pp 2056–2063
Zurück zum Zitat Ouyang, W., Zhou, H., Li, H., Li, Q., Yan, J., & Wang, X. (2017). Jointly learning deep features, deformable parts, occlusion and classification for pedestrian detection. IEEE transactions on pattern analysis and machine intelligence, 40(8), 1874–1887.CrossRef Ouyang, W., Zhou, H., Li, H., Li, Q., Yan, J., & Wang, X. (2017). Jointly learning deep features, deformable parts, occlusion and classification for pedestrian detection. IEEE transactions on pattern analysis and machine intelligence, 40(8), 1874–1887.CrossRef
Zurück zum Zitat Ouyang, W., Zhou, H., Li, H., Li, Q., Yan, J., & Wang, X. (2018a). Jointly learning deep features, deformable parts, occlusion and classification for pedestrian detection. T-PAMI, 40(8), 1874–1887.CrossRef Ouyang, W., Zhou, H., Li, H., Li, Q., Yan, J., & Wang, X. (2018a). Jointly learning deep features, deformable parts, occlusion and classification for pedestrian detection. T-PAMI, 40(8), 1874–1887.CrossRef
Zurück zum Zitat Ouyang, X., Cheng, Y., Jiang, Y., Li, C. L., Zhou, P. (2018b). Pedestrian-synthesis-gan: Generating pedestrian data in real scene and beyond. arXiv preprint arXiv:1804.02047 Ouyang, X., Cheng, Y., Jiang, Y., Li, C. L., Zhou, P. (2018b). Pedestrian-synthesis-gan: Generating pedestrian data in real scene and beyond. arXiv preprint arXiv:​1804.​02047
Zurück zum Zitat Park, D., Ramanan, D., & Fowlkes, C. (2010). Multiresolution models for object detection. ECCV (pp. 241–254). New york: Springer. Park, D., Ramanan, D., & Fowlkes, C. (2010). Multiresolution models for object detection. ECCV (pp. 241–254). New york: Springer.
Zurück zum Zitat Pishchulin, L., Jain, A., Wojek, C., Andriluka, M., Thormählen, T., Schiele, B. (2011). Learning people detection models from few training samples. In: CVPR, IEEE, pp 1473–1480 Pishchulin, L., Jain, A., Wojek, C., Andriluka, M., Thormählen, T., Schiele, B. (2011). Learning people detection models from few training samples. In: CVPR, IEEE, pp 1473–1480
Zurück zum Zitat Radford, A., Metz, L., Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 Radford, A., Metz, L., Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:​1511.​06434
Zurück zum Zitat Ran, Y., Weiss, I., Zheng, Q., & Davis, L. S. (2007). Pedestrian detection via periodic motion analysis. International Journal of Computer Vision, 71(2), 143–160.CrossRef Ran, Y., Weiss, I., Zheng, Q., & Davis, L. S. (2007). Pedestrian detection via periodic motion analysis. International Journal of Computer Vision, 71(2), 143–160.CrossRef
Zurück zum Zitat Redmon, J., Divvala, S., Girshick, R., Farhadi, A. (2016). You only look once: Unified, real-time object detection. In: CVPR, IEEE, pp 779–788 Redmon, J., Divvala, S., Girshick, R., Farhadi, A. (2016). You only look once: Unified, real-time object detection. In: CVPR, IEEE, pp 779–788
Zurück zum Zitat Ren, S., He, K., Girshick, R., Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In: NIPS, pp 91–99 Ren, S., He, K., Girshick, R., Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In: NIPS, pp 91–99
Zurück zum Zitat Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical image computing and computer-assisted intervention (pp. 234–241). New york: Springer. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical image computing and computer-assisted intervention (pp. 234–241). New york: Springer.
Zurück zum Zitat Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A. M. (2016). The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In: CVPR, IEEE, pp 3234–3243 Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A. M. (2016). The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In: CVPR, IEEE, pp 3234–3243
Zurück zum Zitat Sajjadi, M., Javanmardi, M., Tasdizen, T. (2016). Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In: Advances in Neural Information Processing Systems, pp 1163–1171 Sajjadi, M., Javanmardi, M., Tasdizen, T. (2016). Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In: Advances in Neural Information Processing Systems, pp 1163–1171
Zurück zum Zitat Siarohin, A., Sangineto, E., Lathuilière, S., Sebe, N. (2018). Deformable gans for pose-based human image generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3408–3416 Siarohin, A., Sangineto, E., Lathuilière, S., Sebe, N. (2018). Deformable gans for pose-based human image generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3408–3416
Zurück zum Zitat Simonyan, K., Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 Simonyan, K., Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556
Zurück zum Zitat Song, T., Sun, L., Xie, D., Sun, H., Pu, S. (2018). Small-scale pedestrian detection based on topological line localization and temporal feature aggregation. In: The European Conference on Computer Vision (ECCV) Song, T., Sun, L., Xie, D., Sun, H., Pu, S. (2018). Small-scale pedestrian detection based on topological line localization and temporal feature aggregation. In: The European Conference on Computer Vision (ECCV)
Zurück zum Zitat Vapnik, V. (2013). The nature of statistical learning theory. New york: Springer.MATH Vapnik, V. (2013). The nature of statistical learning theory. New york: Springer.MATH
Zurück zum Zitat Villegas, R., Yang, J., Zou, Y., Sohn, S., Lin, X., Lee, H. (2017). Learning to generate long-term future via hierarchical prediction. arXiv preprint arXiv:1704.05831 Villegas, R., Yang, J., Zou, Y., Sohn, S., Lin, X., Lee, H. (2017). Learning to generate long-term future via hierarchical prediction. arXiv preprint arXiv:​1704.​05831
Zurück zum Zitat Viola, P., Jones, M. J., & Snow, D. (2005). Detecting pedestrians using patterns of motion and appearance. International Journal of Computer Vision, 63(2), 153–161.CrossRef Viola, P., Jones, M. J., & Snow, D. (2005). Detecting pedestrians using patterns of motion and appearance. International Journal of Computer Vision, 63(2), 153–161.CrossRef
Zurück zum Zitat Vobecky, A., Uricár, M., Hurych, D., Skoviera, R. (2019). Advanced pedestrian dataset augmentation for autonomous driving. In: ICCV Workshops, pp 0–0 Vobecky, A., Uricár, M., Hurych, D., Skoviera, R. (2019). Advanced pedestrian dataset augmentation for autonomous driving. In: ICCV Workshops, pp 0–0
Zurück zum Zitat Wang, X., Shrivastava, A., Gupta, A. (2017). A-fast-rcnn: Hard positive generation via adversary for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2606–2615 Wang, X., Shrivastava, A., Gupta, A. (2017). A-fast-rcnn: Hard positive generation via adversary for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2606–2615
Zurück zum Zitat Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C. (2018). Repulsion loss: detecting pedestrians in a crowd. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7774–7783 Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C. (2018). Repulsion loss: detecting pedestrians in a crowd. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7774–7783
Zurück zum Zitat Yan, Y., Xu, J., Ni, B., Zhang, W., Yang, X. (2017). Skeleton-aided articulated motion generation. In: Proceedings of the 2017 ACM on Multimedia Conference, ACM , pp 199–207 Yan, Y., Xu, J., Ni, B., Zhang, W., Yang, X. (2017). Skeleton-aided articulated motion generation. In: Proceedings of the 2017 ACM on Multimedia Conference, ACM , pp 199–207
Zurück zum Zitat Zanfir, M., Popa, A. I., Zanfir, A., Sminchisescu, C. (2018). Human appearance transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5391–5399 Zanfir, M., Popa, A. I., Zanfir, A., Sminchisescu, C. (2018). Human appearance transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5391–5399
Zurück zum Zitat Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O. (2016a). Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530 Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O. (2016a). Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:​1611.​03530
Zurück zum Zitat Zhang, L., Lin, L., Liang, X., & He, K. (2016b). Is faster r-cnn doing well for pedestrian detection? ECCV (pp. 443–457). New york: Springer. Zhang, L., Lin, L., Liang, X., & He, K. (2016b). Is faster r-cnn doing well for pedestrian detection? ECCV (pp. 443–457). New york: Springer.
Zurück zum Zitat Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B. (2016c). How far are we from solving pedestrian detection? In: CVPR, IEEE, pp 1259–1267 Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B. (2016c). How far are we from solving pedestrian detection? In: CVPR, IEEE, pp 1259–1267
Zurück zum Zitat Zhang, S., Benenson, R., Schiele, B. (2017). Citypersons: A diverse dataset for pedestrian detection. In: CVPR, IEEE, vol 1, p 3 Zhang, S., Benenson, R., Schiele, B. (2017). Citypersons: A diverse dataset for pedestrian detection. In: CVPR, IEEE, vol 1, p 3
Zurück zum Zitat Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S. Z. (2018a). Occlusion-aware r-cnn: detecting pedestrians in a crowd. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 637–653 Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S. Z. (2018a). Occlusion-aware r-cnn: detecting pedestrians in a crowd. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 637–653
Zurück zum Zitat Zhang, S., Yang, J., Schiele, B. (2018b). Occluded pedestrian detection through guided attention in cnns. In: CVPR, IEEE, pp 6995–7003 Zhang, S., Yang, J., Schiele, B. (2018b). Occluded pedestrian detection through guided attention in cnns. In: CVPR, IEEE, pp 6995–7003
Zurück zum Zitat Zheng, Z., Zheng, L., Yang, Y. (2017). Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3754–3762 Zheng, Z., Zheng, L., Yang, Y. (2017). Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3754–3762
Zurück zum Zitat Zhu, J.Y., Park, T., Isola, P., Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV, IEEE Zhu, J.Y., Park, T., Isola, P., Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV, IEEE
Metadaten
Titel
A Shape Transformation-based Dataset Augmentation Framework for Pedestrian Detection
verfasst von
Zhe Chen
Wanli Ouyang
Tongliang Liu
Dacheng Tao
Publikationsdatum
09.01.2021
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 4/2021
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-020-01412-0

Weitere Artikel der Ausgabe 4/2021

International Journal of Computer Vision 4/2021 Zur Ausgabe