Abstract
An extensive and diverse dataset is a crucial requirement for the successful training of a deep neural network. Compared to on-site data collection, 3D modeling allows to generate large datasets faster and cheaper. Still, the diversity and perceptual realism of synthetic images remain in the realm of a 3D artist’s experience. Moreover, hard sample mining with 3D modeling poses an open question: which synthetic images are challenging for an object detection model? We present an Adversarial 3D modeling framework for training an object detection model against a reinforcement learning-based adversarial controller. The controller alters the 3D simulator parameters to generate complex synthetic images. The controller aims to minimize the score of the object detection model during the training time. We hypothesize that such an objective of the controller allows to maximize the score of the detection model during inference on real-world data. We evaluate our approach by training a YOLOv3 object detection model using our adversarial framework. A comparison with a similar model trained on random synthetic and real images proves that our framework allows us to achieve better performance than using random real or synthetic data.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Dosovitskiy, A., Ros, G., Codevilla, F., López, A., Koltun, V.: CARLA: an open urban driving simulator. CoRR abs/1711.03938 (2017). http://arxiv.org/abs/1711.03938
Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual worlds as proxy for multi-object tracking analysis. CoRR abs/1605.06457 (2016). http://arxiv.org/abs/1605.06457
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.C., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, Montreal, Quebec, Canada, 8–13 December 2014, pp. 2672–2680 (2014). http://papers.nips.cc/paper/5423-generative-adversarial-nets
Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. CoRR abs/1809.01999 (2018). http://arxiv.org/abs/1809.01999
Hausknecht, M.J., Stone, P.: Deep recurrent q-learning for partially observable MDPs. CoRR abs/1507.06527 (2015). http://arxiv.org/abs/1507.06527
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5967–5976. IEEE (2017)
Jocher, G., guigarfr, perry0418, Ttayu, Veitch-Michaelis, J., Bianconi, G., Baltaci, F., Suess, D., WannaSeaU, IlyaOvodov: ultralitycs/yolov3: Rectangular Inference, Conv2d + Batchnorm2d Layer Fusion (2019). https://doi.org/10.5281/zenodo.2672652
Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., Vanhoucke, V., Levine, S.: QT-Opt: scalable deep reinforcement learning for vision-based robotic manipulation. CoRR abs/1806.10293 (2018). http://arxiv.org/abs/1806.10293
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. CoRR abs/1812.04948 (2018). http://arxiv.org/abs/1812.04948
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN (2019)
Kniaz, V.V., Knyaz, V., Remondino, F.: The point where reality meets fantasy: mixed adversarial generators for image splice detection. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 215–226. Curran Associates, Inc. (2019). http://papers.nips.cc/paper/8315-the-point-where-reality-meets-fantasy-mixed-adversarial-generators.for-image-splice-detection.pdf
Kniaz, V.V., Knyaz, V.A., Hladůvka, J., Kropatsch, W.G., Mizginov, V.: Thermalgan: Multimodal color-to-thermal image translation for person re-identification in multispectral dataset. In: Leal-Taixé, L., Roth, S. (eds.) Computer Vision – ECCV 2018 Workshops, pp. 606–624. Springer, Cham (2019). https://link.springer.com/chapter/10.1007/978-3-030-11024-6_46
Kniaz, V.V., Moshkantsev, P.V., Mizginov, V.A.: Deep learning a single photo voxel model prediction from real and synthetic images. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y. (eds.) Advances in Neural Computation, Machine Learning, and Cognitive Research III, pp. 3–16. Springer International Publishing, Cham (2020)
Knyaz, V.: Multimodal data fusion for object recognition. In: Stella, E. (ed.) Multimodal Sensing: Technologies and Applications, vol. 11059, pp. 198–209. International Society for Optics and Photonics, SPIE (2019). https://doi.org/10.1117/12.2526067
Lin, T., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, 6–12 September 2014, Proceedings, Part V, pp. 740–755 (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.A.: Playing atari with deep reinforcement learning. CoRR abs/1312.5602 (2013). http://arxiv.org/abs/1312.5602
Neumann, L., Karg, M., Zhang, S., Scharfenberger, C., Piegert, E., Mistr, S., Prokofyeva, O., Thiel, R., Vedaldi, A., Zisserman, A., Schiele, B.: Nightowls: a pedestrians at night dataset. In: Computer Vision - ACCV 2018 - 14th Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018, Revised Selected Papers, Part I, pp. 691–705 (2018). https://doi.org/10.1007/978-3-030-20887-5_43
OpenAI, Andrychowicz, M., Baker, B., Chociej, M., Józefowicz, R., McGrew, B., Pachocki, J.W., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A., Schneider, J., Sidor, S., Tobin, J., Welinder, P., Weng, L., Zaremba, W.: Learning dexterous in-hand manipulation. CoRR abs/1808.00177 (2018). http://arxiv.org/abs/1808.00177
Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91
Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. CoRR abs/1804.02767 (2018). http://arxiv.org/abs/1804.02767
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3234–3243 (2016)
Tremblay, J., Prakash, A., Acuna, D., Brophy, M., Jampani, V., Anil, C., To, T., Cameracci, E., Boochoon, S., Bircheld, S.: Training deep networks with synthetic data: bridging the reality gap by domain randomization. CoRR abs/1804.06516 (2018). http://arxiv.org/abs/1804.06516
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2242–2251. IEEE (2017)
Acknowledgments
The reported study was funded by Russian Foundation for Basic Research (RFBR) according to the project N\(\mathrm {^{o}}\) 17-29-03185, and by the Russian Science Foundation (RSF) according to the research project N\(\mathrm {^{o}}\) 19-11-11008.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kniaz, V.V., Knyaz, V.A., Mizginov, V., Papazyan, A., Fomin, N., Grodzitsky, L. (2021). Adversarial Dataset Augmentation Using Reinforcement Learning and 3D Modeling. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y. (eds) Advances in Neural Computation, Machine Learning, and Cognitive Research IV. NEUROINFORMATICS 2020. Studies in Computational Intelligence, vol 925. Springer, Cham. https://doi.org/10.1007/978-3-030-60577-3_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-60577-3_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60576-6
Online ISBN: 978-3-030-60577-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)