Skip to main content

2016 | OriginalPaper | Buchkapitel

Playing for Data: Ground Truth from Computer Games

verfasst von : Stephan R. Richter, Vibhav Vineet, Stefan Roth, Vladlen Koltun

Erschienen in: Computer Vision – ECCV 2016

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recent progress in computer vision has been driven by high-capacity models trained on large datasets. Unfortunately, creating large datasets with pixel-level labels has been extremely costly due to the amount of human effort required. In this paper, we present an approach to rapidly creating pixel-accurate semantic label maps for images extracted from modern computer games. Although the source code and the internal operation of commercial games are inaccessible, we show that associations between image patches can be reconstructed from the communication between the game and the graphics hardware. This enables rapid propagation of semantic labels within and across images synthesized by the game, with no access to the source code or the content. We validate the presented approach by producing dense pixel-level semantic annotations for 25 thousand images synthesized by a photorealistic open-world computer game. Experiments on semantic segmentation datasets show that using the acquired data to supplement real-world images significantly increases accuracy and that the acquired data enables reducing the amount of hand-labeled real-world data: models trained with game data and just \(\tfrac{1}{3}\) of the CamVid training set outperform models trained on the complete CamVid training set.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB (1994) Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB (1994)
2.
Zurück zum Zitat Akenine-Möller, T., Haines, E., Hoffman, N.: Real-Time Rendering, 3rd edn. A K Peters, Natick (2008)CrossRef Akenine-Möller, T., Haines, E., Hoffman, N.: Real-Time Rendering, 3rd edn. A K Peters, Natick (2008)CrossRef
4.
Zurück zum Zitat Aubry, M., Maturana, D., Efros, A.A., Russell, B.C., Sivic, J.: Seeing 3D chairs: exemplar part-based 2D–3D alignment using a large dataset of CAD models. In: CVPR (2014) Aubry, M., Maturana, D., Efros, A.A., Russell, B.C., Sivic, J.: Seeing 3D chairs: exemplar part-based 2D–3D alignment using a large dataset of CAD models. In: CVPR (2014)
5.
Zurück zum Zitat Aubry, M., Russell, B.C.: Understanding deep features with computer-generated imagery. In: ICCV (2015) Aubry, M., Russell, B.C.: Understanding deep features with computer-generated imagery. In: ICCV (2015)
6.
Zurück zum Zitat Baker, S., Scharstein, D., Lewis, J.P., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. IJCV 92(1), 1–31 (2011)CrossRef Baker, S., Scharstein, D., Lewis, J.P., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. IJCV 92(1), 1–31 (2011)CrossRef
7.
Zurück zum Zitat Barron, J.L., Fleet, D.J., Beauchemin, S.S.: Performance of optical flow techniques. IJCV 12(1), 43–77 (1994)CrossRef Barron, J.L., Fleet, D.J., Beauchemin, S.S.: Performance of optical flow techniques. IJCV 12(1), 43–77 (1994)CrossRef
8.
Zurück zum Zitat Brostow, G.J., Fauqueur, J., Cipolla, R.: Semantic object classes in video: a high-definition ground truth database. Pattern Recogn. Lett. 30(2), 88–97 (2009)CrossRef Brostow, G.J., Fauqueur, J., Cipolla, R.: Semantic object classes in video: a high-definition ground truth database. Pattern Recogn. Lett. 30(2), 88–97 (2009)CrossRef
9.
Zurück zum Zitat Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 611–625. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33783-3_44 Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 611–625. Springer, Heidelberg (2012). doi:10.​1007/​978-3-642-33783-3_​44
10.
Zurück zum Zitat Chen, C., Seff, A., Kornhauser, A.L., Xiao, J.: DeepDriving: learning affordance for direct perception in autonomous driving. In: ICCV (2015) Chen, C., Seff, A., Kornhauser, A.L., Xiao, J.: DeepDriving: learning affordance for direct perception in autonomous driving. In: ICCV (2015)
11.
Zurück zum Zitat Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)
12.
Zurück zum Zitat Dosovitskiy, A., Fischer, P., Ilg, E., Häusser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D., Brox, T.: FlowNet: learning optical flow with convolutional networks. In: ICCV (2015) Dosovitskiy, A., Fischer, P., Ilg, E., Häusser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D., Brox, T.: FlowNet: learning optical flow with convolutional networks. In: ICCV (2015)
13.
Zurück zum Zitat Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Rob. Res. 32(11), 1231–1237 (2013)CrossRef Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Rob. Res. 32(11), 1231–1237 (2013)CrossRef
14.
Zurück zum Zitat Haltakov, V., Unger, C., Ilic, S.: Framework for generation of synthetic ground truth data for driver assistance applications. In: Weickert, J., Hein, M., Schiele, B. (eds.) GCPR 2013. LNCS, vol. 8142, pp. 323–332. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40602-7_35 CrossRef Haltakov, V., Unger, C., Ilic, S.: Framework for generation of synthetic ground truth data for driver assistance applications. In: Weickert, J., Hein, M., Schiele, B. (eds.) GCPR 2013. LNCS, vol. 8142, pp. 323–332. Springer, Heidelberg (2013). doi:10.​1007/​978-3-642-40602-7_​35 CrossRef
15.
Zurück zum Zitat Handa, A., Pătrăucean, V., Badrinarayanan, V., Stent, S., Cipolla, R.: Understanding real world indoor scenes with synthetic data. In: CVPR (2016) Handa, A., Pătrăucean, V., Badrinarayanan, V., Stent, S., Cipolla, R.: Understanding real world indoor scenes with synthetic data. In: CVPR (2016)
16.
Zurück zum Zitat Handa, A., Whelan, T., McDonald, J., Davison, A.J.: A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In: ICRA (2014) Handa, A., Whelan, T., McDonald, J., Davison, A.J.: A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In: ICRA (2014)
17.
Zurück zum Zitat He, X., Zemel, R.S., Carreira-Perpiñán, M.: Multiscale conditional random fields for image labeling. In: CVPR (2004) He, X., Zemel, R.S., Carreira-Perpiñán, M.: Multiscale conditional random fields for image labeling. In: CVPR (2004)
18.
Zurück zum Zitat Horn, B.K.P., Schunck, B.G.: Determining optical flow. Artif. Intell. 17(1–3), 185–203 (1981)CrossRef Horn, B.K.P., Schunck, B.G.: Determining optical flow. Artif. Intell. 17(1–3), 185–203 (1981)CrossRef
19.
Zurück zum Zitat Hunt, G., Brubacher, D.: Detours: binary interception of Win32 functions. In: 3rd USENIX Windows NT Symposium (1999) Hunt, G., Brubacher, D.: Detours: binary interception of Win32 functions. In: 3rd USENIX Windows NT Symposium (1999)
21.
Zurück zum Zitat Kaneva, B., Torralba, A., Freeman, W.T.: Evaluation of image features using a photorealistic virtual world. In: ICCV (2011) Kaneva, B., Torralba, A., Freeman, W.T.: Evaluation of image features using a photorealistic virtual world. In: ICCV (2011)
23.
Zurück zum Zitat Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: NIPS (2011) Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: NIPS (2011)
24.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)
25.
Zurück zum Zitat Kundu, A., Vineet, V., Koltun, V.: Feature space optimization for semantic video segmentation. In: CVPR (2016) Kundu, A., Vineet, V., Koltun, V.: Feature space optimization for semantic video segmentation. In: CVPR (2016)
26.
Zurück zum Zitat Liebelt, J., Schmid, C., Schertler, K.: Viewpoint-independent object class detection using 3D feature maps. In: CVPR (2008) Liebelt, J., Schmid, C., Schertler, K.: Viewpoint-independent object class detection using 3D feature maps. In: CVPR (2008)
27.
Zurück zum Zitat Liu, B., He, X.: Multiclass semantic video segmentation with object-level active inference. In: CVPR (2015) Liu, B., He, X.: Multiclass semantic video segmentation with object-level active inference. In: CVPR (2015)
28.
Zurück zum Zitat Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015) Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
29.
Zurück zum Zitat Marín, J., Vázquez, D., Gerónimo, D., López, A.M.: Learning appearance in virtual scenarios for pedestrian detection. In: CVPR (2010) Marín, J., Vázquez, D., Gerónimo, D., López, A.M.: Learning appearance in virtual scenarios for pedestrian detection. In: CVPR (2010)
30.
Zurück zum Zitat Massa, F., Russell, B.C., Aubry, M.: Deep exemplar 2D–3D detection by adapting from real to rendered views. In: CVPR (2016) Massa, F., Russell, B.C., Aubry, M.: Deep exemplar 2D–3D detection by adapting from real to rendered views. In: CVPR (2016)
31.
Zurück zum Zitat Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR (2016) Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR (2016)
32.
Zurück zum Zitat McCane, B., Novins, K., Crannitch, D., Galvin, B.: On benchmarking optical flow. CVIU 84(1), 126–143 (2001)MATH McCane, B., Novins, K., Crannitch, D., Galvin, B.: On benchmarking optical flow. CVIU 84(1), 126–143 (2001)MATH
33.
Zurück zum Zitat Papon, J., Schoeler, M.: Semantic pose using deep networks trained on synthetic RGB-D. In: ICCV (2015) Papon, J., Schoeler, M.: Semantic pose using deep networks trained on synthetic RGB-D. In: ICCV (2015)
34.
Zurück zum Zitat Peng, X., Sun, B., Ali, K., Saenko, K.: Learning deep object detectors from 3D models. In: ICCV (2015) Peng, X., Sun, B., Ali, K., Saenko, K.: Learning deep object detectors from 3D models. In: ICCV (2015)
35.
Zurück zum Zitat Pepik, B., Stark, M., Gehler, P.V., Schiele, B.: Multi-view and 3D deformable part models. PAMI 37(11), 2232–2245 (2015)CrossRef Pepik, B., Stark, M., Gehler, P.V., Schiele, B.: Multi-view and 3D deformable part models. PAMI 37(11), 2232–2245 (2015)CrossRef
36.
Zurück zum Zitat Ren, X., Malik, J.: Learning a classification model for segmentation. In: ICCV (2003) Ren, X., Malik, J.: Learning a classification model for segmentation. In: ICCV (2003)
37.
Zurück zum Zitat Richter, S.R., Roth, S.: Discriminative shape from shading in uncalibrated illumination. In: CVPR (2015) Richter, S.R., Roth, S.: Discriminative shape from shading in uncalibrated illumination. In: CVPR (2015)
39.
Zurück zum Zitat Ros, G., Ramos, S., Granados, M., Bakhtiary, A., Vázquez, D., López, A.M.: Vision-based offline-online perception paradigm for autonomous driving. In: WACV (2015) Ros, G., Ramos, S., Granados, M., Bakhtiary, A., Vázquez, D., López, A.M.: Vision-based offline-online perception paradigm for autonomous driving. In: WACV (2015)
40.
Zurück zum Zitat Saito, T., Takahashi, T.: Comprehensible rendering of 3-D shapes. In: SIGGRAPH (1990) Saito, T., Takahashi, T.: Comprehensible rendering of 3-D shapes. In: SIGGRAPH (1990)
41.
Zurück zum Zitat Sharp, T., Keskin, C., Robertson, D.P., Taylor, J., Shotton, J., Kim, D., Rhemann, C., Leichter, I., Vinnikov, A., Wei, Y., Freedman, D., Kohli, P., Krupka, E., Fitzgibbon, A.W., Izadi, S.: Accurate, robust, and flexible real-time hand tracking. In: CHI (2015) Sharp, T., Keskin, C., Robertson, D.P., Taylor, J., Shotton, J., Kim, D., Rhemann, C., Leichter, I., Vinnikov, A., Wei, Y., Freedman, D., Kohli, P., Krupka, E., Fitzgibbon, A.W., Izadi, S.: Accurate, robust, and flexible real-time hand tracking. In: CHI (2015)
42.
Zurück zum Zitat Shotton, J., Girshick, R.B., Fitzgibbon, A.W., Sharp, T., Cook, M., Finocchio, M., Moore, R., Kohli, P., Criminisi, A., Kipman, A., Blake, A.: Efficient human pose estimation from single depth images. PAMI 35(12), 2821–2840 (2013)CrossRef Shotton, J., Girshick, R.B., Fitzgibbon, A.W., Sharp, T., Cook, M., Finocchio, M., Moore, R., Kohli, P., Criminisi, A., Kipman, A., Blake, A.: Efficient human pose estimation from single depth images. PAMI 35(12), 2821–2840 (2013)CrossRef
43.
Zurück zum Zitat Shotton, J., Winn, J.M., Rother, C., Criminisi, A.: Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. IJCV 81(1), 2–23 (2009)CrossRef Shotton, J., Winn, J.M., Rother, C., Criminisi, A.: Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. IJCV 81(1), 2–23 (2009)CrossRef
44.
Zurück zum Zitat Stark, M., Goesele, M., Schiele, B.: Back to the future: learning shape models from 3D CAD data. In: BMVC (2010) Stark, M., Goesele, M., Schiele, B.: Back to the future: learning shape models from 3D CAD data. In: BMVC (2010)
45.
Zurück zum Zitat Sturgess, P., Alahari, K., Ladicky, L., Torr, P.H.S.: Combining appearance and structure from motion features for road scene understanding. In: BMVC (2009) Sturgess, P., Alahari, K., Ladicky, L., Torr, P.H.S.: Combining appearance and structure from motion features for road scene understanding. In: BMVC (2009)
46.
Zurück zum Zitat Taylor, G.R., Chosak, A.J., Brewer, P.C.: OVVV: using virtual worlds to design and evaluate surveillance systems. In: CVPR (2007) Taylor, G.R., Chosak, A.J., Brewer, P.C.: OVVV: using virtual worlds to design and evaluate surveillance systems. In: CVPR (2007)
47.
Zurück zum Zitat Tighe, J., Lazebnik, S.: Superparsing – scalable nonparametric image parsing with superpixels. IJCV 101(2), 329–349 (2013)MathSciNetCrossRef Tighe, J., Lazebnik, S.: Superparsing – scalable nonparametric image parsing with superpixels. IJCV 101(2), 329–349 (2013)MathSciNetCrossRef
48.
Zurück zum Zitat Tripathi, S., Belongie, S., Hwang, Y., Nguyen, T.Q.: Semantic video segmentation: exploring inference efficiency. In: ISOCC (2015) Tripathi, S., Belongie, S., Hwang, Y., Nguyen, T.Q.: Semantic video segmentation: exploring inference efficiency. In: ISOCC (2015)
49.
Zurück zum Zitat Vázquez, D., López, A.M., Marín, J., Ponsa, D., Gomez, D.G.: Virtual and real world adaptation for pedestrian detection. PAMI 36(4), 797–809 (2014)CrossRef Vázquez, D., López, A.M., Marín, J., Ponsa, D., Gomez, D.G.: Virtual and real world adaptation for pedestrian detection. PAMI 36(4), 797–809 (2014)CrossRef
50.
Zurück zum Zitat Xie, J., Kiefel, M., Sun, M.T., Geiger, A.: Semantic instance annotation of street scenes by 3D to 2D label transfer. In: CVPR (2016) Xie, J., Kiefel, M., Sun, M.T., Geiger, A.: Semantic instance annotation of street scenes by 3D to 2D label transfer. In: CVPR (2016)
51.
Zurück zum Zitat Xu, J., Vázquez, D., López, A.M., Marín, J., Ponsa, D.: Learning a part-based pedestrian detector in a virtual world. IEEE Trans. Intell. Transp. Syst. 15(5), 2121–2131 (2014)CrossRef Xu, J., Vázquez, D., López, A.M., Marín, J., Ponsa, D.: Learning a part-based pedestrian detector in a virtual world. IEEE Trans. Intell. Transp. Syst. 15(5), 2121–2131 (2014)CrossRef
52.
Zurück zum Zitat Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: ICLR (2016) Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: ICLR (2016)
53.
Zurück zum Zitat Veeravasarapu, V.S.R., Rothkopf, C., Ramesh, V.: Model-driven simulations for deep convolutional neural networks (2016). arXiv preprint arXiv:1605.09582 Veeravasarapu, V.S.R., Rothkopf, C., Ramesh, V.: Model-driven simulations for deep convolutional neural networks (2016). arXiv preprint arXiv:​1605.​09582
54.
Zurück zum Zitat Shafaei, A., Little, J.J., Schmidt, M.: Play and learn: using video Games to train computer vision models. In: BMVC (2016) Shafaei, A., Little, J.J., Schmidt, M.: Play and learn: using video Games to train computer vision models. In: BMVC (2016)
Metadaten
Titel
Playing for Data: Ground Truth from Computer Games
verfasst von
Stephan R. Richter
Vibhav Vineet
Stefan Roth
Vladlen Koltun
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-46475-6_7