nach oben

Machine Vision and Applications

Erschienen in:

01.08.2018 | Original Paper

A hybrid image dataset toward bridging the gap between real and simulation environments for robotics

Annotated desktop objects real and synthetic images dataset: ADORESet

verfasst von: Ertugrul Bayraktar, Cihat Bora Yigit, Pinar Boyraz

Erschienen in: Machine Vision and Applications | Ausgabe 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The primary motivation of computer vision in the robotics field is to obtain a perception level that is as close as possible to human visual system. To achieve this, the inclusion of large datasets is necessary, sometimes involving less-frequent and seemingly irrelevant data to increase the system robustness. To minimize the effort and time in forming such extensive datasets from real world, the preferred method is to utilize simulation environments, replicating real-world conditions as much as possible. Following this solution path, the machine vision problems in robotics (i.e., object detection, recognition, and manipulation) often employ synthetic images in datasets and, however, do not mix them with real-world images. When the systems are trained only using the synthetic images and tested within the simulated world, the tasks requiring object recognition in robotics can be accomplished. However, the systems trained using this procedure cannot be directly used in the real-world experiments or end-user products due to the inconsistencies between real and simulation environments. Therefore, we propose a hybrid image dataset including annotated desktop objects from real and synthetic worlds (ADORESet). This hybrid dataset provides purposeful object categories with a sufficient number of real and synthetic images. ADORESet is composed of colored images with the dimension of \(300\times 300\) pixels within 30 categories. Each class has 2500 real-world images acquired from the wild web and 750 synthetic images that are generated within Gazebo simulation environment. This hybrid dataset enables researchers to implement their own algorithms for both real-world and simulation environment conditions. ADORESet is composed of fully annotated object images. The limits of objects are manually specified, and the bounding box coordinates are provided. The successor objects are also labeled to give statistical information and the likelihood about the relations of the objects within the dataset. To further demonstrate the benefits of this dataset, it is tested in object recognition tasks by fine-tuning the state-of-the-art deep convolutional neural networks such as VGGNet, InceptionV3, ResNet, and Xception. The possible combinations regarding the data types for these models are compared in terms of time, accuracy, and loss values. As a result of the conducted object recognition experiments, training with all-real images yields approximately \(49\%\) validation accuracy for simulation images. When the training is performed with all-synthetic images and validated using all-real images, the accuracy becomes lower than \(10\%\). If the complete ADORESet is employed for training and validation, the hybrid dataset validation accuracy reaches approximately to \(95\%\). This result proves further that including the real and synthetic images together in the training and validation sessions increases the overall system accuracy and reliability.

Vorheriger Artikel Hyperspectral demosaicking and crosstalk correction using deep learning

Nächster Artikel Video mining for facial action unit classification using statistical spatial–temporal feature image and LoG deep convolutional neural network

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Buhrmester, M., Kwang, T., Gosling, S.D.: Amazon’s mechanical turk: a new source of inexpensive, yet high-quality, data? Perspect. Psychol. Sci. 6(1), 3–5 (2011)CrossRef

Calli, B., Singh, A., Walsman, A., Srinivasa, S., Abbeel, P., Dollar, A.M.: The ycb object and model set: towards common benchmarks for manipulation research. In: IEEE International Conference on Advanced Robotics, pp. 510–517. IEEE (2015)

Carlucci, F.M., Russo, P., Caputo, B.: A deep representation for depth images from synthetic data. In: IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 1362–1369. IEEE (2017)

Chollet, F.: Xception: deep learning with depthwise separable convolutions. arXiv preprint arXiv:1610.02357 (2016)

Chung, K.L.: On a stochastic approximation method. Ann. Math. Stat. 25(3), 463–483 (1954)MathSciNetMATHCrossRef

Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D., Brox, T.: Flownet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2758–2766 (2015)

Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRef

Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Comput. Vis. Image Underst. 106(1), 59–70 (2007)CrossRef

Fischer, P., Dosovitskiy, A., Brox, T.: Descriptor matching with convolutional neural networks: a comparison to sift. arXiv preprint arXiv:1405.5769 (2014)

10.

Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual worlds as proxy for multi-object tracking analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4340–4349 (2016)

11.

Georgakis, G., Mousavian, A., Berg, A.C., Kosecka, J.: Synthesizing training data for object detection in indoor scenes. arXiv preprint arXiv:1702.07836 (2017)

12.

Giusti, A., Guzzi, J., Cireşan, D.C., He, F.L., Rodríguez, J.P., Fontana, F., Faessler, M., Forster, C., Schmidhuber, J., Di Caro, G., et al.: A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robot. Autom. Lett. 1(2), 661–667 (2016)CrossRef

13.

Griffin, G., Holub, A., Perona, P.: Caltech-256 Object Category Dataset. Technical Report 7694, California Institute of Technology, Pasadena (2007)

14.

Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2315–2324 (2016)

15.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

16.

Johnson-Roberson, M., Barto, C., Mehta, R., Sridhar, S.N., Rosaen, K., Vasudevan, R.: Driving in the matrix: can virtual worlds replace human-generated annotations for real world tasks? In: IEEE International Conference on Robotics and Automation, pp. 1–8. IEEE (2017)

17.

Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR) (2014)

18.

Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images, vol. 1, No. 4. Technical report, University of Toronto, p. 7 (2009)

19.

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

20.

LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRef

21.

Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)MathSciNetMATH

22.

Levine, S., Pastor, P., Krizhevsky, A., Quillen, D.: Learning hand-eye coordination for robotic grasping with large-scale data collection. In: International Symposium on Experimental Robotics, pp. 173–184. Springer (2016)

23.

Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)

24.

Milford, M., Shen, C., Lowry, S., Suenderhauf, N., Shirazi, S., Lin, G., Liu, F., Pepperell, E., Lerma, C., Upcroft, B., et al.: Sequence searching with deep-learnt depth for condition-and viewpoint-invariant route-based place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 18–25 (2015)

25.

Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)CrossRef

26.

Ødegaard, N., Knapskog, A.O., Cochin, C., Louvigne, J.C.: Classification of ships using real and simulated data in a convolutional neural network. In: Radar Conference (RadarConf), 2016 IEEE, pp. 1–6. IEEE (2016)

27.

Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1717–1724. IEEE (2014)

28.

Peng, X., Sun, B., Ali, K., Saenko, K.: Learning deep object detectors from 3D models. In: IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1278–1286. IEEE (2015)

29.

Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

30.

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y MathSciNetCrossRef

31.

Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: Labelme: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1), 157–173 (2008)CrossRef

32.

Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 3, p. 6 (2017)

33.

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR (2014). (abs/1409.1556)

34.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

35.

Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1958–1970 (2008)CrossRef

36.

Tremblay, J., Prakash, A., Acuna, D., Brophy, M., Jampani, V., Anil, C., To, T., Cameracci, E., Boochoon, S., Birchfield, S.: Training deep networks with synthetic data: bridging the reality gap by domain randomization. arXiv preprint arXiv:1804.06516 (2018)

37.

Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)

38.

Yan, K., Wang, Y., Liang, D., Huang, T., Tian, Y.: Cnn vs. sift for image retrieval: alternative or complementary? In: Proceedings of the 2016 ACM on Multimedia Conference, pp. 407–411. ACM (2016)

39.

Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems (NIPS), pp. 3320–3328 (2014)

40.

Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, pp. 818–833. Springer, Cham (2014)

41.

Zheng, L., Yang, Y., Tian, Q.: SIFT meets CNN: A decade survey of instance retrieval. IEEE. Trans. Pattern. Anal. Mach. Intell. 40(5), 1224–1244 (2018)CrossRef

42.

Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1252–1264 (2017)

43.

Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems, pp. 487–495 (2014)

44.

Zhuo, L., Jiang, L., Zhu, Z., Li, J., Zhang, J., Long, H.: Vehicle classification for large-scale traffic surveillance videos using convolutional neural networks. Mach. Vis. Appl. 28(7), 793–802 (2017). https://doi.org/10.1007/s00138-017-0846-2 CrossRef

45.

Zuo, H., Lang, H., Blasch, E., Ling, H.: Covert photo classification by deep convolutional neural networks. Mach. Vis. Appl. 28(5), 623–634 (2017). https://doi.org/10.1007/s00138-017-0859-x CrossRef

Titel: A hybrid image dataset toward bridging the gap between real and simulation environments for robotics
Annotated desktop objects real and synthetic images dataset: ADORESet
verfasst von: Ertugrul Bayraktar
Cihat Bora Yigit
Pinar Boyraz
Publikationsdatum: 01.08.2018
Verlag: Springer Berlin Heidelberg
Erschienen in: Machine Vision and Applications / Ausgabe 1/2019
Print ISSN: 0932-8092
Elektronische ISSN: 1432-1769
DOI: https://doi.org/10.1007/s00138-018-0966-3

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1/2019

Hyperspectral demosaicking and crosstalk correction using deep learning

Ancient pelvis reconstruction from collapsed component bones using statistical shape models

Robust UAV-based tracking using hybrid classifiers

Age estimation in facial images through transfer learning

Video mining for facial action unit classification using statistical spatial–temporal feature image and LoG deep convolutional neural network

A probabilistic analysis of a common RANSAC heuristic