Skip to main content

2021 | OriginalPaper | Buchkapitel

Top-1 CORSMAL Challenge 2020 Submission: Filling Mass Estimation Using Multi-modal Observations of Human-Robot Handovers

verfasst von : Vladimir Iashin, Francesca Palermo, Gökhan Solak, Claudio Coppola

Erschienen in: Pattern Recognition. ICPR International Workshops and Challenges

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Human-robot object handover is a key skill for the future of human-robot collaboration. CORSMAL 2020 Challenge focuses on the perception part of this problem: the robot needs to estimate the filling mass of a container held by a human. Although there are powerful methods in image processing and audio processing individually, answering such a problem requires processing data from multiple sensors together. The appearance of the container, the sound of the filling, and the depth data provide essential information. We propose a multi-modal method to predict three key indicators of the filling mass: filling type, filling level, and container capacity. These indicators are then combined to estimate the filling mass of a container. Our method obtained Top-1 overall performance among all submissions to CORSMAL 2020 Challenge on both public and private subsets while showing no evidence of overfitting. Our source code is publicly available: github.​com/​v-iashin/​CORSMAL.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 Workshop on Deep Learning, Dec 2014 (2014) Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 Workshop on Deep Learning, Dec 2014 (2014)
4.
Zurück zum Zitat Giannakopoulos, T.: pyAudioAnalysis: an open-source python library for audio signal analysis. PLoS One 10(12), e0144610 (2015)CrossRef Giannakopoulos, T.: pyAudioAnalysis: an open-source python library for audio signal analysis. PLoS One 10(12), e0144610 (2015)CrossRef
5.
Zurück zum Zitat Griffith, S., Sukhoy, V., Wegter, T., Stoytchev, A.: Object categorization in the sink: learning behavior-grounded object categories with water. In: Proceedings of the 2012 ICRA Workshop on Semantic Perception, Mapping and Exploration. Citeseer (2012) Griffith, S., Sukhoy, V., Wegter, T., Stoytchev, A.: Object categorization in the sink: learning behavior-grounded object categories with water. In: Proceedings of the 2012 ICRA Workshop on Semantic Perception, Mapping and Exploration. Citeseer (2012)
6.
Zurück zum Zitat Hampali, S., Rad, M., Oberweger, M., Lepetit, V.: Honnotate: a method for 3D annotation of hand and object poses. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3196–3206 (2020) Hampali, S., Rad, M., Oberweger, M., Lepetit, V.: Honnotate: a method for 3D annotation of hand and object poses. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3196–3206 (2020)
7.
Zurück zum Zitat He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
8.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
10.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
11.
Zurück zum Zitat Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7310–7311 (2017) Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7310–7311 (2017)
12.
Zurück zum Zitat Iashin, V., Rahtu, E.: A better use of audio-visual cues: dense video captioning with bi-modal transformer. In: British Machine Vision Conference (BMVC) (2020) Iashin, V., Rahtu, E.: A better use of audio-visual cues: dense video captioning with bi-modal transformer. In: British Machine Vision Conference (BMVC) (2020)
13.
Zurück zum Zitat Iashin, V., Rahtu, E.: Multi-modal dense video captioning. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 958–959 (2020) Iashin, V., Rahtu, E.: Multi-modal dense video captioning. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 958–959 (2020)
14.
Zurück zum Zitat King, R.D., et al.: Is it better to combine predictions? Protein Eng. 13(1), 15–19 (2000)CrossRef King, R.D., et al.: Is it better to combine predictions? Protein Eng. 13(1), 15–19 (2000)CrossRef
15.
Zurück zum Zitat Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015) Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015)
16.
Zurück zum Zitat Kokic, M., Kragic, D., Bohg, J.: Learning to estimate pose and shape of hand-held objects from RGB images. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3980–3987. IEEE (2019) Kokic, M., Kragic, D., Bohg, J.: Learning to estimate pose and shape of hand-held objects from RGB images. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3980–3987. IEEE (2019)
20.
Zurück zum Zitat Liu, Y., Albanie, S., Nagrani, A., Zisserman, A.: Use what you have: video retrieval using representations from collaborative experts. In: British Machine Vision Conference (2019) Liu, Y., Albanie, S., Nagrani, A., Zisserman, A.: Use what you have: video retrieval using representations from collaborative experts. In: British Machine Vision Conference (2019)
21.
Zurück zum Zitat Mottaghi, R., Schenck, C., Fox, D., Farhadi, A.: See the glass half full: reasoning about liquid containers, their volume and content. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1871–1880 (2017) Mottaghi, R., Schenck, C., Fox, D., Farhadi, A.: See the glass half full: reasoning about liquid containers, their volume and content. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1871–1880 (2017)
22.
Zurück zum Zitat Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: PVNet: pixel-wise voting network for 6DoF pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4561–4570 (2019) Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: PVNet: pixel-wise voting network for 6DoF pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4561–4570 (2019)
23.
Zurück zum Zitat Phillips, C.J., Lecce, M., Daniilidis, K.: Seeing glassware: from edge detection to pose estimation and shape recovery. In: Robotics: Science and Systems, vol. 3 (2016) Phillips, C.J., Lecce, M., Daniilidis, K.: Seeing glassware: from edge detection to pose estimation and shape recovery. In: Robotics: Science and Systems, vol. 3 (2016)
24.
Zurück zum Zitat Sajjan, S., et al.: Clear grasp: 3D shape estimation of transparent objects for manipulation. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 3634–3642. IEEE (2020) Sajjan, S., et al.: Clear grasp: 3D shape estimation of transparent objects for manipulation. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 3634–3642. IEEE (2020)
26.
Zurück zum Zitat Statistics, L.B., Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)CrossRef Statistics, L.B., Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)CrossRef
27.
Zurück zum Zitat Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6450–6459 (2018) Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6450–6459 (2018)
28.
Zurück zum Zitat Wang, C., et al.: Densefusion: 6D object pose estimation by iterative dense fusion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3343–3352 (2019) Wang, C., et al.: Densefusion: 6D object pose estimation by iterative dense fusion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3343–3352 (2019)
29.
Zurück zum Zitat Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019) Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)
30.
Zurück zum Zitat Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1328–1338 (2019) Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1328–1338 (2019)
31.
Zurück zum Zitat Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes (2018) Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes (2018)
32.
Metadaten
Titel
Top-1 CORSMAL Challenge 2020 Submission: Filling Mass Estimation Using Multi-modal Observations of Human-Robot Handovers
verfasst von
Vladimir Iashin
Francesca Palermo
Gökhan Solak
Claudio Coppola
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-68793-9_31