nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

Liquid Pouring Monitoring via Rich Sensory Inputs

verfasst von : Tz-Ying Wu, Juan-Ting Lin, Tsun-Hsuang Wang, Chan-Wei Hu, Juan Carlos Niebles, Min Sun

Erschienen in: Computer Vision – ECCV 2018

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Humans have the amazing ability to perform very subtle manipulation task using a closed-loop control system with imprecise mechanics (i.e., our body parts) but rich sensory information (e.g., vision, tactile, etc.). In the closed-loop system, the ability to monitor the state of the task via rich sensory information is important but often less studied. In this work, we take liquid pouring as a concrete example and aim at learning to continuously monitor whether liquid pouring is successful (e.g., no spilling) or not via rich sensory inputs. We mimic humans’ rich sensories using synchronized observation from a chest-mounted camera and a wrist-mounted IMU sensor. Given many success and failure demonstrations of liquid pouring, we train a hierarchical LSTM with late fusion for monitoring. To improve the robustness of the system, we propose two auxiliary tasks during training: inferring (1) the initial state of containers and (2) forecasting the one-step future 3D trajectory of the hand with an adversarial training procedure. These tasks encourage our method to learn representation sensitive to container states and how objects are manipulated in 3D. With these novel components, our method achieves \(\sim \)8% and \(\sim \)11% better monitoring accuracy than the baseline method without auxiliary tasks on unseen containers and unseen users respectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Actor-Centric Relation Network

Nächstes Kapitel Weakly Supervised Region Proposal Network and Object Detection

Kubricht, J., Jiang, C., Zhu, Y., Zhu, S.C., Terzopoulos, D., Lu, H.: Probabilistic simulation predicts human performance on viscous fluid-pouring problem. In: CogSci (2016)

Bates, C.J., Yildirim, I., Tenenbaum, J.B., Battaglia, P.W.: Humans predict liquid dynamics using probabilistic simulation. In: CogSci (2015)

Edmonds, M., et al.: Feeling the force: integrating force and pose for fluent discovery through imitation learning to open medicine bottles. In: IROS (2017)

Abu-El-Haija, S., et al.: Youtube-8m: a large-scale video classification benchmark. arXiv:1609.08675 (2016)

Heilbron, F.C., Escorcia, V., Ghanem, B., Niebles, J.C.: Activitynet: a large-scale video benchmark for human activity understanding. In: CVPR (2015)

Alayrac, J.B., Sivic, J., Laptev, I., Lacoste-Julien, S.: Joint discovery of object states and manipulating actions. In: ICCV (2017)

Mottaghi, R., Schenck, C., Fox, D., Farhadi, A.: See the glass half full: Reasoning about liquid containers, their volume and content. In: ICCV (2017)

Nishida, N., Nakayama, H.: Multimodal gesture recognition using multi-stream recurrent neural network. In: Bräunl, T., McCane, B., Rivera, M., Yu, X. (eds.) PSIVT 2015. LNCS, vol. 9431, pp. 682–694. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29451-3_54CrossRef

Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402 (2012)

10.

Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: Hmdb: a large video database for human motion recognition. In: ICCV (2011)

11.

Gu, C., et al.: AVA: a video dataset of spatio-temporally localized atomic visual actions. In: CVPR (2018)

12.

Rohrbach, M., Amin, S., Andriluka, M., Schiele, B.: A database for fine grained activity detection of cooking activities. In: CVPR (2012)

13.

Chéron, G., Laptev, I., Schmid, C.: P-CNN: pose-based CNN features for action recognition. In: ICCV (2015)

14.

Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Towards understanding action recognition. In: ICCV (2013)

15.

Vu, T.-H., Olsson, C., Laptev, I., Oliva, A., Sivic, J.: Predicting actions from static scenes. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 421–436. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_28CrossRef

16.

Zhang, Y., Qu, W., Wang, D.: Action-scene model for human action recognition from videos (2014)CrossRef

17.

Moore, D.J., Essa, I.A., Hayes, M.H.: Exploiting human actions and object context for recognition tasks. In: ICCV (1999)

18.

Delaitre, V., Sivic, J., Laptev, I.: Learning person-object interactions for action recognition in still images. In: NIPS (2011)

19.

Gupta, A., Kembhavi, A., Davis, L.S.: Observing human-object interactions: using spatial and functional compatibility for recognition. In: TPAMI (2009)

20.

Gupta, A., Davis, L.S.: Objects in action: an approach for combining action understanding and object perception. In: CVPR (2007)

21.

Fathi, A., Rehg, J.M.: Modeling actions through state changes. In: CVPR (2013)

22.

Bambach, S., Lee, S., Crandall, D.J., Yu, C.: Lending a hand: detecting hands and recognizing activities in complex egocentric interactions. In: ICCV (2015)

23.

Ma, M., Fan, H., Kitani, K.M.: Going deeper into first-person activity recognition. In: CVPR (2016)

24.

Hu, J.F., Zheng, W.S., Lai, J., Zhang, J.: Jointly learning heterogeneous features for RGB-D activity recognition. In: CVPR (2015)

25.

Lei, J., Ren, X., Fox, D.: Fine-grained kitchen activity recognition using RGB-D. In: UbiComp (2012)

26.

Song, S., Cheung, N.M., Chandrasekhar, V., Mandal, B., Liri, J.: Egocentric activity recognition with multimodal fisher vector. In: Acoustics, Speech and Signal Processing (ICASSP). IEEE (2016)

27.

de la Torre, F., Hodgins, J.K., Montano, J., Valcarcel, S.: Detailed human data acquisition of kitchen activities: the cmu-multimodal activity database (cmu-mmac). In: CHI Workshop (2009)

28.

Roggen, D., et al.: Collecting complex activity datasets in highly rich networked sensor environments. In: INSS. IEEE (2010)

29.

Zhou, Y., Ni, B., Hong, R., Wang, M., Tian, Q.: Interaction part mining: a mid-level approach for fine-grained action recognition. In: CVPR (2015)

30.

Zhou, Y., Ni, B., Yan, S., Moulin, P., Tian, Q.: Pipelining localized semantic features for fine-grained action recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 481–496. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_32CrossRef

31.

Peng, X., Zou, C., Qiao, Y., Peng, Q.: Action recognition with stacked fisher vectors. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 581–595. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_38CrossRef

32.

Sun, S., Kuang, Z., Sheng, L., Ouyang, W., Zhang, W.: Optical flow guided feature: a fast and robust motion representation for video action recognition. In: CVPR (2018)

33.

Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: CVPR (2018)

34.

Schenck, C., Fox, D.: Detection and tracking of liquids with fully convolutional networks. In: RSS workshop (2016)

35.

Sermanet, P., Lynch, C., Hsu, J., Levine, S.: Time-contrastive networks: Self-supervised learning from multi-view observation. arXiv:1704.06888 (2017)

36.

Yamaguchi, A., Atkeson, C.G.: Stereo vision of liquid and particle flow for robot pouring. In: Humanoids (2016)

37.

Tamosiunaite, M., Nemec, B., Ude, A., Wrgtter, F.: Learning to pour with a robot arm combining goal and shape learning for dynamic movement primitives. In: IEEE-RAS (2011)

38.

Rozo, L., Jimnez, P., Torras, C.: Force-based robot learning of pouring skills using parametric hidden markov models. In: 9th International Workshop on Robot Motion and Control (2013)

39.

Brandi, S., Kroemer, O., Peters, J.: Generalizing pouring actions between objects using warped parameters. In: Humanoids (2014)

40.

Schenck, C., Fox, D.: Visual closed-loop control for pouring liquids. In: ICRA (2017)

41.

Yamaguchi, A., Atkeson, C.G.: Differential dynamic programming with temporally decomposed dynamics. In: IEEE-RAS (2015)

42.

Kunze, L., Beetz, M.: Envisioning the qualitative effects of robot manipulation actions using simulation-based projections. Artif. Intell. 247, 352–380 (2017)MathSciNetCrossRef

43.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

44.

Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014)

45.

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)

Titel: Liquid Pouring Monitoring via Rich Sensory Inputs
verfasst von: Tz-Ying Wu
Juan-Ting Lin
Tsun-Hsuang Wang
Chan-Wei Hu
Juan Carlos Niebles
Min Sun
Verlag: Springer International Publishing
Buch: Computer Vision – ECCV 2018
Print ISBN: 978-3-030-01251-9

Electronic ISBN: 978-3-030-01252-6

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-030-01252-6_21

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner