Skip to main content

2018 | OriginalPaper | Buchkapitel

Liquid Pouring Monitoring via Rich Sensory Inputs

verfasst von : Tz-Ying Wu, Juan-Ting Lin, Tsun-Hsuang Wang, Chan-Wei Hu, Juan Carlos Niebles, Min Sun

Erschienen in: Computer Vision – ECCV 2018

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Humans have the amazing ability to perform very subtle manipulation task using a closed-loop control system with imprecise mechanics (i.e., our body parts) but rich sensory information (e.g., vision, tactile, etc.). In the closed-loop system, the ability to monitor the state of the task via rich sensory information is important but often less studied. In this work, we take liquid pouring as a concrete example and aim at learning to continuously monitor whether liquid pouring is successful (e.g., no spilling) or not via rich sensory inputs. We mimic humans’ rich sensories using synchronized observation from a chest-mounted camera and a wrist-mounted IMU sensor. Given many success and failure demonstrations of liquid pouring, we train a hierarchical LSTM with late fusion for monitoring. To improve the robustness of the system, we propose two auxiliary tasks during training: inferring (1) the initial state of containers and (2) forecasting the one-step future 3D trajectory of the hand with an adversarial training procedure. These tasks encourage our method to learn representation sensitive to container states and how objects are manipulated in 3D. With these novel components, our method achieves \(\sim \)8% and \(\sim \)11% better monitoring accuracy than the baseline method without auxiliary tasks on unseen containers and unseen users respectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Kubricht, J., Jiang, C., Zhu, Y., Zhu, S.C., Terzopoulos, D., Lu, H.: Probabilistic simulation predicts human performance on viscous fluid-pouring problem. In: CogSci (2016) Kubricht, J., Jiang, C., Zhu, Y., Zhu, S.C., Terzopoulos, D., Lu, H.: Probabilistic simulation predicts human performance on viscous fluid-pouring problem. In: CogSci (2016)
2.
Zurück zum Zitat Bates, C.J., Yildirim, I., Tenenbaum, J.B., Battaglia, P.W.: Humans predict liquid dynamics using probabilistic simulation. In: CogSci (2015) Bates, C.J., Yildirim, I., Tenenbaum, J.B., Battaglia, P.W.: Humans predict liquid dynamics using probabilistic simulation. In: CogSci (2015)
3.
Zurück zum Zitat Edmonds, M., et al.: Feeling the force: integrating force and pose for fluent discovery through imitation learning to open medicine bottles. In: IROS (2017) Edmonds, M., et al.: Feeling the force: integrating force and pose for fluent discovery through imitation learning to open medicine bottles. In: IROS (2017)
5.
Zurück zum Zitat Heilbron, F.C., Escorcia, V., Ghanem, B., Niebles, J.C.: Activitynet: a large-scale video benchmark for human activity understanding. In: CVPR (2015) Heilbron, F.C., Escorcia, V., Ghanem, B., Niebles, J.C.: Activitynet: a large-scale video benchmark for human activity understanding. In: CVPR (2015)
6.
Zurück zum Zitat Alayrac, J.B., Sivic, J., Laptev, I., Lacoste-Julien, S.: Joint discovery of object states and manipulating actions. In: ICCV (2017) Alayrac, J.B., Sivic, J., Laptev, I., Lacoste-Julien, S.: Joint discovery of object states and manipulating actions. In: ICCV (2017)
7.
Zurück zum Zitat Mottaghi, R., Schenck, C., Fox, D., Farhadi, A.: See the glass half full: Reasoning about liquid containers, their volume and content. In: ICCV (2017) Mottaghi, R., Schenck, C., Fox, D., Farhadi, A.: See the glass half full: Reasoning about liquid containers, their volume and content. In: ICCV (2017)
9.
Zurück zum Zitat Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402 (2012) Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv:​1212.​0402 (2012)
10.
Zurück zum Zitat Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: Hmdb: a large video database for human motion recognition. In: ICCV (2011) Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: Hmdb: a large video database for human motion recognition. In: ICCV (2011)
11.
Zurück zum Zitat Gu, C., et al.: AVA: a video dataset of spatio-temporally localized atomic visual actions. In: CVPR (2018) Gu, C., et al.: AVA: a video dataset of spatio-temporally localized atomic visual actions. In: CVPR (2018)
12.
Zurück zum Zitat Rohrbach, M., Amin, S., Andriluka, M., Schiele, B.: A database for fine grained activity detection of cooking activities. In: CVPR (2012) Rohrbach, M., Amin, S., Andriluka, M., Schiele, B.: A database for fine grained activity detection of cooking activities. In: CVPR (2012)
13.
Zurück zum Zitat Chéron, G., Laptev, I., Schmid, C.: P-CNN: pose-based CNN features for action recognition. In: ICCV (2015) Chéron, G., Laptev, I., Schmid, C.: P-CNN: pose-based CNN features for action recognition. In: ICCV (2015)
14.
Zurück zum Zitat Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Towards understanding action recognition. In: ICCV (2013) Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Towards understanding action recognition. In: ICCV (2013)
16.
Zurück zum Zitat Zhang, Y., Qu, W., Wang, D.: Action-scene model for human action recognition from videos (2014)CrossRef Zhang, Y., Qu, W., Wang, D.: Action-scene model for human action recognition from videos (2014)CrossRef
17.
Zurück zum Zitat Moore, D.J., Essa, I.A., Hayes, M.H.: Exploiting human actions and object context for recognition tasks. In: ICCV (1999) Moore, D.J., Essa, I.A., Hayes, M.H.: Exploiting human actions and object context for recognition tasks. In: ICCV (1999)
18.
Zurück zum Zitat Delaitre, V., Sivic, J., Laptev, I.: Learning person-object interactions for action recognition in still images. In: NIPS (2011) Delaitre, V., Sivic, J., Laptev, I.: Learning person-object interactions for action recognition in still images. In: NIPS (2011)
19.
Zurück zum Zitat Gupta, A., Kembhavi, A., Davis, L.S.: Observing human-object interactions: using spatial and functional compatibility for recognition. In: TPAMI (2009) Gupta, A., Kembhavi, A., Davis, L.S.: Observing human-object interactions: using spatial and functional compatibility for recognition. In: TPAMI (2009)
20.
Zurück zum Zitat Gupta, A., Davis, L.S.: Objects in action: an approach for combining action understanding and object perception. In: CVPR (2007) Gupta, A., Davis, L.S.: Objects in action: an approach for combining action understanding and object perception. In: CVPR (2007)
21.
Zurück zum Zitat Fathi, A., Rehg, J.M.: Modeling actions through state changes. In: CVPR (2013) Fathi, A., Rehg, J.M.: Modeling actions through state changes. In: CVPR (2013)
22.
Zurück zum Zitat Bambach, S., Lee, S., Crandall, D.J., Yu, C.: Lending a hand: detecting hands and recognizing activities in complex egocentric interactions. In: ICCV (2015) Bambach, S., Lee, S., Crandall, D.J., Yu, C.: Lending a hand: detecting hands and recognizing activities in complex egocentric interactions. In: ICCV (2015)
23.
Zurück zum Zitat Ma, M., Fan, H., Kitani, K.M.: Going deeper into first-person activity recognition. In: CVPR (2016) Ma, M., Fan, H., Kitani, K.M.: Going deeper into first-person activity recognition. In: CVPR (2016)
24.
Zurück zum Zitat Hu, J.F., Zheng, W.S., Lai, J., Zhang, J.: Jointly learning heterogeneous features for RGB-D activity recognition. In: CVPR (2015) Hu, J.F., Zheng, W.S., Lai, J., Zhang, J.: Jointly learning heterogeneous features for RGB-D activity recognition. In: CVPR (2015)
25.
Zurück zum Zitat Lei, J., Ren, X., Fox, D.: Fine-grained kitchen activity recognition using RGB-D. In: UbiComp (2012) Lei, J., Ren, X., Fox, D.: Fine-grained kitchen activity recognition using RGB-D. In: UbiComp (2012)
26.
Zurück zum Zitat Song, S., Cheung, N.M., Chandrasekhar, V., Mandal, B., Liri, J.: Egocentric activity recognition with multimodal fisher vector. In: Acoustics, Speech and Signal Processing (ICASSP). IEEE (2016) Song, S., Cheung, N.M., Chandrasekhar, V., Mandal, B., Liri, J.: Egocentric activity recognition with multimodal fisher vector. In: Acoustics, Speech and Signal Processing (ICASSP). IEEE (2016)
27.
Zurück zum Zitat de la Torre, F., Hodgins, J.K., Montano, J., Valcarcel, S.: Detailed human data acquisition of kitchen activities: the cmu-multimodal activity database (cmu-mmac). In: CHI Workshop (2009) de la Torre, F., Hodgins, J.K., Montano, J., Valcarcel, S.: Detailed human data acquisition of kitchen activities: the cmu-multimodal activity database (cmu-mmac). In: CHI Workshop (2009)
28.
Zurück zum Zitat Roggen, D., et al.: Collecting complex activity datasets in highly rich networked sensor environments. In: INSS. IEEE (2010) Roggen, D., et al.: Collecting complex activity datasets in highly rich networked sensor environments. In: INSS. IEEE (2010)
29.
Zurück zum Zitat Zhou, Y., Ni, B., Hong, R., Wang, M., Tian, Q.: Interaction part mining: a mid-level approach for fine-grained action recognition. In: CVPR (2015) Zhou, Y., Ni, B., Hong, R., Wang, M., Tian, Q.: Interaction part mining: a mid-level approach for fine-grained action recognition. In: CVPR (2015)
32.
Zurück zum Zitat Sun, S., Kuang, Z., Sheng, L., Ouyang, W., Zhang, W.: Optical flow guided feature: a fast and robust motion representation for video action recognition. In: CVPR (2018) Sun, S., Kuang, Z., Sheng, L., Ouyang, W., Zhang, W.: Optical flow guided feature: a fast and robust motion representation for video action recognition. In: CVPR (2018)
33.
Zurück zum Zitat Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: CVPR (2018) Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: CVPR (2018)
34.
Zurück zum Zitat Schenck, C., Fox, D.: Detection and tracking of liquids with fully convolutional networks. In: RSS workshop (2016) Schenck, C., Fox, D.: Detection and tracking of liquids with fully convolutional networks. In: RSS workshop (2016)
35.
Zurück zum Zitat Sermanet, P., Lynch, C., Hsu, J., Levine, S.: Time-contrastive networks: Self-supervised learning from multi-view observation. arXiv:1704.06888 (2017) Sermanet, P., Lynch, C., Hsu, J., Levine, S.: Time-contrastive networks: Self-supervised learning from multi-view observation. arXiv:​1704.​06888 (2017)
36.
Zurück zum Zitat Yamaguchi, A., Atkeson, C.G.: Stereo vision of liquid and particle flow for robot pouring. In: Humanoids (2016) Yamaguchi, A., Atkeson, C.G.: Stereo vision of liquid and particle flow for robot pouring. In: Humanoids (2016)
37.
Zurück zum Zitat Tamosiunaite, M., Nemec, B., Ude, A., Wrgtter, F.: Learning to pour with a robot arm combining goal and shape learning for dynamic movement primitives. In: IEEE-RAS (2011) Tamosiunaite, M., Nemec, B., Ude, A., Wrgtter, F.: Learning to pour with a robot arm combining goal and shape learning for dynamic movement primitives. In: IEEE-RAS (2011)
38.
Zurück zum Zitat Rozo, L., Jimnez, P., Torras, C.: Force-based robot learning of pouring skills using parametric hidden markov models. In: 9th International Workshop on Robot Motion and Control (2013) Rozo, L., Jimnez, P., Torras, C.: Force-based robot learning of pouring skills using parametric hidden markov models. In: 9th International Workshop on Robot Motion and Control (2013)
39.
Zurück zum Zitat Brandi, S., Kroemer, O., Peters, J.: Generalizing pouring actions between objects using warped parameters. In: Humanoids (2014) Brandi, S., Kroemer, O., Peters, J.: Generalizing pouring actions between objects using warped parameters. In: Humanoids (2014)
40.
Zurück zum Zitat Schenck, C., Fox, D.: Visual closed-loop control for pouring liquids. In: ICRA (2017) Schenck, C., Fox, D.: Visual closed-loop control for pouring liquids. In: ICRA (2017)
41.
Zurück zum Zitat Yamaguchi, A., Atkeson, C.G.: Differential dynamic programming with temporally decomposed dynamics. In: IEEE-RAS (2015) Yamaguchi, A., Atkeson, C.G.: Differential dynamic programming with temporally decomposed dynamics. In: IEEE-RAS (2015)
42.
Zurück zum Zitat Kunze, L., Beetz, M.: Envisioning the qualitative effects of robot manipulation actions using simulation-based projections. Artif. Intell. 247, 352–380 (2017)MathSciNetCrossRef Kunze, L., Beetz, M.: Envisioning the qualitative effects of robot manipulation actions using simulation-based projections. Artif. Intell. 247, 352–380 (2017)MathSciNetCrossRef
43.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
44.
Zurück zum Zitat Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014) Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014)
45.
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009) Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)
Metadaten
Titel
Liquid Pouring Monitoring via Rich Sensory Inputs
verfasst von
Tz-Ying Wu
Juan-Ting Lin
Tsun-Hsuang Wang
Chan-Wei Hu
Juan Carlos Niebles
Min Sun
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-01252-6_21

Premium Partner