ABSTRACT
We present a first study of using RGB-D (Kinect-style) cameras for fine-grained recognition of kitchen activities. Our prototype system combines depth (shape) and color (appearance) to solve a number of perception problems crucial for smart space applications: locating hands, identifying objects and their functionalities, recognizing actions and tracking object state changes through actions. Our proof-of-concept results demonstrate great potentials of RGB-D perception: without need for instrumentation, our system can robustly track and accurately recognize detailed steps through cooking activities, for instance how many spoons of sugar are in a cake mix, or how long it has been mixing. A robust RGB-D based solution to fine-grained activity recognition in real-world conditions will bring the intelligence of pervasive and interactive systems to the next level.
- L. Bo, X. Ren, and D. Fox. Depth Kernel Descriptors for Object Recognition. In IROS, pages 821--826, 2011.Google ScholarCross Ref
- M. Buettner, R. Prasad, M. Philipose, and D. Wetherall. Recognizing daily activities with RFID-based sensors. In Ubicomp, pages 51--60, 2009. Google ScholarDigital Library
- K. Lai, L. Bo, X. Ren, and D. Fox. A scalable tree-based approach for joint object and pose recognition. In AAAI, 2011.Google ScholarCross Ref
- I. Laptev. On space-time interest points. Int'l. J. Comp. Vision, 64(2):107--123, 2005. Google ScholarDigital Library
- R. Messing, C. Pal, and H. Kautz. Activity recognition using the velocity histories of tracked keypoints. In ICCV, pages 104--111. IEEE, 2009.Google ScholarCross Ref
- I. Oikonomidis, N. Kyriazis, and A. Argyros. Efficient model-based 3d tracking of hand articulations using kinect. In BMVC, 2011.Google ScholarCross Ref
- X. Ren and C. Gu. Figure-ground segmentation improves handled object recognition in egocentric video. In CVPR, pages 3137--3144. IEEE, 2010.Google ScholarCross Ref
- J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake. Real-time human pose recognition in parts from single depth images. In CVPR, volume 2, page 3, 2011. Google ScholarDigital Library
- E. Spriggs, F. De La Torre, and M. Hebert. Temporal segmentation and activity classification from first-person sensing. In First Workshop on Egocentric Vision, 2009.Google Scholar
- Q. Tran, G. Calcaterra, and E. Mynatt. Cook's collage. Home-Oriented Informatics and Telematics, 2005.Google ScholarCross Ref
- J. Wu, A. Osuntogun, T. Choudhury, M. Philipose, and J. Rehg. A scalable approach to activity recognition based on object use. In ICCV, pages 1--8, 2007.Google ScholarCross Ref
- R. Ziola, S. Grampurohit, N. Landes, J. Fogarty, and B. Harrison. Examining interaction with general-purpose object recognition in LEGO OASIS. In Visual Languages and Human-Centric Computing, pages 65--68, 2011.Google ScholarCross Ref
Index Terms
- Fine-grained kitchen activity recognition using RGB-D
Recommendations
A supervised learning approach for fast object recognition from RGB-D data
PETRA '14: Proceedings of the 7th International Conference on PErvasive Technologies Related to Assistive EnvironmentsObject recognition serves obvious purposes in assisted living environments, where robotic devices can be used as companions to assist humans in need. The recent introduction of vision based sensors, which are able to extract depth sensing information ...
Recognizing multi-view objects with occlusions using a deep architecture
Image-based object recognition is employed widely in many computer vision applications such as image semantic annotation and object location. However, traditional object recognition algorithms based on the 2D features of RGB data have difficulty when ...
RGB-D action recognition using linear coding
In this paper, we investigate action recognition using an inexpensive RGB-D sensor (Microsoft Kinect). First, a depth spatial-temporal descriptor is developed to extract the interested local regions in depth image. Such descriptors are very robust to ...
Comments