ABSTRACT
We present a new method to classify human activities by leveraging on the cues available from depth images alone. Towards this end, we propose a descriptor which couples depth and spatial information of the segmented body to describe a human pose. Unique poses (i.e. codewords) are then identified by a spatial-based clustering step. Given a video sequence of depth images, we segment humans from the depth images and represent these segmented bodies as a sequence of codewords. We exploit unique poses of an activity and the temporal ordering of these poses to learn subsequences of codewords which are strongly discriminative for the activity. Each discriminative subsequence acts as a classifier and we learn a boosted ensemble of discriminative subsequences to assign a confidence score for the activity label of the test sequence. Unlike existing methods which demand accurate tracking of 3D joint locations or couple depth with color image information as recognition cues, our method requires only the segmentation masks from depth images to recognize an activity. Experimental results on the publicly available Human Activity Dataset (which comprises 12 challenging activities) demonstrate the validity of our method, where we attain a precision/recall of 78.1%/75.4% when the person was not seen before in the training set, and 94.6%/93.1% when the person was seen before.
Supplemental Material
Available for Download
Please compile sig-alternate.tex file to build the final PDF file.
- J. Aach and G. M. Church. Aligning gene expression time series with time warping algorithms. Bioinformatics, 17(6):495--508, 2001.Google ScholarCross Ref
- J. Aggarwal and M. Ryoo. Human activity analysis: A review. ACM Computing Surveys, 43, 2011. Google ScholarDigital Library
- T. Attwood and D. Parry-Smith. Introduction to bioinformatics, 1999. Addison Wesley Longman.Google Scholar
- P. J. Besl. Surface in Range Image Understanding. Springer-Verlag New York, Inc. New York, 1989. Google ScholarDigital Library
- S. T. Birchfield and S. Rangarajan. Spatiograms versus histograms for region-based tracking. CVPR, 2005. Google ScholarDigital Library
- A. F. Bobick and J. W. Davis. The recognition of human movement using temporal templates. Trans. on Pattern Analysis and Machine Intel., 23:257--267, 2001. Google ScholarDigital Library
- D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis. Trans. on Pattern Analysis and Machine Intel., 24(5), 2002. Google ScholarDigital Library
- P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In: ICCCN, IEEE, pages 65--72, 2005. Google ScholarDigital Library
- A. A. Efros, A. C. Berg, E. C. Berg, G. Mori, and J. Malik. Recognizing action at a distance. ICCV, 2003. Google ScholarDigital Library
- Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. of Comp. and System Sc., 55:119--139, 1997. Google ScholarDigital Library
- S. Holzer, J. Shotton, and P. Kohli. Learning to efficiently detect repeatable interest points in depth data. Eurpoean Conf. on Comp. Vision, 2012. Google ScholarDigital Library
- M.-K. Hu. Visual pattern recognition by moment invariants. IRE Trans. on Info. Theory, 8, 1962.Google Scholar
- W. Iba and P. Langley. Induction of one-level decision trees. Int. Workshop on Machine Learning, 1992. Google ScholarDigital Library
- S. Kumar and A. Filipski. Multiple sequence alignment: In pursuit of homologous dna positions. Genome Research, pages 127--135, 2007.Google ScholarCross Ref
- I. Laptev. On space-time interest points. Int. Journal on Computer Vision, 64:107--123, 2005. Google ScholarDigital Library
- B. Leibe, A. Leonardis, and B. Schiele. Combined object categorization and segmentation with an implicit shape model. Eurpoean Conf. on Comp. Vision Workshop, pages 17--32, 2004.Google Scholar
- W. Li, Z. Zhang, and Z. Liu. Action recognition based on a bag of 3d points. Comp. Vision and Pattern Recognition workshops, 2010.Google ScholarCross Ref
- A. Monnet, A. Mittal, N. Paragios, and V. Ramesh. Background modeling and subtraction of dynamic scenes. ICCV, 2003. Google ScholarDigital Library
- B. Ni, G. Wang, and P. Moulin. Rgbd-hudaact: A color- depth video database for human daily activity recognition. Int. Conf. on Comp. Vision Workshops, 2011.Google ScholarCross Ref
- S. Nowozin, G. Bakir, and K. Tsuda. Discriminative subsequence mining for action classification. ICCV, pages 1--8, 2007.Google ScholarCross Ref
- C. O'Conaire, N. E. O'Connor, and A. F. Smeaton. An improved spatiogram similarity measure for robust object localisation. ICASSP, pages 15--20, 2007.Google Scholar
- N. Otsu. A threshold selection method from gray-level histograms. Trans. on Sys., Man and Cyber., 9(1), 1975.Google Scholar
- V. Parameswaran and R. Chellappa. View invariance for human action recognition. Int. J. on Comp. Vision, 66:83--101, 2006. Google ScholarDigital Library
- R. B. Rusu, N. Blodow, Z. C. Marton, and M. Beetz. Aligning point cloud views using persistent feature histograms. Intelligent Robots and Systems, 2008.Google Scholar
- B. Sabata, F. Arman, and J. K. Aggarwal. Segmentation of 3d range images using pyramidal data structures. Int. Conf. on Computer Vision, 1990.Google ScholarCross Ref
- C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: A local svm approach. ICPR, pages 32--36, 2004. Google ScholarDigital Library
- J. Shotton, A. FItzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake. Real-time human pose recognition in parts from single depth images. CVPR, 2011. Google ScholarDigital Library
- P. Spagnolo, T. Orazio, M. Leo, and A. Distante. Moving object segmentation by background subtraction and temporal analysis. Image and Vision Computing, 24(5):411--423, 2006. Google ScholarDigital Library
- J. Sung, C. Ponce, B. Selman, and A. Saxena. Human activity detection from rgbd images. AAAI workshop on Pattern, Activity and Intent Recognition, 2011.Google Scholar
- J. Sung, C. Ponce, B. Selman, and A. Saxena. Unstructured human activity detection from rgbd images. Int. Conf. on Robotics and Automation, 2012.Google Scholar
- J. Wang, Z. Liu, Y. Wu, and J. Yuan. Mining actionlet ensemble for action recognition with depth cameras. CVPR, 2012.Google ScholarCross Ref
- A. Yilmaz and M. Shah. Actions sketch: A novel action representation. CVPR, 1:984--989, 2005. Google ScholarDigital Library
- H. Zhang and L. E. Parker. 4-dimensional local spatio- temporal features for human activity recognition. Int. Conf. on Intelligent Robots and Systems, 2011.Google ScholarCross Ref
- Y. Zhao, Z. Liu, L. Yang, and H. Cheng. Combining rgb and depth map features for human activity recognition. APSIPA ASC, 2012.Google Scholar
- Z. Zivkovic. Improved adaptive gaussian mixture model for background subtraction. ICPR, pages 28--31, 2004. Google ScholarDigital Library
Index Terms
- Human activities recognition using depth images
Recommendations
Learning human activities and object affordances from RGB-D videos
Understanding human activities and object affordances are two very important skills, especially for personal robots which operate in human environments. In this work, we consider the problem of extracting a descriptive labeling of the sequence of sub-...
Unsupervised Human Activity Detection with Skeleton Data from RGB-D Sensor
CICSYN '13: Proceedings of the 2013 Fifth International Conference on Computational Intelligence, Communication Systems and NetworksHuman activity recognition is an important functionality in any intelligent system designed to support human daily activities. While majority of human activity recognition systems use supervised learning, these systems lack the ability to detect new ...
Use of low-resolution infrared pixel array for passive human motion movement and recognition
HCI '18: Proceedings of the 32nd International BCS Human Computer Interaction ConferenceThe daily monitoring of ageing population is a current issue which can be effectively tackled by applying daily activity monitoring via smart sensing technology. The purpose of the monitoring is mostly aimed at collecting health conditional related ...
Comments