research-article

Human activities recognition using depth images

Authors:
Raj Gupta

Nanyang Technological University, singapore, Singapore

Nanyang Technological University, singapore, Singapore
View Profile

,
Alex Yong-Sang Chia

Rakuten Inc, Tokyo, Japan

Rakuten Inc, Tokyo, Japan
View Profile

,
Deepu Rajan

Nanyang Technological University, singapore, Singapore

Nanyang Technological University, singapore, Singapore
View Profile

MM '13: Proceedings of the 21st ACM international conference on MultimediaOctober 2013Pages 283–292https://doi.org/10.1145/2502081.2502099

Published:21 October 2013Publication History

MM '13: Proceedings of the 21st ACM international conference on Multimedia

Pages 283–292

ABSTRACT

We present a new method to classify human activities by leveraging on the cues available from depth images alone. Towards this end, we propose a descriptor which couples depth and spatial information of the segmented body to describe a human pose. Unique poses (i.e. codewords) are then identified by a spatial-based clustering step. Given a video sequence of depth images, we segment humans from the depth images and represent these segmented bodies as a sequence of codewords. We exploit unique poses of an activity and the temporal ordering of these poses to learn subsequences of codewords which are strongly discriminative for the activity. Each discriminative subsequence acts as a classifier and we learn a boosted ensemble of discriminative subsequences to assign a confidence score for the activity label of the test sequence. Unlike existing methods which demand accurate tracking of 3D joint locations or couple depth with color image information as recognition cues, our method requires only the segmentation masks from depth images to recognize an activity. Experimental results on the publicly available Human Activity Dataset (which comprises 12 challenging activities) demonstrate the validity of our method, where we attain a precision/recall of 78.1%/75.4% when the person was not seen before in the training set, and 94.6%/93.1% when the person was seen before.

Supplemental Material

Available for Download

zip

mm018.zip (6.4 MB)

Please compile sig-alternate.tex file to build the final PDF file.

References

J. Aach and G. M. Church. Aligning gene expression time series with time warping algorithms. Bioinformatics, 17(6):495--508, 2001.Google ScholarCross Ref
J. Aggarwal and M. Ryoo. Human activity analysis: A review. ACM Computing Surveys, 43, 2011. Google ScholarDigital Library
T. Attwood and D. Parry-Smith. Introduction to bioinformatics, 1999. Addison Wesley Longman.Google Scholar
P. J. Besl. Surface in Range Image Understanding. Springer-Verlag New York, Inc. New York, 1989. Google ScholarDigital Library
S. T. Birchfield and S. Rangarajan. Spatiograms versus histograms for region-based tracking. CVPR, 2005. Google ScholarDigital Library
A. F. Bobick and J. W. Davis. The recognition of human movement using temporal templates. Trans. on Pattern Analysis and Machine Intel., 23:257--267, 2001. Google ScholarDigital Library
D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis. Trans. on Pattern Analysis and Machine Intel., 24(5), 2002. Google ScholarDigital Library
P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In: ICCCN, IEEE, pages 65--72, 2005. Google ScholarDigital Library
A. A. Efros, A. C. Berg, E. C. Berg, G. Mori, and J. Malik. Recognizing action at a distance. ICCV, 2003. Google ScholarDigital Library
Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. of Comp. and System Sc., 55:119--139, 1997. Google ScholarDigital Library
S. Holzer, J. Shotton, and P. Kohli. Learning to efficiently detect repeatable interest points in depth data. Eurpoean Conf. on Comp. Vision, 2012. Google ScholarDigital Library
M.-K. Hu. Visual pattern recognition by moment invariants. IRE Trans. on Info. Theory, 8, 1962.Google Scholar
W. Iba and P. Langley. Induction of one-level decision trees. Int. Workshop on Machine Learning, 1992. Google ScholarDigital Library
S. Kumar and A. Filipski. Multiple sequence alignment: In pursuit of homologous dna positions. Genome Research, pages 127--135, 2007.Google ScholarCross Ref
I. Laptev. On space-time interest points. Int. Journal on Computer Vision, 64:107--123, 2005. Google ScholarDigital Library
B. Leibe, A. Leonardis, and B. Schiele. Combined object categorization and segmentation with an implicit shape model. Eurpoean Conf. on Comp. Vision Workshop, pages 17--32, 2004.Google Scholar
W. Li, Z. Zhang, and Z. Liu. Action recognition based on a bag of 3d points. Comp. Vision and Pattern Recognition workshops, 2010.Google ScholarCross Ref
A. Monnet, A. Mittal, N. Paragios, and V. Ramesh. Background modeling and subtraction of dynamic scenes. ICCV, 2003. Google ScholarDigital Library
B. Ni, G. Wang, and P. Moulin. Rgbd-hudaact: A color- depth video database for human daily activity recognition. Int. Conf. on Comp. Vision Workshops, 2011.Google ScholarCross Ref
S. Nowozin, G. Bakir, and K. Tsuda. Discriminative subsequence mining for action classification. ICCV, pages 1--8, 2007.Google ScholarCross Ref
C. O'Conaire, N. E. O'Connor, and A. F. Smeaton. An improved spatiogram similarity measure for robust object localisation. ICASSP, pages 15--20, 2007.Google Scholar
N. Otsu. A threshold selection method from gray-level histograms. Trans. on Sys., Man and Cyber., 9(1), 1975.Google Scholar
V. Parameswaran and R. Chellappa. View invariance for human action recognition. Int. J. on Comp. Vision, 66:83--101, 2006. Google ScholarDigital Library
R. B. Rusu, N. Blodow, Z. C. Marton, and M. Beetz. Aligning point cloud views using persistent feature histograms. Intelligent Robots and Systems, 2008.Google Scholar
B. Sabata, F. Arman, and J. K. Aggarwal. Segmentation of 3d range images using pyramidal data structures. Int. Conf. on Computer Vision, 1990.Google ScholarCross Ref
C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: A local svm approach. ICPR, pages 32--36, 2004. Google ScholarDigital Library
J. Shotton, A. FItzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake. Real-time human pose recognition in parts from single depth images. CVPR, 2011. Google ScholarDigital Library
P. Spagnolo, T. Orazio, M. Leo, and A. Distante. Moving object segmentation by background subtraction and temporal analysis. Image and Vision Computing, 24(5):411--423, 2006. Google ScholarDigital Library
J. Sung, C. Ponce, B. Selman, and A. Saxena. Human activity detection from rgbd images. AAAI workshop on Pattern, Activity and Intent Recognition, 2011.Google Scholar
J. Sung, C. Ponce, B. Selman, and A. Saxena. Unstructured human activity detection from rgbd images. Int. Conf. on Robotics and Automation, 2012.Google Scholar
J. Wang, Z. Liu, Y. Wu, and J. Yuan. Mining actionlet ensemble for action recognition with depth cameras. CVPR, 2012.Google ScholarCross Ref
A. Yilmaz and M. Shah. Actions sketch: A novel action representation. CVPR, 1:984--989, 2005. Google ScholarDigital Library
H. Zhang and L. E. Parker. 4-dimensional local spatio- temporal features for human activity recognition. Int. Conf. on Intelligent Robots and Systems, 2011.Google ScholarCross Ref
Y. Zhao, Z. Liu, L. Yang, and H. Cheng. Combining rgb and depth map features for human activity recognition. APSIPA ASC, 2012.Google Scholar
Z. Zivkovic. Improved adaptive gaussian mixture model for background subtraction. ICPR, pages 28--31, 2004. Google ScholarDigital Library

Index Terms

Human activities recognition using depth images
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Interest point and salient region detections
      2. Computer vision tasks
        Scene understanding

Recommendations

Learning human activities and object affordances from RGB-D videos

Understanding human activities and object affordances are two very important skills, especially for personal robots which operate in human environments. In this work, we consider the problem of extracting a descriptive labeling of the sequence of sub-...
Read More
Unsupervised Human Activity Detection with Skeleton Data from RGB-D Sensor
CICSYN '13: Proceedings of the 2013 Fifth International Conference on Computational Intelligence, Communication Systems and Networks

Human activity recognition is an important functionality in any intelligent system designed to support human daily activities. While majority of human activity recognition systems use supervised learning, these systems lack the ability to detect new ...
Read More
Use of low-resolution infrared pixel array for passive human motion movement and recognition
HCI '18: Proceedings of the 32nd International BCS Human Computer Interaction Conference

The daily monitoring of ageing population is a current issue which can be effectively tackled by applying daily activity monitoring via smart sensing technology. The purpose of the monitoring is mostly aimed at collecting health conditional related ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '13: Proceedings of the 21st ACM international conference on Multimedia
October 2013
1166 pages
ISBN:9781450324045
DOI:10.1145/2502081
General Chairs:
Alejandro (Alex) Jaimes
Yahoo!, Spain
,
Nicu Sebe
University of Trento, Italy
,
Nozha Boujemaa
INRIA, France
,
Program Chairs:
Daniel Gatica-Perez
IDIAP & EPFL, Switzerland
,
David A. Shamma
Yahoo!, USA
,
Marcel Worring
University of Amsterdam, The Netherlands
,
Roger Zimmermann
National University of Singapore, Singapore
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 October 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
depth image segmentation
human activity detection
Qualifiers
- research-article
Conference

Acceptance Rates
MM '13 Paper Acceptance Rate47of235submissions,20%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 38
  Total Citations
  View Citations
- 812
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Human activities recognition using depth images

MM '13: Proceedings of the 21st ACM international conference on Multimedia

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Learning human activities and object affordances from RGB-D videos

Unsupervised Human Activity Detection with Skeleton Data from RGB-D Sensor

Use of low-resolution infrared pixel array for passive human motion movement and recognition