skip to main content
10.1145/2502081.2502099acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Human activities recognition using depth images

Published:21 October 2013Publication History

ABSTRACT

We present a new method to classify human activities by leveraging on the cues available from depth images alone. Towards this end, we propose a descriptor which couples depth and spatial information of the segmented body to describe a human pose. Unique poses (i.e. codewords) are then identified by a spatial-based clustering step. Given a video sequence of depth images, we segment humans from the depth images and represent these segmented bodies as a sequence of codewords. We exploit unique poses of an activity and the temporal ordering of these poses to learn subsequences of codewords which are strongly discriminative for the activity. Each discriminative subsequence acts as a classifier and we learn a boosted ensemble of discriminative subsequences to assign a confidence score for the activity label of the test sequence. Unlike existing methods which demand accurate tracking of 3D joint locations or couple depth with color image information as recognition cues, our method requires only the segmentation masks from depth images to recognize an activity. Experimental results on the publicly available Human Activity Dataset (which comprises 12 challenging activities) demonstrate the validity of our method, where we attain a precision/recall of 78.1%/75.4% when the person was not seen before in the training set, and 94.6%/93.1% when the person was seen before.

Skip Supplemental Material Section

Supplemental Material

References

  1. J. Aach and G. M. Church. Aligning gene expression time series with time warping algorithms. Bioinformatics, 17(6):495--508, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  2. J. Aggarwal and M. Ryoo. Human activity analysis: A review. ACM Computing Surveys, 43, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Attwood and D. Parry-Smith. Introduction to bioinformatics, 1999. Addison Wesley Longman.Google ScholarGoogle Scholar
  4. P. J. Besl. Surface in Range Image Understanding. Springer-Verlag New York, Inc. New York, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. T. Birchfield and S. Rangarajan. Spatiograms versus histograms for region-based tracking. CVPR, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. F. Bobick and J. W. Davis. The recognition of human movement using temporal templates. Trans. on Pattern Analysis and Machine Intel., 23:257--267, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis. Trans. on Pattern Analysis and Machine Intel., 24(5), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In: ICCCN, IEEE, pages 65--72, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. A. Efros, A. C. Berg, E. C. Berg, G. Mori, and J. Malik. Recognizing action at a distance. ICCV, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. of Comp. and System Sc., 55:119--139, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Holzer, J. Shotton, and P. Kohli. Learning to efficiently detect repeatable interest points in depth data. Eurpoean Conf. on Comp. Vision, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M.-K. Hu. Visual pattern recognition by moment invariants. IRE Trans. on Info. Theory, 8, 1962.Google ScholarGoogle Scholar
  13. W. Iba and P. Langley. Induction of one-level decision trees. Int. Workshop on Machine Learning, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Kumar and A. Filipski. Multiple sequence alignment: In pursuit of homologous dna positions. Genome Research, pages 127--135, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  15. I. Laptev. On space-time interest points. Int. Journal on Computer Vision, 64:107--123, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Leibe, A. Leonardis, and B. Schiele. Combined object categorization and segmentation with an implicit shape model. Eurpoean Conf. on Comp. Vision Workshop, pages 17--32, 2004.Google ScholarGoogle Scholar
  17. W. Li, Z. Zhang, and Z. Liu. Action recognition based on a bag of 3d points. Comp. Vision and Pattern Recognition workshops, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  18. A. Monnet, A. Mittal, N. Paragios, and V. Ramesh. Background modeling and subtraction of dynamic scenes. ICCV, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Ni, G. Wang, and P. Moulin. Rgbd-hudaact: A color- depth video database for human daily activity recognition. Int. Conf. on Comp. Vision Workshops, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  20. S. Nowozin, G. Bakir, and K. Tsuda. Discriminative subsequence mining for action classification. ICCV, pages 1--8, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  21. C. O'Conaire, N. E. O'Connor, and A. F. Smeaton. An improved spatiogram similarity measure for robust object localisation. ICASSP, pages 15--20, 2007.Google ScholarGoogle Scholar
  22. N. Otsu. A threshold selection method from gray-level histograms. Trans. on Sys., Man and Cyber., 9(1), 1975.Google ScholarGoogle Scholar
  23. V. Parameswaran and R. Chellappa. View invariance for human action recognition. Int. J. on Comp. Vision, 66:83--101, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. B. Rusu, N. Blodow, Z. C. Marton, and M. Beetz. Aligning point cloud views using persistent feature histograms. Intelligent Robots and Systems, 2008.Google ScholarGoogle Scholar
  25. B. Sabata, F. Arman, and J. K. Aggarwal. Segmentation of 3d range images using pyramidal data structures. Int. Conf. on Computer Vision, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  26. C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: A local svm approach. ICPR, pages 32--36, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Shotton, A. FItzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake. Real-time human pose recognition in parts from single depth images. CVPR, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Spagnolo, T. Orazio, M. Leo, and A. Distante. Moving object segmentation by background subtraction and temporal analysis. Image and Vision Computing, 24(5):411--423, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Sung, C. Ponce, B. Selman, and A. Saxena. Human activity detection from rgbd images. AAAI workshop on Pattern, Activity and Intent Recognition, 2011.Google ScholarGoogle Scholar
  30. J. Sung, C. Ponce, B. Selman, and A. Saxena. Unstructured human activity detection from rgbd images. Int. Conf. on Robotics and Automation, 2012.Google ScholarGoogle Scholar
  31. J. Wang, Z. Liu, Y. Wu, and J. Yuan. Mining actionlet ensemble for action recognition with depth cameras. CVPR, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  32. A. Yilmaz and M. Shah. Actions sketch: A novel action representation. CVPR, 1:984--989, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. Zhang and L. E. Parker. 4-dimensional local spatio- temporal features for human activity recognition. Int. Conf. on Intelligent Robots and Systems, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  34. Y. Zhao, Z. Liu, L. Yang, and H. Cheng. Combining rgb and depth map features for human activity recognition. APSIPA ASC, 2012.Google ScholarGoogle Scholar
  35. Z. Zivkovic. Improved adaptive gaussian mixture model for background subtraction. ICPR, pages 28--31, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Human activities recognition using depth images

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '13: Proceedings of the 21st ACM international conference on Multimedia
        October 2013
        1166 pages
        ISBN:9781450324045
        DOI:10.1145/2502081

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 October 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        MM '13 Paper Acceptance Rate47of235submissions,20%Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader