skip to main content
research-article

Multi-task Self-Supervised Learning for Human Activity Detection

Authors Info & Claims
Published:21 June 2019Publication History
Skip Abstract Section

Abstract

Deep learning methods are successfully used in applications pertaining to ubiquitous computing, pervasive intelligence, health, and well-being. Specifically, the area of human activity recognition (HAR) is primarily transformed by the convolutional and recurrent neural networks, thanks to their ability to learn semantic representations directly from raw input. However, in order to extract generalizable features massive amounts of well-curated data are required, which is a notoriously challenging task; hindered by privacy issues and annotation costs. Therefore, unsupervised representation learning (i.e., learning without manually labeling the instances) is of prime importance to leverage the vast amount of unlabeled data produced by smart devices. In this work, we propose a novel self-supervised technique for feature learning from sensory data that does not require access to any form of semantic labels, i.e., activity classes. We learn a multi-task temporal convolutional network to recognize transformations applied on an input signal. By exploiting these transformations, we demonstrate that simple auxiliary tasks of the binary classification result in a strong supervisory signal for extracting useful features for the down-stream task. We extensively evaluate the proposed approach on several publicly available datasets for smartphone-based HAR in unsupervised, semi-supervised and transfer learning settings. Our method achieves performance levels superior to or comparable with fully-supervised networks trained directly with activity labels, and it performs significantly better than unsupervised learning through autoencoders. Notably, for the semi-supervised case, the self-supervised features substantially boost the detection rate by attaining a kappa score between 0.7 - 0.8 with only 10 labeled examples per class. We get similar impressive performance even if the features are transferred from a different data source. Self-supervision drastically reduces the requirement of labeled activity data, effectively narrowing the gap between supervised and unsupervised techniques for learning meaningful representations. While this paper focuses on HAR as the application domain, the proposed approach is general and could be applied to a wide variety of problems in other areas.

Skip Supplemental Material Section

Supplemental Material

References

  1. Pulkit Agrawal, Joao Carreira, and Jitendra Malik. 2015. Learning to see by moving. In Proceedings of the IEEE International Conference on Computer Vision. 37--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, and Jorge Luis Reyes-Ortiz. 2013. A public domain dataset for human activity recognition using smartphones.. In ESANN.Google ScholarGoogle Scholar
  3. Relja Arandjelović and Andrew Zisserman. 2017. Objects that sound. arXiv preprint arXiv:1712.06651 (2017).Google ScholarGoogle Scholar
  4. Yusuf Aytar, Carl Vondrick, and Antonio Torralba. 2016. Soundnet: Learning sound representations from unlabeled video. In Advances in Neural Information Processing Systems. 892--900.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Shaojie Bai, J Zico Kolter, and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018).Google ScholarGoogle Scholar
  6. Pierre Baldi. 2012. Autoencoders, unsupervised learning, and deep architectures. In Proceedings of ICML workshop on unsupervised and transfer learning. 37--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Gustavo EAPA Batista, Xiaoyue Wang, and Eamonn J Keogh. 2011. A complexity-invariant distance measure for time series. In Proceedings of the 2011 SIAM international conference on data mining. SIAM, 699--710.Google ScholarGoogle ScholarCross RefCross Ref
  8. Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35, 8 (2013), 1798--1828.Google ScholarGoogle Scholar
  9. Sourav Bhattacharya, Petteri Nurmi, Nils Hammerla, and Thomas Plötz. 2014. Using unlabeled data in a sparse-coding framework for human activity recognition. Pervasive and Mobile Computing 15 (2014), 242--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Rich Caruana. 1997. Multitask learning. Machine learning 28, 1 (1997), 41--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Charikleia Chatzaki, Matthew Pediaditis, George Vavoulas, and Manolis Tsiknakis. 2016. Human daily activity and fall recognition using a smartphoneâĂŹs acceleration sensor. In International Conference on Information and Communication Technologies for Ageing Well and e-Health. Springer, 100--118.Google ScholarGoogle Scholar
  12. Zhicheng Cui, Wenlin Chen, and Yixin Chen. 2016. Multi-scale convolutional neural networks for time series classification. arXiv preprint arXiv:1603.06995 (2016).Google ScholarGoogle Scholar
  13. Carl Doersch, Abhinav Gupta, and Alexei A Efros. 2015. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision. 1422--1430.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Carl Doersch and Andrew Zisserman. 2017. Multi-task self-supervised visual learning. In The IEEE International Conference on Computer Vision (ICCV).Google ScholarGoogle ScholarCross RefCross Ref
  15. Basura Fernando, Hakan Bilen, Efstratios Gavves, and Stephen Gould. 2017. Self-supervised video representation learning with odd-one-out networks. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on. IEEE, 5729--5738.Google ScholarGoogle ScholarCross RefCross Ref
  16. Davide Figo, Pedro C Diniz, Diogo R Ferreira, and João M Cardoso. 2010. Preprocessing techniques for context recognition from accelerometer data. Personal and Ubiquitous Computing 14, 7 (2010), 645--662. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Petko Georgiev, Sourav Bhattacharya, Nicholas D Lane, and Cecilia Mascolo. 2017. Low-resource multi-task audio sensing for mobile and embedded devices via shared deep neural network representations. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 3 (2017), 50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Spyros Gidaris, Praveer Singh, and Nikos Komodakis. 2018. Unsupervised Representation Learning by Predicting Image Rotations. arXiv preprint arXiv:1803.07728 (2018).Google ScholarGoogle Scholar
  19. Lluis Gomez, Yash Patel, Marçal Rusiñol, Dimosthenis Karatzas, and CV Jawahar. 2017. Self-supervised learning of visual features through embedding images into text topic spaces. arXiv preprint arXiv:1705.08631 (2017).Google ScholarGoogle Scholar
  20. Nils Y Hammerla, Shane Halloran, and Thomas Ploetz. 2016. Deep, convolutional, and recurrent models for human activity recognition using wearables. arXiv preprint arXiv:1604.08880 (2016).Google ScholarGoogle Scholar
  21. Awni Y. Hannun, Pranav Rajpurkar, Masoumeh Haghpanahi, Geoffrey H. Tison, Codie Bourn, Mintu P. Turakhia, and Andrew Y. Ng. 2019. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature Medicine 25, 1 (2019), 65--69.Google ScholarGoogle ScholarCross RefCross Ref
  22. Kazuma Hashimoto, Yoshimasa Tsuruoka, Richard Socher, et al. 2017. A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1923--1933.Google ScholarGoogle ScholarCross RefCross Ref
  23. Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 328--339.Google ScholarGoogle ScholarCross RefCross Ref
  24. Simon Jenni and Paolo Favaro. 2018. Self-Supervised Feature Learning by Learning to Spot Artifacts. arXiv preprint arXiv:1806.05024 (2018).Google ScholarGoogle Scholar
  25. Alex Kendall, Yarin Gal, and Roberto Cipolla. {n. d.}. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. ({n. d.}).Google ScholarGoogle Scholar
  26. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  27. Bruno Korbar, Du Tran, and Lorenzo Torresani. 2018. Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization. In Advances in Neural Information Processing Systems. 7774--7785.Google ScholarGoogle Scholar
  28. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jennifer R Kwapisz, Gary M Weiss, and Samuel A Moore. 2011. Activity recognition using cell phone accelerometers. ACM SigKDD Explorations Newsletter 12, 2 (2011), 74--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. 2017. Colorization as a proxy task for visual understanding. In CVPR, Vol. 2. 7.Google ScholarGoogle Scholar
  31. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521 (27 May 2015), 436 EP --.Google ScholarGoogle Scholar
  32. Yann LeCun, John S Denker, and Sara A Solla. 1990. Optimal brain damage. In Advances in neural information processing systems. 598--605. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y Ng. 2009. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th annual international conference on machine learning. ACM, 609--616.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Hsin-Ying Lee, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. 2017. Unsupervised representation learning by sorting sequences. In Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 667--676.Google ScholarGoogle ScholarCross RefCross Ref
  35. Chunyuan Li, Heerad Farkhoor, Rosanne Liu, and Jason Yosinski. 2018. Measuring the intrinsic dimension of objective landscapes. arXiv preprint arXiv:1804.08838 (2018).Google ScholarGoogle Scholar
  36. Chi Li, M Zeeshan Zia, Quoc-Huy Tran, Xiang Yu, Gregory D Hager, and Manmohan Chandraker. 2016. Deep supervision with shape concepts for occlusion-aware 3d object parsing. arXiv preprint arXiv:1612.02699 (2016).Google ScholarGoogle Scholar
  37. Yongmou Li, Dianxi Shi, Bo Ding, and Dongbo Liu. 2014. Unsupervised feature learning for human activity recognition using smartphone sensors. In Mining Intelligence and Knowledge Exploration. Springer, 99--107.Google ScholarGoogle Scholar
  38. Chang Liu, Yu Cao, Yan Luo, Guanling Chen, Vinod Vokkarane, and Yunsheng Ma. 2016. Deepfood: Deep learning-based food image recognition for computer-aided dietary assessment. In International Conference on Smart Homes and Health Telematics. Springer, 37--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579--2605.Google ScholarGoogle Scholar
  40. Mohammad Malekzadeh, Richard G Clegg, Andrea Cavallaro, and Hamed Haddadi. 2018. Protecting sensory data against sensitive inferences. In Proceedings of the 1st Workshop on Privacy by Design in Distributed Systems. ACM, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Daniela Micucci, Marco Mobilio, and Paolo Napoletano. 2017. UniMiB SHAR: A dataset for human activity recognition using acceleration data from smartphones. Applied Sciences 7, 10 (2017), 1101.Google ScholarGoogle ScholarCross RefCross Ref
  42. Ishan Misra, C Lawrence Zitnick, and Martial Hebert. 2016. Shuffle and learn: unsupervised learning using temporal order verification. In European Conference on Computer Vision. Springer, 527--544.Google ScholarGoogle ScholarCross RefCross Ref
  43. Abdel-rahman Mohamed, George E Dahl, Geoffrey Hinton, et al. 2012. Acoustic modeling using deep belief networks. IEEE Trans. Audio, Speech & Language Processing 20, 1 (2012), 14--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Francisco Javier Ordóñez Morales and Daniel Roggen. 2016. Deep convolutional feature transfer across mobile activity recognition domains, sensor modalities and locations. In Proceedings of the 2016 ACM International Symposium on Wearable Computers. ACM, 92--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ari S Morcos, David GT Barrett, Neil C Rabinowitz, and Matthew Botvinick. 2018. On the importance of single directions for generalization. arXiv preprint arXiv:1803.06959 (2018).Google ScholarGoogle Scholar
  46. Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10). 807--814. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Mehdi Noroozi and Paolo Favaro. 2016. Unsupervised learning of visual representations by solving jigsaw puzzles. In European Conference on Computer Vision. Springer, 69--84.Google ScholarGoogle ScholarCross RefCross Ref
  48. Jeeheh Oh, Jiaxuan Wang, and Jenna Wiens. 2018. Learning to Exploit Invariances in Clinical Time-Series Data using Sequence Transformer Networks. arXiv preprint arXiv:1808.06725 (2018).Google ScholarGoogle Scholar
  49. Chris Olah, Arvind Satyanarayan, Ian Johnson, Shan Carter, Ludwig Schubert, Katherine Ye, and Alexander Mordvintsev. 2018. The Building Blocks of Interpretability. Distill (2018). https://doi.org/undefined https://distill.pub/2018/building-blocks.Google ScholarGoogle Scholar
  50. Avital Oliver, Augustus Odena, Colin Raffel, Ekin D Cubuk, and Ian J Goodfellow. 2018. Realistic Evaluation of Deep Semi-Supervised Learning Algorithms. (2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Andrew Owens and Alexei A Efros. 2018. Audio-visual scene analysis with self-supervised multisensory features. arXiv preprint arXiv:1804.03641 (2018).Google ScholarGoogle Scholar
  52. Andrew Owens, Jiajun Wu, Josh H McDermott, William T Freeman, and Antonio Torralba. 2016. Ambient sound provides supervision for visual learning. In European Conference on Computer Vision. Springer, 801--816.Google ScholarGoogle ScholarCross RefCross Ref
  53. Sinno Jialin Pan, Qiang Yang, et al. 2010. A survey on transfer learning. IEEE Transactions on knowledge and data engineering 22, 10 (2010), 1345--1359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Deepak Pathak, Pulkit Agrawal, Alexei A Efros, and Trevor Darrell. 2017. Curiosity-driven exploration by self-supervised prediction. In International Conference on Machine Learning (ICML), Vol. 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Thomas Plötz, Nils Y Hammerla, and Patrick Olivier. 2011. Feature learning for activity recognition in ubiquitous computing. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, Vol. 22. 1729. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Valentin Radu, Catherine Tong, Sourav Bhattacharya, Nicholas D Lane, Cecilia Mascolo, Mahesh K Marina, and Fahim Kawsar. 2018. Multimodal deep learning for activity and context recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 4 (2018), 157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Maithra Raghu, Justin Gilmer, Jason Yosinski, and Jascha Sohl-Dickstein. 2017. Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability. In Advances in Neural Information Processing Systems. 6076--6085.Google ScholarGoogle Scholar
  58. Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. 2007. Self-taught learning: transfer learning from unlabeled data. In Proceedings of the 24th international conference on Machine learning. ACM, 759--766. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Narges Razavian, Jake Marcus, and David Sontag. 2016. Multi-task prediction of disease onsets from longitudinal laboratory tests. In Machine Learning for Healthcare Conference. 73--100.Google ScholarGoogle Scholar
  60. Aaqib Saeed, Tanir Ozcelebi, and Johan Lukkien. 2018. Synthesizing and reconstructing missing sensory modalities in behavioral context recognition. Sensors 18, 9 (2018), 2967.Google ScholarGoogle ScholarCross RefCross Ref
  61. Aaqib Saeed and Stojan Trajanovski. 2017. Personalized Driver Stress Detection with Multi-task Neural Networks using Physiological Signals. arXiv preprint arXiv:1711.06116 (2017).Google ScholarGoogle Scholar
  62. Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 806--813. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013).Google ScholarGoogle Scholar
  64. Allan Stisen, Henrik Blunck, Sourav Bhattacharya, Thor Siiger Prentow, Mikkel Baun Kjærgaard, Anind Dey, Tobias Sonne, and Mads Møller Jensen. 2015. Smart devices are different: Assessing and mitigating mobile sensing heterogeneities for activity recognition. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems. ACM, 127--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, and Lior Wolf. 2014. Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1701--1708. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Terry T Um, Franz MJ Pfister, Daniel Pichler, Satoshi Endo, Muriel Lang, Sandra Hirche, Urban Fietzek, and Dana Kulić. 2017. Data augmentation of wearable sensor data for parkinsonâĂŹs disease monitoring using convolutional neural networks. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 216--220.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. 2018. Deep learning for sensor-based activity recognition: A survey. Pattern Recognition Letters (2018).Google ScholarGoogle Scholar
  69. Jindong Wang, Vincent W Zheng, Yiqiang Chen, and Meiyu Huang. 2018. Deep Transfer Learning for Cross-domain Activity Recognition. In Proceedings of the 3rd International Conference on Crowd Science and Engineering. ACM, 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. S. Wawrzyniak and W. Niemiro. 2015. Clustering approach to the problem of human activity recognition using motion data. In 2015 Federated Conference on Computer Science and Information Systems (FedCSIS). 411--416.Google ScholarGoogle Scholar
  71. Donglai Wei, Joseph Lim, Andrew Zisserman, and William T Freeman. 2018. Learning and using the arrow of time. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8052--8060.Google ScholarGoogle ScholarCross RefCross Ref
  72. Jianbo Yang, Minh Nhut Nguyen, Phyo Phyo San, Xiaoli Li, and Shonali Krishnaswamy. 2015. Deep Convolutional Neural Networks on Multichannel Time Series for Human Activity Recognition.. In Ijcai, Vol. 15. 3995--4001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Shuochao Yao, Yiran Zhao, Huajie Shao, Chao Zhang, Aston Zhang, Shaohan Hu, Dongxin Liu, Shengzhong Liu, Lu Su, and Tarek Abdelzaher. 2018. Sensegan: Enabling deep learning for internet of things with a semi-supervised framework. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 3 (2018), 144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Richard Zhang, Phillip Isola, and Alexei A Efros. 2017. Split-brain autoencoders: Unsupervised learning by cross-channel prediction. In CVPR, Vol. 1. 5.Google ScholarGoogle Scholar

Index Terms

  1. Multi-task Self-Supervised Learning for Human Activity Detection

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
        Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 3, Issue 2
        June 2019
        802 pages
        EISSN:2474-9567
        DOI:10.1145/3341982
        Issue’s Table of Contents

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 June 2019
        • Accepted: 1 April 2019
        • Received: 1 February 2019
        Published in imwut Volume 3, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader