main-content

## Über dieses Buch

This book constitutes refereed proceedings of the Second International Workshop on Deep Learning for Human Activity Recognition, DL-HAR 2020, held in conjunction with IJCAI-PRICAI 2020, in Kyoto, Japan, in January 2021. Due to the COVID-19 pandemic the workshop was postponed to the year 2021 and held in a virtual format.
The 10 presented papers were thorougly reviewed and included in the volume. They present recent research on applications of human activity recognition for various areas such as healthcare services, smart home applications, and more.

## Inhaltsverzeichnis

### Human Activity Recognition Using Wearable Sensors: Review, Challenges, Evaluation Benchmark

Abstract
Recognizing human activity plays a significant role in the advancements of human-interaction applications in healthcare, personal fitness, and smart devices. Many papers presented various techniques for human activity representation that resulted in distinguishable progress. In this study, we conduct an extensive literature review on recent, top-performing techniques in human activity recognition based on wearable sensors. Due to the lack of standardized evaluation and to assess and ensure a fair comparison between the state-of-the-art techniques, we applied a standardized evaluation benchmark on the state-of-the-art techniques using six publicly available data-sets: MHealth, USCHAD, UTD-MHAD, WISDM, WHARF, and OPPORTUNITY. Also, we propose an experimental, improved approach that is a hybrid of enhanced handcrafted features and a neural network architecture which outperformed top-performing techniques with the same standardized evaluation benchmark applied concerning MHealth, USCHAD, UTD-MHAD data-sets.

### Wheelchair Behavior Recognition for Visualizing Sidewalk Accessibility by Deep Neural Networks

Abstract
This paper introduces our methodology to estimate sidewalk accessibilities from wheelchair behavior via a triaxial accelerometer in a smartphone installed under a wheelchair seat. Our method recognizes sidewalk accessibilities from environmental factors, e.g. gradient, curbs, and gaps, which influence wheelchair bodies and become a burden for people with mobility difficulties. This paper developed and evaluated a prototype system that visualizes sidewalk accessibility information by extracting knowledge from wheelchair acceleration using deep neural networks. Firstly, we created a supervised convolutional neural network model to classify road surface conditions using wheelchair acceleration data. Secondly, we applied a weakly supervised method to extract representations of road surface conditions without manual annotations. Finally, we developed a self-supervised variational autoencoder to assess sidewalk barriers for wheelchair users. The results show that the proposed method estimates sidewalk accessibilities from wheelchair accelerations and extracts knowledge of accessibilities by weakly supervised and self-supervised approaches.
Takumi Watanabe, Hiroki Takahashi, Goh Sato, Yusuke Iwasawa, Yutaka Matsuo, Ikuko Eguchi Yairi

### Toward Data Augmentation and Interpretation in Sensor-Based Fine-Grained Hand Activity Recognition

Abstract
Recognizing fine-grained hand activities has widely attracted the research community’s attention in recent years. However, rather than enriched sen-sor-based datasets of whole-body activities, there are limited data available for acceler-ator-based fine-grained hand activities. In this paper, we propose a purely convolution-based Generative Adversarial Networks (GAN) approach for data augmentation on accelerator-based temporal data of fine-grained hand activities. The approach consists of 2D-Convolution discriminator and 2D-Transposed-Convolution generator that are shown capable of learning the distribution of re-shaped sensor-based data and generating synthetic instances that well reserve the cross-axis co-relation. We evaluate the usability of synthetic data by expanding existing datasets and improving the state-of-the-art classifier’s test accuracy. The in-nature unreadable sensor-based data is interpreted by introducing visualization methods including axis-wise heatmap and model-oriented decision explanation. The experiments show that our approach can effectively improve the classifier’s test accuracy by GAN-based data augmentation while well preserving the authenticity of synthetic data.
Jinqi Luo, Xiang Li, Rabih Younes

### Personalization Models for Human Activity Recognition with Distribution Matching-Based Metrics

Abstract
Building activity recognition systems conventionally involves training a common model from all data of training users and utilizing this model to recognize activities of unseen subjects. However, participants come from diverse demographics, so that different users can perform the same actions in diverse ways. Each subject might exhibit user-specific signal patterns, yet a group of users may perform activities in similar manners and share analogous patterns. Leveraging this intuition, we explore Frechet Inception Distance (FID) as a distribution matching-based metric to measure the similarity between users. From that, we propose the nearest-FID-neighbors and the FID-graph clustering techniques to develop user-specific models that are trained with data from the community the testing user likely belongs to. Verified on a series of benchmark wearable datasets, the proposed techniques significantly outperform the model trained with all users.
Huy Thong Nguyen, Hyeokhyen Kwon, Harish Haresamudram, Andrew F. Peterson, Thomas Plötz

### Resource-Constrained Federated Learning with Heterogeneous Labels and Models for Human Activity Recognition

Abstract
One of the most significant applications in pervasive computing for modeling user behavior is Human Activity Recognition (HAR). Such applications necessitate us to characterize insights from multiple resource-constrained user devices using machine learning techniques for effective personalized activity monitoring. On-device Federated Learning proves to be an extremely viable option for distributed and collaborative machine learning in such scenarios, and is an active area of research. However, there are a variety of challenges in addressing statistical (non-IID data) and model heterogeneities across users. In addition, in this paper, we explore a new challenge of interest – to handle heterogeneities in labels (activities) across users during federated learning. To this end, we propose a framework with two different versions for federated label-based aggregation, which leverage overlapping information gain across activities – one using Model Distillation Update, and the other using Weighted $$\alpha$$-update. Empirical evaluation on the Heterogeneity Human Activity Recognition (HHAR) dataset (with four activities for effective elucidation of results) indicates an average deterministic accuracy increase of at least $$\sim$$11.01% with the model distillation update strategy and $$\sim$$9.16% with the weighted $$\alpha$$-update strategy. We demonstrate the on-device capabilities of our proposed framework by using Raspberry Pi 2, a single-board computing platform.
Gautham Krishna Gudur, Satheesh Kumar Perepu

### ARID: A New Dataset for Recognizing Action in the Dark

Abstract
The task of action recognition in dark videos is useful in various scenarios, e.g., night surveillance and self-driving at night. Though progress has been made in action recognition task for videos in normal illumination, few have studied action recognition in the dark, partly due to the lack of sufficient datasets for such a task. In this paper, we explored the task of action recognition in dark videos. We bridge the gap of the lack of data by collecting a new dataset: the Action Recognition in the Dark (ARID) dataset. It consists of 3,784 video clips with 11 action categories. To the best of our knowledge, it is the first dataset focused on human actions in dark videos. To gain further understanding of our ARID dataset, we analyze our dataset in detail and showed its necessity over synthetic dark videos. Additionally, we benchmark the performance of current action recognition models on our dataset and explored potential methods for increasing their performances. We show that current action recognition models and frame enhancement methods may not be effective solutions for the task of action recognition in dark videos (data available at https://​xuyu0010.​github.​io/​arid).
Yuecong Xu, Jianfei Yang, Haozhi Cao, Kezhi Mao, Jianxiong Yin, Simon See

### Single Run Action Detector over Video Stream - A Privacy Preserving Approach

Abstract
This paper takes initial strides at designing and evaluating a vision-based system for privacy ensured activity monitoring. The proposed technology utilizing Artificial Intelligence (AI)-empowered proactive systems offering continuous monitoring, behavioral analysis, and modeling of human activities. To this end, this paper presents Single Run Action Detector (S-RAD) which is a real-time privacy-preserving action detector that performs end-to-end action localization and classification. It is based on Faster-RCNN combined with temporal shift modeling and segment based sampling to capture the human actions. Results on UCF-Sports and UR Fall dataset present comparable accuracy to State-of-the-Art approaches with significantly lower model size and computation demand and the ability for real-time execution on edge embedded device (e.g. Nvidia Jetson Xavier).
Anbumalar Saravanan, Justin Sanchez, Hassan Ghasemzadeh, Aurelia Macabasco-O’Connell, Hamed Tabkhi

### Efficacy of Model Fine-Tuning for Personalized Dynamic Gesture Recognition

Abstract
Dynamic hand gestures are usually unique to individual users in terms of style, speed, and magnitude of the gestures’ performance. A gesture recognition model trained with data from a group of users may not generalize well for unseen users and its performance is likely to be different for different users. To address these issues, this paper investigates the approach of fine-tuning a global model using user-specific data locally for personalizing dynamic hand gesture recognition. Using comprehensive experiments with state-of-the-art convolutional neural network architectures for video recognition, we evaluate the impact of four different choices on personalization performance - fine-tuning the earlier vs the later layers of the network, number of user-specific training samples, batch size, and learning rate. The user-specific data is collected from 11 users performing 7 gesture classes. Our findings show that with proper selection of fine-tuning strategy and hyperparameters, improved model performance can be achieved on personalized models for all users by only fine-tuning a small portion of the network weights and using very few labeled user-specific training samples.
Junyao Guo, Unmesh Kurup, Mohak Shah

### Fully Convolutional Network Bootstrapped by Word Encoding and Embedding for Activity Recognition in Smart Homes

Abstract
Activity recognition in smart homes is essential when we wish to propose automatic services for the inhabitants. However, it is a challenging problem in terms of environments’ variability, sensory-motor systems, user habits, but also sparsity of signals and redundancy of models. Therefore, end-to-end systems fail at automatically extracting key features, and need to access context and domain knowledge. We propose to tackle feature extraction for activity recognition in smart homes by merging methods of Natural Language Processing (NLP) and Time Series Classification (TSC) domains.
We evaluate the performance of our method with two datasets issued from the Center for Advanced Studies in Adaptive Systems (CASAS). We analyze the contributions of the use of embedding based on term frequency encoding, to improve automatic feature extraction. Moreover we compare the classification performance of Fully Convolutional Network (FCN) from TSC, applied for the first time for activity recognition in smart homes, to Long Short Term Memory (LSTM). The method we propose, shows good performance in offline activity classification. Our analysis also shows that FCNs outperforms LSTMs, and that domain knowledge gained by event encoding and embedding improves significantly the performance of classifiers.
Damien Bouchabou, Sao Mai Nguyen, Christophe Lohr, Benoit LeDuc, Ioannis Kanellos

### Towards User Friendly Medication Mapping Using Entity-Boosted Two-Tower Neural Network

Abstract
Recent advancements in medical entity linking have been applied in the area of scientific literature and social media data. However, with the adoption of telemedicine and conversational agents such as Alexa in healthcare settings, medical name inference has become an important task. Medication name inference is the task of mapping user friendly medication names from a free-form text to a concept in a normalized medication list. This is challenging due to the differences in the use of medical terminology from health care professionals and user conversations coming from the lay public. We begin with mapping descriptive medication phrases (DMP) to standard medication names (SMN). Given the prescriptions of each patient, we want to provide them with the flexibility of referring to the medication in their preferred ways. We approach this as a ranking problem which maps SMN to DMP by ordering the list of medications in the patient’s prescription list obtained from pharmacies. Furthermore, we leveraged the output of intermediate layers and performed medication clustering. We present the Medication Inference Model (MIM) achieving state-of-the-art results. By incorporating medical entities based attention, we have obtained further improvement for ranking models.
Shaoqing Yuan, Parminder Bhatia, Busra Celikkaya, Haiyang Liu, Kyunghwan Choi

### Backmatter

Weitere Informationen