1 Introduction
2 Related work
3 Methodology for human activity data processing and feature representation
-
Data input into the system from a dataset containing 3D skeleton information of human joints. These data are captured using an RGB-D sensor and pre-processed before it is used in training activity classifier ensemble model.
-
Features representing activities are computed from the data. This step also includes the selection of optimal features relevant for learning activities.
-
Training selected classifier models through supervised learning of activities. The output of this step is the learned classifier ensemble model ready to be utilised in activity classification.
-
Data input in this stage is similar to that described in the model learning stage. However, this has to be unseen data in order to validate the performance of the learned models. The data can be obtained from a dataset or on-the-fly from an RGB-D sensor.
-
Similar features are extracted from the data to be classified. This stage differs from the model learning stage in that unlabelled activity data is used, while the model learning stage is based on labelled activity data. The features extracted from unlabelled activity data are passed into the learned classifier ensemble model for identification of activity classes.
3.1 3D activity data pre-processing
3.2 Extraction and representation of 3D features
3.2.1 Displacement-based features
3.2.2 Statistical features in time domain
3.3 Feature normalisation
3.4 Feature selection
4 Classifier ensemble model
5 Experiments and evaluation
5.1 Experimental setup
Activity | Number of frames | ||
---|---|---|---|
Actor 1 | Actor 2 | Actor 3 | |
Brushing teeth | 2202 | 1876 | 1781 |
Pick up object | 1804 | 1663 | 1355 |
Sit on sofa | 1489 | 1672 | 2736 |
Stand up | 2126 | 2059 | 2100 |
Total | 7621 | 7270 | 7972 |
Feature description | Feature label |
---|---|
Spatial displacement \(\delta \) between both hands, hands and head, hands and feet, shoulders and feet, hip and feet | 1–9 |
Temporal joint coordinate displacement \(t_{cp}\) | 10–54 |
Temporal joint coordinate displacement \(t_{ci}\) | 55–99 |
Joint coordinate-mean difference \(j_{(i,\mathrm{mean})}\) | 100–144 |
Joint coordinate-variance difference \(j_{(i,\mathrm{var})}\) | 145–189 |
Joint coordinate-standard deviation difference \(j_{(i,\mathrm{std})}\) | 190–234 |
Joint coordinate-skewness difference \(j_{(i,\mathrm{skw})}\) | 235–279 |
Joint coordinate-kurtosis difference \(j_{(i,\mathrm{kur})}\) | 278–324 |
Total number of computed features | 324 |
5.2 CAD-60 dataset and experiment
5.3 Evaluation and discussion
Activity | Performance result | |
---|---|---|
Precision (%) | Recall (%) | |
Brushing teeth | 40.38 | 62.19 |
Pick up object | 100 | 94.69 |
Sit on sofa | 100 | 100 |
Stand up | 54.10 | 35.13 |
Average | 70.65 | 68.43 |
5.3.1 Experimental dataset results and evaluation
Location | Activity | Proposed HAL system | |
---|---|---|---|
Prec. (\(\%\)) | Rec. (\(\%\)) | ||
Bathroom | Rinsing mouth | 100 | 99.97 |
Brushing teeth | 96.97 | 75.16 | |
Wearing contact lens | 54.48 | 92.68 | |
Random + still | 99.98 | 100 | |
Average | 95.72 | 93.41 | |
Bedroom | Talking on phone | 98.58 | 74.55 |
Drinking water | 91.47 | 60.99 | |
Opening pill container | 15.39 | 66.55 | |
Random + still | 100 | 100 | |
Average | 94.37 | 84.01 | |
Kitchen | Drinking water | 92.96 | 74.81 |
Cooking (chopping) | 31.04 | 66.67 | |
Cooking (stirring) | 78.43 | 77.52 | |
Opening pill container | 74.49 | 75.49 | |
Random + still | 100 | 100 | |
Average | 86.85 | 84.76 | |
Living room | Talking on phone | 82.36 | 88.29 |
Drinking water | 86.93 | 74.14 | |
Talking on couch | 94.27 | 100 | |
Relaxing on couch | 100 | 100 | |
Random + still | 100 | 100 | |
Average | 94.37 | 94.41 | |
Office | Talking on phone | 67.06 | 93.42 |
Writing on board | 87.36 | 73.19 | |
Drinking water | 100 | 83.84 | |
Working on computer | 100 | 100 | |
Random + still | 100 | 100 | |
Average | 93.28 | 91.71 | |
Overall average | 92.32 | 89.66 |
Location | Performance result | |
---|---|---|
Precision (%) | Recall (%) | |
Bathroom | 91.36 | 90.37 |
Bedroom | 86.72 | 83.43 |
Kitchen | 86.38 | 83.54 |
Living room | 95.95 | 94.36 |
Office | 94.41 | 90.92 |
Overall average | 90.96 | 88.52 |
5.3.2 CAD-60 dataset results and evaluation
Method | Prec. (%) | Rec. (%) | Extended modality |
---|---|---|---|
67.9 | 55.5 |
\(\checkmark \)
| |
Piyathilaka and Kodagoda (2013) | 70.0 | 78.0 | – |
Yang and Tian (2014) | 71.9 | 66.6 |
\(\checkmark \)
|
Ni et al. (2013) | 75.9 | 69.5 |
\(\checkmark \)
|
Gaglio et al. (2015) | 77.3 | 76.7 | – |
Gupta et al. (2013) | 78.1 | 75.4 |
\(\checkmark \)
|
Koppula et al. (2013) | 80.8 | 71.4 |
\(\checkmark \)
|
Nunes et al. (2017) | 81.83 | 80.02 | – |
Zhang and Tian (2012) | 86.0 | 84.0 |
\(\checkmark \)
|
Proposed HAL system (with all features) | 90.96 | 88.52 | – |
Faria et al. (2014) | 91.1 | 91.9 | – |
Parisi et al. (2015) | 91.9 | 90.2 | – |
Proposed HAL system (with selected features) | 92.32 | 89.66 | – |
Zhu et al. (2014) | 93.2 | 84.6 |
\(\checkmark \)
|
Shan and Akella (2014) | 93.8 | 94.5 | – |
Cippitelli et al. (2016) | 93.9 | 93.5 | – |
Proposed by | Method | Prec. (\(\%\)) | Rec. (\(\%\)) |
---|---|---|---|
Yang and Tian (2014) | Naive Bayes Nearest Neighbour | 71.9 | 66.6 |
Ni et al. (2013) | Latent SVM | 75.9 | 69.5 |
Gaglio et al. (2015) | SVM | 77.3 | 76.7 |
Koppula et al. (2013) | Structural SVM | 80.8 | 71.4 |
Nunes et al. (2017) | RF | 81.83 | 80.02 |
Zhang and Tian (2012) | SVM | 86.0 | 84.0 |
Parisi et al. (2015) | Neural network | 91.9 | 90.2 |
Proposed HAL system | Classifier ensemble | 92.32 | 89.66 |