Abstract

Human activity recognition (HAR) is the examination of gestures and actions of humans from various resources such as depth or RGB cameras. In this work, we have designed a dynamic and robust feature selection algorithm for a HAR system, through which the system accurately recognizes various kinds of activities. In the proposed approach, we employed mutual information algorithm, which selects the prominent features from the extracted features. The proposed algorithm is the expansion of two methods like max-relevance and min-redundancy, respectively. This method has the capability to gather the assets of various extraction algorithms. But the procedure of selection may be unfair due to the dissimilarity between the classification power and redundancy of the features. To resolve this type of unfair selection, we stabilize both parts through the proposed algorithm that has autonomous upper limit of the mutual information function. Likewise, for the feature extraction and recognition, we used the symlet wavelet transform and hidden Markov model, respectively, for action classification. The proposed algorithm has been justified on depth-based database which has thirteen kinds of activities under comprehensive set of experiments. We showed that the proposed feature selection method achieved best classification accuracy against existing works.

1. Introduction

Human activity recognition (HAR) refers to a technique that a computer machine utilizes to automatically classify what action(s) is/are being executed, given a series of activity frames. Video-based activity classification has achieved massive attention in healthcare telemedicine since last two decades [16], somewhat because of its applications in various areas like video surveillance, activity analysis in healthcare, and human-computer interaction (HCI) [7]. Activity recognition may utilize in healthcare and telemedicine in order to improve the quality of lives of elderly patients, through which the clinicians and medical experts remotely monitor the daily routines of the corresponding patients and may recommend suitable recommendations for them.

Commonly, the HAR system is the classification of activities performed by various subjects against RGB or depth cameras. In HAR, activities recorded by color cameras are disturbed by lighting effects, occlusion, and dynamic backgrounds in real environments [8]. Activity recognition through wearable sensors like accelerometer or gyroscope has significant performances in healthcare and telemedicine [9]. However, the activity collection using accelerometers and gyroscopes reduces the comfort of human body and dismisses the naturalness of HCI [10]. Therefore, the focus of this study is to employ Kinect depth camera for data collection.

Therefore, HAR is one of the genuine modules in personalized healthcare and telemedicine systems, for the aged and disabled people [11]. In order to monitor the daily routine of aged people, various types of cameras might be employed in smart homes or hospitals to get activities from videos. Based on the survey of WHO, the number of the aged people is quickly growing in the whole world, and their living homes need much resources such as human and healthcare expenses. Hence, the intensive care services are required to tackle the wide use of resources in order to improve the living styles of the aged people [12]. Lots of studies have been recommended for the intensive care services that might reduce the death rates for the aged people. For instance, in Europe, it is assessed that the survival ratio of aged people is growing while getting the intensive care services instead of getting care at their homes [13]. Therefore, a robust and efficient depth-based approach is required for HAR systems that monitors day and night the actions of aged people and delivers them an intellectual living home which makes their lives easy at home.

A recent method based on spatial temporal transformer network was proposed in [14], which is the representations of addictions among joints against the operator of the spatial temporal transformer. In their model, they recognized the intra-state interactions between various parts of the body by employing spatial self-care (SSC). On the other hand, a temporal self-care module (TSC) technique was used to consider the inter-state associations. However, it is very difficult for such system to embed the skeleton data in the structure of graph, which means that this approach is not considering the graph convolution of the skeleton sequences in processing the specified motion features [15]. Similarly, an integrated method was designed in [16] for a HAR system, which was the combination of recurrent neural network and a heuristic feature selection method (like) meta-heuristic optimization. However, recurrent neural network is very difficult to train that cannot process long sequence of frames. Moreover, this approach has a common limitation like violation of the problem restrictions, which caused research community to resolve the optimization concerns with distinct variables [17]. An improved version of the spatiotemporal-based method was proposed in [18] for the activity recognition systems. In their systems, string detector was utilized in order to perceive the spatial position of every frame. However, the result of this approach may degrade due to considering the short length time sequences that make the statistical performance of correlation doubtful, and this is one the major limitations for this approach [19]. On the other hand, a novel approach was proposed in [20], which was based on motion information to categorize various kinds of activities. They employed Laplacian pyramid depth motion images to produce the multi-scale illustration of activities. However, the biggest limitation for this approach is its absence to preserve the topological structure of the frame, which is mismatched from the original frames because of the contraction procedure [21].

In this work, we have designed a dynamic and robust feature selection algorithm for a HAR system, through which the system accurately recognizes various kinds of activities. In the proposed approach, we employed mutual information algorithm, which selects the prominent features from the extracted features. The proposed algorithm is the expansion of two methods like max-relevance and min-redundancy, respectively. This method has the capability to gather the assets of various extraction algorithms. But the procedure of selection may be unfair due to the dissimilarity between the classification power and redundancy of the features. To resolve this type of unfair selection, we stabilize both parts through the proposed algorithm that has autonomous upper limit of the mutual information function. Similarly, we thoroughly estimated the β value by taking the enlargement and diminishing issue, which means that the proposed algorithm utilized the filtering methodology in order to take advantage of low computational cost. Therefore, the proposed feature selection algorithm overcomes the limitations of the existing designed approach; hence, it delivers much better performance compared to the existing works, which is suitable for inclusive healthcare domains. We justified our designed approach on realistic depth-based activity database that was recorded by Kinect depth camera in a static and controlled situation. Based on comprehensive experiments, the proposed feature selection algorithm achieved significant accuracy compared with existing works.

The entire paper is organized as follows. In Section 2, we briefly prescribe the latest related works with their limitations. In Section 3, we comprehensively explain the concept of the proposed feature selection method. In Section 4, we provide the procedures for the proposed HAR system. In Section 5, we present a series of multiple experiments followed by some discussion. Finally, in Section 6, we summarize the paper followed by some future directions.

2. Literature Review

Human activity states the movement of one or many parts of the human body, which might be static or composed of numerous primitive actions accomplished in some successive order. Lots of work have been done for feature extraction; however, most of them have their own limitations. Also, there might be some redundant feature which may decrease the accuracy of a typical HAR system; therefore, feature selection performs a fascinating role in HAR systems and helps them to achieve significant accuracy.

Thus, the author in [22] developed a random projection-based approach for HAR system against depth information. This approach has the ability to classify the human activities along view direction. Furthermore, this approach was not affected from any kind of environmental factors such as illumination. However, the random projection-based conversion matrix is produced without taking the fundamental structure of the original data that commonly leads to comparatively high misrepresentation [23]. Similarly, a new model was developed in [24] for the categorization of human activities. In their approach, they suggested new features from the angles derived from the human body parts in order to extract best features, which relies on weight descriptor considered from activity frames against various poses. Then, these extracted features are utilized as the input for the classifier. However, this approach fails to show the significant performance in depth activities such as clapping and boxing that cannot be differentiated in RGB cameras [25].

A state-of-the-art hierarchical approach was developed in [26] for depth-based sequences that might combine simultaneously the spatial and temporal information at various temporal scales. However, wrapper communicates with classifier that might have a risk of overfitting in the model [27]. Likewise, a wrapper-based feature selection method was developed in [28] that claimed highest recognition rate; however, it did not consider the global optimum and mainly relied on local optimums [29]. Furthermore, the authors of [30, 31], respectively, used wavelet transform along with hidden conditional random fields for HAR systems They utilized full covariance matrix in order to tackle the existing limitations. However, this approach was time consuming because of full covariance matrix. On the other hand, a latest study compared various kinds of classifiers against HAR in order to show their respective performances. They claimed that radial basis kernels and support vector classifier achieved the highest accuracies [32]. Practically, this method is simple and strong method but very difficult to train [33]; also, SVM neglects the temporal information among the activity frames, and hence every frame is projected to be statistically autonomous from the remaining [34].

Another recent approach was proposed in [35], where the human silhouettes were extracted from various activity frames, and then from them a specific sequence is generated using motion detection and tracking techniques. They utilized histogram of the oriented gradient in order to recognize the human activities. However, histogram of oriented gradient is very sensitive to the frame variation; hence, this method is not a better option for activity recognition [36]. Similarly, a two-stage approach was suggested in [37] for activity recognition. In the first stage, deep neural network was used due to which a gesture detector is executed to find the correct location of the pixels of significant key points of the body, while, in another stage, the architecture of a neural search algorithm is utilized to determine an optimum network structure which further utilized in the model for the evolution of spatio-temporal against the corresponding gestures. However, deep neural network has a common disadvantage, and it is very much expensive to train because of the complex data models. Moreover, the evolution of the spatiotemporal approach may degrade the results of the system due to considering the short length time sequences that make the statistical performance of correlation doubtful, and this is one of the major limitations for this approach [19].

A novel convolutional neural network-based HAR system was designed in [38], which employed two types of input descriptors such as depth motion image (which gathers successive depth frames) and the moving joint descriptor that indicates the movement of joints against time. However, depth motion images are less accurate because of real-time latency. Moreover, the major limitation of moving joint descriptor is obstinacy and lesser accuracy [39]. Similarly, a novel approach was designed in [40] that extracts high level movement features by scale invariant feature transform and deep features to classify the activities [40]. However, there were some errors in the corresponding feature points attained by this method, and because of this, the accuracy was gradually decreased [41].

Therefore, we have designed a dynamic and robust feature selection algorithm for a HAR system, through which the system accurately recognizes various kind of activities. In the proposed approach, we employed mutual information algorithm, which selects the prominent features from the extracted features. The proposed algorithm is the expansion of two methods like max-relevance and min-redundancy, respectively. This method has the capability to gather the assets of various extraction algorithms.

3. Proposed Feature Selection Approach

The overall procedure of the proposed technique is presented in Figure 1.

The proposed technique is based on normalized mutual information, which is inherited from the maximum-relevance and minimum-redundancy approach. The proposed technique has one of the limitations to diminish the domination of redundancy and significance. Therefore, we expand this method by designing a scheme in order to integrate the functionality to propose mutual information utilized in this technique. For this purpose, we consider the random variable for the upper limit of mutual information by defining a bound. Assume that (i, j) is the pair of random variables that is considered as an input with their peripheral and combined distributions, which is calculated as follows:where M represents the collected mutual information of I and J, while E indicates the function of entropy which is given below:

Substitute (1) and (2):

The proposed approach attains the predicted error of quantization through computing every feature under the similar number of levels (N), which is described by Algorithm 1.

It may be perceived that the number of the quantization levels slowly increases till the error of quantization reaches lesser than the predefined threshold . Based on multiple experiments, the threshold value is set to 0.08 in our assessment. During evaluation, a value lower than 0.08 leans towards the increment in computational cost due to which we might not improve the performance. Hence, we set the minimum value of threshold to 0.08, which helps the algorithm to keep improvement without extra computational cost. For each feature I where ,where is the upper limit of the independent feature which does not depend upon I and J. Likewise, is employed to diminish the concern of inacceptable normalizing weights. Hence, we estimate the standardize head-to-head mutual information, , as

Input: Y, I(1...K), and
Output: Z, and J(1...K)
begin
 Let Z = 2
while 1 do
  MaxEr= −1e + 8
  for x = 1 to K do
    lower= min(I(x))
    upper= max(I(x))
    Step= (upperlower)/Z
    Fac= [lower:Step:upper]
    BC= [lower-Step, lower:Step:
    upper][J(x), QEr] =
    Quantiz(I(x)), Divider, BC
   if QEr>MaxErthen
   MaxEr = QEr
  end
  end
  if MaxEr < then
   Break
  end
  Z=K+ 1
end
end

where in (5) lies within the range of [0,1]. Therefore, the class feature of mutual information (FCMI) is divided by in order to attain a balance among the redundancy and the relevance, which is described asThe potential feature is calculated through combining (5) and (6), as given below:where S = I1, I2……, Ix is the feature set. In this work, we also integrated the normalized FCMI along with in order to evaluate the consequences of the inequality between the relevance and redundancy. Therefore, the best features are calculated as

The entire selection procedure depends upon the greedy forwarding approach, which is described in Algorithm 2.

Input: Zs, K, Ixy, Cy,
Output: Sk
begin
S = 
for k=  1 to K do
  =St. Dev. of Ik
  k = Average value of Ik
  Ik = Ikk
  In = Ik/k
end
 = Quantiz(I)
for i = 1 to K do
  for j= 1 to Z do
   Calculate fp(j)
  end
  s = argmaxis(fp (i)
  S = S s
end
end

As described before, we utilized symlet wavelet transform and hidden Markov model (HMM) for feature extraction and recognition, respectively, from [30].

4. System Justification

The developed feature selection method was verified and validated against depth-based database. The entire experiments were performed in MATLAB in controlled environment (under laboratory settings) with fixed camera settings. Every subject performed the corresponding activities based on the training provided by the instructor. The whole experiments are performed in the following settings.

4.1. Depth-Based Dataset

In this depth-based database, there were totally 670 video sequences that were collected using Kinect depth camera. The entire database was created with the help of 70 subjects (male and female) who were the students of the university and their age range was 25 to 60. They performed thirteen various activities such as bending, jacking, place-jumping, running, side movement, skipping, walking, one hand waving, two hand waving, jumping, clapping, boxing, and sitting and standing. The subjects performed the activities under the instructions provided by the corresponding instructor. In this database, some activities are recorded in hospitals from the stroke patients. Entire frames were in different sizes, so we have normalized them by reducing their sizes to 100 × 100.

4.2. Procedure of Experiments

(i)In the first experiment, the proposed feature selection method was tested against the depth-based database to check its significance. Based on multiple experiments, we selected and employed the 5-fold cross validation scheme for this experiment, which means that out of five activities, four activities were utilized for training and the remaining one is used for testing. This procedure will be repeated five times due to which every activity will be used once for training and testing.(ii)In the second experiment, we employed existing different types of feature selection methods in the HAR system; however, we did not utilize the proposed feature selection method. The reason for doing this type of comprehensive experiments is to show the importance of the proposed feature selection method.(iii)Finally, in the third experiment, we compared the significance of the proposed HAR system under the presence of the proposed feature selection method against latest activity recognition approaches.

5. Results and Discussion

5.1. First Experiment

As described before, in this experiment, we represented the significance of the proposed feature selection method on depth-based database. The whole results are shown in Table 1.

The proposed HAR system under the presence of the proposed feature selection method showed best recognition rate on depth-based database as illustrated in Table 1. The reason is that in the proposed feature selection technique, the potential of the features is calculated through information measurement. Likewise, the greedy forward selection method was utilized, in which every feature is added to the feature set depending upon its value which approves that the proposed technique is more vigorous than existing works regarding recognition rate. In conclusion, the proposed feature selection algorithm overcomes the limitations of the existing designed approaches; hence, it delivers much better performance compared to the existing works.

5.2. Second Experiment

As described before, in this experiment, we represented the importance of the proposed feature selection method in HAR system. Therefore, we utilized various types of existing feature selection methods, but we did not use the proposed technique. We employed the wrapper method, pointwise mutual information, relief-based method, greedy forward selection, greedy backward selection, particle swarm optimization, targeted projection pursuit, scatter search, variable neighborhood search, probabilistic distance, entropy, consistency-based feature selection, correlation-based feature selection, recursive feature elimination, and genetic algorithm. The entire results for this set of experiments on depth-based database are shown in Tables 216, respectively.

As illustrated from these sets of experiments, the proposed feature selection technique has a significant part in the high classification of the HAR system. When the proposed method has been removed for the system, then abruptly the accuracies are decreased. The results shown in Tables 216 confirm the issue of highest resemblance between the features of various activities. These results approved our examination and offered strong evidence, letting us to accomplish that the proposed algorithm selects the best set of features that improved the recognition rate of HAR systems.

5.3. Third Experiment

Lastly, in this experiment, the weighted average classification accuracy of the proposed work has been compared with existing latest studies. Most of the existing systems are implemented using the same settings as explained in their corresponding articles. The overall results for this experiment are presented in Table 17.

It is indicated in Table 17 that the proposed algorithm accomplished highest accuracy compared with latest existing approaches. This is because the proposed algorithm selects the informative features from the activity frames in the presence of various environmental factors that may cause misclassification occlusion.

6. Conclusions

The activity recognition is an important and significant concern in the field of healthcare and telemedicine. The common HAR system involves three basic modules: segmentation, feature extraction, and recognition. In this research, we have studied some latest mutual information feature selection techniques. We described the shortcomings of every technique, and hence we designed an algorithm that is inherited from the standardized mutual information-based feature selection along with two enhancements: the regularization of the mutual information and the independent feature regularizing weights. We used a mutual information algorithm to choose the most prominent features from the extracted features in the suggested method. The suggested approach is a combination of two methods: max-relevance and min-redundancy. This approach can collect the assets from a variety of extraction algorithms. However, due to the disparity in classification power and feature redundancy, the selection approach may be unjust. As a noble solution to this problem of biased selection, we stabilize both sections using the suggested approach which has an autonomous upper limit of the mutual information function. Similarly, we comprehensively calculated the β value by considering the enlargement and diminishing issue, implying that the proposed algorithm used filtering methodology to benefit from low computing cost. As a result, the suggested feature selection algorithm overcomes the limits of the existing planned techniques, so it provides significantly better performance than previous studies, making it ideal for inclusive healthcare domains. On a realistic depth-based activity database captured by the Kinect depth camera in a static and controlled setting, we justified our designed approach. Similarly, for the feature extraction and recognition, we employed the symlet wavelet transform and hidden Markov model, respectively, for action classification. Based on extensive testing, the suggested feature selection algorithm outperformed previous work by a wide margin.

In real domains, there are some parameters like dynamic background, lighting effects, blurring, and spontaneous factors, which may reduce the accuracy of the proposed feature selection algorithm. Therefore, in the future, the validation of the proposed algorithm needs some investigation to keep the same accuracy in real domain under the presence of the aforementioned environmental factors. Moreover, we will employ the proposed technique in healthcare domains in order to resolve the privacy concern.

Data Availability

The data used for this study and simulation will be provided on demand.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was funded by the Deanship of Scientific Research at Jouf University under grant no. DSR–2021–02–0343.