1 Introduction
2 Materials and methods
2.1 Materials and dataset
2.2 Dataset preprocessing
2.3 Cricket pose estimation
2.3.1 Human-in-the-loop labeling
Hyperparameter | Value |
---|---|
Max stride | 64 |
Filters | 64 |
Filters rate | 2 |
Middle block | True |
Up interpolate | True |
Sigma | 2.5 |
Output stride | 2 |
Input scaling | 0.7 |
Batch size | 8 |
Epochs | 400 |
Plateau min. delta | 1e-08 |
Plateau patience | 20 |
2.3.2 Grid search for parameter optimization
2.3.3 Performance metric for pose estimation
2.4 Stimuli classification network
2.4.1 Sequence preprocessing
2.4.2 Genetic algorithm for neural network construction
-
Initialization. An initial population was randomly generated with a size of 250 individuals, i.e., chromosomes, to trade off execution time and convergence [28]. We distinguished in two types of chromosomes based on our objective, either constructing one-convolutional or recurrent neural network. The chromosomes that comprise the one-convolutional network are constructed with a series of 56 real-coded genes divided in two blocks. The first block consists of six genes repeated five times to indicate: (1) whether the convolutional block is present (0 if absent, 1 if present); (2) the number of filters for the one-convolutional layer (ranging from 16 to 1024); (3) the presence of the batch normalization layer (0 if absent, 1 if present); (4) the activation function to be used (0: sigmoid, 1: swish, 2: tanh, 3: relu, 4: gelu, 5: elu, 6: leaky relu); (5) the presence of dropout (0 if absent, 1 if present); and (6) the dropout rate (ranging from 0 to 0.5, with consideration only given to multiples of 0.05). Subsequently, a gene is used to indicate the type of connection between the convolutional and fully connected layers, with a value of 0 indicating
Flatten
and 1 indicatingGlobalAveragePooling1D
. The second block, also repeated five times, comprises five genes that indicate: (1) whether the fully connected block is present (0 if absent, 1 if present); (2) the number of units (ranging from 3 to 512); (3) the activation function to be used; (4) the presence of dropout; and (5) the dropout rate. The chromosomes that comprise the RNN, on the other hand, consist of 50 real-coded genes. On the other hand, the RNN chromosomes consist of 50 real-coding genes with a slightly different configuration. The first block, also repeated five times, includes five genes that signify the presence of the RNN block, the use of a bidirectional layer, the type of RNN (LSTM or GRU), the number of units (16 to 1024), and the activation function. Unlike the one-convolutional network, there is no need for a gene related to the connection between convolutional and fully connected layers in the RNN. -
Evaluation. In a genetic algorithm, the evaluation is performed through a objected function called fitness function. In our experiment, we proposed the following fitness function that requires maximization:where a stands for “if the training or validation accuracies are less or equal than 1 over the number of classes, or the training accuracy is less than the validation accuracy”; b stands for “if the training accuracy is less than 0.1”; c stands for “no convolutional or RNN layers are present”. The decision to devise a fitness function, as opposed to solely minimizing the validation loss, stems from the recognition that in experiments with limited data and inherent complexity, such as ours, it is possible for a model to achieve a validation loss that is similar to, or even lower than, models with better validation accuracy and exceeding that of random guessing. To address this challenge, the fitness function was constructed to consider the training accuracy, providing an additional metric for assessing the network’s quality. Higher training accuracy values yield fitness values closer to those derived from the validation loss, indicating a type of network quality that can be utilized in subsequent genetic algorithm iterations. Moreover, the$$\begin{aligned} \text {fit(gene)} = {\left\{ \begin{array}{ll} -10 \cdot (1 - \text {train\_accuracy}) &{} \text {if } a \\ -15 &{} \text {if } b \\ -20 &{} \text {if } c \\ -\text {val\_loss} &{} O/W \end{array}\right. } \end{aligned}$$(3)
train
_accuracy
<val
_accuracy
check is incorporated to prevent the genetic algorithm from overfitting on the validation accuracy, which may impede the ability to generalize effectively and harm the training. Lastly, we verify if the training accuracy value is below 0.1 and set a default value in such cases. This measure was taken to avoid any false indications of good models using the first case in Equation 3. -
Selection. The selection algorithm implemented in the genetic algorithm is tournament selection [29], which operates by randomly selecting a fixed number of individuals, in our case two, from the population and subsequently choosing the most fit individual from this group to add to the mating pool. Moreover, in addition to tournament selection, the genetic algorithm also employs elitism as a selection strategy. This strategy involves preserving the top 10 individuals from the current population in the succeeding generation. The use of elitism ensures that the most exceptional individuals are given the opportunity to pass on their favorable traits to future generations, which enhances the chances of achieving the desired solution.
-
Crossover. The genetic algorithm’s crossover algorithm is bounded Simulated Binary Crossover (bSBX), a bounded variant of Simulated Binary Crossover (SBX) introduced by [30]. The probability value of crossover is set to 0.9.
-
Mutation. The function applied for mutation is bounded polynomial mutation with a probability value of 0.5, a mutation operator that utilizes a polynomial function for probability distribution and is bounded to restrict the extent of the changes in the chromosome’s value.
-
Termination. Each genetic algorithm applications ran for 50 epochs before terminating.
2.4.3 Performance metric for classification
3 Results
3.1 Pose estimation results
Max Stride | Filters | Input scaling | Train mAP | Val mAP |
---|---|---|---|---|
32 | 32 | 0.7 | 0.837606 | 0.776263 |
32 | 32 | 0.8 | 0.825532 | 0.782753 |
32 | 32 | 0.9 | 0.825385 | 0.764668 |
32 | 32 | 1.0 | 0.845399 | 0.800341 |
32 | 64 | 0.7 | 0.864492 | 0.795866 |
32 | 64 | 0.8 | 0.865248 | 0.809098 |
32 | 64 | 0.9 | 0.863264 | 0.799638 |
32 | 64 | 1.0 | 0.885996 | 0.829130 |
64 | 32 | 0.7 | 0.824700 | 0.748723 |
64 | 32 | 0.8 | 0.843466 | 0.775747 |
64 | 32 | 0.9 | 0.777974 | 0.643496 |
64 | 32 | 1.0 | 0.849048 | 0.793634 |
64 | 64 | 0.7 | 0.867076 | 0.804768 |
64 | 64 | 0.8 | 0.805469 | 0.732222 |
64 | 64 | 0.9 | 0.863854 | 0.808308 |
64 | 64 | 1.0 | 0.892736 | 0.837392 |
3.2 Performance of the generated classifiers
Model | Iter 1 | Iter 2 | Iter 3 | Iter 4 | Iter 5 | Iter 6 | Iter 7 | Iter 8 | Iter 9 | Iter 10 | Avg |
---|---|---|---|---|---|---|---|---|---|---|---|
cnn | 0.58 | 0.40 | 0.37 | 0.48 | 0.52 | 0.45 | 0.43 | 0.45 | 0.43 | 0.42 | 0.45 |
rnn | 0.60 | 0.38 | 0.36 | 0.36 | 0.47 | 0.48 | 0.43 | 0.45 | 0.40 | 0.45 | 0.44 |
RN18 | 0.46 | 0.41 | 0.33 | 0.33 | 0.33 | 0.33 | 0.33 | 0.33 | 0.33 | 0.33 | 0.35 |
RN34 | 0.38 | 0.33 | 0.33 | 0.33 | 0.41 | 0.33 | 0.38 | 0.33 | 0.33 | 0.33 | 0.35 |
RN50 | 0.33 | 0.33 | 0.36 | 0.33 | 0.33 | 0.33 | 0.33 | 0.33 | 0.45 | 0.36 | 0.36 |