Experimental design
This section explains the experimental setup of the experiments carried out using both proposed pipelines, i.e., the image enhancement pipeline and the classification pipeline.
As the original images were too large and slowed down the training too much, they were resized to 400 × 300 pixels.
The implemented residual CNN had a 14-layer structure, separated in different blocks. The network for image enhancement was configured to train a maximum of 50 epochs of 100 iterations each. An epoch is one complete presentation of the data set to be learned, while an iteration is the number of batches needed to complete one epoch. The batch size is the total number of training examples present in a single batch. The higher is this parameter, the more memory space is needed. In addition, it was also designed to save the model every time the loss value decreased, and also to stop training if the loss value did not improve within 5 epochs (to avoid the over-fitting problem).
To train the residual CNN that was part of the image enhancement process, a dataset of 13,548 images was used to create the training and test data sets for the new datasets. 80% of the images (10,838 images) constituted the training dataset, while the remainder was used to test the network. Different datasets were incrementally generated, to test the techniques that best affected the final classification results. For the purpose of simplify the names of the datasets used and generated and used in this paper, the following table has been created in which all the names and their description are listed in Table
2.
Table 2
Simplified names of datasets used and their description
RAW | Dataset0 | Dataset with the original images |
DBY | Dataset1 | Dataset with the original images, but in colour (after the application of the debayering technique) |
DBY + WB | Dataset2 | Dataset containing images of the DBY dataset to which the White Balance (WB) method has been applied |
DBY + WB + GC | Dataset3 | Dataset containing images of the DBY + WB dataset to which the Gamma Correction (GC) method has been applied |
DBY + WB + GC + CLAHE | Dataset4 | Dataset containing images of the DBY + WB + GC dataset to which the CLAHE algorithm has been applied |
DBY + WB + ResidualCNN | Dataset5 | Dataset generated by the trained Residual CNN, using as input the colour images of the DBY dataset to which the WB method has been applied |
DBY + WB + GC + ResidualCNN | Dataset6 | Dataset generated by the trained Residual CNN, using as input the images of the DBY + WB + GC dataset |
DBY + WB + GC + CLAHE + ResidualCNN | Dataset7 | Dataset generated by the trained Residual CNN, using as input the images of the DBY + WB + GC + CLAHE dataset |
To evaluate the CNN-generated underwater images, we chose SSIM, PSNR, UIQM and UCIQE values.
As for the classification, the size of the collected set was only of 6972 elements. As there were not too many images, we decided to apply data augmentation techniques to 80% of the images (a total of 5573 images), which are the ones that made up the training set. After applying data augmentation techniques, the training set increased from 5573 to 35,020 images, obtaining 3502 samples per class.
All the selected classifiers were tested by tenfold cross-validation by considering that the elements of each class were distributed evenly in each fold [
45,
58,
90]. The performance of the models was evaluated by the accuracy, the loss (both train and testing), the AUROC average scores [
34], as well as by the confusion matrix. The accuracy and AUROC values were calculated by the multiclass implementation from Scikit-learn, which estimates the metrics for each label, without considering the label imbalance.
All experiments were conducted in Python. The implementation of all the classical algorithms used is within the Scikit‐learn library [
74] (
https://scikit-learn.org), while the neural networks were implemented with the Keras and Tensorflow libraries. The environment used for training the selected algorithms and the defined models was Google Colaboratory (also known as Colab). It operates currently under Ubuntu 18.04 (64 bits) and it is provided by an Intel Xeon processor and 12 GB RAM. It is also provided with Nvidia K80, T4, P4 and P100 GPUs.
On the one hand, a classification of 4 datasets (the original-coloured dataset, Dataset1, and the three generated by the network, Dataset5, Dataset6 and Dataset7), whose number of classes amounts to 7, was carried out. In addition, the dataset for which the lowest loss value and the highest accuracy and AUROC test values were obtained was selected for another classification, in which three more classes will be added to make a total of 10.
Discussion
In this study, we presented a novel pipeline for the enhancement of dark deep-sea images and the automated classification of visible fauna, in footage taken by a crawler as a moving benthic platform on a changing background. We elaborated an enhancement procedure that allowed to improve the animals classification capability, hence the functionalities previously achieved with static cameras at cable observatories [
57]. For this purpose, different image enhancement techniques were first investigated and then applied to generate different datasets. Then, a residual network was modelled and trained with these datasets in order to generate a new set of enhanced images. Although the evaluation metrics of the image sets generated by the residual network could be improved, the best values of test accuracy, loss and AUROC in classification were achieved with one of the datasets generated by the neural network, which is the principal objective.
The residual convolutional network shows some problems with some hues when generating new images. For example, orange colours have been generated with bluish hue, transforming them into pinks. This is probably because these colours do not appear very often in the whole set of images. The UIQM and UCIQE values were slightly higher for the input images of the network. This may be because the images generated by the network are more blurred than the input images, as are the images transformed by applying techniques such as white balance, gamma correction and CLAHE. Similar studies, e.g. [
88,
89], applied similar methods to pre-process the images, such as CLAHE, in order to obtain a mask on the Norway lobster (
Nephrops norvegicus) detection, and then apply CV techniques and a Mask-RCNN for detection and segmentation comparison. Other studies in which the dataset was also obtained by an ROV, as in [
69], we can observe that although the images are not as dark as those in the present paper, they do have that characteristic blue-green colour of the water. The method proposed in [
16] provides colour enhancement and restoration to marine images, and although they tested it on not very dark images to which turbidity has been added, it is intended for AUVs and ROVs. They obtained a PSNR value of 21.840 dB, while ours was 20.117 dB. In [
78] authors present an image enhancement method in which they also apply CLAHE, as well as other techniques, such as gray-level co-occurrence matrix (GLCM) feature extraction. However, the images they used were obtained from a dataset whose characteristics are totally different from the one used here, since it is collected by static cameras and at a shallower depth, since the images have natural light.
For the classification of marine species, two different types of methods were used in this study, i.e., classical algorithms and DL techniques. Data augmentation techniques were applied to the species with the most elements, and on the other hand, classes with insufficient number of elements were discarded. Similar studies also detected the advantages of DL over ML methods in marine environments (X. [
19,
80,
83,
95]. However, the datasets used by these studies were obtained in coral reef areas, where there is still some sunlight, while the dataset we used was obtained at depths of more than 800 m, where visibility is low. For deeper water applications mimicking environmental conditions similar to those where the crawler is deployed, [
88,
88,
89,
89] evidenced that advanced DL techniques, such as segmentation networks, can be an efficient tool for monitoring catches in pelagic fishery. In addition, the crawler generated clouds of sand and prevented the observation of species and objects on several occasions, which would not happen with a fixed camera.
For the test values, there are notable differences both among datasets and among neural networks. Regarding the classical algorithms, the RF2 was clearly the model that obtained the best results on all the datasets. With regard to the neural networks, it can be seen that, in the case of the CNNs, good results were not obtained, since CNN-1 and CNN-2 networks obtained quite low validation accuracy and AUC values, while CNN-3 and CNN-4, due to the loss values, were probably over trained. The deep neural networks (DNN-1 to DNN-4) achieved better AUROC, accuracy and loss test values than other algorithms. However, the sequential networks DNN-1 and DNN-2 have performed rather poorly for datasets of 7 and 10 classes, reaching loss values as high as 1.2662 and 1.4685 respectively. On the other hand, the other two deep networks (DNN-3 and DNN-4), have had a good performance and result, obtaining the best value for the test accuracy (0.6644) and the best value for the test loss (1.1330) for 7 classes datasets, while for the 10 classes dataset DNN-4 obtained a test accuracy value of 0.7578 and a test loss value of 0.8389.
In [
88,
89] authors also used CV techniques for the classification of marine species, which they compared with the results obtained by a Mask R-CNN. In their case, they obtained higher results with CV techniques, although in a later work they improved their classification with the segmentation network in a dataset of four classes [
88,
89]. In [
78] authors performed classification on the enhanced images using SVM, DT and k-NN, among others. The SVM achieved an accuracy value of 79.66%, while the k-NN obtained a value of 72.96% and the DT of 64.03%. They also used a backpropagation neural network (BPNN), which achieved an accuracy of 93.73%. The values achieved in the present study were quite close to those, despite the fact that the dataset is totally different, more complex and darker.
If we compare the results obtained for the 7 classes dataset and the 10 classes dataset, we can see that the best results were obtained for the dataset with more classes. This may be due to the fact that the classification pipeline in [
57] detects the elements that move along the different images, and that these elements were not correctly classified because they did not correspond to any class. In the 10 classes dataset, these extra classes have been added and those elements can be assigned to a class and then be correctly classified.
Compared to the results of [
57], the results here obtained achieved better metrics. As for the ML methods, RF2 was the algorithm that obtained the best test values for accuracy and AUROC in both investigations. In this paper the test values of 0.7568 for accuracy and 0.8691 for AUROC were reached, while in [
57] the accuracy value for this algorithm was 0.6527 and that for AUROC was 0.8210. Regarding DL techniques, in [
57] the network that obtained the highest values was the DNN-4, with test accuracy value of 0.7140 and an AUROC value of 0.8503, whereas here higher values have been achieved with several networks. DNN-4 obtained a test accuracy value of 0.7578 and an AUROC value of 0.8389. DNN-3 also obtained, anyway, higher values than the previous paper. DNN-1 and DNN-2 also outperformed the previous results but obtained high error values. We can state definitely that the results obtained in this paper outperform those of [
57].
The next technological application scenario
Marine robotics is creating platforms that can be transformed into intelligent tools for autonomous ecosystem monitoring needs [
4], as is nowadays required for monitoring an increasing number of marine activities, e.g., oil extraction and mining [
46,
87,
91]. Implementing routines for automated individual tracking and classification and later integrate all those component routines into an operational hardware and software product, is a key aspect to improve the ecological monitoring functionality of all mobile platforms, including the crawler [
8,
35,
63,
98,
104]. The proposed approaches and results represent a first step toward the establishment of an autonomous software focused on image processing to be installed on-board of the crawler. This represents a critical bottleneck for full autonomous monitoring of deep-sea ecosystem functions and services, by this class of IOVs.
Extracting the essential information on species presence, counts (as an indicator for abundance) and derived spatiotemporal changes, picturing community dynamism, is a time-consuming manual process. The tasks proposed in this research with the state-of-the-art of CNN algorithms indicate the possibility to allow embedded pre-processing of acquired images for object tracking via image quality enhancing/rendering with CNN approaches. At the same time, a refinement of species classification procedure is available with the post-processing of imaging products on a server with the help of newly added morphological descriptors [
1,
2]. Moreover, the integration of all the detection and classification connected with the video capture processes would allow to transmit and store only the frames where the algorithms detect some kind of labelled species.
Conclusions and future work
The designed neural network, in combination with the detection and classification pipeline, generated enhanced underwater images leading to a more accurate classification process. The improvement and enhancement of underwater images also play an important role in feature detection, since a clear improvement of the images could reduce the subsequent work of feature detection and obtains better classification rates. We demonstrated that a neural network is a good option for generating enhanced images automatically, without the need to apply multiple techniques to an image. Due to their particular characteristics, enhancement of underwater images prior to detection and classification is indispensable for the improvement of classification results, regardless of the use of traditional classifiers or DL approaches.
As future work in this line of research, the current developed CNN for image enhancement could be modified by adding or removing layers, modifying the number of units in each layer, or applying different parameter settings, i.e., modifying the number of epochs, the batch size or using different activation functions. Another step which would be of interest for practical applications would be the optimization of image quality vs. computational cost when applying these procedures to the original-sized (1600 × 1200 pixel) images, in order to minimize the processing time without compromising the extracted amount of valuable information. As for classification, to improve the results, other strategies like transfer learning, or even object detection networks and segmentation networks could be used. However, the amount of floating particles in some images, and the small size of some species, could hinder the performance and results of this type of networks.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.