Top

The Journal of Supercomputing

Published in:

Open Access 21-04-2023

Enhanced aerial vehicle system techniques for detection and tracking in fog, sandstorm, and snow conditions

Authors: Amira Samy Talaat, Shaker El-Sappagh

Published in: The Journal of Supercomputing | Issue 14/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

Unmanned aerial vehicles are rapidly being utilized in surveillance and traffic monitoring because of their great mobility and capacity to cover regions at various elevations and positions. It is a challenging task to detect vehicles due to their various shapes, textures, and colors. One of the most difficult challenges is correctly detecting and counting aerial view vehicles in real time for traffic monitoring objectives using aerial images and videos. In this research, strategies are presented for improving the detection ability of self-driving vehicles in tough conditions, also for traffic monitoring, vehicle surveillance. We make classification, tracking trajectories, and movement calculation where fog, sandstorm (dust), and snow conditions are challenging. Initially, image enhancement methods are implemented to improve unclear images of roads. The improved images are then subjected to an object detection and classification algorithm to detect vehicles. Finally, new methods were evaluated (Corrected Optical flow/Corrected Kalman filter) to get the least error of trajectories. Also features like vehicle count, type, tracking trajectories by (Optical flow, Kalman Filter, Euclidean Distance) and relative movement calculation are extracted from the coordinates of the observed objects. These techniques aim to improve vehicle detection, tracking, and movement over aerial views of roads especially in bad weather. As a result, for aerial view vehicles in bad weather, our proposed method has an error of less than 5 pixels from the actual value and give the best results. This improves detection and tracking performance for aerial view vehicles in bad weather conditions.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Yolo

You only looks once

MSR

Multi-scale Retinex

MSRCR

Multiscale Retinex with color restoration

MSRCP

MSR with chromaticity preservation

1 Introduction

An important application for computer vision is aerial-based object detection [1]. Object detection remains a difficult topic in computer vision, despite recent breakthroughs in the relevant literature [1]. Existing object detection algorithms perform even worse on aerial images because the object recognition problem in aerial images is more difficult than the detection task in ground-based images. There are several reasons for this, including: (1) the lack of large datasets with high object variance; (2) the higher scale and orientation variance in images; (3) the shape and texture differences between ground and aerial images; and (4) objects can appear in any direction, although in aerial images they generally appear in a narrow range of directions (mostly oriented perpendicular to the ground) [2]. Researchers, inspired by the advancements in computer vision technology, began to seek an efficient and effective aerial view vehicle detection model based on image processing [2]. Object detection, as a fundamental task in computer vision, has advanced tremendously, but it remains a difficult task, particularly from the perspective of a UAV due to the small scale of the objects. Therefore, feature extraction and discovery become more difficult to distinguish because of the small size of the features as well as the angle of view. UAVs are increasingly being used for monitoring and surveillance tasks because of their flexibility and mobility [3], as well as their ability to cover large areas at varying heights while providing high-resolution images and videos. There are several uses for UAVs, including tracking [4], disaster management [5], Intelligent Transportation Systems (ITS), and smart parking [6]. UAVs produce real-time, high-resolution images with large viewing fields at a low cost.

Several strategies for dealing with the problem of vehicle detection in aerial images and the associated difficulties have been presented in the literature. The fundamental challenge with aerial views is the tiny size and vast number of objects to identify, which can lead to information loss during convolution processes, as well as the difficulty in distinguishing features due to the angle of view.

Deep learning, particularly convolution neural network (CNN) advancements in computer vision tasks, has resulted in an impressive improvement in classification and object recognition accuracy. In this regard, Liu et al. [7] solved the vehicle detection problem from Google Earth images using a technique based on a hybrid deep convolutional neural network (HDNN) with a sliding window search. To detect and count vehicles in high-resolution UAV images of urban areas, Bazi et al. [8] employed a pre-trained CNN in conjunction with a linear support vector machine (SVM) classifier. Several CNN algorithms and architectures, including Yolo and its variations [9‐11], R-CNN and its variations [12‐14], have been proposed. Darrell et al. [12] proposed R-CNN, a region-based CNN that combines the region-proposals system with CNN. The same authors enhanced their technique by overcoming RCNN's limitation of constructing a convolutional feature map with the image rather than the regions as input. The convolutional feature map is then used to identify the region of proposals. Wang [15] shows a vehicle detection method using the SSD model after HSV transformation. The authors of [6] discussed the difficulties associated with using aerial images for car detection, notably the issue of small objects and complicated backgrounds. To overcome the problem, they suggested a Multi-task Cost-sensitive-Convolutional Neural Network based on Faster R-CNN. Other researchers tackled the problem by using deep learning techniques on aerial images in a number of scenarios, such as object recognition and classification [16, 17], semantic segmentation [18‐20], and generative adversarial networks (GANs) [21]. Hoang [22] used Yolov5 in vehicle detection for speed identification in smart cities. In [23], Rhizma et al. explored the topic of automated car counting in CCTV photos obtained from four datasets of varying resolutions. They investigated both classic image processing algorithms and deep learning neural networks using Yolov2 [11] and FCRN [24]. Their findings suggest that when applied to higher resolution datasets, deep learning algorithms produce significantly superior detection outcomes. There is a shortage in the amount of work done in aerial view vehicle detection and tracking in bad weather.

Humayun [25] detecting vehicles in a scene in multiple weather scenarios including haze, dust and sandstorms, snowy and rainy weather both in day and nighttime. Using YOLOv4 and Spatial Pyramid Pooling Network. The proposed architecture uses CSPDarknet53 as baseline architecture modified with spatial pyramid pooling (SPP-NET) layer and reduced Batch Normalization layers. They augment the Dataset with different techniques including Hue, Saturation, Exposure, Brightness, Darkness, Blur and Noise. This not only increases the size of the dataset but also make the detection more challenging. The model obtained mean average precision of 81% during training and detected smallest vehicle present in the image.

Punagin [26] they made vehicle detection on unstructured roads based on transfer learning. Using BDD with YOLOv2 network. The data set have different weather conditions such as snowy, rainy, foggy in distinct scene types as highway, residential, and city streets. The model gives mAP of 72.4% on Berkeley Deep Drive Dataset (BDD) and when the same model is tested in an unstructured environment using Indian Driving Dataset (IDD), it gives mAP of 56.76%.

Hnewa [27] 2020 Object Detection Under Rainy Conditions for Autonomous Vehicles using Faster R-CNN and YOLO-V3 with generative adversarial network that is called “DeRaindrop” to remove raindrops from images. Shortcomings The lack of data, and especially annotated data, that captures the truly diverse nature of rainy conditions for moving vehicles is arguably the most critical and fundamental issue in this area. The mAP is 52.62% For Yolo and 44.45% for Faster R-CNN.

Ajinkya [28] presented a car detection using Yolo algorithm in normal conditions to detect five classes (cars, trucks, pedestrians, traffic signs, traffic lights). The neural network was trained for 120 epochs which gives mAP of 46.6%. The accuracy of this algorithm can be increased with training on the bigger and more diverse datasets that cover different weather and lighting conditions.

Uzar [29] presented the performance analysis of YOLO versions for automatic vehicle detection from UAV images process was performed for three classes (car, bus, and minibus). The YOLOv5m provides the mAP0.5 of 84% as the highest value. The drawback there were incorrect or missing vehicle detection in the images obtained when trees covered the vehicles the errors due to shadows, also vehicles were not detected correctly when dark colored vehicles such as black and gray glow under the influence of sunlight and have the same pixel gray value as the road object.

In this study, we look to improve the detection ability of self-driving vehicles in tough conditions, traffic monitoring, vehicle surveillance, and tracking, a drone is used to recognize and count vehicles from aerial video streams. A new method for detecting small objects in UAV perspective is presented based on the enhanced Yolov5 method. Our contribution is to combine color correction techniques with deep learning for improving object detection in bad weather, which are then applied to improve trajectory tracking precision and relative movement calculation. In several ways, the current work improves on the literature as follows:

We used two Retinex image/video enhancement methods with two different input sizes of Yolov5 for vehicle detection in tough weather conditions like fog, sandstorm (dust), and snow using several hyperparameter settings and compared the results in detail by displaying the several testing weather conditions for video and images, comparing the tradeoffs between Yolov5 models and Retinex images.
We tracked vehicles movements in videos and drawn vehicles trajectories of traffic paths for bad weather by combining the Yolov5 detected objects with five different techniques (Optical flow, corrected Optical flow, Kalman filter, Corrected Kalman filter, and Euclidean Distance) and comparing the results for bad weather and enhanced image/video techniques.
New methods were evaluated (Corrected Optical flow/Corrected Kalman filter) to get the least error of trajectories.
We calculated the movement of each vehicle on the road in both bad and enhanced weather conditions.
We employed three datasets with varied features for training and testing, whereas earlier research stated above evaluated their approach on a single proprietary dataset. We show that annotation mistakes in the dataset have a large influence on detection performance.

The following is how this paper is organized. Section 2 defines our system's methodology. Section 3 describes the experiments that were carried out and the results that were obtained. Finally, in Sect. 4, is the conclusion of the paper.

2 Methodology

This section goes over our approach (MSRCR-Yolov5) in detail. Our method combines the Multi-Scale Retinex (MSR) color enhancement methodology [30] with the Yolov5 algorithm to improve vehicle recognition. Vehicles are tracked once they have been detected to extract features such as vehicle trajectories. Finally, vehicle trajectories are aggregated with Yolov5 to attain the different vehicles movements.

2.1 Dataset

Three datasets are used with augmentation to improve aerial view vehicle detection. The proposed method is for improving foggy, sandstorm (dust), and snowy road images.

2.1.1 Data collection

The dataset was collected from different resources with varied features. The datasets contain sequences of images for each scene. Due to the limited resources of the Colab training platform, we took several images from each scene and annotated them carefully. We then increased the images by augmentation, as shown in Table 1.

Table 1

Number of objects before and after augmentation

Class	Before augmentation	After augmentation
Car	29,368	47,236
Bus	1,249	2,099
Minibus	1,903	3,690
total	32,520	53,025

Aerial view vehicle videos in bad weather are rarely found to be tracked, so we collect images of aerial vehicles’ views from many datasets and apply Yolov5 (you only look once), which can be trained on images or videos and tested on them. We test on images and videos of bad weather.

The datasets are: (1) images from the UAV-benchmark-S dataset [31], where the dataset is captured by UAVs in a variety of complex scenarios; (2) the PSU aerial-car-dataset: This dataset was generated from images captured by a UAV flying above the Prince Sultan University campus [32]. (3) The Stanford dataset: This is a dataset of aerial images of a university campus [33]. The collected training data contains different flying altitudes referring to the flying heights of UAVs (high, low, medium) with different camera views (side, front, bird-view), see Fig. 1. The approximate height of the UAV flight is ranged from less than 30 m to more than 70 m.

After collecting all the images, they were annotated. The annotation format is (Xcenter, Ycenter, classId, height, width). After that, resize the images to 640 × 640.

2.1.2 Data augmentation

The number of images and objects is insufficient to build a robust model. As a result, one of the most important methods for enhancing performance is data augmentation, which aims to improve variance in the training data. We use augmentation techniques like flip, rotate, blur and mosaic. The dataset has a total of 2418 images with 70% training (1692) and 30% validation (726) and has been enhanced to 5709 total augmented images with 4983 training, 726 validation images where the augmentation technique is only applied on the training set. The vehicle category includes bus, car, and minibus. The number of objects shown in Table 1.

The augmentation parameters are as follows:

Flip horizontal and flip vertical: to assist the model in becoming indifferent to the subject's orientation.
Rotation by 45 and − 45 degrees: to help the model be more camera roll resistant.
Blur 1.25px: add random Gaussian blur to help your model be more resilient to camera focus.
90-degree rotation (clockwise, counter-clockwise, and upside down): to help the model be insensitive to camera orientation.
Mosaic: it is a new data augmentation method that combines four training images. As a result, four separate contexts are blended. This enables the detection of items that are not in their typical environment.

In the augmentation method, we did not add any brightness, saturation, or hue, so the model was not affected by other color changes than Retinex enhancement.

2.1.3 Testing data

Testing data (videos/images) was collected from the UAVDT [31] and also from searching Google images for images that contain special weather conditions that affect the appearance and representation of vehicles. It consists of fog, sandstorm (dust), snow, and a night scene with dim streetlamp lighting that provides little texture information. Meanwhile, frames captured in fog and bad weather lack sharp details, causing object contours to vanish in the background and making it hard for Yolov5 to detect objects.

2.2 Data pre‑processing

The primary goal of this approach is to improve aerial view vehicle detection and tracking. The dataset was enhanced with Retinex color enhancement methods. We had three separate datasets for training and validation by the Yolov5 technique (Original dataset and two Retinex datasets), as shown in Fig. 2. The flowchart of tracking methods is also shown. First, the images were enhanced with Retinex MSRCR and MSRCP techniques. Second, the objects in the Original and enhanced images were processed and detected with Yolov5. Lastly, we applied vehicle tracking techniques and calculated relative movement as described in the next sections:

2.2.1 Retinex enhancement of image

Improving the videos/images in the system is beneficial because it explains unclear roads. To accomplish this, we employ the MSR algorithm (multi-scale Retinex) [30]. The Retinex idea was first proposed by Land and McCann [34]. Retinex is an image enhancement method founded on a model of the human perception of light and color. Image sharpening and color consistency improvement, as well as dynamic range compression are possible through this process. Retinex filtering is based on Land's image perception theory, which was proposed to explain the perceived color constancy of objects under varying lighting conditions. There are several approaches to implementing the Retinex principles, the multiscale Retinex is one of them, and it's utilized to bridge the gap between color images and human scene observation [35]. The steps for this technique are as follows. First, the image is processed by the Single-Scale Retinex (SSR), which subtracts the logarithm of the image from its Gaussian Filter. The first step is depicted in the equation below.

$$Y\left(x,f\right)=\mathrm{log}\left(I\left(x,f\right)\right)-\mathrm{log}(F(x,f,\propto )\times I(x,f))$$

(1)

The initial image is defined by $I(x,f)$, and the image of Gaussian Filter is defined by $F(x,f,\propto )$. The second and final step is for the image to be sent to Multi-Scale Retinex ($\mathrm{MSR}$), which produces better image enhancing results. The following equation depicts this stage as well.

$$Y\left(\mathrm{MSR}\right)=\sum_{n=1}^{N}(\mathrm{log}\left(I\left(x,f)\right)-\mathrm{log}({F}_{n}(x,f,\propto )\times I(x,f))\right)$$

(2)

$Y\left(\mathrm{MSR}\right)$ represents the improved image, $n$ represents the scale values in the $\mathrm{MSR}$.

$N$ is the number of scales.

${F}_{n}$ represents the nth scale.

MSRCR stands for multiscale Retinex with color restoration algorithm, which applies MSR to each spectral channel has proved to be a very flexible automated image enhancing system that combines color constancy and dynamic range compression with local contrast enhancement to render images in a manner similar to how human vision is thought to operate. MSRCR generates images with much better contrast and sharper colors. There is also the (MSRCP) algorithm, another version of MSR that is applied to the intensity channel without color restoration and is known as MSR with chromaticity preservation (MSRCP) [30].

2.3 Processing with Yolov5

Following image enhancement, the first step is to detect a vehicle using the Yolov5 algorithm. The vehicle is then tracked to finally combine vehicle trajectories with Yolov5 to obtain each detected vehicle movement.

2.3.1 Yolov5: deep learning object detection model

Yolov5 is the first Yolo model designed with the PyTorch framework, and it is more lightweight and simpler to use with quick speed than prior Yolo models. For real-time object detection, Yolov5 is built on a smart CNN. The image is divided into regions by this algorithm, and the bounding boxes and probabilities for each area are calculated. The bounding boxes are weighted based on the predicted probabilities. The approach only needs one forward propagation run through the neural network to produce predictions, thus it “only looks once” at the image. It outputs known objects together with bounding boxes after a non-max suppression (which assures that the object detection algorithm only detects each object once). Yolov5 comes in four versions: Yolov5s, Yolov5m, Yolov5l, and Yolov5x. They can be found at [36]. The current implementation employs the smallest model, Yolov5s, and the next-largest model, Yolov5m. The performance of the network may increase as it expands in size, but at the cost of longer processing times [37]. We exclude the largest models due to Colab’s time and space constraints. The data was divided into two sets: training, and validation. Then collected testing data was added. The images were enhanced using image enhancement techniques (MSRCR and MSRCP) and then converted to the Yolov5 PyTorch format. After that, the data were stored in a data.yaml file, which defines the classes that will be used to train, validate, and test the model. The models were then trained using a Google Colab virtual machine. The models were trained for a total of 200 epochs.

2.3.2 Performance evaluation of Yolov5 detected objects

It is a difficult challenge to evaluate the effectiveness of an object detector since a bounding box must be drawn around each identified object in the image. Precision, recall, and mAP are defined in Eqs. (3) through (5). These equations define some of the most used metrics for evaluating detection performance, precision, recall, and mAP.

$${\text{Percision}} = \frac{{{\text{TPs}}}}{{{\text{TPs}} + {\text{FPs}}}} = \frac{{{\text{TPs}}}}{{\text{Number of detections}}}$$

(3)

$${\text{Recall}} = \frac{{{\text{TPs}}}}{{{\text{TPs}} + {\text{FNs}}}} = \frac{{{\text{TPs}}}}{{\text{Number of objects}}}$$

(4)

$${\text{mAP}} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} {\text{AP}}_{i}$$

(5)

A true positive (TP) is an accurate detection of an actual object in an image. A false positive (FP) is an incorrect object detection, that occurs when the model marks an object in the image that does not exist. An object that is visible in the image but is not recognized by the model is referred to as a false negative (FN) where AP and mAP are the average and mean average precisions; $N$ is the number of classes and ${\mathrm{AP}}_{i}$ is the AP value of the related class ($i$).

The intersection over union (IoU) method in object detection calculates the overlap region between the predicted bounding box and the ground truth bounding box of the real item. When the IoU is compared to a certain threshold, detection is categorized as correct or wrong.

2.4 Vehicle tracking methods

Object tracking is a critical task in computer vision. There are three crucial processes in video analysis: detecting interesting moving objects; tracking such objects from frame to frame; and evaluating object tracks to recognize their behaviors. The complexity of object tracking is caused by image noise, changes in scene illumination, complex object motion, and partial and complete object occlusion [38]. Several techniques for tracking multiple objects have been proposed. In this section, we describe three techniques: (Kalman Filtering, Optical Flow, and Euclidean Distance) Trajectory Estimation.

2.4.1 Tracking using Kalman filters

Linear quadratic estimation, also known as the Kalman filter is a set of mathematical equations that, by minimizing the mean squared error, provide an effective computing method for determining the state of a process given its prior state. Prediction and correction are the two steps in the Kalman filter [39]. This filter can estimate future time states even when the nature of the modeled system is unclear. The filter initially calculates the current state variables with their uncertainties in the prediction stage. These estimates are updated after receiving weighted average estimates for the following measurement [40]. Estimates with a higher level of certainty are given more weight. The filter's technique is recursive; thus, it may operate in real time using only the current input measurements and the previously established state and associated uncertainty matrix. These filters are based on linear operators that are distorted by Gaussian noise-induced mistakes. The Kalman filter uses feedback control to estimate process parameters. At some point, the filter estimates the process state and then gets feedback in the form of noisy measurements.

2.4.2 Tracking using optical flow

Optical flow is the movement of objects between consecutive frames of a sequence induced by the relative movement of the object and camera. Chen et al. [41] propose a new optical flow-based approach for tracking any moving object. In complicated scenes, tracking an object's contour is always tough. To begin, they employ a method to obtain the velocity vector. The object contour is then obtained by calculating the location of moving pixels between frames. Lastly, they use the location values to calculate the object's location and speed.

Optical flow techniques are classified into two types: sparse optical flow techniques and dense optical flow techniques [42]. Sparse Optical flow technique provides flow vectors of some "interesting features" within the image and only need to process a subset of the pixels in the image, which is used on paper, whereas dense techniques process all pixels. Dense optical flow technique provides the flow at all points in the frame and are slower but more accurate; sparse accuracy may be sufficient for real-time applications.

2.4.3 Tracking using Euclidean distance

The Euclidean Distance method determines the shortest distance between two points. In the Euclidean technique, unlike Kalman and Optical flow, no prediction is made. We can only create a trajectory between two points by using the Euclidean method to obtain the shortest distance. The Euclidean approach is used to find the new positions of points in this procedure. The points detected by Yolo are originally stored in a 2D-array. The next positions of these points are determined after a specified number of frames. The array is updated by using the Euclidean Method to discover the shortest distance between old and new points, and a trajectory is drawn between them.

3 Results and discussion

This system aims to reduce traffic officers’ costs associated with manual tasks and to assist them in finding a solution to labor- and time-intensive tasks such as manually evaluating vehicle paths, so that they can focus on other safety-related traffic solutions. We describe a vehicle monitoring system for difficult weather conditions (fog, sandstorm (dust), and snow) based on a combination of algorithms for detecting vehicle count and trajectories. To begin, we enhance road scene images using the Multi-scale Retinex algorithm, making subsequent steps easier. Then, to detect vehicles, we use the Yolov5 trained model, which was trained using our own dataset. Finally, vehicle trajectories are obtained by merging the Yolov5 object detection and Kalman filter/Optical Flow/Euclidean Distance algorithms to track vehicle activities in each frame of the video. When capturing videos, the weather condition affects the appearance and representation of objects. Small and medium Yolov5 models are used for training. Training our model took 200 epochs with batch size 32, 16 GB RAM, and Google Colab was used to improve the performance. The Yolov5s is a small Yolov5 version that is the smallest and fastest model that was chosen for the proposed system's initial experiment. In the system's second experiment, the Yolov5m was employed, which is a medium Yolov5 version. There were 200 epochs in the training technique.

3.1 Performance of enhanced images with Yolov5 models

Original images are images before enhancement, but Msrcm and Msrcp are images after enhancements. We consider three separate datasets (Original, Msrcm, and Msrcp). Table 2 displays the results of metrics produced with model YOLOv5s for all classes. Table 3 shows the same metrics results for the second model, i.e., Yolov5m. The second column displays the number of known targets to be detected. The detector's precision, recall, and mean average precision are shown in the next columns with (Original, Msrcm, and Msrcp) datasets. As the tables show, Yolov5m performs better than Yolov5s, and the Msrcm enhancement method is much better than the Msrcp method. The metrics and validation losses of the models are shown in Fig. 3. Among Original, Msrcm, and Msrcp datasets, there are no major differences between the performance results of Msrcm and Original images on their datasets with an advantage for original dataset performance and they both obtain a higher mAP, precision and recall than Msrcp. Train and validation are done in clear weather conditions that is the reason that original dataset accuracy is slightly better than Msrcm dataset accuracy because Original is clear images with clear objects, this will be the opposite in case of bad weather images and videos. Comparing the performances as shown in Table 3 and Fig. 3 between the Yolov5s and Yolov5m models gives the advantage to the medium model (Yolov5m) because the larger models correspond to less loss and higher mAP and it was pretrained by a larger number of parameters. Figure 4 shows the mAp50:95 the performance it is the same as mAP50 performance, the original mAp is better than the msrcr performance.

Table 2

Yolov5s model performance on the dataset

Class	Precision			Recall			mAP 0.5
Class	Original	Msrcm	Msrcp	Original	Msrcm	Msrcp	Original	Msrcm	Msrcp
All	0.939	0.911	0.863	0.882	0.854	0.698	0.915	0.892	0.792
Bus	0.973	0.925	0.912	0.891	0.891	0.739	0.941	0.894	0.81
Car	0.956	0.947	0.913	0.937	0.914	0.825	0.964	0.946	0.901
Minibus	0.889	0.86	0.764	0.816	0.755	0.53	0.838	0.837	0.665

Table 3

Yolov5m model performance on the dataset

Class	Precision			Recall			mAP 0.5
Class	Original	Msrcm	Msrcp	Original	Msrcm	Msrcp	Original	Msrcm	Msrcp
All	0.933	0.914	0.902	0.898	0.888	0.786	0.922	0.911	0.857
Bus	0.977	0.949	0.959	0.935	0.891	0.783	0.954	0.947	0.869
Car	0.96	0.956	0.928	0.943	0.935	0.84	0.966	0.964	0.921
minibus	0.863	0.837	0.818	0.816	0.836	0.734	0.846	0.82	0.781

The following figures show a visual comparison of detection over the different algorithms Yolov5s and Yolov5m for the (Original, Msrcm, and Msrcp) test set for fog, sandstorm (dust), and snow weather conditions. Figures 5 and 6 show fog and sandstorm weather conditions, while Fig. 7 and 8 show snow weather condition. The first row of these figures is a comparison of detections performed on the Yolov5s model, while the second row is a comparison of detections performed on the Yolov5m model. (i) is the original image; (ii) is the Msrcm enhanced image; and (iii) is the Msrcp enhanced image. Figures 5, 6, 7 and 8 show the number of detected objects in each class. It shows that Msrcm (ii) has good detection accuracy results and the highest number of detected vehicles than other dataset images, the Yolov5m model (2nd row) figures have better detection than the Yolov5s model. As a result, images enhanced with Msrcm and using Yolov5m Figs. 5, 6, 7 and 8 (ii) 2nd row have the best method combinations. Now, we test on videos, so another visual comparison of detection is performed on four testing videos as shown in Figs. 9 and 10. Figure 9 is for fog weather at daylight from a high altitude, and Fig. 10 is for fog weather at night. The Yolov5m is applied over (Original, Msrcm) for the same testing video. (i) is the original video, and (ii) is the enhanced Msrcm video. The figures show that Msrcm testing videos figures (ii) have better detection results than the original testing videos figures (i) for the Yolov5m model.

3.2 Relative movement calculation of objects in video

Velocity is used to determine how fast or slow a vehicle is going. Because measuring velocity necessitates the use of a camera that is fixed in one position and the knowledge of characteristics such as distance and time. We used a different approach to calculate the relative movement of objects, as mentioned in Fig. 2 by measuring how much a point moves within certain frames. We can establish weather an object is moving fast or slowly with respect to other objects in the frame.

In both Optical flow and Kalman, the movement is shown as a number. This number is the distance between the new points and the old points calculated after five frames. After every five frames, the distance between the new positions of the points and the old positions of the points is measured and divided by the frame count.

$$\mathrm{Movement}= \frac{\sqrt{{\left({X}_{pn}-{X}_{po}\right)}^{2}+{\left({Y}_{pn}-{Y}_{po}\right)}^{2}}}{\mathrm{Frame\, Count}}$$

(6)

${X}_{pn}, {X}_{po}=$ New Predicted Point by Optical Flow or Kalman, ${Y}_{pn}, {Y}_{po}=$ Old Predicted Point by Optical Flow or Kalman, and $\mathrm{Frame\, Count}=$ 5. It gives how much the number of pixels car moved in 5 frames. Having a large number, we can assume that the car is moving faster than the other detected cars as shown in Fig. 11.

In the next sections, we will describe in detail how we applied the trajectory part.

3.3 Optical flow tracking trajectories with Yolov5

The cars detected by Yolo in the first frame are stored in a 2D matrix where the rows represent the number of points, and the columns represent the central $\left( {x,y} \right)$ coordinates.

$$\left[ {\begin{array}{*{20}c} {x_{1} } & {y_{1} } \\ {x_{2} } & {y_{2} } \\ {x_{3} } & {y_{3} } \\ \end{array} } \right]$$

(7)

Stored Yolo detected Points in the First Frame in a 2D Array is shown in Eq. (7). These points are then fed into the Optical flow as old points, and in return, the Optical flow will predict new points for the next frame. This process is repeated consecutively until a new point is detected by Yolo. In that case, the new point is added to the Optical Flow point array. No correction is made in this process only the predicted trajectory by Optical flow is displayed as shown in Fig. 12b, d. where the trajectory lines appear as straight lines. The new points are added by searching along the x-axis and y-axis. A point gap of 30px is chosen to accommodate the difference between optical flow predicted points and Yolo detected points. If any Yolo point is discovered during the search that is not in the Optical flow predicted points array, then that point is vertically stacked in the optical flow points array.

3.4 Corrected optical flow tracking trajectories with Yolov5

After every 6th frame, the position of Optical flow points is re-adjusted to be equal to Yolo points. In other words, the error is set to 0 after every 6th frame. New points are added by searching in the same way as before. Deletion of points is also implemented in the Corrected Optical flow. If a point is discovered during the search that has not been repeated within 35 frames, it is deleted from the optical flow point array. Figure 11a, c. The trajectory lines appear as zigzag lines due to re-adjustment and smoothing.

3.5 Kalman tracking trajectories with Yolov5

The Kalman prediction is in the form of the mean and variance of the Gaussian Distribution. Where the meaning is that the value we want to measure, and the variance is the confidence level. The Kalman state variables are the values we want to measure based on the values we are receiving. This is the state matrix:

$$x=\left[\begin{array}{c}\begin{array}{c}x\\ y\\ \dot{x}\end{array}\\ \dot{y}\end{array}\right]$$

(8)

Position and velocity in the x and y coordinates make up the state matrix. $x,y$ is the position of the vehicles to be predicted by the Kalman filter and is initialized to zero. $\dot{x},\dot{y}$ is the video frame rate.

3.5.1 Error covariance matrix P

This matrix changes during the filter processing. The covariance matrix controls how fast the filter converges to the correct measured values. It is initialized based on sensor (which in our case is Yolo) accuracy. If the sensor (Yolo) is very accurate, then small values should be used here otherwise large values should be used here.

$${\text{P}} = \left[ {\begin{array}{*{20}c} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{array} } \right]$$

(9)

3.5.2 Transition matrix A

The core of the filter is this transition matrix. It is a dynamic matrix. It depends upon the video frames per second, and it is assumed to be constant during filter calculations.

$${\text{A}} = \left[ {\begin{array}{*{20}c} 1 & 0 & {{\text{d}}t} & 0 \\ 0 & 1 & 0 & {{\text{d}}t} \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{array} } \right]$$

(10)

dt = Frame per second

The dot product of the state matrix and transition matrix gives us the predicted value. The State matrix prediction step:

$$\left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} x \\ y \\ {\dot{x}} \\ \end{array} } \\ {\dot{y}} \\ \end{array} } \right]_{t + 1} = \left[ {\begin{array}{*{20}c} 1 & 0 & {{\text{d}}t} & 0 \\ 0 & 1 & 0 & {{\text{d}}t} \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{array} } \right].\left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} x \\ y \\ {\dot{x}} \\ \end{array} } \\ {\dot{y}} \\ \end{array} } \right]_{t}$$

(11)

3.5.3 Process noise covariance matrix Q

This matrix tells us about the filter and how the system state can jump from one step to the next. This matrix will introduce noise into the system that comes from different physical conditions or parameters, such as changing camera fps. The matrix is a co-variance matrix containing the following elements [47]. Observations are made by taking the values in the process Noise Co-variance matrix to be equal to:

$${\text{Q }} = \left[ {\begin{array}{*{20}c} {\sigma_{x}^{2} } & {\sigma_{xy} } & {\sigma_{{x\dot{x}}} } & {\sigma_{{x\dot{y}}} } \\ {\sigma_{yx} } & {\sigma_{y}^{2} } & {\sigma_{{y\dot{x}}} } & {\sigma_{{y\dot{y}}} } \\ {\sigma_{{\dot{x}x}} } & {\sigma_{{\dot{x}y}} } & {\sigma_{{\dot{x}}}^{2} } & {\sigma_{{\dot{x}\dot{y}}} } \\ {\sigma_{{\dot{y}x}} } & {\sigma_{{\dot{y}y}} } & {\sigma_{{\dot{y}\dot{x}}} } & {\sigma_{{\dot{y}}}^{2} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {1^{e - 5} } & 0 & 0 & 0 \\ 0 & {1^{e - 5} } & 0 & 0 \\ 0 & 0 & {0.009} & 0 \\ 0 & 0 & 0 & {0.009} \\ \end{array} } \right]$$

(12)

3.5.4 Measurement noise co-variance matrix R

This matrix represents measurement uncertainty. The sensor (Yolov5) is accurate and small values should be used here.

$$R=\left[\begin{array}{cc}{1}^{e-1}& 0\\ 0& {1}^{e-1}\end{array}\right]$$

(13)

The Kalman Object detection is implemented by adding the Yolov5 points to the Kalman point array. In normal Kalman detection, initially all the Yolo detected points first 10 values are fed into the Kalman algorithm. Then predictions are taken. Any new point detected by Yolo in the video is added to the Kalman point array, and the first 10 values of that point are taken from Yolo before taking predictions from Kalman. Kalman trajectories are shown in Fig. 13b, and d. where the trajectory lines appear as straight lines.

3.6 Corrected Kalman tracking trajectories with Yolov5

This error is made 0 at every 6 frames. The positions of points in the Kalman array are made equal to the Yolo detected points. New points are searched and added on every 6th frame. And the points that are not repeated within 25 frames are deleted. By this deletion, we remove the points which are falsely detected by Yolo or the points which are no longer present in the video, i.e., cars going out of the frame. Corrected Kalman trajectories are shown in Fig. 13a, c.

3.7 Error calculation for tracking trajectories methods

The error graph is plotted using the predictions made by the Optical flow and Kalman methods. The Euclidean distance of these predicted values is taken with Yolo’s detected point.

Error is measured after 10 frames in both the Optical flow and Kalman methods. The summation of the Euclidean norm of all the points predicted by Optical flow and Kalman is taken with the Yolo detected points in that 10th frame. To keep the number smaller, the error value is divided by the frame count which is 10 in our case.

$$\mathrm{Error}= \sum \left(\sqrt{{\left({X}_{\mathrm{p}}-{X}_{\mathrm{y}}\right)}^{2}+{\left({Y}_{\mathrm{p}}-{Y}_{\mathrm{y}}\right)}^{2}}\right)/\mathrm{Frame\, Count}$$

(14)

${X}_{\mathrm{p}}, {Y}_{\mathrm{p}}=$ Predicted point by Optical flow or Kalman, ${X}_{\mathrm{y}}, {Y}_{\mathrm{y}}=$ Yolo Detected Point, $\mathrm{Frame\, Count}=10$.

As seen in Fig. 14a, the (Optical Flow) method works better in an (enhanced weather video) where the error remains less than 10 (red and green lines). But both the figures (enhanced and bad) weather Fig. 14a, b the (Corrected Optical Flow) method has less error, and it remains almost equal to 5 (green line in (a), (b) of Fig. 14). This error is calculated by taking the norm of the predicted points and the Yolo detected points and dividing them by the frame count, which is equal to 10. The green graph has a distance error of 5 pixels from the actual value (Yolo). It is clear show in the line graph, that the (Corrected Kalman) method (blue line) works well in an enhanced video where the error remains less than 15 in Fig. 14a.

Overall, the best method for drawing trajectories that have a minimum error is the Corrected Optical flow method (green line), followed by the Optical flow method (red line) for both enhanced Fig. 14a and bad weather in Fig. 14b.

As we see in Table 4 the average error calculation for 40 frames of Fig. 14, the least errors are for Corrected Optical Flow (green) plot in Fig. 14 with 1.53 for enhanced weather and 1.76 for bad weather. The enhanced weather has the least error because the trajectories are more accurate than the bad weather.

Table 4

Average Error Calculation for trajectory drawing for frames of Fig. 14

Method	Enhanced weather Average error	Bad weather Average error
Kalman error	13.24	13.34
Kalman corrected error	5.48	8.43
Optical flow error	4.26	6.82
Corrected optical flow error	1.53	1.76

3.8 Tracking trajectories with Euclidean distance

Using the Euclidean Distance tracking vehicle algorithm, each identified vehicle center point in each frame is sent to an array, which holds the coordinates of the previous frame. The distance between the vehicle’s current coordinates and the previous frame coordinates is then determined (Euclidean distance). The identified vehicle either gets a new id or keeps the one from the previous frame. Yolov5 finds the objects in the image. Each object's coordinates will be returned. Using the previous frame, we built object tracking. Because the frame before is a referred frame. The object coordinates from a previous frame were recorded and compared to the current frame. The straight-line distance between two locations in Euclidean space is known as the “Euclidean distance” or “Euclidean metric.” Calculate the distance between the current frame and the reference frame for each object using the Euclidean distance. This is the same object if the distance between two object values is less than 50 pixels; otherwise, create a new label and apply it as shown in Fig. 15. A comparative trajectory line between enhanced and bad weather videos is shown in Fig. 16 where enhanced video has better and longest trajectories due to the higher number and stability of detected vehicles.

Figure 13a, c depicts trajectory lines. (Corrected Optical Flow) trajectories lines of (Enhanced weather) videos have the least error, with smooth, not scattered lines, and give better tracking for detecting vehicles.

As a result, in bad weather; enhanced Msrcm images and videos have improved object detection and trajectory lines, particularly when using the [Msrcm, Yolov5m model, and (Corrected Optical Flow) tracking trajectories] the error is less than 5 pixels from the Yolo actual value.

Paper	Year	mAP%	Method used
Proposed	2022	92.2	YOLOv5m Original
Proposed	2022	91.1	YOLOv5m Msrcm
Humayun [25]	2022	89	Yolov4
Punagin [26]	2022	72.4	YOLOv2
Hnewa [27]	2020	52.62	YOLO-V3
Hnewa [27]	2020	44.45	Faster R-CNN
Ajinkya [28]	2021	46.6	Yolo
Uzar [29]	2021	84	YOLOv5m aerial view

4 Conclusions

The current work aims to develop a deep learning neural network capable of detecting, counting, and tracking vehicles from aerial video streams for enhancing self-driving vehicles’ recognition rate in bad weather, vehicle surveillance, traffic monitoring, and tracking in bad weather conditions by combining the Retinex color enhancement algorithm and the Yolov5 object detection algorithm in a new manner. Three separate datasets were collected to train the model. Two Yolov5 versions were also examined, and it was concluded that enhanced Msrcm Yolov5m is better in performance than enhanced Msrcm Yolov5s for the intended detection problem. The Yolov5m with enhanced Msrcm retinex method proved the ability to recognize aerial view vehicle types correctly with the largest count in test images and videos. Corrected Optical Flow trajectories with enhanced weather videos have the least error and give better tracking for detected vehicles. These combinations enable the best vehicle detection and trajectory drawing. All the tests and results suggest that the proposed technique is reliable enough to be used in tough weather conditions like fog, sandstorm (dust), and snow. The significant improvements make it possible for the model to perform well in real-world road and traffic applications. Overall, the research presented in this paper has increased self-driving cars' recognition abilities in adverse weather. In future, we will run the system online, expand the detection target types, extract more features from the video frames to get better detection and accuracy, and increase the dataset with more training data.

Declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Ethical approval

Yes.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

previous article A new interval constructed belief rule base with rule reliability

next article Dynamic stochastic game-based security of edge computing based on blockchain

Albaba BM, Ozer S (2021) SyNet: An ensemble network for object detection in UAV images. In 2020 25th International Conference on Pattern Recognition (ICPR) (pp. 10227–10234). IEEE

Sahin O, Ozer S (2021) Yolodrone: Improved yolo architecture for object detection in drone images. In 2021 44th International Conference on Telecommunications and Signal Processing (TSP) (pp. 361–365). IEEE

Benjdira B, Khursheed T, Koubaa A, Ammar A, Ouni K (2019) Car detection using unmanned aerial vehicles: Comparison between faster r-cnn and yolov3. In 2019 1st International Conference on Unmanned Vehicle Systems-Oman (UVS) (pp. 1–6). IEEE

Koubâa A, Qureshi B (2018) Dronetrack: Cloud-based real-time object tracking using unmanned aerial vehicles over the internet. IEEE Access 6:13810–13824CrossRef

Alotaibi ET, Alqefari SS, Koubaa A (2019) Lsar: Multi-uav collaboration for search and rescue missions. IEEE Access 7:55817–55832CrossRef

Xi X, Yu Z, Zhan Z, Yin Y, Tian C (2019) Multi-task cost-sensitive-convolutional neural network for car detection. IEEE Access 7:98061–98068CrossRef

Chen X, Xiang S, Liu CL, Pan CH (2014) Vehicle detection in satellite images by hybrid deep convolutional neural networks. IEEE Geosci remote sens lett 11(10):1797–1801

Ammour N, Alhichri H, Bazi Y, Benjdira B, Alajlan N, Zuair M (2017) Deep learning approach for car detection in UAV imagery. Remote sens 9(4):312CrossRef

Farhadi A, Redmon J (2018) Yolov3: An incremental improvement. In Computer vision and pattern recognition (Vol. 1804, pp. 1–6). Berlin/Heidelberg, Germany: Springer

10.

Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 779–788)

11.

Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7263–7271)

12.

Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 580–587)

13.

Girshick R (2015) Fast r-cnn. In Proceedings of the IEEE international Conference on Computer Vision (pp. 1440–1448)

14.

Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031CrossRef

15.

Wang X (2022) Vehicle image detection method using deep learning in UAV video. Computational Intelligence and Neuroscience. Article ID 8202535

16.

Ševo I, Avramović A (2016) Convolutional neural network based automatic object detection on aerial images. IEEE geosci remote sens lett 13(5):740–744CrossRef

17.

Ochoa KS, Guo Z (2019) A framework for the management of agricultural resources with automated aerial imagery detection. Comput Electron Agric 162:53–69CrossRef

18.

Kampffmeyer M, Salberg AB, Jenssen R (2016) Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 1–9)

19.

Azimi SM, Fischer P, Körner M, Reinartz P (2018) Aerial LaneNet: Lane-marking semantic segmentation in aerial imagery using wavelet-enhanced cost-sensitive symmetric fully convolutional neural networks. IEEE Trans Geosci and Remote Sensi 57(5):2920–2938CrossRef

20.

Mou L, Zhu XX (2018) Vehicle instance segmentation from aerial image and video using a multitask learning residual fully convolutional network. IEEE Trans Geosci and Remote Sens 56(11):6699–6711CrossRef

21.

Benjdira B, Bazi Y, Koubaa A, Ouni K (2019) Unsupervised domain adaptation using generative adversarial networks for semantic segmentation of aerial images. Remote Sens 11(11):1369CrossRef

22.

Tra HTH, Trung HD, Trung NH (2022) YOLOv5 Based Deep Convolutional Neural Networks for Vehicle Recognition in Smart University Campus. In Hybrid Intelligent Systems: 21st International Conference on Hybrid Intelligent Systems (HIS 2021), December 14–16, 2021 (pp. 3-12). Cham: Springer International PublishingCrossRef

23.

Hardjono B, Tjahyadi H, Rhizma MG, Widjaja AE, Kondorura R, Halim AM (2018) Vehicle counting quantitative comparison using background subtraction, viola jones and deep learning methods. In 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON) (pp. 556–562). IEEE

24.

Tayara H, Soo KG, Chong KT (2017) Vehicle detection and counting in high-resolution aerial images using convolutional regression neural network. Ieee Access 6:2220–2230CrossRef

25.

Humayun M, Ashfaq F, Jhanjhi NZ, Alsadun MK (2022) Traffic management: Multi-scale vehicle detection in varying weather conditions using yolov4 and spatial pyramid pooling network. Electronics 11(17):2748CrossRef

26.

Punagin S, Iyer N (2022) Vehicle detection on unstructured roads based on Transfer learning

27.

Hnewa M, Radha H (2020) Object detection under rainy conditions for autonomous vehicles: a review of state-of-the-art and emerging techniques. IEEE Signal Process Mag 38(1):53–67CrossRef

28.

Marode A, Ambadkar A, Kale A, Mangrudkar T (2021) Car detection using yolo algorithm. Int Res J Modernization Eng Technol Sci 03(05)

29.

Uzar M et al (2021) Performance analysis of YOLO versions for automatic vehicle detection from UAV images. Adv Remote Sens 1(1):16–30

30.

Petro AB, Sbert C, Morel JM (2014) Multiscale retinex. Image Process On Line 71–88

31.

Dataset U (2018) https://paperswithcode.com/dataset/uavdt

32.

Aerial-car-dataset (2018) https://github.com/jekhor/aerial-cars-dataset

33.

Robicquet A, Sadeghian A, Alahi A, Savarese S (2016) Learning social etiquette: Human trajectory understanding in crowded scenes. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14 (pp. 549–565). Springer International Publishing

34.

Land EH, McCann JJ (1971) Lightness and retinex theory. Josa 61(1):1–11CrossRef

35.

Rahman ZU, Woodell GA, Jobson DJ (1997) A comparison of the multiscale retinex with other image enhancement techniques

36.

Jocher G, Stoken A, Borovec J, NanoCode012 AC, ChristopherSTAN, Changyu L, Laughing, tkianai, yxNONG, Hogan, A, lorenzomammana, AlexWang1900, Chaurasia, A, Diaconu, L, Marc, wanghaoyang0106, ml5ah, Doug, Durgesh, … Lijun Yu 于力军 (2021) Ultralytics/yolov5: v4.0 - nn.SiLU() activations, Weights & Biases logging, PyTorch Hub integration (v4.0). Zenodo

37.

Jocher G (2020) Available online: https://github.com/ultralytics/yolov5

38.

Shantaiya S, Kesari Verma, Kamal Mehta (2015) Multiple object tracking using Kalman filter and optical flow. Eur J Adv Eng Technol 2(2):34–39

39.

Bar-Shalom Y.a.F., T (1988) Tracking and data association. Academic Press Inc

40.

Costa EP (2014) Human tracking using the Kalman filter

41.

Chen Z, Cao J, Tang Y, Tang L (2011) Tracking of moving object based on optical flow detection. In Proceedings of 2011 International Conference on Computer Science and Network Technology (Vol. 2, pp. 1096–1099). IEEE

42.

Balasundaram A, Ashok Kumar S, Magesh Kumar S (2019) Optical flow based object movement tracking. Int J Eng Adv Technol 9(1):3913–3916CrossRef

43.

Fog (2021) https://stock.adobe.com/sk/search/images?k=%22foggy+street%22. "Foggy Street" IMAGES

44.

sandstorm (2021) https://ar.mehrnews.com/photo/1881137

45.

snowroad (2021) https://www.shutterstock.com/video/clip-1044802699-drone-top-down-car-drives-through-slippery. Car drives through a slippery snow

46.

snow (2021) https://www.alamy.com/aerial-top-view-of-snow-covered-cars-stand-in-the-parking-lot-on-a-winter-day-image231137354.html. Aerial top view of snow-covered cars stand in the parking lot on a winter day

47.

Srinivasan S (2018) The Kalman filter: an algorithm for making sense of fused sensor insight. https://towardsdatascience.com/kalman-filter-an-algorithm-for-making-sense-from-the-insights-of-various-sensors-fused-together-ddf67597f35e

Title: Enhanced aerial vehicle system techniques for detection and tracking in fog, sandstorm, and snow conditions
Authors: Amira Samy Talaat
Shaker El-Sappagh
Publication date: 21-04-2023
Publisher: Springer US
Published in: The Journal of Supercomputing / Issue 14/2023
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-023-05245-9

Springer Professional

Enhanced aerial vehicle system techniques for detection and tracking in fog, sandstorm, and snow conditions

Abstract

Publisher's Note

1 Introduction

2 Methodology

2.1 Dataset

2.1.1 Data collection

2.1.2 Data augmentation

2.1.3 Testing data

2.2 Data pre‑processing

2.2.1 Retinex enhancement of image

2.3 Processing with Yolov5

2.3.1 Yolov5: deep learning object detection model

2.3.2 Performance evaluation of Yolov5 detected objects

2.4 Vehicle tracking methods

2.4.1 Tracking using Kalman filters

2.4.2 Tracking using optical flow

2.4.3 Tracking using Euclidean distance

3 Results and discussion

3.1 Performance of enhanced images with Yolov5 models

3.2 Relative movement calculation of objects in video

3.3 Optical flow tracking trajectories with Yolov5

3.4 Corrected optical flow tracking trajectories with Yolov5

3.5 Kalman tracking trajectories with Yolov5

3.5.1 Error covariance matrix P

3.5.2 Transition matrix A

3.5.3 Process noise covariance matrix Q

3.5.4 Measurement noise co-variance matrix R

3.6 Corrected Kalman tracking trajectories with Yolov5

3.7 Error calculation for tracking trajectories methods

3.8 Tracking trajectories with Euclidean distance

4 Conclusions

Declarations

Conflict of interest

Ethical approval

Publisher's Note

Premium Partner

Springer Professional

Abstract

Publisher's Note

1 Introduction

2 Methodology

2.1 Dataset

2.1.1 Data collection

2.1.2 Data augmentation

2.1.3 Testing data

2.2 Data pre‑processing

2.2.1 Retinex enhancement of image

2.3 Processing with Yolov5

2.3.1 Yolov5: deep learning object detection model

2.3.2 Performance evaluation of Yolov5 detected objects

2.4 Vehicle tracking methods

2.4.1 Tracking using Kalman filters

2.4.2 Tracking using optical flow

2.4.3 Tracking using Euclidean distance

3 Results and discussion

3.1 Performance of enhanced images with Yolov5 models

3.2 Relative movement calculation of objects in video

3.3 Optical flow tracking trajectories with Yolov5

3.4 Corrected optical flow tracking trajectories with Yolov5

3.5 Kalman tracking trajectories with Yolov5

3.5.1 Error covariance matrix P

3.5.2 Transition matrix A

3.5.3 Process noise covariance matrix Q

3.5.4 Measurement noise co-variance matrix R

3.6 Corrected Kalman tracking trajectories with Yolov5

3.7 Error calculation for tracking trajectories methods

3.8 Tracking trajectories with Euclidean distance

4 Conclusions

Declarations

Conflict of interest

Ethical approval

Publisher's Note

Other articles of this Issue 14/2023

Multivariate outlier filtering for A-NFVLearn: an advanced deep VNF resource usage forecasting technique

A method based on k-shell decomposition to identify influential nodes in complex networks

Blockchain-enabled healthcare monitoring system for early Monkeypox detection

Classification of arithmetic mental task performances using EEG and ECG signals

A survey on automation approaches of smart contract generation

Evaluation of 3D LiDAR SLAM algorithms based on the KITTI dataset

Premium Partner