Skip to main content
Top
Published in: Complex & Intelligent Systems 5/2023

Open Access 10-03-2023 | Original Article

Fire and smoke precise detection method based on the attention mechanism and anchor-free mechanism

Authors: Yu Sun, Jian Feng

Published in: Complex & Intelligent Systems | Issue 5/2023

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Substantial natural environmental damage and economic losses are caused by fire. For this problem, automatic fire-smoke detection and identification are needed. Fire-smoke detection methods based on vision still suffer from significant challenges that fail to balance model complexity and accuracy. We propose an improved YOLOv3 fire-smoke detection and identification method to address these problems and include a fire and smoke dataset. The neck module (1) adds an attention mechanism to enhance the ability to extract features from pictures, and (2) uses an anchor-free mechanism in the anchor box mechanism to solve the problem of significant variances in smoke texture, shape, and color in real applications, and (3) uses a lightweight backbone to reduce the model complexity. The proposed dataset is based on VOC, which contains images of complex scenes and high diversity. The dataset includes pictures that (1) combine fire with smoke, (2) only have smoke or fire objects, and (3) contain a single cloud object. The experimental results demonstrate that the method achieves 50.8 AP, which outperforms the suboptimal method by 3.8. Moreover, the inference speed of our method is 13% faster on the GPU than the suboptimal method.
Notes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

Approximately 250 thousand fires occur in China every year, resulting in considerable economic losses and environmental damage. Approximately 2 thousand people have suffered fire and resulting economic losses of approximately 40 billion. It is well known that the early stages of fire produce smoke and flame. The flame is difficult to detect, because the burn is slow in early fires. Therefore, smoke detection is more important for extinguishing fires. To prevent fire spread from causing larger losses, rapid and accurate fire detection is crucial.
In the past few years, numerous studies have been conducted on contact sensors for fire-smoke detection, including smoke sensors, temperature sensors, and particle sensors. However, fire-smoke detection based on sensors has apparent disadvantages. The sensor-based fire-smoke detection method works well in indoor spaces but is unsuitable for larger open scenes. Vision-based methods are faster and more suitable for vast space and robustness compared with sensor-based methods. Therefore, vision-based method has drawn increasing attention.
Traditional vision-based fire-smoke detection methods [5, 14, 32] usually rely on manual feature extraction from images [29]. However, designing a corporate feature is very difficult, because (1) designing an effective feature requires specialized knowledge, (2) the features of flame and smoke are notably unstable, and (3) different inflammables produce different flames and smoke. For traditional vision-based methods, it is difficult to obtain high accuracy and robustness.
In recent years, CNN-based methods, including image classification [13], object detection [11], and image enhancement, have achieved success in computer vision. To solve the difficulty of extracting flame and smoke features, several CNN-based methods [22, 23] have recently been proposed. Compared with traditional vision-based methods, CNN-based methods can automatically extract features from images. However, these methods still have some problems in special scenes, as shown in Fig. 1. Additionally, the rapid nature of the model cannot satisfy the real-time requirement.
In this paper, we propose an outdoor fire-smoke detection method based on a channel attention mechanism and an anchor-free mechanism. In particular, to enhance the discriminating power of fire-like and smoke-like objects, we presume that the channel attention mechanism [35] selectively emphasizes the contribution between different feature maps. Then, we change the anchor box into anchor free to enhance the generalization ability of the model for unstable flame and smoke shapes. Moreover, being anchor-free can dramatically decrease the detection head complexity.
The main contributions of this paper can be summarized as follows:
(1)
We propose an outdoor fire-smoke detection method based on an attention mechanism and an anchor-free mechanism to overcome the problem of similar-object detection. Finally, we propose a lightweight backbone to reduce the model complexity. The method has obvious improvements over the original method and has advantages over state-of-the-art methods in terms of detection performance. Our method achieves 50.7 AP, which outperforms the sun-optimal method by 3.8. The parameters of our method are 48.8 M, reducing 12.7 M compared to the baseline. For a given detection performance level, the inference speed of our method is 73.0 FPS which achieves the best speed of these methods. The optimization strategies in this paper also have great reference value for similar detection.
 
(2)
We establish a fire-smoke detection dataset of 16,714 images. All images are labeled by annotation platform labeling. The dataset comprises different fire-smoke images in normal and difficult scenes and many fire-like and smoke-like images, such as the sun, lighting, cloud, and steam. In this field of fire-smoke detection, an effective evaluation standard is absent, similar to the case in which VOC and COCO are benchmarks of vision-based object detection, and it is difficult to evaluate the performance of different methods. Our dataset will contribute to vision-based fire-smoke detection research. https://​drive.​google.​com/​file/​d/​1kWjy1msWm3DiHMs​_​MGE1KFU0lMmQ0Tsl​/​view?​usp=​sharing.
 
The paper has been organized as follows. Section “Related works” describes the details of the related work. Section “Proposed methods” is concerned with the methodology used for this study. Section “Experiment and discussion” analyzes the results of the method. Section “Conclusion” summarizes this paper and discusses further work.
Existing fire detection methods can be categorized into conventional and deep-learning-based methods. Traditional fire detection methods rely on handcrafted feature extraction, such as color, shape, texture, and motion features. The method [5] utilized different color models to discriminate between fire and nonfire pixels. Moreover, T Celike et al. [4] proposed a generic fire detection model for fire color in the YCbCr color space. The method of using color information and wavelet transform coefficients to detect smoke in open space was proposed in [3]. To exploit the shape information, [7] proposed an improved smoke detection method with RGB contrast images and shape constraints. Pasquale Foggia et al. [9] presented an approach using YUC instead of RGB color space and obtained a better result.
However, using a single feature results in a high false alarm rate. To avoid the problem of individual information leading to a high false rate, [24] explored static characteristics, color and shape, and dynamic characteristics such as smoke and fire disorder. However, the false-negative rate remains high, because there are fire-like objects in the images. Töreyin et al. [34] utilized the hidden Markov model to discriminate between real fires and fire-like objects. The spatiotemporal domain wavelet transform was used to analyze the dynamic behavior of fire in [33]. Nevertheless, too many thresholds reduce the generalization of the model.
Heuristic rule-based methods still have high false alarm rates. Some researchers use statistical learning methods to avoid the effect of personal experience in detection. For example, Zhao et al. [42] proposed a method based on texture analysis and used the SVM to classify this texture. Moreover, [39] proposed a novel dynamic texture descriptor based on a surfacelet transform and hidden Markov tree mode. Then, the texture was used to train the SVM classifier. In [15], they utilized the SVM to classify the covariance-based features that were extracted from the spatial–temporal blocks. Chen et al. [6] and Han et al. [12] investigated the Gaussian mixture model (GMM) to detect fire. In Ref. [1], the paper proposed a local binary co-occurrence patterns (RGB-LBCoP), utilized fuzzy C-Means (FCM) to extract the features from smoke regions, and used the support vector machine (SVM) for classification based on these features. To solve the problem of distinguishing fire and fire-like objects, [40] utilized a BP network based on the dynamic characteristics of fire to detect fire.
The conventional methods rely on handcrafted features and require rich experience with this file. At other sites, these methods have difficulty maintaining a low false alarm rate. With the rapid development of deep learning, CNN-based methods have been explored for fire and smoke detection. For example, in [10] and [20], LeNet-5 was used to classify the images into fire and nonfire. These methods achieved higher accuracy than traditional methods. Muhammad et al. [21] proposed an early fire detection method using a CNN structure. They modified the AlexNet model’s classification from 1000 categories to two categories. However, the model size is comparatively heavy, which restricts its application. Thus, they introduced GoogLeNet [22], SquezeNet [21] and MobileNet [23] to detect fire, which has lower complexity and better performance. Sharam et al. compared ResNet50 and VGG16 and found that ResNet50 performed better than VGG16 in fire detection. However, the dataset in this paper is too small (containing 651 images). The normal convolution structure has high performance for normal scenes. These methods cannot effectively distinguish between real fires and fire-like objects. To address this problem, [18] built a complicated large-scale dataset containing 25,000 fire images and 25,000 fire-like images and proposed a model named EFDNet.
Moreover, some researchers have expanded smoke and fire detection into object detection. Object detection methods distinguish between fire and nonfire scenes and can locate the fire object and nonfire object in the image. In Ref. [37], Wu et al. used YOLO, R-CNN, and SSD for forest fire detection and built a forest fire benchmark. Reference [17] used YOLO for fire detection, and Ref. [28] used YOLO for flame detection. However, these methods can only detect fire. Reference [16] used the mean square error (MSES) to obtain flames and smoke from a camera. Second, the flame and smoke areas were extracted using Faster R-CNN. However, the method does not satisfy the real-time requirement. To solve this problem, Aponara et al. [27] used YOLOv2 [25] as a real-time fire and smoke detection pipeline. In this model, the dataset was too small and failed to detect the like-fire object. Reference [38] built a dataset containing 10,581 images and presented a detection system based on ensemble learning. However, the method cannot distinguish between smoke and smoke-like objects.

Proposed methods

YOLOv3 has good performance for normal object detection [43]. YOLOv3 is more easily modified to meet the requirements of other application scenarios [36]. However, the original model cannot efficiently recognize real objects and likely objects, which leads to a high number of false alarms. To solve this problem, we use attention mechanism to enhance the capacity of feature extraction. Then, the original method uses the anchor mechanism that conducts a set of fixed-size anchors before training, which damages the model generation. The anchor mechanism has more complicated detection heads than the anchor-free mechanism. Therefore, we switch the anchor mechanism to an anchor-free mechanism.

Overview

As shown in Fig. 2, the fire and smoke detection network comprises a backbone network, a multi-scale feature extraction neck, and a detection head with an attention mechanism module. The input is the \(416 \times 416 \times 3\) RGB image. In the first step, we feed the image into the backbone to generate detailed features. To retain more information, the model divides the output into three branches, and the shapes of the output features of the three branches are \(13 \times 13 \times 1024\),\(26 \times 26 \times 512\) and \(52 \times 52 \times 256\). In the second step, the three features flow through three detection necks to generate the contextual information, as shown in the neck part of Fig. 2. In the neck, a feature pyramid structure is used to fuse different scale output features. At the start of every branch, we utilize the CBAM module to selectively enhance features. In the final step, the detection head includes the convolutional layer, batch normal layer, and ReLU activation. The detection head generates three feature maps containing classification prediction, bounding box prediction, and conference information. The shape of the three output features is reduced to the shape of the input image divided by 32, 16, and 8.

Attention mechanism

The detection of likely objects is a difficult task in fire and smoke detection. Since the likely objects have similar features in shape and color, the eventual extracted features are similar, which results in recognition error on two similar objects. To solve this problem, we utilize channel attention and spatial attention modules, as shown in Fig. 5, in the branches not the backbone. First, channel attention enhances channels representative targets and suppresses other channels. Then, the spatial attention can emphasize the position of the object. Therefore, the network will focus on to the area where the target is present to obtain more detailed information. This information contributes to distinguishing similar objects.

Channel attention module

As shown in Fig. 3, utilizing both average pooling and max pooling generates two descriptors: \({\textbf{F}}_{\textrm{avg}}^{\textrm{c}}\) and \({\textbf{F}}_{\textrm{max}}^{\textrm{c}}\), which represent the average pooling features and max pooling features, respectively. Then, both features are input into the shared network, which is composed of a multi-layer perceptron (MLP) with one hidden layer to produce the channel attention map \({\textbf{M}} _{\textrm{c}} \in {\textbf{R}}^{C \times 1 \times 1}\). Then, the output feature of the shared network is added, which can be obtained by
$$\begin{aligned} \begin{aligned} {\textbf{M}}_{{\textbf{c}}}({\textbf{F}})&=\sigma ({\text {M L P(A v g P o o l}}({\textbf{F}}))\\&\quad +{\text {M L P}}({\text {MaxPool}}({\textbf{F}})))\\&=\sigma \left( {\textbf{W}}_{{\textbf{1}}}\left( {\textbf{W}}_{{\textbf{0}}}\left( {\textbf{F}}_{\textbf{a v g}}^{{\textbf{c}}}\right) \right) +{\textbf{W}}_{{\textbf{1}}}\left( {\textbf{W}}_{{\textbf{0}}}\left( {\textbf{F}}_{\textbf{m a x}}^{{\textbf{c}}}\right) \right) \right) ; \end{aligned} \end{aligned}$$
(1)
\(\sigma \) denotes the sigmoid activation function, \(\delta \) denotes the ReLU activation function, and \({\textbf{W}}_{{\textbf{0}}}\) and \({\textbf{W}}_{{\textbf{1}}}\) are the weight matrices.

Spatial attention module

As illustrated in Fig. 4, using two pooling operations along the channel dimension generates two 2D maps: \(\textbf{F}_{\textrm{avg}}^{\textrm{s}} \in R ^{1 \times H \times W}\) and \({\textbf{F}} _{\max }^{\textrm{s}} \in R ^{1 \times H \times W}\). Avg and Max indicate avg-pooled and max-pooled across the channel. Then, those two maps are concatenated and convolved by a convolution layer to produce a 2D spatial attention map. The spatial attention can be expressed by
$$\begin{aligned} \begin{aligned} {\textbf{M}}_{{\textbf{s}}}({\textbf{F}})&=\sigma \left( f^{7 \times 7}([{\text {AvgPool}}({\textbf{F}}); {\text {MaxPool}}({\textbf{F}})])\right) \\&=\sigma \left( f^{7 \times 7}\left( \left[ {\textbf{F}}_{\text{ avg } }^{\textrm{s}}; {\textbf{F}}_{\text{ max } }^{\textrm{s}}\right] \right) \right) ; \end{aligned} \end{aligned}$$
(2)
\(\sigma \) denotes the sigmoid activation function, \(f^{7 \times 7}\) denotes the convolution layer with filter size \(7 \times 7\), and [...] denotes the concatenation operation. As illustrated in Fig. 5, we connect the spatial attention module behind the channel attention. Let a feature map \({\textbf{F}} \in {\textbf{R}}^{C \times H \times W}\) be the input, the channel attention module infers a 1D weight matrix \({\textbf{M}}_{c} \in R^{C \times 1 \times 1}\), and the spatial attention module infers a 2D weight matrix \({\textbf{M}}_{s} \in R^{1 \times H \times W}\). The overall computation can be expressed as follows:
$$\begin{aligned} \begin{aligned} {\textbf{F}}_{{\textbf{c}}}&= {\textbf{M}}_{{\textbf{c}}}({\textbf{F}}) \otimes {\textbf{F}} \\ {\textbf{F}}_{{\textbf{o}}}&= {\textbf{M}}_{{\textbf{s}}}({\textbf{F}}_{{\textbf{c}}}) \otimes {\textbf{F}}_{{\textbf{c}}}. \end{aligned} \end{aligned}$$
(3)
In the equation, \(\otimes \) represents the multiplication. In multiplication, the channel attention value is multiplied by the channel, and the spatial attention is multiplied by the spatial attention. \(F_{\text {o}}\) is the final refined output.

Anchor-free mechanism

The anchor mechanism has some problems with fire detection. First, to optimize detection performance, clustering analysis is used to determine a set of optimal anchors before training. However, the shapes of smoke and flame are variable. Those cluster anchors are less generalized to the detection. Second, the anchor mechanism increases the complexity of the detection head. However, fire detection requires high levels of real-time performance. For these reasons, we use the use of anchor-free mechanism instead of the anchor mechanism.
We replace the anchor mechanism with an anchor-free mechanism to decrease the model complexity and improve the generalization ability. We reduce the predictions of each grid cell from 3 to 1, as shown in Fig. 6. The prediction directly outputs four values: the predicted box’s height and width and offsets of the left-top corner of the grid. Then, we assign the prediction box center located in the ground truth as a positive sample.
To maintain the assigning rule of the baseline method, the above anchor-free strategy selects only one positive sample for each object, which means ignoring some high-quality predictions. However, high-quality prediction may also benefit the gradients, which may mitigate the extreme imbalance of positive/negative sampling during training. Therefore, using the ground-truth center point as a reference, a \(5 \times 5\) square is set up and the center points of the prediction boxes inside the square are assigned as the positive samples. As shown in Fig. 7, we used a sun picture as a sample, the sun is a ground truth, and the blue rectangles are prediction boxes. A high-quality label assignment strategy is important for training models to achieve better performance. The label assignment strategy is used on positive samples selected by the above rule. First, the strategy calculates the cost between each positive samples and \(gt_{i}\)(ground-truth), which can be calculated as
$$\begin{aligned} c_{ij} = L^{\text {cls}}_{ij} + \lambda L^{\text {reg}}_{ij}, \end{aligned}$$
(4)
where \(\lambda \) is a weight. \(L^{\text {cls}}_{ij}\) and \(L^{\text {reg}}_{ij}\) are of classification and regression losses, respectively, between ground truth \(gt_{i}\) and sample \(s_{j}\). Second, for \(gt_{i}\), the top k samples with the lowest cost are chosen as its positive samples. For \(gt_{i}\), we dynamically calculate K using the following equation:
https://static-content.springer.com/image/art%3A10.1007%2Fs40747-023-00999-4/MediaObjects/40747_2023_999_Equ5_HTML.png
(5)
The \(iou_{ij}\) are the IoU between \(gt_{i}\) and sample \(s_{j}\). Finally, those samples are assigned as positive, and the remaining samples are negative.

Lightweight backbone

Fire detection is a real-time visual task. However, the backbone of the baseline “DarkNet-53” is heavy for real-time tasks. To further reduce the complexity, we propose a new backbone to replace the original backbone.
We used a lightweight cross-stage module for the new backbone, as shown in Fig. 8. First, we remove the bottleneck structure. As shown in Table 1, the new structure has fewer parameters than the bottleneck structure. Then, the \(1 \times 1\) convolution layer is used to reduce the repeatability of gradients.
The backbone structure contains 51 layers, as described in Table 2. We use the \(3 \times 3\) convolution with a stride of 2 to increase the number of channels and decrease the size \(h \times w\). To maintain the original head structure, the backbone still outputs three features, 256 channels, 512 channels, and 1024 channels.

Experiment and discussion

Table 1
Comparison of the bottleneck and our block
Bottleneck
Our used
\(\frac{C}{2} \times 1 \times 1\) convolution
\(\frac{C}{2} \times 1 \times 1\) convolution
\(C \times 3 \times 3\) convolution
\(\frac{C}{2} \times 3 \times 3\)
Table 2
The lightweight backbone structure
Input
Operator
n
\(416 \times 416 \times 3\)
Conv2d
\(416 \times 416 \times 32\)
Conv2d
\(208 \times 208 \times 64\)
Cross-stage
1
\(208 \times 208 \times 64\)
Conv2d
\(104 \times 104 \times 128\)
Cross-stage
2
\(104 \times 204 \times 128\)
Conv2d
\(52 \times 52 \times 256\)
Cross-stage
8
\(52 \times 52 \times 256\)
Conv2d
\(26 \times 26 \times 512\)
Cross-stage
8
\(26 \times 26 \times 512\)
Conv2d
\(13 \times 13 \times 1024\)
Cross-stage
4
The n indicates the number of residual blocks in the cross-stage module
In this section, we first describe the dataset details, implementation, and evaluation metric. Then, we perform various ablation experiments to verify the performance. Finally, the proposed method is compared with the state-of-the-art methods.

Experimental setup

Datasets

To evaluate the proposed method, we create novel fire and smoke datasets. We refer to the dataset [8]. Dataset [8] has only 226 images with fire and without fire. To fertilize the dataset, we collected images of fire smoke and no fire smoke from the Internet. We established a fire and smoke dataset of 10,029 images. The fire and smoke images include many scenes, such as cars, forests, buildings, and grasslands. In the remainder of the article, it is named Dataset 1. Moreover, we added images of fire-like and smoke-like objects, such as glare lights, burning clouds, sunset, water, steam, and sand dust. In the remainder of the article, this dataset is named Dataset 2. As shown in Fig. 9, our self-created dataset contains fire, fire-like, and nonfire three-type images. In Fig. 9, picture (a) shows some command images of fire scenarios containing house fire, wildfire, and smoke at the scene of the fire. Picture (b) shows some images of fire-like scenarios containing clouds, dust droplets, lighting, and fog. The detailed statistics of our dataset are shown in Table 3.
Table 3
Detailed statistics of the datasets
  
Training set
Validation set
Testing set
Dataset 1
Fire scene
10,029
1002
1002
 
Fire scene
10,029
1002
1002
Dataset 2
Fire-like
3136
313
313
 
Smoke-like
3548
354
354

Implementation details

We use PyTorch to implement our method. The SGD optimizer is chosen, we set the parameters of the optimizer at the initial learning rate to 1e−3, and the momentum is 0.9. The batch size is set to 16, and the training time is 300 epochs. We use Mosaic [2] and MixUp [41] of the data augmentation strategies. After every epoch, we use a validation set to validate the model.
This paper configures the hardware platform with an NVIDIA RTX3060 graphics card with 12 GB memory. The GPU’s single-precision floating computing capability is 12.74 TFLOPS. We implement our method in the Windows 10 system.

Evaluation metrics

The VOC metric is a significant standard for assessing the detection method performance. In object detection, the average precision (AP) is a common evaluation metric. The specific description is shown as follows.
For any threshold, the true positive is the number of detections with \(\mathrm IoU \>\mathrm threshold\); the false positive is the number of detection boxes with \(\mathrm IoU \le \mathrm threshold\), and the false negative is the number of ground truths with no detection boxes.
Precision is the ratio of the number of real positive samples to the number of detected positive samples. The recall is the ratio of the number of real positive samples to the number of objects. The mathematical equation is
$$\begin{aligned} \begin{aligned} {\text {Precision}}&= \frac{TP}{TP + FP} \\ {\text {Recall}}&= \frac{TP}{TP + FN}. \end{aligned} \end{aligned}$$
(6)
Using precision as the y-axis and recall as the x-axis, we build the precision-recall curve. Then, we calculated the AP (average precision) as the average of 11 values of the area under the curve, which can be expressed by
$$\begin{aligned} \begin{aligned} {\text {AP}}=\frac{1}{11} \sum _{i=r} {\text {area}}(r) \qquad r \in \{0,0.1, \ldots , 0.9,1\}. \end{aligned} \end{aligned}$$
(7)
Here, \({\textbf{r}}\) is the recall. Assume that there are n classes. The MAP is the mean AP of all classes, which can be calculated by
$$\begin{aligned} \begin{aligned} {\text {MAP}} = \frac{1}{n} \sum _{i=1}^{n} {\text {AP}}(\mathbf {classes_{i}}). \end{aligned} \end{aligned}$$
(8)
Table 4
The performances of different structures on the Dataset 1
Methods
\(\mathrm AP_{50:95}\)
\(\mathrm AP_{50}\)
\(\mathrm AP_{60}\)
\(\mathrm AP_{70}\)
Parameters
FPS
baseline
39.9
68.9
60.2
45.9
61.5 M
62.9
CBAM
39.7
69.2
60.2
46.9
61.8 M
58.8
anchor-free
42.8
69.3
62.6
50.4
61.5 M
83.3
CBAM + anchor-free
43.7
73.3
64.4
51.9
61.8 M
73.0
CBAM + anchor-free + Light backbone
43.8
70.1
63.9
53.3
48.8 M
73.0
CBAM denotes the convolution block attention module. Anchor-free implicates the anchor-free mechanism. Lightweight backbone denotes the proposed backbone. All the models have been tested at \(416 \times 416\) resolution
The blod means best performance

Ablation study

In this section, we compare the improved method with the baseline method. Our method is based on YOLOv3 and introduces two mechanisms to enhance the performance of the base method. To confirm the effectiveness of the two mechanisms, we present experiments with different settings: (1) only the YOLOv3 baseline network, (2) applying the CBAM (convolution block attention module) in three branches that are on different scales, (3) switching the anchor mechanisms to anchor-free mechanisms, (4) simultaneously adding the CBAM and switching the anchor mechanisms, and (5) replacing the original backbone with a lightweight backbone.
The experimental results of the ablation study are presented in Table 4. Compared with the YOLOv3 baseline, the structure that applies the CBAM (channel attention and spatial attention module) achieves better performance in \(\mathrm AP_{50}\) and \(\mathrm AP_{70}\), but AP is a slightly baseline lower. It can be concluded that CBAM enhances the classification ability and drops the location ability of the model. In addition, the inference speed is reduced from 62.9 FPS to 58.8 FPS due to the increase in model complexity. The \(\mathrm AP_{50}\) of applying the anchor-free mechanism individually outperforms the YOLOv3 baseline by 0.4, which reaches 69.3. Moreover, the structure achieves significant improvement in \(\mathrm AP\), \(\mathrm AP_{60}\), and \(\mathrm AP_{70}\). It can be concluded that removing the anchor mechanism can improve the positioning ability of the model. Additionally, the speed of the model increases from 58.8 FPS to 83.3 FPS causing a decrease in the number of detection boxes. When we apply these two mechanisms simultaneously, the AP further improves to 43.7. Finally, the structure replaces “DarkNet-53” with a lightweight backbone, decreasing the model parameter from 61.8 M to 48.8 M. It achieves 43.8 and 53.3 in terms of AP, and \(\mathrm AP_{70}\), respectively, which outperforms the structure of “DarkNet-53”. However, “DarkNet-53” outperforms the “Lightweight” backbone. The reason may be that the classification ability becomes weak, but the localization ability becomes strong. In addition, the model maintains a high FPS, which increases from 62.9 to 73.0 compared with the baseline method.
To further evaluate the effectiveness of the improved method, we conduct a comparison in Dataset 2, and the results are shown in Table 5. Our method obviously improves the AP, \(\mathrm AP_{50}\), \(\mathrm AP_{60}\), and \(\mathrm AP_{70}\). In particular, our method achieves 50.8 AP, which outperforms the baseline by 4.7.
Table 5
Performance of the baseline method and our method on Dataset 2
Method
Backbone
Size
\(\mathrm AP_{50:95}\)
\(\mathrm AP_{50}\)
\(\mathrm AP_{60}\)
\(\mathrm AP_{70}\)
Baseline
DarkNet-53
416
46.1
73.1
67.8
56.2
Our method
LightNet
416
50.8
74.6
69.4
60.4
The blod means best performance
Table 6
Quantitative result on the Dataset 1 without fire-like and smoke-like object
Methods
Backbone
Size
\(\mathrm AP_{50:95}\)
\(\mathrm AP_{50}\)
\(\mathrm AP_{60}\)
\(\mathrm AP_{70}\)
YOLOv2 [25]
ResNet-50
416
37.3
68.7
58.4
44.9
YOLOv3 [26]
DarkNet-53
416
39.9
68.9
60.2
45.9
YOLOv4 [2]
CSPDarknet-53
416
39.2
70.6
62.4
48.4
CenterNet [44]
ResNet-34
512
33.0
63.2
52.3
38.7
CenterNet [44]
ResNet-50
512
38.3
68.8
57.6
44.6
SSD [19]
VGG
300
31.2
63.4
53.2
37.8
SSD [19]
VGG
512
37.9
68.2
58.5
45.5
EfficienDet-D0 [30]
EfficienNet-B0
512
25.4
59.7
46.5
27.7
EfficienDet-D1 [30]
EfficienNet-B1
512
27.9
61.9
56.4
48.6
Our method
LightNet
416
43.8
70.1
63.9
53.3
The blod means best performance
Table 7
Quantitative result on Dataset 2 that includes fire-like and smoke-like objects
Methods
Backbone
Size
\(\mathrm AP_{50:95}\)
\(\mathrm AP_{50}\)
\(\mathrm AP_{60}\)
\(\mathrm AP_{70}\)
YOLOv2 [25]
ResNet-50
416
42.0
72.6
64.4
51.7
YOLOv3 [26]
DarkNet-53
416
46.1
73.1
67.8
56.2
YOLOv4 [2]
CSPDarkNet-53
416
47.0
75.2
68.4
58.1
CenterNet [44]
ResNet-34
512
37.2
67.2
58.0
38.7
CenterNet [44]
ResNet-50
512
42.1
72.1
62.6
50.7
SSD [19]
VGG
300
38.1
67.8
58.0
45.1
SSD [19]
VGG
512
39.0
67.2
59.8
47.8
EfficienDet-D0 [30]
EfficienNet-B0
512
27.2
53.7
45.5
32.9
EfficienDet-D1 [30]
EfficienNet-B1
640
26.5
54.4
44.8
31.3
Our method
LightNet
416
50.8
74.6
69.4
60.4
The blod means best performance

Comparison with CNN-based detection methods

In this section, we compare the performance of our method and existing CNN-based methods. Traditional fire disaster detection methods are mainly based on manual features. The complex environment easily influences the manual features, whose detection performance is lower than that of network extraction. Thus, the comparison between the proposed method and the traditional method is unfair. First, we conducted a comparison experiment on Dataset 1. Then, we conducted a series of experiments on Dataset 2 to further analyze the performance of our method. Finally, based on the AP indicator, we present a wide comparison of the model size and inference speed.

AP comparison using dataset 1

In this part, the experiment is performed on the benchmark Dataset 1 to evaluate the performance of the proposed method and compare it with some representative detection methods based on CNNs (convolution neural networks), including [19, 26, 30, 44]. The comparison indices are AP(\(\mathrm AP_{50:95}\)), \(\mathrm AP_{50}\), \(\mathrm AP_{60}\), and \(\mathrm AP_{70}\).
We reimplement all given methods on Dataset 1 and use the same testing set to evaluate the performance of these methods. The results are shown in Table 6. Our method achieves 43.8 AP, which outperforms the second-best method [26] by 3.9. The performance indicates that our method has good position ability. Method [30] has the worst AP on Dataset 1, which is 25.4. Moreover, our method has suboptimal performance in terms of \(\mathrm AP_{50}\), which is 70.1. The performance indicates that our method has good classification performance. Moreover, our method achieves 53.3 \(\mathrm AP_{70}\), which is the best method. This indicates that our method has better positioning ability than the other methods. [26] achieves 39.9 AP on Dataset 1, which is the second-best method. [2] achieves 39.2 AP, which is slightly lower than [26]. However, [2] performs better than [26] in terms of \(\mathrm AP_{50}\), \(\mathrm AP_{60}\), and \(\mathrm AP_{70}\). This is because [26] has a better precise positioning ability than [2]. From the perspective of comprehensive detection performance, our method achieves the best AP, \(\mathrm AP_{60}\) and \(\mathrm AP_{70}\).

Detection performance in dataset 2

Considering that fire-like and smoke-like objects will affect fire disaster management, the detection systems must be robust against fire-like and smoke-like object attacks. To analyze the detection performance of different methods and enhance the ability against object attacks, we build a dataset that includes fire-like and smoke-like objects based on Dataset 1.
All methods are trained on Dataset 2, and the experimental results are shown in Table 7. From the results, our model has the best detection performance, and [30] has the worst detection performance. The AP of method [30] only reaches 26.5 on Dataset 2, which is 24.3 lower than our method. In addition to AP, our method shows the best performance on other evaluation metrics, particularly \(\mathrm AP_{70}\), and our method achieves 60.4, which outperforms the worst and suboptimal by 29.1 and by 2.3, respectively. This is because removing the anchor frame enhances the positioning of the model. In terms of \(\mathrm AP_{60}\), our method achieves 69.4 and outperforms the suboptimal method by 1.0. The method [2] achieves the best \(\mathrm AP_{50}\), which is 75.2 and higher by 0.6 than our method. The reason may be that method [2] has better classification ability.
Furthermore, to explore the effect of similar targets on detection, we validated the performance of the model on both datasets. As shown in the first row of Fig. 10, some nonfire scenes are recognized as fire accidents. Using Dataset 2 to train the model, the model can correctly identify similar targets. The proposed method is tested on images including fire and smoke, and the results are shown in Fig. 11. The first row are includes images containing fire mixed with smoke. It can be seen that the model can basically identify fire and smoke, but in some areas, it is difficult for the model to distinguish fire from smoke. In the second row and bottom row, the model clearly recognizes the fire and smoke. Some small objects are still difficult to detect, as shown in the last images in Fig. 11. Fairly, all images are not included in Dataset 1 and 2. We can see that using a dataset containing similar targets for training can effectively reduce the false alarm rate.

Comparison of the experimental results of fire and smoke

The AP of fire and smoke is an important indicator of the detection performance of detection methods. In this subsection, we present more detailed comparative results in terms of the detection performance of each category in Dataset 2. We present \(\mathrm AP_{50}\) of fire and smoke under similar detection performance. The experimental results are illustrated in Table 8.
Table 8
Experimental results of each category in Dataset 2
Method
Backbone
Size
\(\mathrm AP_\textrm{fire}\)
\(\mathrm AP_\textrm{smoke}\)
YOLOv2 [25]
ResNet-50
416
60.6
74.1
YOLOv3 [26]
Darknet-53
416
62.1
74.7
YOLOv4 [2]
CSPDarkNet-53
416
62.5
76.1
CenterNet [44]
ResNet-50
512
60.3
74.3
SSD [19]
VGG
512
58.6
74.4
Our method
LightNet
416
62.7
77.2
The blod means best performance
Table 9
Comparison of the model complexity and speed under similar AP values
Method
Backbone
Size
\(\mathrm AP_{50:95}\)
FPS
Parameter
YOLOv2 [25]
ResNet-50
416
42.4
61.8
44.6 M
YOLOv3 [26]
DarkNet-53
416
46.1
62.9
61.5 M
YOLOv4 [2]
CSPDarkNet-53
416
47.0
62.0
57.7 M
CenterNet [44]
ResNet-50
512
42.1
51.5
28.5 M
SSD [19]
VGG
512
39.0
15
30.6 M
Our method
LightNet
416
50.8
73.0
48.8 M
The blod means best performance
Table 8 shows that \(\mathrm AP_{smoke}\) is higher than \(\mathrm AP_{fire}\) for all methods. The reason is that smoke targets are usually larger than flames. For smoke detection, [19, 26, 44] have very similar \(\mathrm AP_{smoke}\) values, all of which exceed 70. This result is also due to the relatively large area of smoke. However, [19] achieves the worst \(\mathrm AP_{fire}\) at only 58.6. The reason may be that VGG does not contain a residual structure; therefore, it is not easy to detect fire. A low \(\mathrm AP_{fire}\) implies that the model causes a high fire error alarm rate. For \(\mathrm AP_{fire}\), the suboptimal method achieves 62.5. Our method obtains the best results for the two indices and outperforms the suboptimal methods by 0.2 and 1.1, respectively. From the experimental results, it can be concluded that our method achieves the lowest alarm rate.

Comparison of the model size, inference speed, and AP

Smaller fire detection methods are more feasible for deployment on hardware with limited computing resources. However, the inference speed of the methods is often uncontrolled, since the speed varies with software and hardware. Thus, in this section, we provide a detailed experimental analysis of the feasibility of our method and the state-of-the-art CNN-based method. Experiments are performed in two settings: (1) an NVIDIA GTX3060 supporting deep learning acceleration with 12 GB on-board memory and floating-point arithmetic capability of 12.74 TFLOPS, and (2) a CPU with 16 GB RAM and 4.0-GHz 64-bit Intel Core-i5(12400).
AP is the basic nature of the object detection method. Therefore, the comparison of results is discussed considering AP. The parameters of the methods under similar AP are compared in Table 9. First, in terms of model parameters, [44] appears to be the best method. However, it has a lower AP than our method. In contrast, method [44] has only 51.5 FPS. The reason is that high-resolution images can lead to increased inference time. The parameter of our method is 48.8 M, which is similar to [25]. However, the method [25] has only 61.8 FPS. In terms of inference speed, the suboptimal method achieves 62.9 FPS, which is 13% lower than that of our method. In addition, Ref. [19] is the worst among these methods. Our method achieves the best inference speed on the GPU. Furthermore, Refs. [44] and [19] have lower parameters than the other methods. However, these methods achieve lower speeds than other methods. The reason for this is that it takes time to process the predicted data. According to the above experimental results, our method reaches a good balance between detection performance and inference speed.

Conclusion

In this paper, we discuss the successful application of the current CNN-based methods in computer vision tasks and the possibility of enhancing the performance of fire detection methods based on computer vision. Nevertheless, in some complicated scenes, the performance of current CNN-based methods remains limited. Most existing methods have difficulty recognizing similar objects and lack generalization ability. To address these limitations, an attention mechanism module that combines channel attention and spatial attention is proposed to improve the discrimination of smoke, fire, and similar objects. Then, the anchor-free mechanism is proposed to solve the model shortage of generalization ability because of the variable shapes of fire and smoke. Moreover, we used a lightweight backbone to reduce the complexity of our method. Compared with existing CNN-based methods in two datasets, the results show that our method achieves a better tradeoff on the AP, model size, and FPS. Further work will focus on expanding the detection dataset and using semantic segmentation [31] to precisely mark the fire area in images.

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 61763002 and 62072124), and Guangxi Major Projects of Science and Technology (Grant No. 2020AA21077007).

Declarations

Conflict of interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, and there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled. On behalf of all authors, the corresponding author states that there is no conflict of interest.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Literature
1.
go back to reference Alamgir N, Nguyen K, Chandran V, Boles W (2018) Combining multi-channel color space with local binary co-occurrence feature descriptors for accurate smoke detection from surveillance videos. Fire Saf J 102:1–10CrossRef Alamgir N, Nguyen K, Chandran V, Boles W (2018) Combining multi-channel color space with local binary co-occurrence feature descriptors for accurate smoke detection from surveillance videos. Fire Saf J 102:1–10CrossRef
2.
go back to reference Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection
3.
go back to reference Calderara S, Piccinini P, Cucchiara R (2011) Vision based smoke detection system using image energy and color information. Mach Vis Appl 22(4):705–719CrossRef Calderara S, Piccinini P, Cucchiara R (2011) Vision based smoke detection system using image energy and color information. Mach Vis Appl 22(4):705–719CrossRef
4.
go back to reference Celik T, Demirel H (2009) Fire detection in video sequences using a generic color model. Fire Saf J 44(2):147–158CrossRef Celik T, Demirel H (2009) Fire detection in video sequences using a generic color model. Fire Saf J 44(2):147–158CrossRef
5.
go back to reference Celik T, Ozkaramanli H, Demirel H (2007) Fire pixel classification using fuzzy logic and statistical color model. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, vol. 1, pp I–1205. IEEE Celik T, Ozkaramanli H, Demirel H (2007) Fire pixel classification using fuzzy logic and statistical color model. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, vol. 1, pp I–1205. IEEE
6.
go back to reference Chen J, He Y, Wang J (2010) Multi-feature fusion based fast video flame detection. Build Environ 45(5):1113–1122CrossRef Chen J, He Y, Wang J (2010) Multi-feature fusion based fast video flame detection. Build Environ 45(5):1113–1122CrossRef
7.
go back to reference Chen J, Wang Y, Tian Y, Huang T (2013) Wavelet based smoke detection method with rgb contrast-image and shape constrain. In: Visual Communications and Image Processing (VCIP), 2013 Chen J, Wang Y, Tian Y, Huang T (2013) Wavelet based smoke detection method with rgb contrast-image and shape constrain. In: Visual Communications and Image Processing (VCIP), 2013
8.
go back to reference Chino D, Avalhais L, Jr J, Traina A (2015) Bowfire: Detection of fire in still images by integrating pixel color and texture analysis. In: 2015 28th SIBGRAPI Conference on Graphics, Patterns and Images Chino D, Avalhais L, Jr J, Traina A (2015) Bowfire: Detection of fire in still images by integrating pixel color and texture analysis. In: 2015 28th SIBGRAPI Conference on Graphics, Patterns and Images
9.
go back to reference Foggia P, Saggese A, Vento M (2015) Real-time fire detection for video-surveillance applications using a combination of experts based on color, shape, and motion. IEEE Trans Circuits Syst Video Technol 25(9):1545–1556CrossRef Foggia P, Saggese A, Vento M (2015) Real-time fire detection for video-surveillance applications using a combination of experts based on color, shape, and motion. IEEE Trans Circuits Syst Video Technol 25(9):1545–1556CrossRef
11.
go back to reference Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587 Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587
12.
go back to reference Han XF, Jin JS, Wang MJ, Jiang W, Gao L, Xiao LP (2017) Video fire detection based on gaussian mixture model and multi-color features. Signal Image Video Process Han XF, Jin JS, Wang MJ, Jiang W, Gao L, Xiao LP (2017) Video fire detection based on gaussian mixture model and multi-color features. Signal Image Video Process
13.
go back to reference He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
14.
go back to reference Healey G, Slater D, Lin T, Drda B, Goedeke AD (1993) A system for real-time fire detection. In: IEEE conference on computer vision and pattern recognition Healey G, Slater D, Lin T, Drda B, Goedeke AD (1993) A system for real-time fire detection. In: IEEE conference on computer vision and pattern recognition
15.
go back to reference Ko BC, Cheong KH, Nam JY (2009) Fire detection based on vision sensor and support vector machines. Fire Saf J 44(3):322–329CrossRef Ko BC, Cheong KH, Nam JY (2009) Fire detection based on vision sensor and support vector machines. Fire Saf J 44(3):322–329CrossRef
16.
go back to reference Lee Y, Shim J (2019) False positive decremented research for fire and smoke detection in surveillance camera using spatial and temporal features based on deep learning. Electronics 8(10):1167CrossRef Lee Y, Shim J (2019) False positive decremented research for fire and smoke detection in surveillance camera using spatial and temporal features based on deep learning. Electronics 8(10):1167CrossRef
17.
go back to reference Lestari DP, Kosasih R, Handhika T, Sari I, Fahrurozi A, et al. (2019) Fire hotspots detection system on cctv videos using you only look once (yolo) method and tiny yolo model for high buildings evacuation. In: 2019 2nd international conference of computer and informatics engineering (IC2IE), pp 87–92. IEEE Lestari DP, Kosasih R, Handhika T, Sari I, Fahrurozi A, et al. (2019) Fire hotspots detection system on cctv videos using you only look once (yolo) method and tiny yolo model for high buildings evacuation. In: 2019 2nd international conference of computer and informatics engineering (IC2IE), pp 87–92. IEEE
18.
go back to reference Li S, Yan Q, Liu P (2020) An efficient fire detection method based on multiscale feature extraction, implicit deep supervision and channel attention mechanism. IEEE Trans Image Process 29:8467–8475CrossRefMATH Li S, Yan Q, Liu P (2020) An efficient fire detection method based on multiscale feature extraction, implicit deep supervision and channel attention mechanism. IEEE Trans Image Process 29:8467–8475CrossRefMATH
19.
go back to reference Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. Springer, Cham Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. Springer, Cham
20.
go back to reference Mao W, Wang W, Dou Z, Li Y (2018) Fire recognition based on multi-channel convolutional neural network. Fire Technol 54(2):531–554CrossRef Mao W, Wang W, Dou Z, Li Y (2018) Fire recognition based on multi-channel convolutional neural network. Fire Technol 54(2):531–554CrossRef
21.
go back to reference Muhammad K, Ahmad J, Lv Z, Bellavista P, Yang P, Baik SW (2018) Efficient deep cnn-based fire detection and localization in video surveillance applications. IEEE Trans Syst Man Cybern Syst 49(7):1419–1434CrossRef Muhammad K, Ahmad J, Lv Z, Bellavista P, Yang P, Baik SW (2018) Efficient deep cnn-based fire detection and localization in video surveillance applications. IEEE Trans Syst Man Cybern Syst 49(7):1419–1434CrossRef
22.
go back to reference Muhammad K, Ahmad J, Mehmood I, Rho S, Baik SW (2018) Convolutional neural networks based fire detection in surveillance videos. IEEE Access 6:18174–18183CrossRef Muhammad K, Ahmad J, Mehmood I, Rho S, Baik SW (2018) Convolutional neural networks based fire detection in surveillance videos. IEEE Access 6:18174–18183CrossRef
23.
go back to reference Muhammad K, Khan S, Elhoseny M, Ahmed SH, Baik SW (2019) Efficient fire detection for uncertain surveillance environment. IEEE Trans Industr Inf 15(5):3113–3122CrossRef Muhammad K, Khan S, Elhoseny M, Ahmed SH, Baik SW (2019) Efficient fire detection for uncertain surveillance environment. IEEE Trans Industr Inf 15(5):3113–3122CrossRef
24.
go back to reference Rafiee A, Dianat R, Jamshidi M, Tavakoli R, Abbaspour S (2011) Fire and smoke detection using wavelet analysis and disorder characteristics. In: 2011 3rd international conference on computer research and development, vol. 3, pp 262–265. IEEE Rafiee A, Dianat R, Jamshidi M, Tavakoli R, Abbaspour S (2011) Fire and smoke detection using wavelet analysis and disorder characteristics. In: 2011 3rd international conference on computer research and development, vol. 3, pp 262–265. IEEE
25.
go back to reference Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271 Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
27.
go back to reference Saponara S, Elhanashi A, Gagliardi A (2021) Real-time video fire/smoke detection based on cnn in antifire surveillance systems. J Real-Time Image Proc 18(3):889–900CrossRef Saponara S, Elhanashi A, Gagliardi A (2021) Real-time video fire/smoke detection based on cnn in antifire surveillance systems. J Real-Time Image Proc 18(3):889–900CrossRef
28.
go back to reference Shen D, Chen, X, Nguyen M, Yan WQ (2018) Flame detection using deep learning. In: 2018 4th international conference on control, automation and robotics (ICCAR), pp 416–420. IEEE Shen D, Chen, X, Nguyen M, Yan WQ (2018) Flame detection using deep learning. In: 2018 4th international conference on control, automation and robotics (ICCAR), pp 416–420. IEEE
29.
go back to reference Sun Y, Wu F (2022) Desac: differential evolution sample consensus algorithm for image registration. Appl Intell 1–24 Sun Y, Wu F (2022) Desac: differential evolution sample consensus algorithm for image registration. Appl Intell 1–24
30.
go back to reference Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790 Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790
31.
go back to reference Tian Z, He T, Shen C, Yan Y (2020) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) Tian Z, He T, Shen C, Yan Y (2020) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
32.
go back to reference Toreyin BU, Dedeoglu Y, Cetin AE (2005) Flame detection in video using hidden markov models. In: IEEE international conference on image processing 2005, vol. 2, pp II-1230. IEEE Toreyin BU, Dedeoglu Y, Cetin AE (2005) Flame detection in video using hidden markov models. In: IEEE international conference on image processing 2005, vol. 2, pp II-1230. IEEE
33.
go back to reference Töreyin B, Dedeoǧlu Y, Güdükbay U, Etin AE (2006) Computer vision based method for real-time fire and flame detection. Pattern Recogniti Lett Töreyin B, Dedeoǧlu Y, Güdükbay U, Etin AE (2006) Computer vision based method for real-time fire and flame detection. Pattern Recogniti Lett
34.
go back to reference Töreyin B, Dedeoǧlu Y, Çetin A (2005) Flame detection in video using hidden markov models. In: IEEE international conference on image processing Töreyin B, Dedeoǧlu Y, Çetin A (2005) Flame detection in video using hidden markov models. In: IEEE international conference on image processing
35.
go back to reference Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164 Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
36.
go back to reference Wang K, Liu M, Ye Z (2021) An advanced yolov3 method for small-scale road object detection. Appl Soft Comput 112:107846CrossRef Wang K, Liu M, Ye Z (2021) An advanced yolov3 method for small-scale road object detection. Appl Soft Comput 112:107846CrossRef
37.
go back to reference Wu S, Zhang L (2018) Using popular object detection methods for real time forest fire detection. In: 2018 11th international symposium on computational intelligence and design (ISCID), vol 1, pp 280–284. IEEE Wu S, Zhang L (2018) Using popular object detection methods for real time forest fire detection. In: 2018 11th international symposium on computational intelligence and design (ISCID), vol 1, pp 280–284. IEEE
38.
go back to reference Xu R, Lin H, Lu K, Cao L, Liu Y (2021) A forest fire detection system based on ensemble learning. Forests 12(2):217CrossRef Xu R, Lin H, Lu K, Cao L, Liu Y (2021) A forest fire detection system based on ensemble learning. Forests 12(2):217CrossRef
39.
go back to reference Ye W, Zhao J, Wang S, Wang Y, Zhang D, Yuan Z (2015) Dynamic texture based smoke detection using surfacelet transform and hmt model. Fire Saf J 73(apr.):91-101CrossRef Ye W, Zhao J, Wang S, Wang Y, Zhang D, Yuan Z (2015) Dynamic texture based smoke detection using surfacelet transform and hmt model. Fire Saf J 73(apr.):91-101CrossRef
40.
go back to reference Zhang D, Han S, Zhao J, Zhang Z, Qu C, Ke Y, Chen X (2009) Image based forest fire detection using dynamic characteristics with artificial neural networks. In: 2009 international joint conference on artificial intelligence, pp. 290–293. https://doi.org/10.1109/JCAI.2009.79 Zhang D, Han S, Zhao J, Zhang Z, Qu C, Ke Y, Chen X (2009) Image based forest fire detection using dynamic characteristics with artificial neural networks. In: 2009 international joint conference on artificial intelligence, pp. 290–293. https://​doi.​org/​10.​1109/​JCAI.​2009.​79
41.
42.
go back to reference Zhao Y, Zhao JH, Huang J, Han SZ, Long CJ, Yuan ZY, Zhang DY (2011) Contourlet transform based texture analysis for smoke and fog classification. Appl Mech Mater 88–89:537–542CrossRef Zhao Y, Zhao JH, Huang J, Han SZ, Long CJ, Yuan ZY, Zhang DY (2011) Contourlet transform based texture analysis for smoke and fog classification. Appl Mech Mater 88–89:537–542CrossRef
Metadata
Title
Fire and smoke precise detection method based on the attention mechanism and anchor-free mechanism
Authors
Yu Sun
Jian Feng
Publication date
10-03-2023
Publisher
Springer International Publishing
Published in
Complex & Intelligent Systems / Issue 5/2023
Print ISSN: 2199-4536
Electronic ISSN: 2198-6053
DOI
https://doi.org/10.1007/s40747-023-00999-4

Other articles of this Issue 5/2023

Complex & Intelligent Systems 5/2023 Go to the issue

Premium Partner