Skip to main content

Open Access 04.03.2024 | Original Article

A lightweight contour detection network inspired by biology

verfasst von: Chuan Lin, Zhenguang Zhang, Jiansheng Peng, Fuzhang Li, Yongcai Pan, Yuwei Zhang

Erschienen in: Complex & Intelligent Systems

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In recent years, the field of bionics has attracted the attention of numerous scholars. Some models combined with biological vision have achieved excellent performance in computer vision and image processing tasks. In this paper, we propose a new bio-inspired lightweight contour detection network (BLCDNet) by combining parallel processing mechanisms of bio-visual information with convolutional neural networks. The backbone network of BLCDNet simulates the parallel pathways of ganglion cell–lateral geniculate nucleus and primary visual cortex (V1) area, realizing parallel processing and step-by-step extraction of input information, effectively extracting local features and detailed features in images, and thus improving the overall performance of the model. In addition, we design a depth feature extraction module combining depth separable convolution and residual connection in the decoding network to integrate the output of the backbone network, which further improves the performance of the model. We conducted a large number of experiments on BSDS500 and NYUD datasets, and the experimental results show that the BLCDNet proposed in this paper achieves the best performance compared with traditional methods and previous biologically inspired contour detection methods. In addition, BLCDNet still outperforms some VGG-based contour detection methods without pre-training and with fewer parameters, and it is competitive among all of them. The research in this paper also provides a new idea for the combination of biological vision and convolutional neural networks.
Hinweise

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

As one of the low-level tasks in the field of computer vision [1], contour detection plays a crucial role in enhancing the performance of various mid-level and advanced vision tasks. These tasks include target detection [2], semantic segmentation [3], saliency detection [4] and occlusion reasoning [5], among others.
Traditional edge detection methods, such as Prewitt [6], Sobel [7], and Canny [1], primarily extract edges by calculating local gray-level changes in the image using differential operators. During the image edge extraction process, these methods concentrate on detecting the underlying image features [8], but often struggle to differentiate important background and texture. This limitation results in lower accuracy and performance in contour extraction, failing to meet the requirements of certain mid-level and advanced visual tasks. Hence, many experts and scholars have started to explore high-performance contour detection methods. In addition, as one of the new research hotspots, contour detection has also attracted the attention of the field of biology [9].
Inspired by the early discovery and suggestion by Hubel and Wiesel [10] that primary visual cortex (V1) neurons have the function of detecting edges and lines, several experts and scholars have proposed many bionic contour detection models based on the biological visual mechanisms effective for contour detection [11, 12]. For example, Grigorescu et al. [13] used the Gabor operator, Gabor energy operator, and difference of Gaussian (DOG) operator to simulate simple cell response, complex cell response, and non-classical receptive field (nCRF) inhibitory characteristics on classical receptive field (CRF), and proposed a new contour detection model. Yang et al. [14] proposed the biomimetic contour detection model, double-opponency and spatial sparseness constraint (SCO), based on color antagonism mechanism and spatial sparseness constraint strategy (SSC). Akbarinia et al. [15] realized target edge extraction based on the color opposition mechanism from the retina to the visual cortex (V1) and the surround modulation characteristics of the receptive field of cells in the V1 area. Although the contour detection model, which simulates the biological vision mechanism, achieves better performance by reducing the background and texture to a certain extent compared to traditional methods, there are still some issues worthy of investigation. In previous methods, researchers typically employed mathematical formulas to simulate visual mechanisms or biological characteristics effective for contour detection in biological vision systems for extracting image contours. However, interactions between neurons in biological vision systems are typically complex and diverse. Thus, relying solely on a single mathematical function to simulate their functions is evidently inappropriate [8]. To this end, Tang et al. [8] proposed a method combining biological vision with deep learning. Tang et al. designed a learnable contour detection model using convolution kernels of different sizes to simulate nCRF and CRF's processing of feature maps. At the same time, the combination of image pyramids achieves the fusion of feature information at different scales, which further increases the complexity and diversity of the model and also provides new ideas for the design of bionic contour detection models. Later, Lin et al. [16], inspired by the effective mechanism of contour detection in the biological vision system, combined it with convolutional neural networks and self-attention mechanisms to propose a multi-level interactive contour detection model, MI-Net, achieving good performance.
In the past period of time, the end-to-end contour detection model based on the convolutional neural network [1722] has made breakthrough progress. For example, on the BSDS500 [23], the detection performance has been boosted from 0.598 [24] to 0.828 [22] in ODS (optimal dataset scale) F-measure. Recently, a transformer-based edge detection model [25] has achieved higher performance with an ODS of 0.848. However, although these methods achieve the best performance, they generally have high complexity, a large number of parameters, and slow processing speed, occupying a large number of computing resources. Furthermore, researchers incorporate parameters trained on ImageNet [26] into their models during training to achieve enhanced performance through the application of transfer learning. To minimize computational resource consumption and enhance model processing speed, some researchers have initiated investigations into achieving high-performance image contour extraction under conditions of a simple model, minimal parameters, fast operation speed, and low resource consumption while examining contour detection models based on transfer learning. Later, inspired by the lightweight models in other visual tasks [2729], some researchers proposed a lightweight model for contour detection. For example, Wibisono et al. [30] proposed a lightweight edge detection model called fast inference network for edge detection (FINED) by using expansion convolution to design a backbone network. Combined with the steps of edge extraction by traditional contour detection methods, a lightweight contour detection model traditional method inspired deep neural network (TIN2) [31] is proposed. Su et al. [32] proposed pixel difference network (PiDiNet), a simple and efficient edge detection network based on pixel difference, and achieved the best results in the lightweight model.
To sum up, it can be found that the design of the lightweight model is becoming a new research hotspot, attracting the attention of more and more researchers. For contour detection, although the lightweight model has achieved better performance than the traditional methods and some CNN-based models [1719], there are still some problems to be solved. As we know, the emergence of CNN is inspired by the biological vision system [35], while the current lightweight model is mainly designed based on the experience of researchers, lacking the guidance of relevant biological vision mechanisms. Therefore, this paper proposes a new bio-inspired lightweight contour detection network (BLCDNet) combining biological vision and deep learning technology. Among them, our backbone network simulates three parallel channels formed by ganglion cells, lateral geniculate nucleus (LGN), and primary visual cortex (V1) in the biological visual system [33, 34], and simulates the different characteristics of the three parallel channels to achieve visual information processing and feature extraction. The transmission process of visual information from the retina to LGN to V1 is shown in Fig. 1. In addition, we also design a depth feature extraction module by using the depth separable convolution [29] which is widely used in lightweight networks. By further processing the output of the backbone network, we can comprehensively extract feature information and enhance the overall performance of the model. It is worth noting that our method has achieved the most advanced performance in the bionic contour detection model, and our method of combining biological vision with deep learning also provides new ideas for future research. Our contributions are summarized as follows:
1.
We simulated the three parallel pathways formed by ganglion cells, LGN, and the primary visual cortex (V1) in the biological visual system and designed corresponding backbone networks. These include the large receptive field network simulating the pathway from ganglion cells to the V1 area with large cells, the small receptive field network simulating the pathway with small cells, and the hybrid network simulating the color pathway from ganglion cells to the V1 area. Finally, we combine the outputs of these three pathways to comprehensively extract and fuse the feature information.
 
2.
We design the deep feature extraction module using deep separable convolutions. By further processing the features output by the backbone network, the contextual information is fully integrated and the overall detection performance of the model is improved.
 
3.
We combine a backbone network that models parallel pathways with a designed deep feature extraction module to propose a biologically inspired lightweight contour detection network with simple structure and high efficiency and accuracy.
 
This paper mainly involves contour detection, biological vision mechanism, and lightweight network. We will briefly review the work in these three aspects.

Contour detection

The existing contour detection methods can be divided into traditional contour detection methods, bio-inspired contour detection methods, and learnable contour detection methods. Among them, the learnable contour detection methods can be divided into traditional machine learning methods and deep learning methods. The traditional contour detection methods [1, 6, 7] mainly calculate the local gradient change of the image by the derivative of the differential operator to detect the edge. While these early contour detection methods can extract contours in images, their performance and accuracy are lacking. They struggle to precisely differentiate between the background and the image contours, making them susceptible to noise interference. In contrast, the bionic contour detection method [1315] simulates the characteristics of a specific area or cell in the biological vision system using mathematical formulas. To some extent, this method achieves background and texture suppression in the image, resulting in commendable performance. Methods based on traditional machine learning [23, 3638] use supervised learning and manual design features to extract contour. They regard the contour detection task as a binary classification task, classify the target image at pixel level by using the designed features and extract the target contour from the image successfully. For example, the oriented edge forests (OEF) algorithm based on a random forest classifier proposed by Hallman et al. [38] achieves the probability fusion of edges according to pixel points and then obtains image edges. The deep learning-based method [1722] utilizes the excellent feature extraction capability of a convolutional neural network to fully extract feature information and achieve better contour detection performance. Xie et al. [17] first proposed an end-to-end detection model based on CNN, which extracted the target contour by outputting the features of the middle layer and fusing the features of different scales. Liu et al. [18] improved on this basis and proposed a contour detection model RCF with richer features. He et al. [22] improved model performance to a higher level by designing cascade networks and scale enhancement modules. Amaren et al. [39] proposed a new framework based on VGG16 by designing the fire module and incorporating residual learning. The framework achieves a significant reduction in network complexity and can increase the depth of the network while preserving its low-complexity characteristics. Fang et al. [40] designed a novel local contrast loss to learn edge mapping as a representation of local contrast, addressing the edge ambiguity problem extracted by the current method. This design results in clear edge extraction and achieves good performance. Recently, Pu et al. [25] proposed a new contour detection model using transformer as the backbone network, which achieved better performance than bi-directional cascade network (BDCN) [22].

Biological visual mechanisms

In the biological visual system, the processing and transmission process of visual information from the retina to LGN to V1 is called the first visual pathway [41]. In this pathway, visual information undergoes transformation and processing by the retina before being transmitted through ganglion cells to the LGN. The LGN receives and processes visual information from the retina, subsequently transmitting the processed information to area V1. Within area V1, the visual information received from the LGN undergoes further processing and integration [33, 34]. As a research hotspot in the field of computer vision, contour detection has received much attention in the field of physiology. Studies have shown that there are many biological visual mechanisms in the first visual pathway that have been proved to be important for contour detection. For example, the modulation of CRF by nCRF in neurons in area V1 [13], the color antagonism mechanism in retina to V1 [14], and the dynamic modulation mechanism in the receptive field after neurons in area V1 are stimulated [42]. In addition, Hubel and Wiesel [10] also found and proposed in an earlier study that neurons in the V1 region of the biological visual system have the function of detecting edges and lines. At present, the research on the bionic contour detection model is mostly focused on the first visual pathway. Recently, Fan et al. [43] proposed a hierarchical scale convolutional neural network for facial expression recognition. In this method, they not only use enhanced kernel scale information extraction and high-level semantic features to guide low-level learning, but also propose a method to mimic human cognitive learning with knowledge transfer learning (KTL). The KTL process shares similarities with human cognitive ability in that it can be progressively enhanced by knowledge acquired from other tasks. In contrast, our approach takes inspiration from the parallel processing in biological vision and the step-by-step handling of visual information. Simultaneously, by integrating the characteristics of convolutional neural networks, we have designed a new backbone network. The network achieves good contour detection performance by extracting and fusion feature information step by step.

Lightweight network

Recently, to solve the problems of the contour detection method based on deep learning [1722], such as complex models, a large number of parameters, and slow calculation speed, researchers proposed a lightweight network for contour detection [3032]. They design the backbone network using existing experience or by combining traditional contour detection methods, thus reducing the complexity of the model, reducing the parameters of the model, and increasing the computational speed of the model. For example, Wibisono et al. [31] designed a convolutional neural network framework corresponding to the traditional edge detection scheme inspired by the edge extraction step in traditional methods. Su et al. [32] combined traditional central difference, angular difference, and radial difference with 2D convolution to propose differential convolution operation and construct pixel difference network (PiDiNet) for edge detection. Among them, the PiDiNet proposed by Su et al. [32] achieves the best performance.
Based on the above analysis, we combined the design of a lightweight network with a biological visual mechanism and designed a new lightweight network for contour detection (BLCDNet) by simulating the processing and transmission process of visual information from the retina ganglion cells to LGN to V1. BLCDNet has the characteristics of low complexity, less parameter number, and less memory resource occupation, and achieves good results without the need for pre-training. Compared with other bio-inspired contour detection models, the results are the most advanced. In addition, this approach of combining lightweight networks based on deep learning with biological vision mechanisms also provides a new direction for further research.

Proposed methods

Information processing and transmission mechanism from ganglion cells to LGN to V1

Physiological studies have revealed that ganglion cells in mammalian retinas can be categorized based on appearance, connectivity, and electrophysiological properties. In both the macaque monkey retina and the human retina, three primary types of ganglion cells have been identified: large M-type ganglion cells, smaller P-type ganglion cells, and non-M–non-P ganglion cells [33, 4446]. As shown in Fig. 2. They have different visual response characteristics and play different roles in visual perception. Among them, M-type ganglion cells have a larger receptive field, which is considered to be of great significance for the detection of motor stimulation. P-type ganglion cells have small receptive fields, which are very suitable for distinguishing tiny details. Non-M–non-P cells are equally sensitive to different wavelengths of light, and they and some P-type ganglion cells are also known as color-opponent cells, reflecting the phenomenon that the response of a neuron's receptive field centers to one color is canceled out by another color around the receptive field. In non-M–non-P ganglion cells, the two opponent colors information are blue and yellow [33]. Then the visual information processed by different ganglion cells is projected to the LGN layer.
The research shows that LGN can be divided into six layers, starting from the most ventral layer and superimposed layer by layer [33, 47]. The detailed structure is shown in Fig. 2. Among them, the ventral layers 1 and 2 contain larger neurons, which are called large-cell LGN layers, and correspondingly receive the output from M-type ganglion cells. The neurons of dorsal layer 3–6 are called the small-cell LGN layers, which receives the output from P-type ganglion cells. Many tiny neurons on the ventral side of each of layers 1–6 make up koniocellular LGN layers to receive the output from non-M–non-P ganglion cells. Furthermore, through physiological experiments, the researchers concluded that neurons in LGN have similar characteristics to their corresponding ganglion cells. Specifically, large-cell LGN neurons share similarities with M-type ganglion cells, small-cell LGN neurons are akin to P-type ganglion cells, and koniocellular LGN layer neurons resemble non-M–non-P-type ganglion cells [33].
LGN-processed visual information was projected to the primary visual cortex (V1) [44, 47]. Region V1 is divided into six layers according to its cell arrangement and structure and Brodmann’s [33, 48] convention that the neocortex has six layers of cells. As shown in the rightmost part of Fig. 2, the IV layer contains three sub-layers (IVA, IVB, IVC), and the IVC sub-layer contains two sub-layers (IVCα, IVCβ). In the same way that LGN receives output from ganglion cells, some of the different layers in V1 receive output from LGN’s different layers. Among them, the IVCα layer receives the projection from the large-cell LGN layer, the IVCβ layer receives the projection from the small-cell LGN layer, and part of the cells in the III layer receive the projection from koniocellular LGN layers. ###Then, the visual information processed by the IVCα layer was transferred to the IVB layer, and the visual information processed by the IVCβ layer was transferred to the III layer. It is noteworthy that the region of V1 receiving visual information has similar characteristics to the corresponding LGN neuron. In addition, through relevant experiments, the researchers found that visual information began to mix after being transmitted to the III and IVB sub-layers of the V1 region, and before that, they were independent in the processing transmission process of ganglion cells to LGN to V1.
In summary, we can find that visual information is processed by different channels in the processing and transmission process of ganglion cells to LGN to V1 [33, 34, 47]. That is, M-type ganglion cells, LGN layer of large cells and IVCα layer of V1 region form a large cell channel, which has the characteristics of a large receptive field and is more sensitive to motor stimulation. P-type ganglion cells, LGN layer of small cells and IVCβ layer of V1 region constitute small cell channels, which have small receptive fields and are sensitive to detailed information. Non-M–non-P ganglion cells, koniocellular LGN layers, and some regions in layer III of the V1 area constitute yellow–blue antagonistic color channels, which are sensitive to blue–yellow antagonistic information. Inspired by this, this paper simulates three parallel channels composed of ganglion cells, LGN, and the V1 area. It models the characteristics of these three channels in processing visual information to design a new lightweight contour detection network with commendable performance.

Overall structure of bionic lightweight contour detection model

Figure 3 shows the overall structure of BLCDNet, which includes two parts: the backbone network and the decoding network. The backbone network is responsible for extracting feature information of different scales and inputting the extracted features into the decoding network. It is inspired by three parallel pathways in the retinal ganglion cells to LGN to V1 region. In the decoding network part, we designed a new feature extraction module named DFEM (depth feature extraction module). It uses residual error and depth separable convolution to further process the output of the backbone network, which realizes the feature extraction and fusion more fully and improves the overall performance of the model.

Backbone network

Figure 4 shows the detailed structure of our backbone network, corresponding to the green section in Fig. 3. In the biological visual system, visual information processed by the retina is transmitted to the LGN through different types of ganglion cells. Upon receiving this visual information, the LGN processes it once again and transmits it to the primary visual cortex, V1. After that, the V1 region consciously processes the received visual information, and after the initial processing is completed, it is transmitted to the higher regions via the ventral and dorsal pathways. It is worth noting that the process of processing and transmitting visual information from ganglion cells to LGN to V1 is divided into three parallel channels, and each channel has different characteristics and features, which do not interfere with each other when processing visual information. As the end point of parallel pathways and the starting point of ventral and dorsal pathways, the V1 region also plays a crucial role in the conscious processing of visual stimuli.
Inspired by this, in this paper, we designed a new backbone network named parallel path feature extraction network (PFENet) by using a convolution neural network to simulate three parallel paths of ganglion cells to LGN to V1. Their detailed composition is shown in a–e in Fig. 5, the large receptive field feature extraction network is composed of a dilated convolution with a convolution kernel size of 3 × 3 and a dilated rate of 5, and a maximum pooling layer, which simulates the magnocellular pathway in ganglion cells to LGN to V1. The feature extraction network of a small receptive field consists of conventional convolution and pool layer with a convolution kernel size of 3 × 3, which simulates the parvocellular–interblob pathway in ganglion cells to LGN to V1. The color adversarial feature extraction network is composed of a conventional convolution with a convolution kernel size of 3 × 3, a dilated convolution with a convolution kernel size of 3 × 3, and a dilated rate of 5, and a pooling layer, simulating the blob pathway in ganglion cells to LGN to V1. Although conventional convolution and dilated convolution have convolution cores of the same size, we set the dilation rate of dilated convolution to 5. Therefore, dilated convolution has a larger receptive field [49]. In addition, as in [17, 18, 50], we used the pooling layer to divide the network, and divided the large receptive field feature extraction network, small receptive field feature extraction network, and color adversarial feature extraction network into three stages, corresponding to retinal ganglion cells, LGN and V1, respectively. This also reflects the feature of extracting feature information step by step in the biological vision system.
a–e in Fig. 5 are represented by the following equation:
$$ {\text{Block\_b}} = C_{3 \times 3,5} \left( {C_{3 \times 3,5} \left( {C_{3 \times 3,5} \left( {C_{3 \times 3,5} \left( {I_{{{\text{input}}}} } \right)} \right)} \right)} \right) - C_{3 \times 3,5} \left( {I_{{{\text{input}}}} } \right), $$
(1)
$$ {\text{Block\_s}} = C_{3 \times 3} * \left( {C_{3 \times 3} * \left( {C_{3 \times 3} * \left( {C_{3 \times 3} * I_{{{\text{input}}}} } \right)} \right)} \right) - C_{3 \times 3} * I_{{{\text{input}}}} , $$
(2)
$$ {\text{Block\_c}} = C_{3 \times 3,5} * \left( {C_{3 \times 3} * \left( {C_{3 \times 3,5} * \left( {C_{3 \times 3} * I_{{{\text{input}}}} } \right)} \right)} \right) - C_{3 \times 3} * I_{{{\text{input}}}} , $$
(3)
$$ {\text{Block\_S\_C}} = {\text{Block\_s}}\left( {\left( {I_{{\text{G}}} - I_{{\text{R}}} } \right) + I_{{{\text{input}}}} } \right), $$
(4)
$$ {\text{Block\_C\_M}} = {\text{Block\_c}}\left( {\left( {\frac{{\left( {I_{{\text{G}}} + I_{{\text{R}}} } \right)}}{2} - I_{{\text{B}}} } \right) + I_{{{\text{input}}}} } \right), $$
(5)
\(I_{{{\text{input}}}}\) is the input image. \(C_{3 \times 3,5}\) and represents a dilated convolution with a convolution kernel size of 3 × 3 and dilation of 5. It is equivalent to a conventional convolution with a convolution kernel size of 11 × 11 and has a large receptive field. \(C_{3 \times 3}\) is a conventional convolution with a convolution kernel of size 3 × 3. Figure 6 shows the conventional convolution kernel dilated convolution with the same convolution kernel size. \(I_{{\text{R}}}\), \(I_{{\text{G}}}\) and \(I_{{\text{B}}}\) represent the three channels of the color image. \(\frac{{\left( {I_{{\text{G}}} + I_{{\text{R}}} } \right)}}{2}\) indicates the yellow information. “\(*\)” represents the convolution.

Depth feature extraction module

As shown in Fig. 3, the right part is the new decoding network proposed in this paper. Different from the previous decoding networks, we design a depth feature extraction module (DFEM) in the new decoding network to enhance the overall performance of the model. In previous methods [17, 18], the decoding network adjusts the number of channels through 1 × 1-conv after receiving the input from the backbone network and then restores the output feature graphs of different scales to the original image size for fusion through deconvolution or other up-sampling methods, so as to obtain the final contour output. However, in our decoding network, the input from the backbone network is processed by DFEM to further extract and fuse the feature information, while the number of channels is adjusted using 3 × 3-conv convolution. Finally, the output feature maps of different scales are resized to the original image size through deconvolution and then fused to obtain the final contour output. The feature information processed by DFEM incorporates more effective details, reducing unnecessary background and texture. This enhancement contributes to an overall improvement in the model's performance. In the experiments conducted in “Ablation study”, we validated the effectiveness of this module.
As shown in Fig. 7, DFEM is the depth feature extraction module proposed by us. It consists of a 1 × 1 conventional convolution, a 3 × 3 conventional convolution, and a 3 × 3 depth separable convolution. The input from the backbone network is firstly processed by 1 × 1-conv to increase the number of channels and then processed by 3 × 3 depth separable convolution to further extract the feature information. After that, the output of the depth separable convolution is added with the result of 1 × 1-conv, and the final output is obtained after a 3 × 3 convolution.
The calculation formula of DFEM is shown as follows:
$$ F_{{{\text{out}}}} = C_{3 \times 3} * \left[ {{\text{DSC}}_{3 \times 3} * \left( {C_{1 \times 1} * {\text{Output}}_{i} } \right) + C_{1 \times 1} * {\text{Output}}_{i} } \right]. $$
(6)
Among them, \(F_{{{\text{out}}}}\) is the output after DFEM processing, \({\text{Output}}_{i}\)(i = 1, 2, 3) is the side output of backbone network, \(C_{m \times n}\) is the conventional convolution, m, n = (1, 2, 3, ……) is the size of the convolution kernel, and \({\text{DSC}}_{m \times n}\) is the depth separable convolution, m, n = (1, 2, 3, ……) is the size of the convolution kernel, “\(*\)” represents the convolution.

Loss of function

To illustrate the effectiveness of the method proposed in this paper, we choose the same strategy as the previous method [21], and use the class-balanced cross-entropy loss function to solve the unbalanced distribution of positive and negative samples. The threshold \(\eta \) is introduced to distinguish positive and negative sample sets in consideration of the problem of labels being tagged by multiple people. \(\eta \) is set to 0.2. For a true edge graph \(Y\, { = }\, \left( {y_{j} ,j = 1,...,\left| Y \right|} \right),\quad y_{j} \in \left\{ {0,1} \right\}\) we define \(Y^{ + } = \left\{ {y_{j} ,y_{j} > \eta } \right\}\) and \(Y^{ - } = \left\{ {y_{j} ,y_{j} = 0} \right\}\). However, when \(0 < y_{j} \le \eta\), this point is considered controversial, so we ignore this point, that is, it does not belong to the positive sample or the negative sample. \(Y^{ + }\) and \(Y^{ - }\) represent the positive and negative sample sets. Therefore, \(l\left( \cdot \right)\) is calculated as follows:
$$ l\left( {P,Y} \right) = - \alpha \sum\limits_{{j \in Y^{ - } }} {\log \left( {1 - p_{j} } \right)} - \beta \sum\limits_{{j \in Y^{ + } }} {\log \left( {p_{j} } \right)} . $$
(7)
In Eq. (3), P represents the predicted contour, and \(p_{j}\) represents the value processed by a sigmoid function at pixel j. \(\alpha = \lambda \cdot \frac{{\left| {Y^{ + } } \right|}}{{\left| {Y^{ + } } \right| + \left| {Y^{ - } } \right|}}\) and \(\beta = \frac{{\left| {Y^{ - } } \right|}}{{\left| {Y^{ + } } \right| + \left| {Y^{ - } } \right|}}\) are used to balance the positive and negative samples, and \(\lambda\) \(\left( {\lambda = 3.0} \right)\) is the weight that controls the coefficient.
As can be seen from Fig. 3, the network uses multiple losses for training. We formulate the total loss as follows:
$$ L = \sum\limits_{i = 1}^{3} {\left( {\omega_{i} \cdot l\left( {P_{i} ,Y} \right)} \right) + } \omega_{{{\text{fuse}}}} \cdot l\left( {P_{{{\text{fuse}}}} ,Y} \right). $$
(8)
In the above formula, \(\omega_{i} \left( {i = 1,2,3} \right)\) and \(\omega_{{{\text{fuse}}}}\), respectively, represent the weight of loss of three side output results and the weight of loss of final prediction results, \(P_{i}\) represents three different outputs, \(P_{{{\text{fuse}}}}\) represents the final contour prediction, and \(Y\) represents the real contour map. \(\omega_{i} = \omega_{{{\text{fuse}}}} = 0.25\).

Experiment

In this section, we introduce the building environment of the model and the related parameter settings of the model. And experimental analysis is carried out on several publicly available data ets. For example, BSDS500 [23], NYUD [51]. In addition, we validate the effectiveness of the proposed backbone network and depth feature extraction module through ablation experiments. Finally, we compare them with existing lightweight contour detection models and contour detection models based on deep learning.

Datasets

BSDS500 and NYUDv2 are the two publicly available datasets and the most commonly used datasets in the field of contour detection.
As one of the most commonly used datasets in the field of contour detection, the BSDS500 dataset contains a total of 500 images. Among them, there are 200 pictures of the training set, 100 pictures of the verification set, and 200 pictures of the test set. We adopt the same strategy as in [1822] to enhance the training set and verification set through rotation, flipping, and random scaling and finally obtain the amplified BSDS500 dataset. In addition, to further enhance the dataset, the researchers mixed the amplified BSDS500 dataset with the flipped PASCAL VOC Context dataset [52] to obtain the mixed training set BSDS500-VOC.
NYUDv2 dataset, like the previous methods [17, 18, 20, 22], we rotated the 381 training pictures, 414 verification pictures, and their corresponding annotation information by four different angles (0, 90, 180, 270) and flipped the rotated results, thus increasing the number of training sets. In addition, because the NYUDv2 dataset contains RGB images and HHA images, we train and test BLCDNet models on the two images, respectively, and finally average the outputs of RGB and HHA as the final contour output. The NYUDv2 dataset has more test images than the BSDS500 dataset, and it contains 654 test images.

Implementation details

Parameter setting

We completed the design of BLCDNet in the PyTorch environment. In the training, we do not use the method of transfer learning to load other model parameters but train the model from the beginning. We used the SGD optimizer to update the parameters, setting the global learning rate to \(1 \times 10^{ - 6}\), momentum and weight decay to 0.9 and \(2 \times 10^{ - 4}\), respectively. When training on the BSDS500-VOC dataset and the NYUD-v2 dataset, we use the original image size and do not crop. We set the maximum allowable error distance between BSDS500 dataset contour prediction and true contour matching to 0.0075 during the evaluation process according to different datasets. Since the images in NYUD-v2 are larger than those in BSDS500, the maximum allowable error distance is set to 0.011. We used the same loss function as [17, 18, 22] to ensure the fairness of the experiment. All the experiments are conducted on a NVIDIA GeForce3090 GPU with 24GB memory.

Performance metrics

Similar to the previous method [1722], we first perform non-maximum suppression on the results of network output, so as to obtain the final contour output. We then evaluated the final contours using common evaluation metrics, including the optimal dataset scale (ODS), optimal image scale (OIS) and average precision (AP).
Optimal dataset scale (ODS). The F-score of each image in the dataset is tested at a fixed threshold and the average is calculated. Different average F-score can be computed at different fixed thresholds, and the maximum of all average F-score is the ODS. The threshold range for calculating F-score is [0,1].
Optimal image scale (OIS). The F-score of each image in the dataset is tested at different thresholds and the maximum F-score corresponding to each image is calculated. At this point, the threshold is also the optimal threshold for the image. OIS is the average of the F-score under the optimal threshold for each image.
Average precision (AP). AP is the average precision between the given threshold ranges [0, 1], and is the area under the precision–recall (PR) curve.
Precision–recall curve. The abscissa and ordinate of the PR curve are Recall and Precision, respectively. Recall and precision are calculated as in Eqs. (11) and (10). PR curve can reflect the classification performance of the model [53].
The F-score is calculated as follows:
$$ F{\text{-score}} = \frac{{\left( {P \times R} \right)}}{{\left[ {\left( {1 - \alpha } \right) + \alpha R} \right]}}. $$
(9)
\(\alpha\) is the weight, generally 0.5. P and R stand for precision and recall, respectively.
P is calculated as follows:
$$ P = \frac{{{\text{TP}}}}{{\left( {{\text{TP}} + {\text{FP}}} \right)}}. $$
(10)
TP and FP represent the correct number and false number of contour pixels.
R is calculated as follows:
$$ R = \frac{{{\text{TP}}}}{{\left( {{\text{TP}} + {\text{FN}}} \right)}} $$
(11)
TP and FN represent the correct number and missed number of contour pixels.
In addition, the recent lightweight method [3032] tested the parameters, floating-point operations per second (FLOPs), and frame per second (FPS) of the model. To verify the competitiveness of the model, we also tested the parameters, FLOPs, and FPS of BLCDNet in this paper.

Ablation study

In this section, we conduct a detailed experimental analysis and evaluation of the backbone network of BLCDNet using the BSD500 dataset. First, we trained only the large receptive field feature extraction network (LRF-FENet), the small receptive field feature extraction network (SRF-FENet) and the color confrontation feature extraction network (CC-FENet) under the same conditions and tested their output results. The experimental results are shown in Table 1. Subsequently, we proceeded to train and test the results of combining two different networks. The experimental outcomes are presented in Table 1. Specifically, LSRF-FENet signifies the fusion of the large cell feature extraction network and the small cell feature extraction network; LCRF-FENet denotes the fusion of the large cell feature extraction network and the color confrontation feature extraction network; while SCRF-FENet represents the fusion of the small cell feature extraction network and the color confrontation feature extraction network. Finally, we trained and tested our entire model in the same way that the three channels in biological vision are parallel. The results show that the three channels are processed in parallel, and the final fusion achieves the best result ODS = 0.784.
Table 1
Test results of network on BSDS500 without using mixed training set, SS denotes single scale
Method
 
ODS
OIS
AP
Test results for a single network
    
 LRF-FENet
SS
0.769
0.783
0.612
 SRF-FENet
SS
0.758
0.776
0.610
 CC-FENet
SS
0.779
0.796
0.597
The two paths are combined with the results of the test
 LSRF-FENet
SS
0.779
0.795
0.543
 LCRF-FENet
SS
0.780
0.794
0.555
 SCRF-FENet
SS
0.779
0.796
0.553
The three paths are combined with the results of the test
 BLCDNet
SS
0.784
0.800
0.537
 BLCDNet-w/o-DFEM
SS
0.776
0.791
0.531
BLCDNet-w/o-DFEM indicates that the DFEM module is not used
In addition, we also verify the effectiveness of the proposed DFEM on the BSDS500 dataset, as shown in Table 1 are the results of our experiments. BLCDNet indicates that the DFEM module is used in the decoding network, and BLCDNet-w/o-DFEM indicates that the DFEM module is not used in the decoding network. It can be seen from Table 1 that the performance of BLCDNet using DFEM is higher than that of BLCDNet without DFEM, with ODS exceeding 0.8%. The results show that DFEM blocks achieve further feature extraction and improve the overall performance of the model. Figure 8 is the output result of BLCDNet and BLCDNET-w/o-DFEM. As noted in the red box in Fig. 8, we can see that the DFEM treatment reduces the texture in the output and adds more useful details.

Comparison with other works

BSDS500

We trained BLCDNet on the BSDS500-VOC hybrid training set and conducted a detailed experimental analysis and evaluation of the test results. We compare the results of BLCDNet with previous contour detection methods, including biologically inspired contour detection methods, lightweight contour detection methods, deep learning contour detection methods, and non-deep learning contour detection methods. For example, Tang [8], multiscale integration [54], SCO [14], contrasts-dependent [55], multifeature based [56], SED [15], adaptive inhibition [57]. PiDiNet [32], FINED [30], TIN2 [31], BDCN2 [22], BDCN3 [22]. DeepContour [58], DeepEdge [59], HED [17], RCF [18], CED [19], LPCB [20], DRNet [21], DSCD [60], MI-Net [16], LRDNN [39], LLCED [40], gPb [23], OEF [38], SE [61], MCG [62], SCG [37], and sketch tokens [63]. In addition, Tang [8], PiDiNet [32], FINED [30] and so on can also be deep learning methods. Table 2 shows the quantitative comparison results between BLCDNet and other methods.
Table 2
The quantitative comparison results of the proposed method and other methods on BSDS500 test set
Type
Method
ODS
OIS
AP
Bio-inspired contour detection method
BLCDNet-SS
0.799
0.816
0.697
Tang [8]
0.762
0.778
0.809
Multiscale integration [54]
0.680
SCO [14]
0.670
0.710
0.710
Contrast dependent [55]
0.630
Multifeature based [56]
0.620
SED [15]
0.710
0.740
0.740
Adaptive inhibition [57]
0.580
Lightweight contour detection method
PiDiNet [32]
0.807
0.823
BLCDNet-SS
0.799
0.816
0.697
TIN2 [[30]
0.772
0.795
FINED [31]
0.790
0.808
BDCN2 [22]
0.766
0.787
BDCN3 [22]
0.796
0.817
Deep learning contour detection method
DeepContour [58]
0.757
0.776
790
DeepEdge [59]
0.753
0.772
787
HED [17]
0.788
0.808
0.840
RCF [18]
0.806
0.823
0.839
CED [19]
0.794
0.811
0.847
LPCB [20]
0.808
0.824
DRNet [21]
0.802
0.818
0.800
DSCD [60]
0.813
0.836
0.847
MI-Net [16]
0.820
0.837
0.873
LRDNN [39]
0.825
0.840
LLCED [40]
0.805
0.818
Non-deep learning contour detection method
gPb [23]
0.729
0.755
0.745
OEF [38]
0.746
0.770
0.815
SE [61]
0.743
0.764
0.800
MCG [62]
0.744
0.777
SCG [37]
0.739
0.758
0.773
Sketch tokens [63]
0.727
0.746
0.780
Bold shows our results
According to the results in Table 2, it can be found that BLCDNet achieves the best result among all biologically inspired contour detection models, with ODS = 0.799, exceeding Tang [8] 3.7%. Combining the results in Table 2, Figs. 9 and 10, it can be seen that BLCDNet also achieves good results among all lightweight models, just below the best PiDiNet. In addition, the results still exceed some deep learning-based contour detection methods when the number of model parameters is small, the calculation is simple and the pre-trained model is not used. ODS exceeds HED and CED by 1.1% and 0.5%, respectively. It further proves that our model has strong competitiveness. The PR curves of our method and other methods are shown in Fig. 11. As can be seen from the figure, our method is closest to the test results of humans and is competitive among all methods. Among them, the vertical coordinate represents the precision rate, and the horizontal coordinate represents the recall rate. The area under the curve represents AP in the performance indicator.

NYUD

Like the previous methods [17, 18, 20, 22], we trained our model on RGB images and HHA feature maps, and then tested them, respectively. Finally, the test results were output to obtain RGB, HHA, and RGB–HHA. Where RGB–HHA is the average output of RGB and HHA. We compare the three outputs with results from other methods. For example, gPb-UCM [23], SE [61], gPb + NG [64], SE + NG + [65], OEF [38], HED [17], RCF [18], LPCB [20], TIN2 [31], and PiDiNet [32]. The experimental results are shown in Table 3.
Table 3
Quantitative comparison results between the proposed method and other methods on nyud-v2 test set
Method
Input
ODS
OIS
AP
gPb-UCM [23]
RGB
0.631
0.661
0.562
SE [61]
0.695
0.708
0.719
gPb + NG [64]
0.687
0.716
0.629
SE + NG + [65]
0.706
0.734
0.549
OEF [38]
0.651
0.667
0.653
HED [17]
RGB
0.717
0.732
0.704
HHA
0.681
0.695
0.674
RGB-HHA
0.741
0.757
0.749
RCF [18]
RGB
0.729
0.742
0.693
HHA
0.705
0.715
0.650
RGB-HHA
0.757
0.771
0.749
LPCB [20]
RGB
0.739
0.754
HHA
0.707
0.719
RGB-HHA
0.762
0.778
PiDiNet [32]
RGB
0.733
0.747
HHA
0.715
0.728
RGB-HHA
0.756
0.773
TIN1 [31]
RGB
0.706
0.723
HHA
0.661
0.681
RGB-HHA
0.729
0.750
TIN2 [31]
RGB
0.729
0.745
HHA
0.705
0.722
RGB-HHA
0.753
0.773
BLCDNet
RGB
0.726
0.741
0.696
HHA
0.703
0.717
0.650
RGB-HHA
0.751
0.766
0.758
Bold shows our results
According to the results in Table 3, our method also achieves good performance on the NYUD dataset. It surpasses the results of all biomimetic contour detection models. It also exceeds some deep learning-based methods and lightweight methods, such as HED [17] and TIN1 [31]. This proves that our method shows consistent performance on different data and is more competitive than other methods. Figure 12 is the partial output result of our random selection. It can be seen from the figure that BLCDNet can extract the contour information of the input image relatively completely. Figure 13 shows the PR curves of the proposed method and other methods.

Conclusion

In this paper, we propose a novel biologically inspired lightweight contour detection network, BLCDNet, by combining biological vision mechanisms and convolutional neural networks. We perform experiments and tests on several publicly available datasets, BSDS5000, NYUD, and the results show that BLCDNet obtains an advanced performance among all the biologically inspired models, which is highly competitive among all the deep learning methods. In addition, the combination of biological vision mechanisms also makes BLCDNet more interpretable than other methods, indicating the importance of visual mechanisms for future research. In BLCDNet, we designed the corresponding network structure by simulating three parallel pathways from ganglion cells to V1. We designed a large receptive field network with dilated convolution to simulate the large cell channel from ganglion cells to the V1 region, designed a small receptive field network with conventional convolution to simulate the small cell channel from ganglion cells to the V1 region, and designed a mixed network with conventional convolution and dilated convolution to simulate the color channel from ganglion cells to V1 region. Finally, the combination of the three as the backbone network to achieve full extraction of feature information. In addition, we designed a depth feature extraction module by using deep separable convolution and realize the full fusion of context information by further processing the characteristics of the output of the backbone network. Experiments and tests on publicly available datasets BSDS5000 and NYUD show that our method achieves good performance and has strong competitiveness. It is worth noting that although BLCDNet performs well in all methods, in this paper we pay more attention to the three parallel pathways from ganglion cells to V1, without further exploration and research on the characteristics of neuronal cells in them, which makes the performance of our model limited to a certain extent. In future work, fully considering the overall structure and neuronal properties of visual pathways will be the focus of our study. Furthermore, based on the recent excellent performance of Transformer and the connection between the selectivity mechanism in biological vision systems and the attention mechanism in transformer, it is our future direction to use transformer to improve the proposed method.

Acknowledgements

The authors appreciate the anonymous reviewers for their helpful and constructive comments on an earlier draft of this paper.

Declarations

Conflict of interest

The authors declare that they have no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Literatur
1.
Zurück zum Zitat Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 6:679–698CrossRef Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 6:679–698CrossRef
2.
Zurück zum Zitat Ferrari V, Fevrier L, Jurie F, Schmid C (2007) Groups of adjacent contour segments for object detection. IEEE Trans Pattern Anal Mach Intell 30(1):36–51CrossRef Ferrari V, Fevrier L, Jurie F, Schmid C (2007) Groups of adjacent contour segments for object detection. IEEE Trans Pattern Anal Mach Intell 30(1):36–51CrossRef
3.
Zurück zum Zitat Bertasius G, Shi J, Torresani L (2016) Semantic segmentation with boundary neural fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3602–3610 Bertasius G, Shi J, Torresani L (2016) Semantic segmentation with boundary neural fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3602–3610
4.
Zurück zum Zitat Wang Y, Zhao X, Hu X, Li Y, Huang K (2019) Focal boundary guided salient object detection. IEEE Trans Image Process 28(6):2813–2824ADSMathSciNetCrossRef Wang Y, Zhao X, Hu X, Li Y, Huang K (2019) Focal boundary guided salient object detection. IEEE Trans Image Process 28(6):2813–2824ADSMathSciNetCrossRef
5.
Zurück zum Zitat Sundberg P, Brox T, Maire M, Arbeláez P, Malik J (2011) Occlusion boundary detection and figure/ground assignment from optical flow. In: CVPR 2011. IEEE, pp 2233–2240 Sundberg P, Brox T, Maire M, Arbeláez P, Malik J (2011) Occlusion boundary detection and figure/ground assignment from optical flow. In: CVPR 2011. IEEE, pp 2233–2240
6.
Zurück zum Zitat Prewitt JM (1970) Object enhancement and extraction. Picture Process Psychopictorics 10(1):15–19 Prewitt JM (1970) Object enhancement and extraction. Picture Process Psychopictorics 10(1):15–19
7.
Zurück zum Zitat Sobel I, Feldman G (1968) A 3x3 isotropic gradient operator for image processing. In: A talk at the Stanford artificial project, pp 271–272 Sobel I, Feldman G (1968) A 3x3 isotropic gradient operator for image processing. In: A talk at the Stanford artificial project, pp 271–272
8.
Zurück zum Zitat Tang Q, Sang N, Liu H (2019) Learning nonclassical receptive field modulation for contour detection. IEEE Trans Image Process 29:1192–1203ADSMathSciNetCrossRef Tang Q, Sang N, Liu H (2019) Learning nonclassical receptive field modulation for contour detection. IEEE Trans Image Process 29:1192–1203ADSMathSciNetCrossRef
9.
Zurück zum Zitat Yang D, Peng B, Al-Huda Z, Malik A, Zhai D (2022) An overview of edge and object contour detection. Neurocomputing 488:470–493CrossRef Yang D, Peng B, Al-Huda Z, Malik A, Zhai D (2022) An overview of edge and object contour detection. Neurocomputing 488:470–493CrossRef
10.
Zurück zum Zitat Hubel DH, Wiesel TN (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160(1):106CrossRefPubMedPubMedCentral Hubel DH, Wiesel TN (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160(1):106CrossRefPubMedPubMedCentral
11.
Zurück zum Zitat Jones H, Grieve K, Wang W, Sillito A (2001) Surround suppression in primate V1. J Neurophysiol 86(4):2011–2028CrossRefPubMed Jones H, Grieve K, Wang W, Sillito A (2001) Surround suppression in primate V1. J Neurophysiol 86(4):2011–2028CrossRefPubMed
12.
13.
Zurück zum Zitat Grigorescu C, Petkov N, Westenberg MA (2003) Contour detection based on nonclassical receptive field inhibition. IEEE Trans Image Process 12(7):729–739ADSCrossRefPubMed Grigorescu C, Petkov N, Westenberg MA (2003) Contour detection based on nonclassical receptive field inhibition. IEEE Trans Image Process 12(7):729–739ADSCrossRefPubMed
14.
Zurück zum Zitat Yang K-F, Gao S-B, Guo C-F, Li C-Y, Li Y-J (2015) Boundary detection using double-opponency and spatial sparseness constraint. IEEE Trans Image Process 24(8):2565–2578ADSMathSciNetCrossRef Yang K-F, Gao S-B, Guo C-F, Li C-Y, Li Y-J (2015) Boundary detection using double-opponency and spatial sparseness constraint. IEEE Trans Image Process 24(8):2565–2578ADSMathSciNetCrossRef
15.
Zurück zum Zitat Akbarinia A, Parraga CA (2018) Feedback and surround modulated boundary detection. Int J Comput Vision 126(12):1367–1380CrossRef Akbarinia A, Parraga CA (2018) Feedback and surround modulated boundary detection. Int J Comput Vision 126(12):1367–1380CrossRef
16.
Zurück zum Zitat Lin C, Pang X, Hu Y (2023) Bio-inspired multi-level interactive contour detection network. Digit Signal Process 141:104155CrossRef Lin C, Pang X, Hu Y (2023) Bio-inspired multi-level interactive contour detection network. Digit Signal Process 141:104155CrossRef
17.
Zurück zum Zitat Xie S, Tu Z (2015) Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision, pp 1395–1403 Xie S, Tu Z (2015) Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision, pp 1395–1403
18.
Zurück zum Zitat Liu Y, Cheng M-M, Hu X, Wang K, Bai X (2017) Richer convolutional features for edge detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3000–3009 Liu Y, Cheng M-M, Hu X, Wang K, Bai X (2017) Richer convolutional features for edge detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3000–3009
19.
Zurück zum Zitat Wang Y, Zhao X, Huang K (2017) Deep crisp boundaries. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3892–3900 Wang Y, Zhao X, Huang K (2017) Deep crisp boundaries. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3892–3900
20.
Zurück zum Zitat Deng R, Shen C, Liu S, Wang H, Liu X (2018) Learning to predict crisp boundaries. In: Proceedings of the European conference on computer vision (ECCV), pp 562–578 Deng R, Shen C, Liu S, Wang H, Liu X (2018) Learning to predict crisp boundaries. In: Proceedings of the European conference on computer vision (ECCV), pp 562–578
21.
Zurück zum Zitat Lin C, Cui L, Li F, Cao Y (2020) Lateral refinement network for contour detection. Neurocomputing 409:361–371CrossRef Lin C, Cui L, Li F, Cao Y (2020) Lateral refinement network for contour detection. Neurocomputing 409:361–371CrossRef
22.
Zurück zum Zitat He J, Zhang S, Yang M, Shan Y, Huang T (2019) Bi-directional cascade network for perceptual edge detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3828–3837 He J, Zhang S, Yang M, Shan Y, Huang T (2019) Bi-directional cascade network for perceptual edge detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3828–3837
23.
Zurück zum Zitat Arbelaez P, Maire M, Fowlkes C, Malik J (2010) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 33(5):898–916CrossRef Arbelaez P, Maire M, Fowlkes C, Malik J (2010) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 33(5):898–916CrossRef
24.
Zurück zum Zitat Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619CrossRef Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619CrossRef
25.
Zurück zum Zitat Pu M, Huang Y, Liu Y, Guan Q, Ling H (2022) EDTER: edge detection with transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1402–1412 Pu M, Huang Y, Liu Y, Guan Q, Ling H (2022) EDTER: edge detection with transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1402–1412
26.
Zurück zum Zitat Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255 Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255
27.
Zurück zum Zitat Gao S-H, Tan Y-Q, Cheng M-M, Lu C, Chen Y, Yan S (2020) Highly efficient salient object detection with 100k parameters. In; Computer vision—ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, proceedings, part VI. Springer, pp 702–721 Gao S-H, Tan Y-Q, Cheng M-M, Lu C, Chen Y, Yan S (2020) Highly efficient salient object detection with 100k parameters. In; Computer vision—ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, proceedings, part VI. Springer, pp 702–721
28.
Zurück zum Zitat Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Bisenet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 325–341 Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Bisenet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 325–341
29.
Zurück zum Zitat Howard AG et al. (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 Howard AG et al. (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:​1704.​04861
31.
Zurück zum Zitat Wibisono JK, Hang H-M (2020) Traditional method inspired deep neural network for edge detection. In: 2020 IEEE international conference on image processing (ICIP). IEEE, pp 678–682 Wibisono JK, Hang H-M (2020) Traditional method inspired deep neural network for edge detection. In: 2020 IEEE international conference on image processing (ICIP). IEEE, pp 678–682
32.
Zurück zum Zitat Su Z et al. (2021) Pixel difference networks for efficient edge detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5117–5127 Su Z et al. (2021) Pixel difference networks for efficient edge detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5117–5127
33.
Zurück zum Zitat Bear M, Connors B, Paradiso MA (2020) Neuroscience: exploring the brain, enhanced. Jones & Bartlett Learning, Burlington Bear M, Connors B, Paradiso MA (2020) Neuroscience: exploring the brain, enhanced. Jones & Bartlett Learning, Burlington
34.
Zurück zum Zitat Zhong H, Wang R (2021) Neural mechanism of visual information degradation from retina to V1 area. Cogn Neurodyn 15:299–313CrossRefPubMed Zhong H, Wang R (2021) Neural mechanism of visual information degradation from retina to V1 area. Cogn Neurodyn 15:299–313CrossRefPubMed
35.
Zurück zum Zitat Fukushima K (1980) Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 36(4):193–202CrossRefPubMed Fukushima K (1980) Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 36(4):193–202CrossRefPubMed
36.
Zurück zum Zitat Martin DR, Fowlkes CC, Malik J (2004) Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans Pattern Anal Mach Intell 26(5):530–549CrossRefPubMed Martin DR, Fowlkes CC, Malik J (2004) Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans Pattern Anal Mach Intell 26(5):530–549CrossRefPubMed
37.
Zurück zum Zitat Xiaofeng R, Bo L (2012) Discriminatively trained sparse code gradients for contour detection. Adv Neural Inf Process Syst 25:593–601 Xiaofeng R, Bo L (2012) Discriminatively trained sparse code gradients for contour detection. Adv Neural Inf Process Syst 25:593–601
38.
Zurück zum Zitat Hallman S, Fowlkes CC (2015) Oriented edge forests for boundary detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1732–1740 Hallman S, Fowlkes CC (2015) Oriented edge forests for boundary detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1732–1740
39.
Zurück zum Zitat Al-Amaren A, Ahmad MO, Swamy M (2023) A low-complexity residual deep neural network for image edge detection. Appl Intell 53(9):11282–11299CrossRef Al-Amaren A, Ahmad MO, Swamy M (2023) A low-complexity residual deep neural network for image edge detection. Appl Intell 53(9):11282–11299CrossRef
40.
Zurück zum Zitat Fang X-N, Zhang S-H (2023) Learning local contrast for crisp edge detection. J Comput Sci Technol 38(3):554–566CrossRef Fang X-N, Zhang S-H (2023) Learning local contrast for crisp edge detection. J Comput Sci Technol 38(3):554–566CrossRef
41.
Zurück zum Zitat Nicholls JG, Martin AR, Wallace BG, Fuchs PA (2001) From neuron to brain. Sinauer Associates, Sunderland, MA Nicholls JG, Martin AR, Wallace BG, Fuchs PA (2001) From neuron to brain. Sinauer Associates, Sunderland, MA
42.
Zurück zum Zitat Zhang Q, Lin C, Li F (2021) Application of binocular disparity and receptive field dynamics: a biologically-inspired model for contour detection. Pattern Recogn 110:107657CrossRef Zhang Q, Lin C, Li F (2021) Application of binocular disparity and receptive field dynamics: a biologically-inspired model for contour detection. Pattern Recogn 110:107657CrossRef
43.
Zurück zum Zitat Fan X, Jiang M, Shahid AR, Yan H (2022) Hierarchical scale convolutional neural network for facial expression recognition. Cogn Neurodyn 16(4):847–858CrossRefPubMedPubMedCentral Fan X, Jiang M, Shahid AR, Yan H (2022) Hierarchical scale convolutional neural network for facial expression recognition. Cogn Neurodyn 16(4):847–858CrossRefPubMedPubMedCentral
44.
Zurück zum Zitat Kandel ER, Schwartz JH, Jessell TM, Siegelbaum S, Hudspeth AJ, Mack S (2000) Principles of neural science. McGraw-Hill, New York Kandel ER, Schwartz JH, Jessell TM, Siegelbaum S, Hudspeth AJ, Mack S (2000) Principles of neural science. McGraw-Hill, New York
45.
Zurück zum Zitat Purves D et al (2008) Cognitive neuroscience (no. 4). Sinauer Associates Inc, Sunderland Purves D et al (2008) Cognitive neuroscience (no. 4). Sinauer Associates Inc, Sunderland
46.
Zurück zum Zitat Stone J (2013) Parallel processing in the visual system: the classification of retinal ganglion cells and its impact on the neurobiology of vision. Springer Science & Business Media, Berlin Stone J (2013) Parallel processing in the visual system: the classification of retinal ganglion cells and its impact on the neurobiology of vision. Springer Science & Business Media, Berlin
47.
Zurück zum Zitat Briggs F, Usrey WM (2011) Corticogeniculate feedback and visual processing in the primate. J Physiol 589(1):33–40CrossRefPubMed Briggs F, Usrey WM (2011) Corticogeniculate feedback and visual processing in the primate. J Physiol 589(1):33–40CrossRefPubMed
48.
Zurück zum Zitat Garey LJ (1999) Brodmann’s localisation in the cerebral cortex. World Scientific, SingaporeCrossRef Garey LJ (1999) Brodmann’s localisation in the cerebral cortex. World Scientific, SingaporeCrossRef
50.
Zurück zum Zitat Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556
51.
Zurück zum Zitat Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. ECCV 5(7576):746–760 Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. ECCV 5(7576):746–760
52.
Zurück zum Zitat Mottaghi R et al (2014) The role of context for object detection and semantic segmentation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 891–898 Mottaghi R et al (2014) The role of context for object detection and semantic segmentation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 891–898
53.
Zurück zum Zitat Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on machine learning, pp 233–240 Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on machine learning, pp 233–240
54.
Zurück zum Zitat Wei H, Lang B, Zuo Q (2013) Contour detection model with multi-scale integration based on non-classical receptive field. Neurocomputing 103:247–262CrossRef Wei H, Lang B, Zuo Q (2013) Contour detection model with multi-scale integration based on non-classical receptive field. Neurocomputing 103:247–262CrossRef
55.
Zurück zum Zitat Tang Q, Sang N, Liu H (2016) Contrast-dependent surround suppression models for contour detection. Pattern Recogn 60:51–61ADSCrossRef Tang Q, Sang N, Liu H (2016) Contrast-dependent surround suppression models for contour detection. Pattern Recogn 60:51–61ADSCrossRef
56.
Zurück zum Zitat Yang K-F, Li C-Y, Li Y-J (2014) Multifeature-based surround inhibition improves contour detection in natural images. IEEE Trans Image Process 23(12):5020–5032ADSMathSciNetCrossRefPubMed Yang K-F, Li C-Y, Li Y-J (2014) Multifeature-based surround inhibition improves contour detection in natural images. IEEE Trans Image Process 23(12):5020–5032ADSMathSciNetCrossRefPubMed
57.
Zurück zum Zitat Zeng C, Li Y, Li C (2011) Center–surround interaction with adaptive inhibition: a computational model for contour detection. Neuroimage 55(1):49–66CrossRefPubMed Zeng C, Li Y, Li C (2011) Center–surround interaction with adaptive inhibition: a computational model for contour detection. Neuroimage 55(1):49–66CrossRefPubMed
58.
Zurück zum Zitat Shen W, Wang X, Wang Y, Bai X, Zhang Z (2015) Deepcontour: a deep convolutional feature learned by positive-sharing loss for contour detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3982–3991 Shen W, Wang X, Wang Y, Bai X, Zhang Z (2015) Deepcontour: a deep convolutional feature learned by positive-sharing loss for contour detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3982–3991
59.
Zurück zum Zitat Bertasius G, Shi J, Torresani L (2015) Deepedge: a multi-scale bifurcated deep network for top-down contour detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4380–4389 Bertasius G, Shi J, Torresani L (2015) Deepedge: a multi-scale bifurcated deep network for top-down contour detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4380–4389
60.
Zurück zum Zitat Deng R, Liu S (2020) Deep structural contour detection. In: Proceedings of the 28th ACM international conference on multimedia, pp 304–312 Deng R, Liu S (2020) Deep structural contour detection. In: Proceedings of the 28th ACM international conference on multimedia, pp 304–312
61.
Zurück zum Zitat Dollár P, Zitnick CL (2014) Fast edge detection using structured forests. IEEE Trans Pattern Anal Mach Intell 37(8):1558–1570CrossRef Dollár P, Zitnick CL (2014) Fast edge detection using structured forests. IEEE Trans Pattern Anal Mach Intell 37(8):1558–1570CrossRef
62.
Zurück zum Zitat Arbeláez P, Pont-Tuset J, Barron JT, Marques F, Malik J (2014) Multiscale combinatorial grouping. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 328–335 Arbeláez P, Pont-Tuset J, Barron JT, Marques F, Malik J (2014) Multiscale combinatorial grouping. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 328–335
63.
Zurück zum Zitat Lim JJ, Zitnick CL, Dollár P (2013) Sketch tokens: a learned mid-level representation for contour and object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3158–3165 Lim JJ, Zitnick CL, Dollár P (2013) Sketch tokens: a learned mid-level representation for contour and object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3158–3165
64.
Zurück zum Zitat Gupta S, Arbelaez P, Malik J (2013) Perceptual organization and recognition of indoor scenes from RGB-D images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 564–571 Gupta S, Arbelaez P, Malik J (2013) Perceptual organization and recognition of indoor scenes from RGB-D images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 564–571
65.
Zurück zum Zitat Gupta S, Girshick R, Arbeláez P, Malik J (2014) Learning rich features from RGB-D images for object detection and segmentation. In: Computer vision–ECCV 2014: 13th European conference, Zurich, Switzerland, September 6–12, 2014, proceedings, part VII 13. Springer, pp 345–360 Gupta S, Girshick R, Arbeláez P, Malik J (2014) Learning rich features from RGB-D images for object detection and segmentation. In: Computer vision–ECCV 2014: 13th European conference, Zurich, Switzerland, September 6–12, 2014, proceedings, part VII 13. Springer, pp 345–360
Metadaten
Titel
A lightweight contour detection network inspired by biology
verfasst von
Chuan Lin
Zhenguang Zhang
Jiansheng Peng
Fuzhang Li
Yongcai Pan
Yuwei Zhang
Publikationsdatum
04.03.2024
Verlag
Springer International Publishing
Erschienen in
Complex & Intelligent Systems
Print ISSN: 2199-4536
Elektronische ISSN: 2198-6053
DOI
https://doi.org/10.1007/s40747-024-01393-4